Posts Tagged ‘Excel’

A Real Head-Scratcher Courtesy The CLR And Office Teams

"Why the hell is this application insisting on loading an old version of the CLR?" I'm guessing that's what you're asking yourself if you've gotten here through your search engine of choice. Well, relax. I'm about to explain.

It is reasonably well-known that there are .config file settings that can force an application to load a specific CLR version. It is also possible to write unmanaged code using the hosting API's like CorBindToRuntimeEx to bind to a specific version. I'm not talking about either of these scenarios.

There is a set of undocumented (as far as I know) registry keys that can override either of these vectors. They are located here:

HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\.NETFramework\policy\AppPatch\v2.0.50727.00000

If you take a look at this key on your own machine, you'll probably see a handful of applications listed.

The root of the Microsoft Office .NET Framework version bug

Applications can specify a target version of the framework that will be loaded into the process, even if there is no .config file present, and even if you write some C++ to call CorBindToCurrentRuntime. Mscoree will simply report that the current runtime is 1.1.4322, even if the user has 2.0 installed on their machine. This is, I suppose, superfically similar to the strategy used in the Image File Execution Options built into NT.

I can see where this would be handy–presumably Microsoft has a handful of .NET 1.1 apps in the wild that showed bugs against 2.0, and it's not exactly practical or foolproof to try to drop a .config file on those apps when installing the framework (naturally, somebody can just install the application after the framework, then be puzzled as to why it isn't working).

Good solution, right? It's a shame they fouled it up.

Notice that Word and Excel are both listed as special applications on my machine. I have these values on my laptop:

[HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\.NETFramework\policy\ AppPatch\v2.0.50727.00000\excel.exe\{2CCAA9FE-6884-4AF2-99DD-5217B94115DF}]
"Minimum File Version Number"="11.0.0.0"
"Maximum File Version Number"="11.0.9999.9999"
"Target Version"="v1.1.4322"

It appears as though the someone is trying to force a relatively older build of Excel/Word 2003 to load v1.1. Fair enough, the Excel build on my machine is up to 11.6 at this point. Unfortunately, those developers failed miserably in the attempt. Every single other application listed under AppPatch in my registry looks like this:

"Minimum File Version"="11.0.0.0"
"Maximum File Version"="11.0.9999.9999"
"Target Version"="v1.1.4322"

When the version filter isn't specified (or actually in this case, miswritten as "Minimum/Maximum File Version Number" instead of "Minimum/Maximum File Version"), all versions of the executable will load the target version. So these keys effectively break 2.0 addins for all versions of Word and Excel.

There is an update available that fixes this issue, here. The KB article that accompanies it doesn't really explain what the issue is, which is why I've summarized it here. Office update is not exactly an automatic affair like its Windows counterpart, so this bug makes it essentially impossible to write a fault-free Office addin that targets 2.0 and does not hack up the user's registry upon installation. I can't say I advocate doing that, but I would understand if you did. As for myself, I am most likely going to target 1.1 much longer than I had hoped would be necessary.

Finally, This Site Produces Some Results

In my last post, I complained about an obscure Excel crash. Well, this morning the client got back to me saying that he opened up the original workbook (with no workarounds in place), and obviously, got the crash.

However, something was different this time. Dr. Watson showed him a link for more information, which he followed and found a download link for Office 2003 SP2! Released on September 27th! It fixed the issue! Less than a week after I lost a whole day debugging and working around it.

Thanks to Microsoft for monitoring the blogosphere closely and for being so responsive to my issue. Hopefully, the legion of call center workers on The Sub-Continent who refuse to escalate my other premier support case to anyone even remotely familiar with the Excel code base helped in churning out this incredibly timed hotfix.

Following Excel User-Defined Functions with WinDbg: A Debugging Odyssey

One of several applications I work on is a COM add-in for Excel. To put it mildly, this thing can be a real bear to support sometimes.

I was looking into a very peculiar access violation (AV) today. It seemed to only ever happen if—

  • My add-in was installed.
  • The Analysis Toolpak was installed.
  • I tried to load a very large and very particular workbook from a client.

I took a look at a crash dump and—luckily—was able to reproduce this in my office. This is a very good thing, because when I can’t reproduce it often means that I will have to dress up.

The AV was occurring very deep in some Excel code. It didn’t look like my add-in was doing anything special, or in fact, anything at all.

My first instinct might have been, “Why can’t the Excel team fix their stupid bugs!!?!” But, I have more empathy than that. The AV happened in a function that appeared to put the EBP register through a blender set to “pulverize.”

In other words, it required effort to get an accurate stack trace within this function, and I was far too lazy to pick up the pieces. Any reasonable person would be too—and maybe the bug only occurs with the optimizer turned on. So I will give the Excel team a pass on this one.

So what’s a clever debugger to do? Well, I (begrudgingly) know enough about Excel to know that the Analysis Toolpak is a bit of a dinosaur—apparently just like many Office users. It is actually ANALYS32.XLL, nothing more than a standard DLL with special exports. It is a relic of add-in architectures gone by.

So I used GFLAGS.EXE to set WinDbg as the startup debugger for Excel, and fired it up.

Now, I had absolutely no idea which worksheet functions were actually in the toolpak. Frankly, I had no particular desire to do the research. So, I set a breakpoint on all of them:

0:00> bm ANALYS32!*

The bm command is the set symbol breakpoint command. Since I only have export symbols for the Analysis Toolpak, this effectively puts a break on every exported function from that XLL.

After that I hit “go” and opened up the workbook. The first break was on the XIRR formula:

ANALYS32!xirr:
3d00fbb1 55               push    ebp

At this point I had second thoughts about breaking on all of the formulas, so I cleared those breakpoints.

0:00> bc *

And I figured that it was pretty likely that Excel would be calling any user-defined-function from the same parent procedure, so I stepped one frame down on the stack.

0:00> .frame 1

The .frame command changes the local context. It’s basically the same as clicking on a higher function in the stack trace in Visual Studio. At that point I saw this in my disassembly window:

3046aaac ff5004      call    dword ptr [eax+0x4]  ; call to UDF
3046aaaf 8b4d14      mov     ecx, [ebp+0x14]      ; return address

The first instruction is the call to the user-defined function. The second is the return address, or the next instruction to be executed when the user-defined function returns.

I was still in the information-gathering stage here, so I decided to just dump out the functions being called here rather than try to really dig into any disassembly where the AV was occurring.

I decided to print out messages on both of these instructions—in other words, before and after every call to a UDF. I did that with two breakpoints:

0:000> bp 3046aaac "dds @eax+4 L1; g"
0:000> bp 3046aaaf ".echo return; g"

Let me point out a few things about these. First, I specified the breakpoint location directly by instruction address, and not using any symbols. Second, let me explain what I mean by “dds @eax+4 L1.” dds is the display data with symbols command. And as you recall, the instruction we are breaking on is:

call    dword ptr [eax+0x4]

So what I am trying to do here is print out the name of the function being called, if we indeed have the name. If there’s a symbol for (@eax+0x4) when we get here, dds will print it out for us.

After this I hit “go,” and this is what I saw:

00137974  3d00fbb1 ANALYS32!xirr
return
00137974  3d00fbb1 ANALYS32!xirr
return
00137780  3d00fbb1 ANALYS32!xirr
00133d58  3d00fbb1 ANALYS32!xirr
return
00133d58  3d00fbb1 ANALYS32!xirr
return
return

<KABOOM>

Hm, that’s interesting. The XIRR’s by themselves seem to work fine, but we get an AV as soon as the nested XIRR function returns.

At this point I disabled the Analysis Toolpak and opened the worksheet up. Searching through the formulas, I found the XIRR functions that my client was using. I couldn’t figure out how any of them were dependent upon one another, but I did notice that they were all feeding on a TODAY() function. TODAY is what you call a volatile function, and can cause a lot of crazy BS to go down.

On a hunch, I plugged in a nonvolatile alternative that I had handy in an XLA module. After that, the workbook loaded fine.

So there you have it. Apparently, Excel fails miserably if my add-in loads, there are nested XIRR formulas on a sheet, and they are using the TODAY formula as an input.

I never figured out what the “real” conflict was. It’s not worth the effort and Excel shouldn’t crash, regardless, I told myself.

It’s ironic that I never would have been able to figure this much out if the conflict had been with an XLA add-in or an Automation add-in. Well ok, not never, but I probably would have given up first.