Welcome to McFunley.com Sign in | Join | Faq

Following Excel User-Defined Functions with WinDbg: A Debugging Odyssey

posted on Thursday, September 22, 2005 7:20 PM by mcfunley

One of several applications I work on is a COM add-in for Excel. To put it mildly, this thing can be a real bear to support sometimes.

 

I was looking into a very peculiar access violation (AV) today. It seemed to only ever happen if—

 

  • My add-in was installed.
  • The Analysis Toolpak was installed.
  • I tried to load a very large and very particular workbook from a client.

 

I took a look at a crash dump and—luckily—was able to reproduce this in my office. This is a very good thing, because when I can’t reproduce it often means that I will have to dress up.

The AV was occurring very deep in some Excel code. It didn’t look like my add-in was doing anything special, or in fact, anything at all.

 

My first instinct might have been, “Why can’t the Excel team fix their stupid bugs!!?!”  But, I have more empathy than that. The AV happened in a function that appeared to put the EBP register through a blender set to “pulverize.”

 

In other words, it required effort to get an accurate stack trace within this function, and I was far too lazy to pick up the pieces. Any reasonable person would be too—and maybe the bug only occurs with the optimizer turned on. So I will give the Excel team a pass on this one.

 

So what’s a clever debugger to do? Well, I (begrudgingly) know enough about Excel to know that the Analysis Toolpak is a bit of a dinosaur—apparently just like many Office users. It is actually ANALYS32.XLL, nothing more than a standard DLL with special exports. It is a relic of add-in architectures gone by.

 

So I used GFLAGS.EXE to set WinDbg as the startup debugger for Excel, and fired it up.

 

Now, I had absolutely no idea which worksheet functions were actually in the toolpak. Frankly, I had no particular desire to do the research. So, I set a breakpoint on all of them:

 

bm ANALYS32!*

 

The bm command is the set symbol breakpoint command. Since I only have export symbols for the Analysis Toolpak, this effectively puts a break on every exported function from that XLL.

 

After that I hit “go” and opened up the workbook. The first break was on the XIRR formula:

 

ANALYS32!xirr:

3d00fbb1 55               push    ebp

 

At this point I had second thoughts about breaking on all of the formulas, so I cleared those breakpoints.

 

bc *

 

And I figured that it was pretty likely that Excel would be calling any user-defined-function from the same parent procedure, so I stepped one frame down on the stack.

 

.frame 1

 

The .frame command changes the local context. It’s basically the same as clicking on a higher function in the stack trace in Visual Studio. At that point I saw this in my disassembly window:

 

3046aaac ff5004           call    dword ptr [eax+0x4]  ; call to UDF

3046aaaf 8b4d14           mov     ecx,[ebp+0x14]   ; return address

 

The first instruction is the call to the user-defined function. The debugger highlights the return address, or the next instruction to be executed when the user-defined function returns.

 

I was still in the information-gathering stage here, so I decided to just dump out the functions being called here rather than try to really dig into any disassembly where the AV was occurring.

 

I decided to print out messages on both of these instructions—in other words, before and after every call to a UDF. I did that with two breakpoints:

 

0:000> bp 3046aaac "dds @eax+4 L1; g"

0:000> bp 3046aaaf ".echo return; g"

 

Let me point out a few things about these. First, I specified the breakpoint location directly by instruction address, and not using any symbols. Second, let me explain what I mean by “dds @eax+4 L1.dds is the display data with symbols command. And as you recall, the instruction we are breaking on is:

 

call    dword ptr [eax+0x4]

 

So what I am trying to do here is print out the name of the function being called, if we indeed have the name.  If there’s a symbol for (@eax+0x4) when we get here, dds will print it out for us.

 

After this I hit “go,” and this is what I saw:

 

00137974  3d00fbb1 ANALYS32!xirr

return

00137974  3d00fbb1 ANALYS32!xirr

return

00137780  3d00fbb1 ANALYS32!xirr

00133d58  3d00fbb1 ANALYS32!xirr

return

00133d58  3d00fbb1 ANALYS32!xirr

return

return

<KABOOM>

 

Hm, that’s interesting. The XIRR’s by themselves seem to work fine, but we get an AV as soon as the nested XIRR function returns.

 

At this point I disabled the Analysis Toolpak and opened the worksheet up. Searching through the formulas, I found the XIRR functions that my client was using. I couldn’t figure out how any of them were dependent upon one another, but I did notice that they were all feeding on a TODAY() function. TODAY is what you call a volatile function, and can cause a lot of crazy BS to go down.

 

On a hunch, I plugged in a nonvolatile alternative that I had handy in an XLA module. After that, the workbook loaded fine.

 

So there you have it. Apparently, Excel fails miserably if my add-in loads, there are nested XIRR formulas on a sheet, and they are using the TODAY formula as an input.

 

I never figured out what the “real” conflict was. It’s not worth the effort and Excel shouldn’t crash, regardless, I told myself.

 

It’s ironic that I never would have been able to figure this much out if the conflict had been with an XLA add-in or an Automation add-in. Well ok, not never, but I probably would have given up first.

 

Comments

# Finally, This Site Produces Some Results @ Friday, September 30, 2005 8:54 PM

  

SKIN NAME : ImageHeader