Posts Tagged ‘Financial Industry’

Following Excel User-Defined Functions with WinDbg: A Debugging Odyssey

One of several applications I work on is a COM add-in for Excel. To put it mildly, this thing can be a real bear to support sometimes.

I was looking into a very peculiar access violation (AV) today. It seemed to only ever happen if—

  • My add-in was installed.
  • The Analysis Toolpak was installed.
  • I tried to load a very large and very particular workbook from a client.

I took a look at a crash dump and—luckily—was able to reproduce this in my office. This is a very good thing, because when I can’t reproduce it often means that I will have to dress up.

The AV was occurring very deep in some Excel code. It didn’t look like my add-in was doing anything special, or in fact, anything at all.

My first instinct might have been, “Why can’t the Excel team fix their stupid bugs!!?!” But, I have more empathy than that. The AV happened in a function that appeared to put the EBP register through a blender set to “pulverize.”

In other words, it required effort to get an accurate stack trace within this function, and I was far too lazy to pick up the pieces. Any reasonable person would be too—and maybe the bug only occurs with the optimizer turned on. So I will give the Excel team a pass on this one.

So what’s a clever debugger to do? Well, I (begrudgingly) know enough about Excel to know that the Analysis Toolpak is a bit of a dinosaur—apparently just like many Office users. It is actually ANALYS32.XLL, nothing more than a standard DLL with special exports. It is a relic of add-in architectures gone by.

So I used GFLAGS.EXE to set WinDbg as the startup debugger for Excel, and fired it up.

Now, I had absolutely no idea which worksheet functions were actually in the toolpak. Frankly, I had no particular desire to do the research. So, I set a breakpoint on all of them:

0:00> bm ANALYS32!*

The bm command is the set symbol breakpoint command. Since I only have export symbols for the Analysis Toolpak, this effectively puts a break on every exported function from that XLL.

After that I hit “go” and opened up the workbook. The first break was on the XIRR formula:

ANALYS32!xirr:
3d00fbb1 55               push    ebp

At this point I had second thoughts about breaking on all of the formulas, so I cleared those breakpoints.

0:00> bc *

And I figured that it was pretty likely that Excel would be calling any user-defined-function from the same parent procedure, so I stepped one frame down on the stack.

0:00> .frame 1

The .frame command changes the local context. It’s basically the same as clicking on a higher function in the stack trace in Visual Studio. At that point I saw this in my disassembly window:

3046aaac ff5004      call    dword ptr [eax+0x4]  ; call to UDF
3046aaaf 8b4d14      mov     ecx, [ebp+0x14]      ; return address

The first instruction is the call to the user-defined function. The second is the return address, or the next instruction to be executed when the user-defined function returns.

I was still in the information-gathering stage here, so I decided to just dump out the functions being called here rather than try to really dig into any disassembly where the AV was occurring.

I decided to print out messages on both of these instructions—in other words, before and after every call to a UDF. I did that with two breakpoints:

0:000> bp 3046aaac "dds @eax+4 L1; g"
0:000> bp 3046aaaf ".echo return; g"

Let me point out a few things about these. First, I specified the breakpoint location directly by instruction address, and not using any symbols. Second, let me explain what I mean by “dds @eax+4 L1.” dds is the display data with symbols command. And as you recall, the instruction we are breaking on is:

call    dword ptr [eax+0x4]

So what I am trying to do here is print out the name of the function being called, if we indeed have the name. If there’s a symbol for (@eax+0x4) when we get here, dds will print it out for us.

After this I hit “go,” and this is what I saw:

00137974  3d00fbb1 ANALYS32!xirr
return
00137974  3d00fbb1 ANALYS32!xirr
return
00137780  3d00fbb1 ANALYS32!xirr
00133d58  3d00fbb1 ANALYS32!xirr
return
00133d58  3d00fbb1 ANALYS32!xirr
return
return

<KABOOM>

Hm, that’s interesting. The XIRR’s by themselves seem to work fine, but we get an AV as soon as the nested XIRR function returns.

At this point I disabled the Analysis Toolpak and opened the worksheet up. Searching through the formulas, I found the XIRR functions that my client was using. I couldn’t figure out how any of them were dependent upon one another, but I did notice that they were all feeding on a TODAY() function. TODAY is what you call a volatile function, and can cause a lot of crazy BS to go down.

On a hunch, I plugged in a nonvolatile alternative that I had handy in an XLA module. After that, the workbook loaded fine.

So there you have it. Apparently, Excel fails miserably if my add-in loads, there are nested XIRR formulas on a sheet, and they are using the TODAY formula as an input.

I never figured out what the “real” conflict was. It’s not worth the effort and Excel shouldn’t crash, regardless, I told myself.

It’s ironic that I never would have been able to figure this much out if the conflict had been with an XLA add-in or an Automation add-in. Well ok, not never, but I probably would have given up first.

Jealous Much?

Here are some pictures from the office we just moved into last week.

View from 55 Water Street

Above is the view from my desk, looking west into the financial district.

View from 55 Water Street

This one is from the north side, looking towards midtown.

View from 55 Water Street

Looking northeast over the South Street Seaport, the Brooklyn Bridge, and the Manhattan Bridge.

View from 55 Water Street

Looking south at the Statue of Liberty.

The End-of-Release-Cycle Motivational Email

Feel free to use this with your colleagues and/or subordinates.

Plug-sky

Please take a moment to subscribe to my friend Udi Falkson's blog (rss). Udi has recently split for the coast, leaving Investment Bank IT ennui behind him.

WTF

This Daily WTF reminded me of a guy I used to work with who would use this pattern in every single function he wrote.

Public Const cSTRINGFOO As String = "foo"

Public Function Foo() As Boolean
    Dim ret As Boolean = False
    Try
        Console.WriteLine(cSTRINGFOO)
        ret = True
    Catch ex As Exception
        ret = False
        Throw New Exception("", ex)
    End Try
    Foo = ret
    Exit Function
End Function

I’m not sure what he was expecting to happen here, but there were some clues:

  • Every function he wrote returned true or false to indicate that it had successfully completed without an exception.
  • Every exception was wrapped in another, except where he forgot and swallowed the exceptions.
  • He would always use constants for string literals, even if he only used the literal once (like above), and even if the constant was much longer than the literal.

Mind you, I discovered this self-imposed standard a while after he quit, so I never really got to the bottom of it.

Managed Debugging with WinDbg, Part 1 of N

There seems to be relatively little information about WinDbg available, so I will try to post some things as I figure them out myself.

We had a severe-to-extremely-severe production problem last week, and my recent activity with WinDbg was another example of “learning with a gun to your forehead.”

Getting the Right Extensions

If you are doing any kind of managed debugging, you will want the current set of extensions which can be found here.

The simplest thing to do is just drop the .dlls in the install directory for WinDbg. The two dll’s in this set that will be most important to you are psscor.dll and sieextpub.dll. Psscor has a lot of tools for dumping the contents of managed objects, and the second has some powerful functions for showing application and thread state.

There is a third extension, sos.dll, whose functionality is mostly overlapping with psscor.

Extensions Basics

Load an extension into WinDbg like so:

0:000> .load psscor

At any point, you can see the extensions you have loaded with this command:

0:000> .chain

All of the extensions I’ve mentioned come with help commands. The help for the topmost extension in the chain can be called like this:

0:000> !help

But you can always refer back to extensions further down in this way:

0:000> !sieextpub.help

An Example – Debugging an ASP.NET Hang

This was the scenario we found ourselves in last week. The first thing you will need to do is get a hang dump of the worker process using AdPlus, which I don’t have time to cover here. However, it is relatively straightforward.

Once we’ve got the dump, the most obvious thing to try is to see what the process is doing. To do that, load psscor and use this command:

0:000> ~*e!clrstack

This will dump out the managed stack of all of the threads in the process (the ~*e means that we want to iterate through all of the threads and perform the specified action for each).

That will give us a general idea of what is going on. The stack of one or more particular threads is bound to be interesting, and we can narrow it down to the stack trace of a single thread using:

0:000> ~113e!clrstack

Here I’ve replaced * (all threads) with a single thread, 113. Assuming this is a managed thread, you should see some output like this.

Thread 113
ESP         EIP     

…

0x0dc5f6cc  0x0fa3bb17 [DEFAULT] [hasThis] Void System.Web.UI.Page.ProcessRequestMain()
0x0dc5f710  0x0fa3aedf [DEFAULT] [hasThis] Void System.Web.UI.Page.ProcessRequest()
0x0dc5f74c  0x0fa3a94b [DEFAULT] [hasThis] Void System.Web.UI.Page.ProcessRequest(Class System.Web.HttpContext)

…

For this example, I’ll just show how to figure out which page in the application is executing. Since we’re running inside a page class for a lot of the response, the this pointer should be all we need.

Grab two stack pointers (the ESP register) and use the psscor.DumpStackObjects command, or dso for short:

0:000> !dso 0x0dc5f6cc  0x0dc5f74c 

Thread 0
ESP/REG    Object     Name
0x0dc5f6cc 0x31f0d208 System.Collections.Specialized.HybridDictionary
0x0dc5f6d0 0x0371c55c _ASP.incomeStatement_aspx
0x0dc5f6d8 0x0371c55c _ASP.incomeStatement_aspx
0x0dc5f714 0x31f0d208 System.Collections.Specialized.HybridDictionary
0x0dc5f720 0x070ca198 System.Globalization.CultureInfo
0x0dc5f724 0x070fb010 System.Threading.Thread
0x0dc5f728 0x0371c55c _ASP.incomeStatement_aspx
0x0dc5f73c 0x31f0d208 System.Collections.Specialized.HybridDictionary

Bingo – in this case, _ASP.incomeStatement_aspx is an instance of the page class.

Note that the first argument to dso is the upper stack pointer, and the second is the lower stack pointer. I’ll post some more stuff when I have time.

Release Day 422: A Vignette


McFunley: did you name this page something impossible to remember correctly on purpose?
McFunley: this is a good anti-querystring hacking technique
DarrelHerbst: the twocheckboxcolumn.aspx?
DarrelHerbst: i remember it
DarrelHerbst: your mind is addled with coffee
McFunley: i typed "twocolumncheckbox" and "twocheckcolumnbox" and "boxcolumnchecktwo" before I got it right
DarrelHerbst: addled


Significant Digits for the Innumerate

Suppose you ask me, “how wide is that picture frame?” (No, nevermind why you care how wide the picture frame is. This is a little device I call, “using narrative to make a semi-technical thing sound approachable.)

“About nine inches,” I would say.

Now, suppose your friend Friedrich asks you how wide the picture frame is, and I’m not in the room (let’s say I’m asleep—that’s where I’m a pirate!). You happen to know that Friedrich is of the German persuasion, and if you give him a number in English units he’ll scoff at you and possibly destroy your apartment with a Stuka divebomber.

Thinking quickly, you refer to your wristwatch calculator which you have been wearing since 1988 (in this narrative, your sense of style is atrocious). “There are 2.54 centimeters in an inch – I remember that from high school,” you think to yourself. Doing the multiplication, you smugly report to Friedrich that the picture frame is 22.86 centimeters wide.

Everybody’s happy, right? Not so fast, Copernicus.

I only told you the frame was “about nine inches” wide, remember? That could possibly mean 8.556 inches, 9.210 inches—you really can’t be sure. As a measuring device, I am lazy and inept. Do you want to be held responsible for my lax attitude? No, of course you don’t.

You are really only justified in telling Freddy-boy that the frame is about 20 centimeters wide. Giving him the extra digits of that number just give it the illusion of precision where none exists.

Friedrich measures the frame himself and gets less than 22 centimeters. He punches you in the stomach and leaves you wincing in pain on the floor, cursing me for not giving you a better figure in the first place. Sorry, I just couldn’t be bothered. (You made your bed, now lie in it, dammit!!!)

Oh yes, my point. Well, this kind of thing applies to a lot more than the picture on my desk. If I tell you that you can get $1.32 for a Euro, and $1.89 for a British Pound, you might similarly do the math and tell me that therefore, you can get 0.6984126984126984126984126984127 British Pounds per Euro. Well, you’d be nuts.

I’m Turning into a Pile of Sand

The humidity in my office dipped below 20% this week, the same as last winter. If you need a frame of reference, the Sahara desert is usually at least 25%. Some people are able to tolerate this, but I’m apparently not one of them. Especially not when I’m in the 70-80 hour per week range. I have these questions:

  1. What are the long-term health detriments associated with being DRY?
  2. Would OSHA for New York City be likely to do anything about it being DRY?
  3. Is it socially acceptable to put chap stick on your eyelids?
  4. Seriously, what the f*ck?