We had our production release a few days ago, which means some random issues any way you slice it. It’s been much more serious in the past, but I looked at an interesting IIS hang dump today.
Although most of the debugging I’ve gone through here has been managed code, the dump in question is from a classic ASP/COM application. In our production environment, we run a legacy application side-by-side with a new ASP.NET application. To the user, it’s all the same app; to us, it’s a bit of an integration nightmare.
In this particular case, the server was unresponsive and not serving any requests. We took a dump of all of the worker processes before being forced to do an iisreset.
The first thing I did after opening the dump was to take a quick look at the stacks on all of the threads to get an idea of the overall activity. I noticed that there were a ton (ok, about twenty) with a stack more or less like this:
0:022> k
ChildEBP RetAddr
030de954 7c822114 ntdll!KiFastSystemCallRet
030de958 77e6711b ntdll!NtWaitForMultipleObjects+0xc
030dea00 7739cd08 kernel32!WaitForMultipleObjectsEx+0x11a
030dea5c 77697483 user32!RealMsgWaitForMultipleObjectsEx+0x141
030dea84 776974f2 ole32!CCliModalLoop::BlockFn+0x80
030deaac 7778866b ole32!ModalLoop+0x5b
030deac8 77788011 ole32!ThreadSendReceive+0xa0
030deae4 77787ed7 ole32!CRpcChannelBuffer::SwitchAptAndDispatchCall+0x112
030debc4 776975b8 ole32!CRpcChannelBuffer::SendReceive2+0xc1
030debe0 7769756a ole32!CCliModalLoop::SendReceive+0x1e
030dec4c 776c4eee ole32!CAptRpcChnl::SendReceive+0x6f
030deca0 77ce127e ole32!CCtxComChnl::SendReceive+0x91
030decbc 77ce13ca rpcrt4!NdrProxySendReceive+0x43
030df0a4 77d0c947 rpcrt4!NdrClientCall2+0x206
030df0bc 77d0c911 oleaut32!IDispatch_RemoteInvoke_Proxy+0x1c
030df37c 6b61c892 oleaut32!IDispatch_Invoke_Proxy+0xb6
WARNING: Stack unwind information not available. Following frames may be wrong.
030df3e4 6b61fb5c vbscript!DllRegisterServer+0x8285
00000000 00000000 vbscript!DllRegisterServer+0xb54f
This is an automation call — either to a local or remote com server. Notice that we’re warned that bottom two frames on the stack “may be wrong.” This is because there is optimized code running that does not set up a clean stack frame.
That is, code compiled without the optimizer will usually do something like this at the start of each function:
push ebp ; Save the old stack base
mov ebp, esp ; Stack base becomes the current top of the stack
sub esp, 0xc ; Save space for local variables
mov eax, [ebp+8] ; example reference to one of the parameters
Optimized functions can omit that, and refer to parameters and local variables relative to the ESP register instead of the EBP register. All of this matters to us now because this makes it significantly harder for the debugger to construct a clean stack trace.
So anyway, we’re not 100% sure at this point that the call is originating on an .asp page (executing in vbscript.dll), but since that’s pretty much all this process does, it’s more than a safe bet.
How do we start digging into what these calls are? Well, a good starting point is SIEExtPub, which is a debugger extension just for COM stuff. It’s maintained by and for Microsoft’s SIE group, which employs some of the most badass engineers you will ever meet.
SIEExtPub has a !comcalls
function which purports to show the automation calls on all threads. This was the first thing I tried, and in a similar situation it’s what I would recommend.
Unfortunately, this didn’t get me anywhere. The output for all of the threads looked like this:
Thread 49 - STA
Target Process ID: 13a82170 = 329785712
Target Thread ID: d48526a2 (STA - Possible junk values)
It’s possible I don’t totally understand the output, but the reason I’m saying this looks bogus is because the Process ID and Thread ID values are outside of the range that I’m used to seeing. Typically (although I’m not sure it’s by rule), PID’s and TID’s are word values (ie, not more than 0xffff).
It was time for a radical re-evaluation of the whole scene. I decided to dissect one of the function calls, as close to “our code” as I could get. That would be this function:
030df37c 6b61c892 oleaut32!IDispatch_Invoke_Proxy+0xb6
Which is the last one on the stack before we are warned that “the following frames might be wrong.” Below that, it’s likely to get dicey.
This isn’t a documented API function, but I’m going to guess that its parameters will closely follow those to IDispatch::Invoke or possibly IDispatchEx::InvokeEx since this is being called from a script. I’ve done enough C++ COM coding to know what to expect for those.
I googled the name of the function too, and found this definition in the ReactOS source:
HRESULT CALLBACK IDispatch_Invoke_Proxy(
IDispatch* This,
DISPID dispIdMember,
REFIID riid,
LCID lcid,
WORD wFlags,
DISPPARAMS* pDispParams,
VARIANT* pVarResult,
EXCEPINFO* pExcepInfo,
UINT* puArgErr)
Which is basically the same as IDispatch::Invoke
. I decided to take a stab at the parameters being passed. I needed to dump the stack frame out to see them. To do that, I grabbed the child EBP pushed onto the stack at that call:
ChildEBP RetAddr
030df0bc 77d0c911 oleaut32!IDispatch_RemoteInvoke_Proxy+0x1c
And I used that with the dds
(dump dwords with symbols) command:
0:022> dds 030df0bc
030df0bc 030df37c
030df0c0 77d0c911 oleaut32!IDispatch_Invoke_Proxy+0xb6 ; return address
030df0c4 141004b4 ; this
030df0c8 60030005 ; dispIdMember
030df0cc 6b655340 vbscript!DllRegisterServer+0x40d33 ; &IID_NULL
030df0d0 00000409 ; en-US
030df0d4 00020001 ; DISPATCH_METHOD
030df0d8 030df438 ; pDispParams
030df0dc 030df360 ; pVarResult
030df0e0 030df454 ; pExcepInfo
030df0e4 030df428 ; puArgErr
I’d like to point out a few things here before moving on. First, notice that it gives us some text next to the third parameter to the function. It does this because the address is within the vbscript.dll module (meaning it’s a constant or global variable). The name it gives us isn’t very helpful, though, because the only symbols we have for vbscript are export symbols. Unless you work for Microsoft, in which case this whole thing should be much, much easier for you.
If you read the documentation for Invoke
, you’ll notice that they say the riid
parameter is reserved and must be set to IID_NULL
. If we’re looking for signs that what we’re looking at is making sense, we should remember that IID_NULL
would be compiled into a dll as a constant (bing). We can dump that GUID-sized memory chunk out and see that it is indeed NULL (bing):
0:022> dd 6b655340 l4
6b655340 00000000 00000000 00000000 00000000
And finally, the next parameter should be a locale and it is set to 0x409
. If you handle locales a good bit you might recognize that as 1033 decimal, or en-US (bing).
Generally speaking, I think it’s a good idea to stop every now and then when debugging and do sanity checks like that.
So, moving on, if we want to look at the parameters being passed to the function we should dig into the pDispParams
parameter (number six). This is a pointer to a DISPPARAMS
structure, which can be found in oaidl.h.
typedef struct tagDISPPARAMS
{
VARIANTARG *rgvarg;
DISPID *rgdispidNamedArgs;
UINT cArgs;
UINT cNamedArgs;
} DISPPARAMS;
The first field in the structure is the array of arguments, and the third is the length of that array. If we dump out that parameter,
0:022> dd 030df438 l4
030df438 057a40f0 00000000 00000002 00000000
The dwords are the fields in the DISPPARAMS
structure, from right to left. We can see that there are two parameters. The first field in the structure points to the array of VARIANTs
.
As a quick review, a VARIANT
structure is used heavily in COM/VB and looks something like this:
typedef struct tagVARIANT
{
VARTYPE vt;
WORD reserved1;
WORD reserved2;
WORD reserved3;
union
{
// DWORD-sized value
}
} VARIANT;
The logical thing to do now is to dump out the array:
0:022> dd 057a40f0 L8
057a40f0 006f0008 00790064 06f50524 40e2e04f
057a4100 012d0008 6b60dc2d 06f504c8 0134d1a8
In each line, the low word of the first dword (0x0008
) is the variant type (vt) and the variant value is the third dword. You can look it up in the headers, but I know from memory that 8 is the variant type for a BSTR
.
A BSTR
is the type of string used by Visual Basic / ASP internally (and is used widely in COM in general). It is a length-prefixed string, but also null-terminated.
I dumped out the string parameters like so:
0:022> du 06f504c8
06f504c8 "A URL"
0:022> du 06f50524
06f50524 "A query string"
(No, these aren’t the real parameters. I’m obfuscating them for security reasons).
This is where I stopped, because the strings were very familiar to me and I knew immediately the function being called.
If I hadn’t narrowed the issue down at this point I might have tried looking in the dumps of other processes for activity, and/or tried to look for familiar strings further down on the stack. The issue isn’t totally resolved at this point, but I at least have somewhere to go for further instrumentation and other measures.
Hopefully, somebody found this slightly interesting. Until next time, amigos. If you’re out on your bike tonight, please, wear white.