Archive for the ‘WinDbg’ Category

Debugging and Symbols FAQ

This is something I typed up internally, to help resolve the confusion that precipitates when Visual Studio begins stepping through comments. Hopefully this will be helpful to someone else.

What is required for source mode debugging?

  • Binaries (.dll, .exe, .ocx, …).
  • Symbols (the .pdb file).
  • Source files.

The symbol files tell your debugger how points in the binary relate to the source files. When you set a breakpoint in a source file, the debugger uses the symbol file to look up the corresponding instruction in the binary.

I have all of these things. Why is my debugger behaving strangely or not working at all?

Well, there are a bunch of different things that can go wrong.

Your symbols might be mismatched.

You can't (as an example) use the symbols for version 2.0 of a DLL to debug version 1.0 of that DLL. Perhaps, in this case, you have copied a new binary into your run directory without copying the corresponding .pdb file. It's also possible that your IDE or compiler did that for you.

Some debuggers, like WinDbg, will NOT load a mismatched PDB unless you force this to happen. Others, like Visual Studio 2003, will do this without so much as a complaint. That can be very confusing in some situations.

The module you are trying to debug is not loaded into the process.

If you are attempting to hit a breakpoint, a good sanity check is to make sure that the module you are trying to debug is being loaded. If it isn't, you are experiencing something other than a PDB problem. For example, your application could be experiencing an exception before your code is called. Make sure you have "break on exceptions" turned on.

In Visual Studio, you can look at the "modules" window to make sure your binary is loaded. The corresponding command in WinDbg is lm.

You are attached with the wrong debugger.

This is another non-symbol issue. There are many different kinds of debuggers:

  • Native debuggers: cdb, ntsd, windbg, Visual Studio.
  • CLR debuggers: cordbg, mdbg, deblector, Visual Studio.
  • Oddballs: the VB6 IDE, VS.NET's script debugger, etc.

You should make sure you are attaching to or starting your process using the right kind of debugger. Visual Studio gives you a dialog to choose the appropriate debugger(s).

The source file is not recognized.

Say I build a .dll from c:\foo\x.cpp, and you take that library from me and try to set a breakpoint in c:\bar\x.cpp. You might need to tell the debugger that your source path is different. Visual Studio is usually pretty good at figuring this out on its own and/or asking you when it is having a problem.

The _NT_SOURCE_PATH environment variable is honored as a search path for source files by most of the Windows debuggers. The .srcpath command in WinDbg/CDB can be used to display and change the source search path in those debuggers.

The symbols are corrupt.

This is rare, but it can happen. The fix is usually to just rebuild the binary and regenerate the PDB.

The binary is not debuggable.

.NET assemblies have the added limitation that they must be built in /debug mode for conventional CLR debugging to work. You can tell if a module is debuggable by opening it up in Reflector or ILDASM. You should see this attribute:

[assembly: Debuggable(true, true)]

Where does the debugger get the symbols?

Most Windows debuggers work like this:

  • The folder containing the binary is searched for a matching PDB.
  • The debugger's symbol search path is then searched. Most Windows debuggers (including Visual Studio) take their default search path from the _NT_SYMBOL_PATH environment variable.

How can I tell where the debugger is getting its symbols?

In WinDbg/CDB, the !sym noisy command can be used to show verbose output when the debugger is trying to resolve symbols. Visual Studio 2005 shows the symbol path as a column in the Modules window.

I am not aware of a simple way to identify where Visual Studio 2003 is loading its symbols from. The easiest way might be to use the handle utility from Sysinternals:

C:\src>handle foo.pdb

Handle v3.01
Copyright (C) 1997-2005 Mark Russinovich
Sysinternals - www.sysinternals.com

devenv.exe      pid: 4832   C:\WINDOWS\Microsoft.NET\Framework\v1.1.4322\Temp
orary ASP.NET Files\someapp\dbf846f9\1ab0397c\assembly\dl2\c33a0bcb\eca89e48_3
b78c601\foo.PDB

In this case ASP.NET has copied the binary and the PDB to a temporary location. This is another way a PDB can wind up being mismatched.

How are PDB's matched to binaries?

The short version: the PDB contains a date stamp and a checksum which is matched against the binaries.

How can I tell if a PDB matches a binary?

The symchk utility that comes with the Debugging Tools for Windows can be used to do this. There are many options for this utility, so you should read the output of symchk /?.

Some Twists on Blocked Finalizers

Blocked finalizer threads have gotten some recent publicity on Tess Ferrandez's blog. I recently ran into this myself, although the particulars were slightly different.

Our problem was manifesting itself in a very long-running console application that both connected to a database and made web service calls. Eventually the app would start hitting OutOfMemoryExceptions. (Completely unrelated to these OutOfMemoryExceptions. We have a high incidence of OutOfMemoryExceptions around here.)

Fixing the Wrong Problem and Moving the Goalposts

My initial reaction to this problem was (sensibly, I think) to attempt to reduce the amount of memory that the application was using. I admit that I was also focused on different problems at the time, so any makeshift solution I could get to work here would have been just fine with me.

We threw in some buffer pools, et cetera, and managed to get the app running slightly longer, but the OOM's always resurfaced. Every time they did, the heap looked completely different due to the semi-drastic changes we were making to the app's memory profile. After the third or maybe fourth change it dawned on me that perhaps we were just confusing the real issue.

The Proverbial Lightbulb Over the Head

The turning point came when I noticed that most of the overly-abundant objects on the heap in a particular dump were types that were likely to be associated with finalizers.

0:000> !dumpheap -stat
------------------------------
...
2,431,972    48,639,440 System.Threading.WaitHandle/__WaitHandleHandleProtector
...

This hadn't been the case in the previous dumps. Those were eaten alive by byte arrays and strings. By this point, we had done such a fantastic job of optimizing the application for memory usage that the problem came more clearly into focus.

The Road to Victory

At this point I used !threads in SOS to locate the finalizer thread (some useless info trimmed out here):

0:000> !threads
ID APT   Exception
0  STA System.OutOfMemoryException
4  MTA (Finalizer)

Thread 4 was stuck in a WaitForSingleObject and, not unrelated to the problem here, the native stacks of almost all of the other threads in the process were in some variation of "WaitUntilGCComplete." Probably should have noticed that sooner. So sue me.

Anyway, the top of the finalizer's stack was interesting:

0:004> ~4k
ChildEBP RetAddr
00f0f658 7c822124 ntdll!KiFastSystemCallRet
00f0f65c 77e6baa8 ntdll!NtWaitForSingleObject+0xc
00f0f6cc 77e6ba12 kernel32!WaitForSingleObjectEx+0xac
00f0f6e0 776c54ef kernel32!WaitForSingleObject+0x12
00f0f6fc 77789905 ole32!GetToSTA+0x6f
00f0f71c 77787ed7 ole32!CRpcChannelBuffer::SwitchAptAndDispatchCall+0xcb

The finalizer is waiting to make a call over to an STA thread. Why's it doing that? And wait, what STA thread? Funny you should ask.

Visual Basic Complications

No technical problem at my post-VBScript company would be complete without Visual Basic putting a unique spin on things. This one is no exception. The authors of this app are fairly oblivious to COM apartments and didn't intentionally create any STA threads. They also weren't using COM Interop directly.

So how did thread zero end up being an STA thread (the astute among you may have noticed this in the !threads output above)? The MSDN documentation seems to clearly say that MTA is the default.

The reason was that the app was written in Visual Basic. If the Main() method in a VB program is lacking the STAThreadAttribute and the MTAThreadAttribute, the VB compiler will stick an STAThreadAttribute on it for you when the MSIL is emitted (the C# compiler doesn't do this). Presumably–and I'm guessing here but I think this is a good guess–this is done for compatibility with VB6.

The main thread of this application was sleeping the vast majority of the time. Admittedly, this is not a fantastic example of multithreaded design, but it wouldn't have caused any problems if the thread were MTA. As Raymond has pointed out, sleeping on an STA thread can have some ill effects.

The Fix

This problem was solved by decorating the main method with <MTAThread()>. This is possibly the smallest fix to an apparently massive problem I have ever personally witnessed.

How to Notice This Problem Much Sooner

There are a few ways this could have been less painful. The easiest might have been to check the output of SOS's !finalizequeue command. If I had done that I would have seen the following (again trimmed for clarity):

Heap 0
generation 0 has 0 finalizable objects
generation 1 has 0 finalizable objects
generation 2 has 11,212 finalizable objects
Ready for finalization 1,163,295 objects

Which does not look healthy. Looking more closely at the native stacks would have also probably yielded a solution on day one.

Debugger Exercise: Displaying a Function’s Return Value

I found myself needing to automatically manipulate the return value of a managed function in release code today. I thought this would make an interesting little writeup.

For the purposes of the demonstration, let's use this example program:

class Program
{
    static void Main()
    {
        while (!Console.KeyAvailable)
        {
            Console.WriteLine(FooBar(false));
            Console.WriteLine(FooBar2());
        }
    }

    static string FooBar(bool b)
    {
        if (b)
        {
            return "b";
        }
        return "a";
    }

    static string FooBar2()
    {
        return FooBar(true);
    }
}

The function we're interested in will be FooBar. I've made things interesting by calling the function from two places–let's assume we're not sure where the function is called from. We'll make things more interesting still by accepting the limitation that we won't cheat and set a breakpoint on the ret instruction in FooBar. The function I needed to instrument today was a little monolithic, so let's take it as a given that we need to be able to handle a function that might be long and might have more than one exit point.

Compile this program and start it in WinDbg. Start by loading SOS:

0:003> .loadby sos mscorwks

The first thing we'll do is set a breakpoint on FooBar. (I'm going to do this manually so that these steps will work with the CLR v1.1. The new version of SOS makes this slightly easier. I'm also going to cover this quickly, so if you want a clearer explanation of what's going on in this next step see Eran Sandler's writeup here.)

To set the breakpoint, let the program run for a second so that the method is compiled by the JIT. Then we need to find and dump out the MethodTable for the class:

0:003> !name2ee retaddr!retaddr.Program
Module: 00912d5c (retaddr.exe)
Token: 0x02000002
MethodTable: 00913140
EEClass: 009112ec
Name: retaddr.Program

0:003> !dumpmt -md 00913140
EEClass: 009112ec
Module: 00912d5c
Name: retaddr.Program
mdToken: 02000002
BaseSize: 0xc
ComponentSize: 0x0
Number of IFaces in IFaceMap: 0
Slots in VTable: 8
--------------------------------------
MethodDesc Table
   Entry MethodDesc      JIT Name
...
00d00070   00913120      JIT retaddr.Program.Main()
00d000d8   00913128      JIT retaddr.Program.FooBar(Boolean)
00d00130   00913130      JIT retaddr.Program.FooBar2()
...

Now that we have the MethodTable, we need to dump out the MethodDesc for FooBar to get the virtual address of the jitted code.

0:003> !dumpmd 00913128
Method Name: retaddr.Program.FooBar(Boolean)
Class: 009112ec
MethodTable: 00913140
mdToken: 06000002
Module: 00912d5c
IsJitted: yes
m_CodeOrIL: 00d000d8

The compiled method begins at m_CodeOrIL (this is called "Method VA" in SOS 1.1). We can set a breakpoint there with "bp 00d000d8."

Here comes the trick. We know that the return address to the method should be on the top of the stack when the method is called. In other words, the ESP register points at the return address. We can set a dynamic breakpoint on that return address using a command like this:

bp 00d000d8 "bp poi(@esp); g"

If you try this, you'll notice that we break right after FooBar has returned to its caller. As with any function obeying one of the standard calling conventions, the return value is stored in the EAX register. We can give the dynamic breakpoint a command of its own that manipulates this value like this:

bp 00d000d8 "bp poi(@esp) \"!do -nofields @eax; g\"; g"

If we run the sample program now, we'll see this output repeating itself:

breakpoint 1 redefined
Name: System.String
MethodTable: 790fa3e0
EEClass: 790fa340
Size: 20(0x14) bytes
String: a

breakpoint 2 redefined
Name: System.String
MethodTable: 790fa3e0
EEClass: 790fa340
Size: 20(0x14) bytes
String: b

Voila.

Beware the OutOfMemoryException Red Herring

Low memory conditions have a way of executing lightly-or-never-tested code paths. Bugs in these paths can sometimes cast a penumbra over the real reason your application is crashing.

Worse, the resulting exceptions or crashes are likely to be more or less nondeterministic. Worse still, the most obvious steps with SOS rarely work once the CLR or the IIS worker process has started to OOM and has tossed you an access violation.

As a concrete example–this is a !clrstack from an application that appears to be crashing with an IOException.

0:026> !clrstack
Thread 26
ESP         EIP
0x10e3ea90  0x77e55dea [FRAME: GCFrame]

This doesn't tell us much, but if we look at the native stack:

0:026> k
ChildEBP RetAddr
10e3e9e8 79216aed kernel32!RaiseException+0x53
WARNING: Stack unwind information not available. Following frames may be wrong.
10e3ea40 79216a70 mscorsvr!GetAssemblyMDImport+0x2e1d4
10e3ea68 79216a24 mscorsvr!GetAssemblyMDImport+0x2e157
10e3ea78 79265ec2 mscorsvr!GetAssemblyMDImport+0x2e10b
10e3eab4 7924e9d3 mscorsvr!CoEEShutDownCOM+0x12dc1
10e3eac8 7c82eeb2 mscorsvr!GetMetaDataPublicInterfaceFromInternal+0x7e73
10e3eaec 7c82ee84 ntdll!ExecuteHandler2+0x26
10e3eb94 7c82ecc6 ntdll!ExecuteHandler+0x24
10e3eb94 77e55dea ntdll!KiUserExceptionDispatcher+0xe
10e3eee0 79216aed kernel32!RaiseException+0x53
10e3ef38 7924c8d0 mscorsvr!GetAssemblyMDImport+0x2e1d4
10e3efe8 791b3208 mscorsvr!GetMetaDataPublicInterfaceFromInternal+0x5d70
10e3eff4 791b3ad7 mscorsvr!Ordinal76+0x3208
10e3f108 791d3ef0 mscorsvr!Ordinal76+0x3ad7
10e3f1c4 791d3fa4 mscorsvr!GetCompileInfo+0x13e3
10e3f1ec 7931392f mscorsvr!GetCompileInfo+0x1497
10e3f21c 791cc3c8 mscorsvr!ReleaseFusionInterfaces+0x5f3d7
10e3f264 793139e1 mscorsvr!Ordinal76+0x1c3c8
10e3f2d8 7924a3f2 mscorsvr!ReleaseFusionInterfaces+0x5f489
10e3f2f0 791d4096 mscorsvr!GetMetaDataPublicInterfaceFromInternal+0x3892

We can see from the frames in the middle (the lower kernel32!RaiseException through the subsequent ntdll calls) that the current exception is being raised from inside another exception handler (a catch block, in managed terms). The current exception doesn't appear to reflect this:

0:026> !cen
System.IO.IOException (0xbab602fc)

0:026> !do 0xbab602fc
Name: System.IO.IOException
-----------------
Exception bab602fc in MT 79bd1234: System.IO.IOException

But with some wizardry, we can coerce the relevant information. We can hypothesize that the original exception would be passed as a parameter somewhere before the original call to kernel32!RaiseException. Isolating just that frame on the stack, and the frame before it:

0:026> kb
…
10e3eee0 79216aed e0434f4d 00000001 00000000 kernel32!RaiseException+0x53
…
10e3ef38 7924c8d0 bab5ce60 00000000 bab5ce60 mscorsvr!GetAssemblyMDImport+0x2e1d4

The address 0xbab5ce60 (second line) looks like a good candidate, being close to the address of the current exception while not being equal to it.

0:026> !do bab5ce60
Name: System.IO.IOException
-----------------
Exception bab5ce60 in MT 79bd1234: System.IO.IOException
_message: Unable to create a transport connection.
_innerException:
   Exception 06c6003c in MT 79b96cec: System.OutOfMemoryException
   _remoteStackTraceString:

Bingo. (Mind you, we haven't taken any steps yet to actually fix the problem.) This was relatively easy, since all of the relevant information could be reconstructed from the offending stack. In other situations, you may need to !DumpAllExceptions to see what has been happening recently in the application.

Note: a very good hint that you are looking at an OOM situation is that your crash dump is exactly two gigabytes in size.

Psychic Debugging: Random Crashes

Raymond Chen has had an entry or two about "Psychic Debugging." This skill is a lot like everyday psychic abilities by virtue of being really just intuition and guesswork, but dissimilar in that it produces results that benefit society.

Today one of the developers was working on a weird problem he found in his development environment. He could test his functionality once (it's an ASPX page), but the second time he tried to use it there was usually a "random" exception on the page that didn't make any sense. Sometimes, instead of an exception, his worker process just crashed. He also told me that he was trying to use a new API that I knew mapped to a third-party COM object.

This is where the mystical powers came in handy. I tried running his page once, then running a different page–same weird exceptions and crashes. It was here that I started to suspect that the heap was being corrupted by the COM library.

(Let me quickly explain what heap corruption is: suppose I have two objects, X and Y, next to one another in memory. Now suppose I make a mistake writing my program and accidently write beyond the end of object X. I've now screwed up object Y, but we might not know about it until we try to use object Y. We might not try to use object Y for quite a while, so this makes the real cause of the bug awfully difficult to find. To make matters worse, the next time we run the program our bug writing to object X might scribble all over a completely different object that will crash the program somewhere else.)

On a hunch I attached a debugger after the first load of the page. When the exception occurred, I ran the !verifyheap command in the SOS extension. This reported that there was a corrupt object on the heap. I ran the same test a few times to confirm the theory.

If you're lucky, you might see a debug break like this in one of the Windows Debuggers as soon as the heap corruption occurs:

HEAP[heapcorrupt.exe]: Heap block at 001AD700 modified at 001AD780
  past requested size of 78
...
ntdll!DbgBreakPoint:

This is just a debug break, so it won't tear down the process immediately. Depending on the application you're working on the real problems could start much later, long after the offending code has finished executing.

All I can say is that it's a goddamned miracle that the developer was doing a good job testing his functionality and noticed this. The mayhem that would have ensued if this made it to QA would have made it far more difficult to track down the source of the problem. The happy ending here is that the vendor has a fully-managed version of their component that we can switch to–hopefully that will be more difficult for them to screw up.

Quick Debugger Tips

In my javascript debugging post I showed a stack trace and mentioned only that it seemed to be running javascript code. Here it is:

0:009> ~0k
ChildEBP RetAddr
WARNING: Stack unwind information not available. Following frames may be wrong.
0012f97c 6009db8b js3250!JSLL_MinInt+0x45e2
0012f9c8 6009cdd9 js3250!js_GetSrcNoteOffset+0x5358
0012f9f0 600c7c01 js3250!js_GetSrcNoteOffset+0x45a6
0012fa08 600856b5 js3250!js_GetScriptLineExtent+0x39e6
0012fa28 600b036f js3250!JS_NewStringCopyZ+0x44
0012fa40 600b3e93 js3250!js_FindProperty+0x26c5

This is a good stack to use to point out a few things about debugging without symbols.

If you're used to stepping through debug code in Visual Studio, your first instict may be to trust the stack printed out here. This would be wrong, for a few reasons.

First, there's the line telling us: "Stack unwind information not available. Following frames may be wrong." Stop me if I've covered this before, but it probably means that we're looking at optimized code. Optimized stack frames often have the pointers to each previous stack frame left out (you'll see this called "frame pointer omission," or FPO) which makes walking the stack much more difficult for the debugger.

Second, notice the offsets following the function names - most are very high. For example,

js3250!js_GetSrcNoteOffset+0x5358

The +0×5358 indicates that the return address is to a point more than 20KB after the export function given. 20KB is a lot of machine code, so this almost definitely an internal function call and not a return address into the body of js_GetSrcNoteOffset.

I say "almost definitely" instead of "definitely" with good reason. I have personally seen some VB6 functions several thousands of lines long that would certainly generate more than 20KB of machine code. But I guess if you're dealing with that then you have enough of your own problems without worrying about symbol subtleties in the Windows debuggers.

So the debugger here is showing us the closest symbol to the return address available, which turns out to not be very close at all.

What can we say about this thread's stack, given just this information? Only that it is very likely to be running javascript code. Not much else.

Conventional Javascript Debugging is for Wimps

Recently, Firefox has been maxing out my CPU "when I have more than three tabs open." As is typical with bug reports from users, this one is very poorly worded and essentially useless. It turns out that that was not a good explanation of what was going on. I messed around a little more and figured out that the problem had these characteristics:

  • The CPU was maxed out on any ESPN.com page.
  • Thread #0 was the offending thread.
  • Firefox's working set increased steadily while looking at ESPN.
  • Firefox eventually crashes with an A/V if the browser is left on ESPN.

Firefox 1.5 CPU Spike

I'm not sure if this is some strange interaction of my specific combination of extensions and settings, or if this happens for all Firefox users. This seems like a possible security vulnerability, but I can't say one way or the other.

I suppose I could have downloaded and compiled the source code for Firefox to figure things out, but that was way more effort than I felt like giving. I did managed to debug and fix this issue for myself–without source or symbols–which I think makes for an interesting writeup.

Since the problem was on every page on the website, and continued after the page finished loading, I assumed it had to be a javascript or Flash problem (both of which are abused by ESPN to an irritating degree). I don't even have the Flash plugin for Firefox installed, so that narrowed the scope of my investigation.

The first thing I did was try to debug the Javascript normally; this was hampered by several factors:

  • The Javascript debugger for Firefox, Venkman, hasn't officially come out for version 1.5 yet. I've complained about broken extensions before, but let me just reiterate how stupid and unprofessional I think broken extensions are.
  • Although this kind person has done his own workaround for 1.5, it was either brought to its knees by the ESPN site or was broken by something else.

To summarize the story to this point, I had a (likely) javascript problem on a page I visit very frequently, coupled with a browser bug that maxed the CPU and made the issue difficult to diagnose through normal means.

I attached to Firefox.exe in Windbg and started randomly breaking in and checking on thread zero while the spike was in progress. Sometimes, frames in the js3250 module were on the stack:

0:009> ~0k
ChildEBP RetAddr
WARNING: Stack unwind information not available. Following frames may be wrong.
0012f97c 6009db8b js3250!JSLL_MinInt+0x45e2
0012f9c8 6009cdd9 js3250!js_GetSrcNoteOffset+0x5358
0012f9f0 600c7c01 js3250!js_GetSrcNoteOffset+0x45a6
0012fa08 600856b5 js3250!js_GetScriptLineExtent+0x39e6
0012fa28 600b036f js3250!JS_NewStringCopyZ+0x44
0012fa40 600b3e93 js3250!js_FindProperty+0x26c5

The information for js3250 confirms (if the module name and the names of the export functions weren't enough for you) that it is the Mozilla Javascript implementation.

0:009> lmv m js3250
start    end        module name
60080000 600e9000   js3250     (export symbols)
   C:\Program Files\Mozilla Firefox\js3250.dll
    Loaded symbol image file:
        C:\Program Files\Mozilla Firefox\js3250.dll
    Image path: C:\Program Files\Mozilla Firefox\js3250.dll
    Image name: js3250.dll
    Timestamp:        Fri Nov 11 20:05:34 2005 (43753FDE)
    CheckSum:         00073A1C
    ImageSize:        00069000
    File version:     4.0.0.0
    Product version:  4.0.0.0
    File flags:       0 (Mask 3F)
    File OS:          10004 DOS Win32
    File type:        2.0 Dll
    File date:        00000000.00000000
    Translations:     0409.04e4
    CompanyName:      Netscape Communications Corporation
    ProductName:      NETSCAPE
    InternalName:     JS3240
    OriginalFilename: js3240.dll
    ProductVersion:   4.0
    FileVersion:      4.0
    FileDescription:  Netscape 32-bit JavaScript Module
    LegalCopyright:   Copyright Netscape Communications. 1994-96
    LegalTrademarks:  Netscape, Mozilla

All of this is reasonably good evidence that javascript is the right place to start. I took a look at the exports for js3250 ("x js3250!*" in Windbg) — it appeared as though this module was implemented as a few dozen C-style exports.

I thought that a logical thing to look for was an "execute a javascript function" function of some kind, and there were a few exports named some variation of "Call function."

0:009> x js3250!*Call*Function*
6008541f js3250!JS_CallFunction (<no parameter info>)
60085464 js3250!JS_CallFunctionName (<no parameter info>)
600854ff js3250!JS_CallFunctionValue (<no parameter info>)

I set a breakpoint on all of these.

0:009> bm js3250!*Call*Function*
  1: 6008541f @!"js3250!JS_CallFunction"
  2: 60085464 @!"js3250!JS_CallFunctionName"
  3: 600854ff @!"js3250!JS_CallFunctionValue"

After resuming the program, I started hitting these breakpoints constantly. I did a sanity check and made sure that this didn't occur on a cleaner site (google), and this was the case. I didn't have any immediate success figuring out what script functions were being called. You can dig up the source for these API's if you like, but I'm guessing that the script is already processed into different data structures by the time the javascript engine gets here.

I looked at some of the stacks when these functions were called, and I noticed this pretty far down:

0012fc7c 00534e18 js3250!JS_EvaluateUCScriptForPrincipals+0x70

I hoped this would lead me to some bits of javascript pointed to from nearby locations on the stack, so I set a breakpoint there. When it was hit, I started dumping out strings (using "dda @esp" and and "ddu @esp") and found this:

http://espn-att.starwave.com/motion/fsp/fsp.js

As an only slightly educated guess, I used the Adblock extension to block this script.

After reloading the page, the problem was gone! There don't seem to be any site-breaking problems associated with turning this script off. This is all or part of the "ESPN Motion" business that tries to display sound and video on the site. Hey, ESPN: this is a terrible idea in the first place. Your website shouldn't start yelling at me when I visit it. It's no excuse for Firefox to A/V, but I thought everyone with a three-digit IQ stopped doing this in 1996.

The Debugger Extension, Part 6 - Scanning Threads

The Debugger Extension

We already have an extension that might be pretty useful in some scenarios, but another common situation is determining what a particular thread is doing. You might want to look at instances of a particular type on the stack of a thread that is maxing out your CPU, or you might want to look at two or more threads that appear to be deadlocked.

We can get something like this out of the extension we have written already if we alter it to search only the stack for the current thread. How do we do that? On an x86 machine, the stack looks something like this:

Thread Stack

Finding one end of the stack region is very easy. The top of the stack (but the "bottom" address, since the stack grows down) should always be in the ESP register. To get the base of the stack we need to be able to read an NT structure called the Thread Environment Block, or TEB.

The TEB is defined as follows in the Platform SDK.

typedef struct _TEB {
    BYTE Reserved1[1952];
    PVOID Reserved2[412];
    PVOID TlsSlots[64];
    BYTE Reserved3[8];
    PVOID Reserved4[26];
    PVOID ReservedForOle;
    PVOID Reserved5[4];
    PVOID TlsExpansionSlots;
} TEB, *PTEB;

We're all ecstatic that the TEB is undocumented when this allows the kernel team to freely implement new features, I guess, but this is no help to us right now. This is closer to what the TEB header really looks like.

typedef struct tagTEB_INTERNAL
{
    DWORD dwExceptionList;
    DWORD dwStackBase;
    DWORD dwStackLimit;
    DWORD lpTIB;
    DWORD lpFiberInfo;
    DWORD lpUserPointer;
    DWORD lpSelf;
    DWORD lpEnvironmentPointer;
    DWORD dwProcessId;
    DWORD dwThreadId;
    DWORD dwActiveRPCHandle;
    DWORD lpPEB;
    DWORD dwLastError;
    // More fields follow but are not included here.
} TEB_INTERNAL, *PTEB_INTERNAL;

I took that from chapter six of Microsoft Windows Internals by Mark Russinovich. He's done many useful and awe-inspiring things besides discovering the Sony Rootkit DRM. The DbgEng SDK exposes a method to us (IDebugSystemObjects::GetCurrentThreadTeb) that makes it trivial to write a function to read in this structure in the debugger (download the source if you want to see it).

We can now write a templated search function (much like those we wrote in part four) to search only the current stack. Since the stack will contain handles/pointers to the objects, we'll also need a function that searches with a level of indirection.

// Performs a range search on the current
// thread's stack.
//
template<class search_command>
inline void SearchStack(ULONG64 pattern)
{
    TEB_INTERNAL teb = {0};
    HRESULT hr = GetCurrentTEB(&teb);
    if( FAILED(hr) )
    {
        Out("Could not retrieve the TEB.\n");
        return;
    }

    ULONG64 esp = 0;
    hr = m_Registers->GetStackOffset(&esp);
    if( FAILED(hr) )
    {
        Out("Could not read the stack pointer.\n");
        return;
    }
    Out("Thread %d:\n", teb.dwThreadId);

    // Thunk the dword to a 64-bit integer.
    // Otherwise we'll take the previous dword
    // field in the TEB structure with us into
    // the Search call.
    ULARGE_INTEGER li = { teb.dwStackBase, 0L };
    this->SearchPointers<search_command>(
         pattern,
         esp,
         li.QuadPart);
}

// Searches for the pattern with a level
// of indirection.
//
template<class search_command>
inline void SearchPointers(ULONG64 pattern,
     ULONG64 start, ULONG64 end)
{
    search_command sc;
    int hits = 0;
    for(ULONG64 offs = start;
        offs <= end;
        offs += m_PtrSize)
    {
        ULONG64 ptr = 0;
        HRESULT hr = m_Data->ReadPointersVirtual(
            1L, offs, &ptr);
        ULONG64 ptrVal = 0;
        hr = m_Data->ReadPointersVirtual(
            1L, ptr, &ptrVal);
        if( hr == S_OK && ptrVal == pattern )
        {
            if( sc.HandleMatch(ptr) )
            {
                ++hits;
            }
        }
    }
    sc.ShowResults(hits);
}

The EngExtCpp framework makes it very easy to add a switch to enable searching with this method:

0:000> !atstat /?
!atstat [/s] <The MethodTable for SampleApp.ArbitraryType.>
  <The MethodTable for SampleApp.ArbitraryType.>
  /s - Searches only the current stack.
Displays statistics about ArbitraryType instances in memory.

That lets us build some cool composite commands in WinDbg like this:

0:000> ~*e!atstat /s 009131b0
Thread 2624:
Searching for ArbitraryTypes...
--------------------------------------------
01272bf8 : Purple
01272bf8 : Purple
01272bdc : Blue
01272bf8 : Purple
--------------------------------------------
Found 4 total instances.
Totals:
  Blue: 1
  Purple: 3

Thread 1924:
Searching for ArbitraryTypes...
--------------------------------------------
--------------------------------------------
Found 0 total instances.
Totals:

Thread 2096:
Searching for ArbitraryTypes...
--------------------------------------------
--------------------------------------------
Found 0 total instances.
Totals:

I think that's where we'll leave it for now. Not bad for a day of work or so, when you consider that we're empowered to crank out similar utilities in no time at all. I hope you've enjoyed the debugger extension series.

The Debugger Extension, Part 5 - Manipulating Managed Types

The Debugger Extension

In the last post in this series, we succeeded in writing a working extension that searched memory for instances of a particular type. Trouble is, we haven't done anything useful yet. We've merely duplicated a very small subset of the functionality offered by SOS's !DumpHeap command, and poorly at that.

In the problem setup, we said we wanted to show statistics about a particular property of these instances–for the purposes of this example, we're calling that their "Color." A sensible step in this direction would be to write some utility C++ code to accompany the Colors enumeration that we wrote earlier in C#.

// Some definitions that correspond to the managed
// SampleApp.Colors enum.
typedef ULONG Color;
const Color COLOR_RED = 0;
const Color COLOR_GREEN = 1;
const Color COLOR_BLUE = 2;
const Color COLOR_PURPLE = 3;
const Color MAX_COLOR = COLOR_PURPLE;

PCSTR g_szColorNames[] = { "Red", "Green",
    "Blue", "Purple" };

bool IsColor(Color c)
{
    if( c <= MAX_COLOR )
    {
        return true;
    }
    return false;
}

PCSTR ColorName(Color c)
{
    if( !IsColor(c) )
    {
        return NULL;
    }
    return g_szColorNames[static_cast<int>(c)];
}

Part of the point of this series has been to develop a framework that deals with instances of .NET objects. To that end, we should try to write a generic base class that loads a managed instance in the debuggee into the debugger's process. This is my implementation of such a class.

// ------------------------------------------------------
// mtypes.h
//        Some base classes for dealing with instances
//        of managed objects.
//
#pragma once

template<class object_fields>
class ManagedInstance
{
protected:
    object_fields m_Fields;
    ULONG64 m_offset;
    bool m_valid;

    // Can be used by derived classes can to refer
    // to this class.
    typedef ManagedInstance<object_fields> base_t;

    // Constructor - pass the offset of the managed
    // object. Check the result of IsValid() before
    // using an instance derived from this class.
    ManagedInstance(ULONG64 offset) : m_Fields()
    {
        // Load the data for the fields from the
        // debugee process / memory dump
        IDebugDataSpaces* pData = g_Ext->m_Data;
        m_offset = offset;
        ULONG read = 0L;
        HRESULT hr = pData->ReadVirtual(
            m_offset,
            reinterpret_cast<void*>(&m_Fields),
            sizeof(object_fields),
            &read);
        m_valid = SUCCEEDED(hr) &&
            (read == sizeof(object_fields));
    }

    virtual ~ManagedInstance() {}

public:
    // Returns true if the object was successfully
    // read from the debugee. Doesn't validate that
    // it actually is a managed object of the desired
    // type, but overridden implementations should
    // do this.
    virtual bool IsValid() { return m_valid; }
};

// This can be used as a base class for classes used
// as the object_fields template parameter.
class ObjectFields
{
public:
    ULONG pMethodTable;
};

The template parameter for the ManagedInstance class takes a POD ("plain old data") type that should just list the fields in the instance. The constructor for ManagedInstance loads the data at the specified offset as those fields. I declared a virtual function that indicates whether or not the managed instance is valid. In this base class, all we can really say about that is whether or not we could read the data at the provided address.

I've also defined a base class for the fields of a managed object, and put the MethodTable address in it. Given these classes, it's not a lot of work to write the implementations for ArbitraryType.

// Represents the fields of a
// SampleApp.ArbitraryType instance.
class ArbitraryTypeFields
     : public ObjectFields
{
public:
    Color col;
    ULONG id;
};

// Represents a single ArbitraryType instance.
class ArbitraryType :
    public ManagedInstance<ArbitraryTypeFields>
{
public:
    ArbitraryType(ULONG64 offset)
        : base_t(offset) {}
    virtual bool IsValid();
    Color GetColor() { return m_Fields.col; }
};

// Returns true if the loaded data is a valid
// ArbitraryType instance.
bool ArbitraryType::IsValid()
{
    if( base_t::IsValid() &&
        IsColor(m_Fields.col) )
    {
        return true;
    }
    return false;
}

In the last post, the only criteria we used for finding ArbitraryType instances was that we had found a INT_PTR containing the address of its MethodTable. That's obviously going to result in false positives, because the CLR's execution engine will certainly have this pointer in several places in its own internal data structures. We can do a little better now by making sure that the Color field is within the range of expected values. While we're changing our HandleMatch function to implement this, we'll add an STL map to the mix to keep track of the colors we find.

class AtStatCmd : public SearchCommand
{
protected:
    typedef map<Color, int>  CountMap_t;
    CountMap_t m_counts;

public:
    virtual void ShowResults(int totalHits);
    virtual bool HandleMatch(ULONG64 offset);
    AtStatCmd();
};

bool AtStatCmd::HandleMatch(ULONG64 offset)
{
    ArbitraryType at(offset);
    if( at.IsValid() )
    {
        Color c = at.GetColor();
        g_Ext->Out("%08I64x : %s\n",
           offset, ColorName(c));
        m_counts[c]++;
        return true;
    }
    return false;
}

The output of the extension in WinDbg now looks like this:

    0:000> !atstat 009131b0
    Searching for ArbitraryTypes...
    --------------------------------------------
    Searching 00000000 to 7fffffff.
    009100cc : Red
    009130e0 : Red
    01271ce4 : Red
    01271d00 : Blue
    01271d1c : Red
    01271d38 : Blue
    01271d54 : Red
    --------------------------------------------
    Found 7 total instances.
    Totals:
      Red: 5
      Blue: 2

Incidentally, this is debugging the same sample code that, in the previous post, was declared to have 20 instances in memory. As you can see, adding some basic validation reduced the number of bogus hits considerably. A more complex object–one you might actually use, I suppose–could have an even smaller incidence of errors.

There's one obvious problem with this that I can think of, and that is the fact these objects are not necessarily alive (rooted) on the GC heap. They could very well be collected and sitting in freed memory. In my case, this is not a concern since I am interested mostly in debugging leaked memory. This may be an issue for other users, however.

Since the extension we set out to build is basically complete, I'll post the code now even though I have one more feature in mind for the next post. You can download the code here.

The Debugger Extension, Part 4 - Searching Memory

The Debugger Extension

Now that we know how to solve our problem conceptually, we can put pen to paper. Metaphorically speaking, I suppose. As I said in the last post, our strategy will be to search memory for INT_PTRs matching the MethodTable for our ArbitraryType. When we find matches, we'll perform some further validation to reduce the likelihood of false positives.

It's not unreasonable to assume that if we're successful in writing an extension to look at this class, we might want to do something like it again in the future. So let's define an interface for commands that search through memory.

// ------------------------------------------------------
// SearchCommand.h
//        Defines the interface for extension commands
//        that search through memory.
//
#pragma once

class SearchCommand
{
public:
    // Called whenever the search pattern is encountered
    // at the provided offset. The method should return
    // true if the offset is a hit.
    virtual bool HandleMatch(ULONG64 offset)=0;

    // Called when the search is finished. The parameter
    // will contain the total number of matches found.
    virtual void ShowResults(int totalHits)=0;
};

This should give us some flexibility later on if we need it. We can also abstract the process of searching through memory. The DbgEng API that is available to us is the IDebugDataSpaces interface. This defines a SearchVirtual function, which we'll use to scan for the ArbitraryType's MethodTable. This is its definition.

HRESULT
  IDebugDataSpaces::SearchVirtual(
    IN ULONG64  Offset
    IN ULONG64  Length
    IN PVOID  Pattern
    IN ULONG  PatternSize
    IN ULONG  PatternGranularity
    OUT PULONG64  MatchOffset
    );

To make our algorithm generic, we'll add a templated function to our extension class.

// ------------------------------------------------------
// dmext.h
//
#pragma once
#include "engextcpp.hpp"
#include "searchcommand.h"

class EXT_CLASS : public ExtExtension
{
protected:

   // Does a range search for the pattern, keeps track of the hits, and
   // calls methods matching the SearchCommand interface on an instance
   // of the search_command parameter.
   //
   template<class search_command>
   inline void Search(ULONG64 pattern, ULONG64 start, ULONG64 end)
   {
      if( start > end )
      {
         Err("The start cannot be after the end.\n");
         return;
      }

      search_command sc;
      Out("Searching %08I64x to %08I64x.\n", start, end);

      HRESULT hr = 0;
      int hits = 0;
      ULONG64 offs = start;
      do
      {
         hr = m_Data->SearchVirtual(offs,
            end - offs,
            &pattern,
            this->m_PtrSize,
            1,
            &offs);
         if( hr == S_OK )
         {
            if( sc.HandleMatch(offs) )
            {
               ++hits;
            }

            // Search again, starting at the the next
            // pointer-sized location.
            offs += m_PtrSize;
         }
       }
       while( hr == S_OK );
       sc.ShowResults(hits);
    }

    // Shortcut for an x86 without the /3GB switch.
    template<class search_command>
    inline void Search(ULONG64 pattern)
    {
       this->Search<search_command>(pattern, 0, 0x7fffffff);
    }

 public:
    EXT_CLASS();
    EXT_COMMAND_METHOD(atstat);
};

I also added a shortcut function that searches all of the virtual memory that is available to user mode. This function assumes that we're debugging on an Intel x86 machine, and that the process is not LARGEADDRESSAWARE–that is to say, it can't make use of more that 2 gigabytes of virtual memory.

(A brief aside: although I'm using ULONG64 addresses and other conventions, I'm making no sincere attempt to ensure that this extension will work properly with a 64-bit debuggee. That much should be obvious from my last shortcut. Where I can, I will try to make life easy for someone writing a port.)

We now have enough framework to implement the skeleton of our extension command. For now, we'll just spit out addresses when we think we have a match.

// ------------------------------------------------------
// atstat.cpp
//
#include "stdafx.h"
#include "dmext.h"

class AtStatCmd : public SearchCommand
{
public:
    virtual void ShowResults(int totalHits);
    virtual bool HandleMatch(ULONG64 offset);
    AtStatCmd();
};

AtStatCmd::AtStatCmd()
{
    g_Ext->Out("Searching for ArbitraryTypes...\n");
    g_Ext->Out("--------------------------------------------\n");
}

bool AtStatCmd::HandleMatch(ULONG64 offset)
{
    g_Ext->Out("%08I64x\n", offset);
    return true;
}

void AtStatCmd::ShowResults(int totalHits)
{
    g_Ext->Out("--------------------------------------------\n");
    g_Ext->Out("Found %d total instances.\n\n", totalHits);
}

EXT_COMMAND(atstat,
   "Displays statistics about ArbitraryType instances in memory.",
   "{;e;The MethodTable for SampleApp.ArbitraryType.}")
{
    ULONG64 mt = this->GetUnnamedArgU64(0);
    this->Search<AtStatCmd>(mt);
}

The !atstat command is defined using the EngExtCpp framework's macro; this will automatically parse any parameters and provide debugger help. Look in the engextcpp.hpp header for the definition of this macro–I'm not even going to try to explain it here. As you can see, I've simplified matters by assuming that we can just retrieve the MethodTable for our type by some other means and provide it to the extension.

Here's some output of what we have finished so far:

0:000> .load C:\src\samples\dmext\dmext\objfre_wnet_x86\i386\dmext ;
0:000> .load C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\SOS
0:000> !name2ee sampleapp!SampleApp.ArbitraryType
Module: 00912c14 (SampleApp.exe)
Token: 0x02000003
MethodTable: 009131b0
EEClass: 00911410
Name: SampleApp.ArbitraryType

0:000> !atstat 00912c14
Searching for ArbitraryTypes...
--------------------------------------------
Searching 00000000 to 7fffffff.
0012f560
0012f72c
00167628
0016832c
001685f4
0016860c
00168624
0016abec
0016ae28
009101c0
009101e8
009111f8
00911268
009112d0
00911414
00911478
00913058
00913118
009131c4
00c316b8
--------------------------------------------
Found 20 total instances.

In the next post, we'll work on trimming down false positives and accomplishing what we set out to do: showing statistics about the "colors" of the ArbitraryTypes.