Posts Tagged ‘Debugging’

Debugger Feature Request

Mike Stall has had a recent string of posts about debugger features, which got me thinking about what I waste time doing in debuggers these days.

Omniscient debuggers would be great, but what I would really like is a memory window that I can annotate. I don't care if this makes it into Visual Studio or WinDbg, but it would save me time that I currently waste scribbling on notepads.

I should be able to stick some comments in the memory window, and they should stay at that address for the extent of the debugger run. This is what I mean:

Visual Studio memory window with annotation

In this case, since my memory window is @ESP, I should be able to see my annotations being pushed down on the stack as I step.

I realize maybe one percent of developers could use this, but dammit, we're the coolest one percent.

Psychic Debugging: Random Crashes

Raymond Chen has had an entry or two about "Psychic Debugging." This skill is a lot like everyday psychic abilities by virtue of being really just intuition and guesswork, but dissimilar in that it produces results that benefit society.

Today one of the developers was working on a weird problem he found in his development environment. He could test his functionality once (it's an ASPX page), but the second time he tried to use it there was usually a "random" exception on the page that didn't make any sense. Sometimes, instead of an exception, his worker process just crashed. He also told me that he was trying to use a new API that I knew mapped to a third-party COM object.

This is where the mystical powers came in handy. I tried running his page once, then running a different page–same weird exceptions and crashes. It was here that I started to suspect that the heap was being corrupted by the COM library.

(Let me quickly explain what heap corruption is: suppose I have two objects, X and Y, next to one another in memory. Now suppose I make a mistake writing my program and accidently write beyond the end of object X. I've now screwed up object Y, but we might not know about it until we try to use object Y. We might not try to use object Y for quite a while, so this makes the real cause of the bug awfully difficult to find. To make matters worse, the next time we run the program our bug writing to object X might scribble all over a completely different object that will crash the program somewhere else.)

On a hunch I attached a debugger after the first load of the page. When the exception occurred, I ran the !verifyheap command in the SOS extension. This reported that there was a corrupt object on the heap. I ran the same test a few times to confirm the theory.

If you're lucky, you might see a debug break like this in one of the Windows Debuggers as soon as the heap corruption occurs:

HEAP[heapcorrupt.exe]: Heap block at 001AD700 modified at 001AD780
  past requested size of 78
...
ntdll!DbgBreakPoint:

This is just a debug break, so it won't tear down the process immediately. Depending on the application you're working on the real problems could start much later, long after the offending code has finished executing.

All I can say is that it's a goddamned miracle that the developer was doing a good job testing his functionality and noticed this. The mayhem that would have ensued if this made it to QA would have made it far more difficult to track down the source of the problem. The happy ending here is that the vendor has a fully-managed version of their component that we can switch to–hopefully that will be more difficult for them to screw up.

Quick Debugger Tips

In my javascript debugging post I showed a stack trace and mentioned only that it seemed to be running javascript code. Here it is:

0:009> ~0k
ChildEBP RetAddr
WARNING: Stack unwind information not available. Following frames may be wrong.
0012f97c 6009db8b js3250!JSLL_MinInt+0x45e2
0012f9c8 6009cdd9 js3250!js_GetSrcNoteOffset+0x5358
0012f9f0 600c7c01 js3250!js_GetSrcNoteOffset+0x45a6
0012fa08 600856b5 js3250!js_GetScriptLineExtent+0x39e6
0012fa28 600b036f js3250!JS_NewStringCopyZ+0x44
0012fa40 600b3e93 js3250!js_FindProperty+0x26c5

This is a good stack to use to point out a few things about debugging without symbols.

If you're used to stepping through debug code in Visual Studio, your first instict may be to trust the stack printed out here. This would be wrong, for a few reasons.

First, there's the line telling us: "Stack unwind information not available. Following frames may be wrong." Stop me if I've covered this before, but it probably means that we're looking at optimized code. Optimized stack frames often have the pointers to each previous stack frame left out (you'll see this called "frame pointer omission," or FPO) which makes walking the stack much more difficult for the debugger.

Second, notice the offsets following the function names - most are very high. For example,

js3250!js_GetSrcNoteOffset+0x5358

The +0×5358 indicates that the return address is to a point more than 20KB after the export function given. 20KB is a lot of machine code, so this almost definitely an internal function call and not a return address into the body of js_GetSrcNoteOffset.

I say "almost definitely" instead of "definitely" with good reason. I have personally seen some VB6 functions several thousands of lines long that would certainly generate more than 20KB of machine code. But I guess if you're dealing with that then you have enough of your own problems without worrying about symbol subtleties in the Windows debuggers.

So the debugger here is showing us the closest symbol to the return address available, which turns out to not be very close at all.

What can we say about this thread's stack, given just this information? Only that it is very likely to be running javascript code. Not much else.

Conventional Javascript Debugging is for Wimps

Recently, Firefox has been maxing out my CPU "when I have more than three tabs open." As is typical with bug reports from users, this one is very poorly worded and essentially useless. It turns out that that was not a good explanation of what was going on. I messed around a little more and figured out that the problem had these characteristics:

  • The CPU was maxed out on any ESPN.com page.
  • Thread #0 was the offending thread.
  • Firefox's working set increased steadily while looking at ESPN.
  • Firefox eventually crashes with an A/V if the browser is left on ESPN.

Firefox 1.5 CPU Spike

I'm not sure if this is some strange interaction of my specific combination of extensions and settings, or if this happens for all Firefox users. This seems like a possible security vulnerability, but I can't say one way or the other.

I suppose I could have downloaded and compiled the source code for Firefox to figure things out, but that was way more effort than I felt like giving. I did managed to debug and fix this issue for myself–without source or symbols–which I think makes for an interesting writeup.

Since the problem was on every page on the website, and continued after the page finished loading, I assumed it had to be a javascript or Flash problem (both of which are abused by ESPN to an irritating degree). I don't even have the Flash plugin for Firefox installed, so that narrowed the scope of my investigation.

The first thing I did was try to debug the Javascript normally; this was hampered by several factors:

  • The Javascript debugger for Firefox, Venkman, hasn't officially come out for version 1.5 yet. I've complained about broken extensions before, but let me just reiterate how stupid and unprofessional I think broken extensions are.
  • Although this kind person has done his own workaround for 1.5, it was either brought to its knees by the ESPN site or was broken by something else.

To summarize the story to this point, I had a (likely) javascript problem on a page I visit very frequently, coupled with a browser bug that maxed the CPU and made the issue difficult to diagnose through normal means.

I attached to Firefox.exe in Windbg and started randomly breaking in and checking on thread zero while the spike was in progress. Sometimes, frames in the js3250 module were on the stack:

0:009> ~0k
ChildEBP RetAddr
WARNING: Stack unwind information not available. Following frames may be wrong.
0012f97c 6009db8b js3250!JSLL_MinInt+0x45e2
0012f9c8 6009cdd9 js3250!js_GetSrcNoteOffset+0x5358
0012f9f0 600c7c01 js3250!js_GetSrcNoteOffset+0x45a6
0012fa08 600856b5 js3250!js_GetScriptLineExtent+0x39e6
0012fa28 600b036f js3250!JS_NewStringCopyZ+0x44
0012fa40 600b3e93 js3250!js_FindProperty+0x26c5

The information for js3250 confirms (if the module name and the names of the export functions weren't enough for you) that it is the Mozilla Javascript implementation.

0:009> lmv m js3250
start    end        module name
60080000 600e9000   js3250     (export symbols)
   C:\Program Files\Mozilla Firefox\js3250.dll
    Loaded symbol image file:
        C:\Program Files\Mozilla Firefox\js3250.dll
    Image path: C:\Program Files\Mozilla Firefox\js3250.dll
    Image name: js3250.dll
    Timestamp:        Fri Nov 11 20:05:34 2005 (43753FDE)
    CheckSum:         00073A1C
    ImageSize:        00069000
    File version:     4.0.0.0
    Product version:  4.0.0.0
    File flags:       0 (Mask 3F)
    File OS:          10004 DOS Win32
    File type:        2.0 Dll
    File date:        00000000.00000000
    Translations:     0409.04e4
    CompanyName:      Netscape Communications Corporation
    ProductName:      NETSCAPE
    InternalName:     JS3240
    OriginalFilename: js3240.dll
    ProductVersion:   4.0
    FileVersion:      4.0
    FileDescription:  Netscape 32-bit JavaScript Module
    LegalCopyright:   Copyright Netscape Communications. 1994-96
    LegalTrademarks:  Netscape, Mozilla

All of this is reasonably good evidence that javascript is the right place to start. I took a look at the exports for js3250 ("x js3250!*" in Windbg) — it appeared as though this module was implemented as a few dozen C-style exports.

I thought that a logical thing to look for was an "execute a javascript function" function of some kind, and there were a few exports named some variation of "Call function."

0:009> x js3250!*Call*Function*
6008541f js3250!JS_CallFunction (<no parameter info>)
60085464 js3250!JS_CallFunctionName (<no parameter info>)
600854ff js3250!JS_CallFunctionValue (<no parameter info>)

I set a breakpoint on all of these.

0:009> bm js3250!*Call*Function*
  1: 6008541f @!"js3250!JS_CallFunction"
  2: 60085464 @!"js3250!JS_CallFunctionName"
  3: 600854ff @!"js3250!JS_CallFunctionValue"

After resuming the program, I started hitting these breakpoints constantly. I did a sanity check and made sure that this didn't occur on a cleaner site (google), and this was the case. I didn't have any immediate success figuring out what script functions were being called. You can dig up the source for these API's if you like, but I'm guessing that the script is already processed into different data structures by the time the javascript engine gets here.

I looked at some of the stacks when these functions were called, and I noticed this pretty far down:

0012fc7c 00534e18 js3250!JS_EvaluateUCScriptForPrincipals+0x70

I hoped this would lead me to some bits of javascript pointed to from nearby locations on the stack, so I set a breakpoint there. When it was hit, I started dumping out strings (using "dda @esp" and and "ddu @esp") and found this:

http://espn-att.starwave.com/motion/fsp/fsp.js

As an only slightly educated guess, I used the Adblock extension to block this script.

After reloading the page, the problem was gone! There don't seem to be any site-breaking problems associated with turning this script off. This is all or part of the "ESPN Motion" business that tries to display sound and video on the site. Hey, ESPN: this is a terrible idea in the first place. Your website shouldn't start yelling at me when I visit it. It's no excuse for Firefox to A/V, but I thought everyone with a three-digit IQ stopped doing this in 1996.

Using LeakDiag to Debug Unmanaged Memory Leaks

I have been getting quite a few google hits for search strings like this:

unmanaged memory leaks windbg

It's the second-most-common combination of search terms, trailing "hank goldberg picks" by a hell of a lot. I don't think the searches are coming from the same demographic. Anyway, I thought I would write up one of the easiest techniques that I'm aware of for debugging a memory leak in unmanaged code. This one doesn't touch WinDbg, but rather uses a few other Microsoft PSS tools specifically built for this purpose.

For this example, I fired up the MFC wizard and created a new scratch application. To that I added some logic to leak roughly 2K of memory every tenth of a second.

#include <vector>

using namespace std;

BEGIN_MESSAGE_MAP(CMainFrame, CFrameWnd)
    ON_WM_CREATE()
    ON_WM_TIMER()
END_MESSAGE_MAP()

// Incredibly stupid memory leak
void CMainFrame::OnTimer(UINT_PTR nIDEvent)
{
    UNREFERENCED_PARAMETER(nIDEvent);

    vector<int>* pvec = new vector<int>();
    for(int i = 0; i < 500; i++)
    {
        pvec->push_back(i);
    }
}

int CMainFrame::OnCreate(LPCREATESTRUCT lpCreateStruct)
{
    // ...
    this->SetTimer(1, 100, NULL);
}

Supposing we had not done this on purpose, it would be clear from looking at the process in Perfmon that we were dealing with a memory leak. The Private Bytes counter for this process grows steadily while the application is doing nothing in particular.

Perfmon output for a program leaking memory

The tools that we'll be using to look at this problem are LeakDiag, LDParser, and LDGrapher. You can download them all from ftp://ftp.microsoft.com/PSS/Tools/ (LeakDiag and LDParser are bundled together).

After opening the problem application, start LeakDiag.exe. In Tools->Options, we want to increase the stack depth to the maximum (32). The reason for this is because in an application written in any medium to high level language you are typically pretty far from the actual call to malloc when you are leaking memory.

Adjusting the stack depth

There are a few options available (on the main dialog) for the specific allocator to monitor. Several may generate hits for the same leak (The CRT malloc will ultimately call the NT APIs, for example), but try to pick the one that most describes your application. Click Start and create a few logs as the leak manifests itself. In the MFC application I wrote, the leak is occurring constantly. Your application may need to run for many hours before you can get any worthwhile data.

LeakDiag

After doing this, you can use the LDParser application to open up one of the log files. You'll see something like this:

LDParser

The upper-right pane is a list of unique stack traces when the specified allocator was invoked. The list should be sorted by the total amount of data allocated by each. The bottom pane shows the stack trace for the active stack ID. In my case, the stack allocating the most memory is my intentional leak (notice that CMainFrame::OnTimer is in frame ten).

If your situation is more complicated than mine, as it almost certainly will be, there is one other tool you should be aware of. LDGrapher can take a set of logs generated by LeakDiag and generate a set of graphs of allocations over time. Here is the output of my application over a few minutes:

LDGrapher

Each stack ID is represented by a line on the graph. Hopefully, this will help some of you debugging unmanaged memory leaks.

Making a DataSet Read-Only

Let's say you're writing a component that makes a DataSet available to a number of clients. The DataSet is expected to persist in your application for a while, and be used in code written by many different developers.

You could carefully craft an email explaining that,

Hey everybody, this DataSet is shared. Please don't change the data in it for the purposes of your own piece of the application. That could create some odd bugs that would be really hard to track down. If you need to do something like that, please make a copy of the DataSet first.

I suppose you could do that, if you wanted to waste your time. You could always create a copy of the DataSet before handing it over to anybody, but that may introduce some unnecessary performance issues if the DataSet is large. It feels a little like punishing everyone for the bad behavior of a few. If you wanted to actually prevent such problems, you could write a clever class like this that acts like a "lock" on the DataSet.

/// <summary>
/// Class that ensures that clients keep their pesky hands
/// off of a particular dataset.
/// </summary>
internal class DataSetLock
{
    [Conditional("DEBUG")]
    public static void Install(DataSet ds)
    {
        if (ds == null)
        {
            throw new ArgumentNullException("ds");
        }
        EventHandler thrower = delegate(object sender, EventArgs e)
        {
            string msg = string.Format(
                "You can't change: {0}.", ds.DataSetName);
            throw new InvalidOperationException(msg);
        };

        foreach (DataTable t in ds.Tables)
        {
            t.RowChanging += new DataRowChangeEventHandler(thrower);
            t.RowDeleted += new DataRowChangeEventHandler(thrower);
            t.ColumnChanging += new DataColumnChangeEventHandler(thrower);
            t.TableClearing += new DataTableClearEventHandler(thrower);
            t.TableNewRow += new DataTableNewRowEventHandler(thrower);
        }
    }
    private DataSetLock() { }
}

The "lock" is really an anonymous method that we attach, willy nilly, to all of the change events in the DataSet. We use lexical closure to add the DataSet name passed in to the exception message dozens of caffeine-addled developer brains will be processing shortly. [Note: if you don't think it's lexical closure, go talk to this guy. He's really into the topic.]

I put the ConditionalAttribute on the Install method as well, so as long as you have reasonably good testing coverage this check will just be compiled away in retail code.

Here's a quick usage example:


static void Main(string[] args)
{
    DataSet ds = new DataSet();
    ds.DataSetName = "My Data Set";
    DataTable t = ds.Tables.Add();
    t.Columns.Add("foo", typeof(string));

    DataSetLock.Install(ds);

    // Pow!
    t.LoadDataRow(new object[] { "asdf" }, true);
}

This prints:

System.InvalidOperationException: You can't change: My Data Set.

Enjoy.

Monad Script to Scrub Perfmon Output

Here's the scenario: I had about eight hours of perfmon output, for several hundred counters sampled once per second. I wanted to put this into a database already containing the parsed IIS logs from the same machine, to try to correlate some of the resource utilization peaks with URL's.

The only snag was that Microsoft's LogParser couldn't handle the CSV files that perfmon generated. The input lines were far too long for it.

The call is from heroism. Will you accept the charges?

The solution I came up with was to hack together an MSH script that processed the files. It's not pretty (because it didn't have to be), but here it is.


#
# trim-perflog.msh
#   This pares down a set of enormous perfmon logs to a
#   size that can be managed by the Microsoft LogParser.
#
#   This finds the column indices that we're interested in,
#   then runs through each line in the input CSV file(s)
#   and pulls them out.
#

# The counters in the perfmon trace that we're interested in.
#
$counters =
"(PDH-CSV 4.0) (Eastern Standard Time)(300)",
"\\machine\.NET CLR Exceptions(w3wp)\# of Exceps Thrown / sec",
"\\machine\.NET CLR LocksAndThreads(w3wp)\Contention Rate / sec",
"\\machine\.NET CLR Memory(w3wp)\% Time in GC",
"\\machine\.NET CLR Remoting(w3wp)\Remote Calls/sec",
"\\machine\Process(w3wp)\Page Faults/sec",
"\\machine\Process(w3wp)\% Processor Time",
"\\machine\Process(w3wp)\% User Time",
"\\machine\Process(w3wp)\% Privileged Time",
"\\machine\Process(w3wp)\Private Bytes";

# Prettier names for the counters.
#
$columnAlias = "time", "exceptionsPerSecond",
"contentionPerSecond", "pctTimeInGC",
"remoteCallsPerSecond", "pageFaultsPerSecond",
"pctProcessorTime", "pctUserTime",
"pctPrivelegedTime", "privateBytes";

# Returns an array that contains the index of each of the
# $counters in the csv.
#
function getPerflogIndices
{
    param([System.String]$columns)

    $tokens = $columns.Split(',');
    for($i = 0; $i -lt $tokens.Length; $i++)
    {
        $t = $tokens[$i];
        $t = $t.Substring(1, $t.Length-2);
        $index = -1 ;
        for($j = 0; $j -lt $counters.Length; $j++)
        {
            if($t -eq $counters[$j])
            {
                $index = $j;
            }
        }
        if($index -ge 0)
        {
            write-object $i;
        }
    }
}

# writes a CSV line using the values in $array.
#
function write-array
{
    param([System.IO.StreamWriter]$writer, $array)
    $j = 0;
    $last = $array.Length;
    foreach($a in $array)
    {
        $writer.Write($a);
        if($j -ne $last)
        {
            $writer.Write(",");
        }
        $j++;
    }
    $writer.WriteLine();
}

# Writes a single line to the result CSV file.
# This requires:
#    $writer  - The output stream.
#    $ln      - A single line read from
#                the input stream.
#    $indices - The column indices of the
#                  subject perfmon counters.
#
function write-csv-line
{
    param([System.IO.StreamWriter]$writer,
       [System.String]$ln, [System.Object[]]$indices)

    $vals = $ln.Split(',');
    $j = 0;
    $last = $indices.Length - 1;
    foreach($i in $indices)
    {
        $writer.Write($vals[$i]);
        if($j -ne $last)
        {
            $writer.Write(",");
        }
        $j++;
    }
    $writer.WriteLine();
}

# Trims down the input CSV file. If $names is True,
# writes the names of the columns as the first line
# in the output.
function trim-perflog
{
    param([System.IO.FileInfo]$in, [System.Boolean]$names)

    $r = $in.OpenText();
    $ln = $r.ReadLine();
    $indices = getPerfLogIndices $ln;

    $wr = [System.IO.File]::AppendText($out);
    if($names)
    {
        write-array $wr $columnAlias;
    }
    while($r.Peek() -ge 0)
    {
        write-csv-line $wr $r.ReadLine() $indices;
    }
    $wr.Close();
}

$out = "c:\perflogs\output.csv";

trim-perflog c:\perflogs\prod_000005.csv True
trim-perflog c:\perflogs\prod_000006.csv False
trim-perflog c:\perflogs\prod_000007.csv False

The Debugger Extension, Part 6 - Scanning Threads

The Debugger Extension

We already have an extension that might be pretty useful in some scenarios, but another common situation is determining what a particular thread is doing. You might want to look at instances of a particular type on the stack of a thread that is maxing out your CPU, or you might want to look at two or more threads that appear to be deadlocked.

We can get something like this out of the extension we have written already if we alter it to search only the stack for the current thread. How do we do that? On an x86 machine, the stack looks something like this:

Thread Stack

Finding one end of the stack region is very easy. The top of the stack (but the "bottom" address, since the stack grows down) should always be in the ESP register. To get the base of the stack we need to be able to read an NT structure called the Thread Environment Block, or TEB.

The TEB is defined as follows in the Platform SDK.

typedef struct _TEB {
    BYTE Reserved1[1952];
    PVOID Reserved2[412];
    PVOID TlsSlots[64];
    BYTE Reserved3[8];
    PVOID Reserved4[26];
    PVOID ReservedForOle;
    PVOID Reserved5[4];
    PVOID TlsExpansionSlots;
} TEB, *PTEB;

We're all ecstatic that the TEB is undocumented when this allows the kernel team to freely implement new features, I guess, but this is no help to us right now. This is closer to what the TEB header really looks like.

typedef struct tagTEB_INTERNAL
{
    DWORD dwExceptionList;
    DWORD dwStackBase;
    DWORD dwStackLimit;
    DWORD lpTIB;
    DWORD lpFiberInfo;
    DWORD lpUserPointer;
    DWORD lpSelf;
    DWORD lpEnvironmentPointer;
    DWORD dwProcessId;
    DWORD dwThreadId;
    DWORD dwActiveRPCHandle;
    DWORD lpPEB;
    DWORD dwLastError;
    // More fields follow but are not included here.
} TEB_INTERNAL, *PTEB_INTERNAL;

I took that from chapter six of Microsoft Windows Internals by Mark Russinovich. He's done many useful and awe-inspiring things besides discovering the Sony Rootkit DRM. The DbgEng SDK exposes a method to us (IDebugSystemObjects::GetCurrentThreadTeb) that makes it trivial to write a function to read in this structure in the debugger (download the source if you want to see it).

We can now write a templated search function (much like those we wrote in part four) to search only the current stack. Since the stack will contain handles/pointers to the objects, we'll also need a function that searches with a level of indirection.

// Performs a range search on the current
// thread's stack.
//
template<class search_command>
inline void SearchStack(ULONG64 pattern)
{
    TEB_INTERNAL teb = {0};
    HRESULT hr = GetCurrentTEB(&teb);
    if( FAILED(hr) )
    {
        Out("Could not retrieve the TEB.\n");
        return;
    }

    ULONG64 esp = 0;
    hr = m_Registers->GetStackOffset(&esp);
    if( FAILED(hr) )
    {
        Out("Could not read the stack pointer.\n");
        return;
    }
    Out("Thread %d:\n", teb.dwThreadId);

    // Thunk the dword to a 64-bit integer.
    // Otherwise we'll take the previous dword
    // field in the TEB structure with us into
    // the Search call.
    ULARGE_INTEGER li = { teb.dwStackBase, 0L };
    this->SearchPointers<search_command>(
         pattern,
         esp,
         li.QuadPart);
}

// Searches for the pattern with a level
// of indirection.
//
template<class search_command>
inline void SearchPointers(ULONG64 pattern,
     ULONG64 start, ULONG64 end)
{
    search_command sc;
    int hits = 0;
    for(ULONG64 offs = start;
        offs <= end;
        offs += m_PtrSize)
    {
        ULONG64 ptr = 0;
        HRESULT hr = m_Data->ReadPointersVirtual(
            1L, offs, &ptr);
        ULONG64 ptrVal = 0;
        hr = m_Data->ReadPointersVirtual(
            1L, ptr, &ptrVal);
        if( hr == S_OK && ptrVal == pattern )
        {
            if( sc.HandleMatch(ptr) )
            {
                ++hits;
            }
        }
    }
    sc.ShowResults(hits);
}

The EngExtCpp framework makes it very easy to add a switch to enable searching with this method:

0:000> !atstat /?
!atstat [/s] <The MethodTable for SampleApp.ArbitraryType.>
  <The MethodTable for SampleApp.ArbitraryType.>
  /s - Searches only the current stack.
Displays statistics about ArbitraryType instances in memory.

That lets us build some cool composite commands in WinDbg like this:

0:000> ~*e!atstat /s 009131b0
Thread 2624:
Searching for ArbitraryTypes...
--------------------------------------------
01272bf8 : Purple
01272bf8 : Purple
01272bdc : Blue
01272bf8 : Purple
--------------------------------------------
Found 4 total instances.
Totals:
  Blue: 1
  Purple: 3

Thread 1924:
Searching for ArbitraryTypes...
--------------------------------------------
--------------------------------------------
Found 0 total instances.
Totals:

Thread 2096:
Searching for ArbitraryTypes...
--------------------------------------------
--------------------------------------------
Found 0 total instances.
Totals:

I think that's where we'll leave it for now. Not bad for a day of work or so, when you consider that we're empowered to crank out similar utilities in no time at all. I hope you've enjoyed the debugger extension series.

The Debugger Extension, Part 5 - Manipulating Managed Types

The Debugger Extension

In the last post in this series, we succeeded in writing a working extension that searched memory for instances of a particular type. Trouble is, we haven't done anything useful yet. We've merely duplicated a very small subset of the functionality offered by SOS's !DumpHeap command, and poorly at that.

In the problem setup, we said we wanted to show statistics about a particular property of these instances–for the purposes of this example, we're calling that their "Color." A sensible step in this direction would be to write some utility C++ code to accompany the Colors enumeration that we wrote earlier in C#.

// Some definitions that correspond to the managed
// SampleApp.Colors enum.
typedef ULONG Color;
const Color COLOR_RED = 0;
const Color COLOR_GREEN = 1;
const Color COLOR_BLUE = 2;
const Color COLOR_PURPLE = 3;
const Color MAX_COLOR = COLOR_PURPLE;

PCSTR g_szColorNames[] = { "Red", "Green",
    "Blue", "Purple" };

bool IsColor(Color c)
{
    if( c <= MAX_COLOR )
    {
        return true;
    }
    return false;
}

PCSTR ColorName(Color c)
{
    if( !IsColor(c) )
    {
        return NULL;
    }
    return g_szColorNames[static_cast<int>(c)];
}

Part of the point of this series has been to develop a framework that deals with instances of .NET objects. To that end, we should try to write a generic base class that loads a managed instance in the debuggee into the debugger's process. This is my implementation of such a class.

// ------------------------------------------------------
// mtypes.h
//        Some base classes for dealing with instances
//        of managed objects.
//
#pragma once

template<class object_fields>
class ManagedInstance
{
protected:
    object_fields m_Fields;
    ULONG64 m_offset;
    bool m_valid;

    // Can be used by derived classes can to refer
    // to this class.
    typedef ManagedInstance<object_fields> base_t;

    // Constructor - pass the offset of the managed
    // object. Check the result of IsValid() before
    // using an instance derived from this class.
    ManagedInstance(ULONG64 offset) : m_Fields()
    {
        // Load the data for the fields from the
        // debugee process / memory dump
        IDebugDataSpaces* pData = g_Ext->m_Data;
        m_offset = offset;
        ULONG read = 0L;
        HRESULT hr = pData->ReadVirtual(
            m_offset,
            reinterpret_cast<void*>(&m_Fields),
            sizeof(object_fields),
            &read);
        m_valid = SUCCEEDED(hr) &&
            (read == sizeof(object_fields));
    }

    virtual ~ManagedInstance() {}

public:
    // Returns true if the object was successfully
    // read from the debugee. Doesn't validate that
    // it actually is a managed object of the desired
    // type, but overridden implementations should
    // do this.
    virtual bool IsValid() { return m_valid; }
};

// This can be used as a base class for classes used
// as the object_fields template parameter.
class ObjectFields
{
public:
    ULONG pMethodTable;
};

The template parameter for the ManagedInstance class takes a POD ("plain old data") type that should just list the fields in the instance. The constructor for ManagedInstance loads the data at the specified offset as those fields. I declared a virtual function that indicates whether or not the managed instance is valid. In this base class, all we can really say about that is whether or not we could read the data at the provided address.

I've also defined a base class for the fields of a managed object, and put the MethodTable address in it. Given these classes, it's not a lot of work to write the implementations for ArbitraryType.

// Represents the fields of a
// SampleApp.ArbitraryType instance.
class ArbitraryTypeFields
     : public ObjectFields
{
public:
    Color col;
    ULONG id;
};

// Represents a single ArbitraryType instance.
class ArbitraryType :
    public ManagedInstance<ArbitraryTypeFields>
{
public:
    ArbitraryType(ULONG64 offset)
        : base_t(offset) {}
    virtual bool IsValid();
    Color GetColor() { return m_Fields.col; }
};

// Returns true if the loaded data is a valid
// ArbitraryType instance.
bool ArbitraryType::IsValid()
{
    if( base_t::IsValid() &&
        IsColor(m_Fields.col) )
    {
        return true;
    }
    return false;
}

In the last post, the only criteria we used for finding ArbitraryType instances was that we had found a INT_PTR containing the address of its MethodTable. That's obviously going to result in false positives, because the CLR's execution engine will certainly have this pointer in several places in its own internal data structures. We can do a little better now by making sure that the Color field is within the range of expected values. While we're changing our HandleMatch function to implement this, we'll add an STL map to the mix to keep track of the colors we find.

class AtStatCmd : public SearchCommand
{
protected:
    typedef map<Color, int>  CountMap_t;
    CountMap_t m_counts;

public:
    virtual void ShowResults(int totalHits);
    virtual bool HandleMatch(ULONG64 offset);
    AtStatCmd();
};

bool AtStatCmd::HandleMatch(ULONG64 offset)
{
    ArbitraryType at(offset);
    if( at.IsValid() )
    {
        Color c = at.GetColor();
        g_Ext->Out("%08I64x : %s\n",
           offset, ColorName(c));
        m_counts[c]++;
        return true;
    }
    return false;
}

The output of the extension in WinDbg now looks like this:

    0:000> !atstat 009131b0
    Searching for ArbitraryTypes...
    --------------------------------------------
    Searching 00000000 to 7fffffff.
    009100cc : Red
    009130e0 : Red
    01271ce4 : Red
    01271d00 : Blue
    01271d1c : Red
    01271d38 : Blue
    01271d54 : Red
    --------------------------------------------
    Found 7 total instances.
    Totals:
      Red: 5
      Blue: 2

Incidentally, this is debugging the same sample code that, in the previous post, was declared to have 20 instances in memory. As you can see, adding some basic validation reduced the number of bogus hits considerably. A more complex object–one you might actually use, I suppose–could have an even smaller incidence of errors.

There's one obvious problem with this that I can think of, and that is the fact these objects are not necessarily alive (rooted) on the GC heap. They could very well be collected and sitting in freed memory. In my case, this is not a concern since I am interested mostly in debugging leaked memory. This may be an issue for other users, however.

Since the extension we set out to build is basically complete, I'll post the code now even though I have one more feature in mind for the next post. You can download the code here.

The Debugger Extension, Part 4 - Searching Memory

The Debugger Extension

Now that we know how to solve our problem conceptually, we can put pen to paper. Metaphorically speaking, I suppose. As I said in the last post, our strategy will be to search memory for INT_PTRs matching the MethodTable for our ArbitraryType. When we find matches, we'll perform some further validation to reduce the likelihood of false positives.

It's not unreasonable to assume that if we're successful in writing an extension to look at this class, we might want to do something like it again in the future. So let's define an interface for commands that search through memory.

// ------------------------------------------------------
// SearchCommand.h
//        Defines the interface for extension commands
//        that search through memory.
//
#pragma once

class SearchCommand
{
public:
    // Called whenever the search pattern is encountered
    // at the provided offset. The method should return
    // true if the offset is a hit.
    virtual bool HandleMatch(ULONG64 offset)=0;

    // Called when the search is finished. The parameter
    // will contain the total number of matches found.
    virtual void ShowResults(int totalHits)=0;
};

This should give us some flexibility later on if we need it. We can also abstract the process of searching through memory. The DbgEng API that is available to us is the IDebugDataSpaces interface. This defines a SearchVirtual function, which we'll use to scan for the ArbitraryType's MethodTable. This is its definition.

HRESULT
  IDebugDataSpaces::SearchVirtual(
    IN ULONG64  Offset
    IN ULONG64  Length
    IN PVOID  Pattern
    IN ULONG  PatternSize
    IN ULONG  PatternGranularity
    OUT PULONG64  MatchOffset
    );

To make our algorithm generic, we'll add a templated function to our extension class.

// ------------------------------------------------------
// dmext.h
//
#pragma once
#include "engextcpp.hpp"
#include "searchcommand.h"

class EXT_CLASS : public ExtExtension
{
protected:

   // Does a range search for the pattern, keeps track of the hits, and
   // calls methods matching the SearchCommand interface on an instance
   // of the search_command parameter.
   //
   template<class search_command>
   inline void Search(ULONG64 pattern, ULONG64 start, ULONG64 end)
   {
      if( start > end )
      {
         Err("The start cannot be after the end.\n");
         return;
      }

      search_command sc;
      Out("Searching %08I64x to %08I64x.\n", start, end);

      HRESULT hr = 0;
      int hits = 0;
      ULONG64 offs = start;
      do
      {
         hr = m_Data->SearchVirtual(offs,
            end - offs,
            &pattern,
            this->m_PtrSize,
            1,
            &offs);
         if( hr == S_OK )
         {
            if( sc.HandleMatch(offs) )
            {
               ++hits;
            }

            // Search again, starting at the the next
            // pointer-sized location.
            offs += m_PtrSize;
         }
       }
       while( hr == S_OK );
       sc.ShowResults(hits);
    }

    // Shortcut for an x86 without the /3GB switch.
    template<class search_command>
    inline void Search(ULONG64 pattern)
    {
       this->Search<search_command>(pattern, 0, 0x7fffffff);
    }

 public:
    EXT_CLASS();
    EXT_COMMAND_METHOD(atstat);
};

I also added a shortcut function that searches all of the virtual memory that is available to user mode. This function assumes that we're debugging on an Intel x86 machine, and that the process is not LARGEADDRESSAWARE–that is to say, it can't make use of more that 2 gigabytes of virtual memory.

(A brief aside: although I'm using ULONG64 addresses and other conventions, I'm making no sincere attempt to ensure that this extension will work properly with a 64-bit debuggee. That much should be obvious from my last shortcut. Where I can, I will try to make life easy for someone writing a port.)

We now have enough framework to implement the skeleton of our extension command. For now, we'll just spit out addresses when we think we have a match.

// ------------------------------------------------------
// atstat.cpp
//
#include "stdafx.h"
#include "dmext.h"

class AtStatCmd : public SearchCommand
{
public:
    virtual void ShowResults(int totalHits);
    virtual bool HandleMatch(ULONG64 offset);
    AtStatCmd();
};

AtStatCmd::AtStatCmd()
{
    g_Ext->Out("Searching for ArbitraryTypes...\n");
    g_Ext->Out("--------------------------------------------\n");
}

bool AtStatCmd::HandleMatch(ULONG64 offset)
{
    g_Ext->Out("%08I64x\n", offset);
    return true;
}

void AtStatCmd::ShowResults(int totalHits)
{
    g_Ext->Out("--------------------------------------------\n");
    g_Ext->Out("Found %d total instances.\n\n", totalHits);
}

EXT_COMMAND(atstat,
   "Displays statistics about ArbitraryType instances in memory.",
   "{;e;The MethodTable for SampleApp.ArbitraryType.}")
{
    ULONG64 mt = this->GetUnnamedArgU64(0);
    this->Search<AtStatCmd>(mt);
}

The !atstat command is defined using the EngExtCpp framework's macro; this will automatically parse any parameters and provide debugger help. Look in the engextcpp.hpp header for the definition of this macro–I'm not even going to try to explain it here. As you can see, I've simplified matters by assuming that we can just retrieve the MethodTable for our type by some other means and provide it to the extension.

Here's some output of what we have finished so far:

0:000> .load C:\src\samples\dmext\dmext\objfre_wnet_x86\i386\dmext ;
0:000> .load C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\SOS
0:000> !name2ee sampleapp!SampleApp.ArbitraryType
Module: 00912c14 (SampleApp.exe)
Token: 0x02000003
MethodTable: 009131b0
EEClass: 00911410
Name: SampleApp.ArbitraryType

0:000> !atstat 00912c14
Searching for ArbitraryTypes...
--------------------------------------------
Searching 00000000 to 7fffffff.
0012f560
0012f72c
00167628
0016832c
001685f4
0016860c
00168624
0016abec
0016ae28
009101c0
009101e8
009111f8
00911268
009112d0
00911414
00911478
00913058
00913118
009131c4
00c316b8
--------------------------------------------
Found 20 total instances.

In the next post, we'll work on trimming down false positives and accomplishing what we set out to do: showing statistics about the "colors" of the ArbitraryTypes.