Posts Tagged ‘.NET’

Why is my application hanging in weird places?

Here's a problem:

  1. You have an application that's hanging permanently or temporarily.
  2. The hang does not occur in any synchronization code that you have written yourself.
  3. You are writing a windows forms application, or any kind of application using COM interop.

Let me present an educated guess as to what is happening. Note to people getting here by googling: there are actually many other reasons this might happen in addition to the one I am going to tell you about. I merely suggest that you spend more time thinking before being led too far down a rabbit hole by what I am about to write. Experience has taught me that people tend to lock on to one internet analysis (say, an unrelated KB article mentioning the same exception text) and waste a lot of time on it. None of the other sites seem to be nice enough to explicitly warn you about your own tendencies–I am, you should thank me.

With those disclaimers, here is my psychic debugging response:

  1. You are inadvertently calling a method on an STA object created on a different thread.
  2. That other thread is busy doing something else.

For people that managed (har har) to avoid COM development, or stayed completely in the insulated VB6 world, I will now explain briefly everything you really need to know about threading models in this, our fantastic futuristic age.

Methods on single threaded apartment (STA) objects are synchronized. The object receives all calls to its methods on the thread that owns it. This is achieved by creating a hidden window that receives messages for any calls made on those objects. Multi-threaded apartment (MTA) objects behave more or less like any .NET object you create: calls are not automatically synchronized, and you are responsible for doing any synchronization necessary yourself.

So in order for calls to STA objects to work correctly, the thread that owns them has to be free to run its message loop a sufficient percentage of the time. Running a computationally-intensive process on an STA thread is a great way to hang the other threads in your application, if you are careless or ignorant of where the STA calls are.

If you attach with Windbg and dump out the native stacks (~*k), you will probably see threads in calls to WaitOne that got there through functions named things like "SendReceive" above some interop marshalling code if this is indeed your problem. If you are lucky you might see a function called "GetToSTA," however, the stacks unfortunately do not always make things that easy.

This is one of those things that I have seen a bunch of times, but seems to be on the radar of an increasingly small number of people. It often takes me longer than it should to figure it out, too. If you are doing any kind of practical Windows development, you still need to understand the basics of STA/MTA. That's because any development that is "practical" will almost necessarily involve interoperating with old code.

Here's a related post that I wrote, about how STA objects can result in blocked finalizer threads.

Crazy Hacker Tricks

Let me be the first to say that this is probably a really bad idea, unless you are very desperate.

On the other hand, this was really fun to write.

Today I wrote some code that patches the first few bytes of user32!MessageBoxExW, in order to keep a pesky third-party library from showing a prompt when nobody was around to click on it. This overwrites the start of that function with a relative jump to one of my own, which then forwards to a managed function that logs what the message box was trying to tell us.

Actually, this was for a friend's program. I would probably shy away from doing this if I weren't only indirectly responsible.

void InstallHook()
{
    HMODULE hUser32 = LoadLibrary(L"user32.dll");
    PROC pMsgBox = GetProcAddress(hUser32, "MessageBoxExW");
    FailIf(pMsgBox == NULL);

    // Need to change the memory protection for the
    // (read/execute) page in user32.dll before we
    //write to it.
    MEMORY_BASIC_INFORMATION mbi = {0};
    VirtualQuery(pMsgBox, &mbi,
        sizeof(MEMORY_BASIC_INFORMATION));
    FailIf(!VirtualProtect(mbi.BaseAddress, mbi.RegionSize,
        PAGE_READWRITE, &mbi.Protect));

    // Copy in the jump instruction... you could of course
    // opt to not write this in assembly, but if we've gone this
    // far, what's the difference?
    __asm
    {
         // edi <- user32!MessageBoxExW
        mov edi, pMsgBox;              

        // write the opcode for JMP rel32
        mov byte ptr [edi], 0xe9;       

        // eax <- _MessageBoxEx, then
        // eax <- _MessageBoxEx - user32!MessageBoxExW
        mov eax, offset _MessageBoxEx;
        sub eax, edi;                   

        // One byte for jmp, four for the address.
        sub eax, 5;
        inc edi;
        stosd;
    };

    // Restore the old protection
    DWORD oldProtect = 0;
    FailIf(!VirtualProtect(mbi.BaseAddress,
        mbi.RegionSize, mbi.Protect, &oldProtect));
}

With that in place it's not hard to erect some managed scaffolding around it, and nobody needs to know your secret. Until they try to run it on an x64, I guess:

static void Main(string[] args)
{
    HookMonkey.Install(delegate(MessageBoxInfo mbi)
    {
        Console.WriteLine(mbi.Text);
        return DialogResult.OK;
    });
    MessageBox.Show("foo bar");
}

You can download some sample code here.

First Impressions of CCR

I got a chance to mess around with the Concurrency and Coordination Runtime (CCR) bits recently. Before I get into that, first check out this real-life code I wrote this week.

public static class NotificationQueue
{
    private static Queue<Notification> _queue;
    private static Semaphore _work;

    static NotificationQueue()
    {
        _queue = new Queue<Notification>();
        _work = new Semaphore(0, int.MaxValue);
    }

    public static void Enqueue(Notification n)
    {
        lock (_queue)
        {
            _queue.Enqueue(n);
        }
        _work.Release();
    }

    public static Notification Dequeue()
    {
        _work.WaitOne();
        lock (_queue)
        {
            return _queue.Dequeue();
        }
    }
}

The idea here is to post notifications to the queue from many threads, and have a single notification thread sending the messages out from something like this:

private static void NotifyThreadProc()
{
    while(!Abort())
        NotificationQueue.Dequeue().Send();
}

The documentation for CCR is pretty sparse at this point, limited to just this paper [pdf], the Channel9 video, and this Wiki. Some of the particulars seem to have changed significantly, but I was able to figure out how to replicate my explicitly-threaded notification queue:

public class NotificationService : CcrServiceBase
{
    private Port<Notification> _port;

    public NotificationService()
        : base(new DispatcherQueue("foo"))
    {
        _port = new Port<Notification>();
        Activate(Arbiter.Receive(true, _port,
        delegate(Notification n)
        {
            n.Send();
        }));
    }

    public void Post(Notification n)
    {
        _port.Post(n);
    }
}

That's pretty awesome: notice that no explicit locks or waits are necessary, and I don't need to write the NotificationThreadProc or the code required to start it. I just need to make a NotificationService and start posting to it.

I can't say it's the most immediately comprehensible API I've ever seen, but hopefully that will change with more documentation and some polish. It's also possible that I am just warped from years of using the lower-level concepts. Overall this is awfully impressive for a library.

Debugging and Symbols FAQ

This is something I typed up internally, to help resolve the confusion that precipitates when Visual Studio begins stepping through comments. Hopefully this will be helpful to someone else.

What is required for source mode debugging?

  • Binaries (.dll, .exe, .ocx, …).
  • Symbols (the .pdb file).
  • Source files.

The symbol files tell your debugger how points in the binary relate to the source files. When you set a breakpoint in a source file, the debugger uses the symbol file to look up the corresponding instruction in the binary.

I have all of these things. Why is my debugger behaving strangely or not working at all?

Well, there are a bunch of different things that can go wrong.

Your symbols might be mismatched.

You can't (as an example) use the symbols for version 2.0 of a DLL to debug version 1.0 of that DLL. Perhaps, in this case, you have copied a new binary into your run directory without copying the corresponding .pdb file. It's also possible that your IDE or compiler did that for you.

Some debuggers, like WinDbg, will NOT load a mismatched PDB unless you force this to happen. Others, like Visual Studio 2003, will do this without so much as a complaint. That can be very confusing in some situations.

The module you are trying to debug is not loaded into the process.

If you are attempting to hit a breakpoint, a good sanity check is to make sure that the module you are trying to debug is being loaded. If it isn't, you are experiencing something other than a PDB problem. For example, your application could be experiencing an exception before your code is called. Make sure you have "break on exceptions" turned on.

In Visual Studio, you can look at the "modules" window to make sure your binary is loaded. The corresponding command in WinDbg is lm.

You are attached with the wrong debugger.

This is another non-symbol issue. There are many different kinds of debuggers:

  • Native debuggers: cdb, ntsd, windbg, Visual Studio.
  • CLR debuggers: cordbg, mdbg, deblector, Visual Studio.
  • Oddballs: the VB6 IDE, VS.NET's script debugger, etc.

You should make sure you are attaching to or starting your process using the right kind of debugger. Visual Studio gives you a dialog to choose the appropriate debugger(s).

The source file is not recognized.

Say I build a .dll from c:\foo\x.cpp, and you take that library from me and try to set a breakpoint in c:\bar\x.cpp. You might need to tell the debugger that your source path is different. Visual Studio is usually pretty good at figuring this out on its own and/or asking you when it is having a problem.

The _NT_SOURCE_PATH environment variable is honored as a search path for source files by most of the Windows debuggers. The .srcpath command in WinDbg/CDB can be used to display and change the source search path in those debuggers.

The symbols are corrupt.

This is rare, but it can happen. The fix is usually to just rebuild the binary and regenerate the PDB.

The binary is not debuggable.

.NET assemblies have the added limitation that they must be built in /debug mode for conventional CLR debugging to work. You can tell if a module is debuggable by opening it up in Reflector or ILDASM. You should see this attribute:

[assembly: Debuggable(true, true)]

Where does the debugger get the symbols?

Most Windows debuggers work like this:

  • The folder containing the binary is searched for a matching PDB.
  • The debugger's symbol search path is then searched. Most Windows debuggers (including Visual Studio) take their default search path from the _NT_SYMBOL_PATH environment variable.

How can I tell where the debugger is getting its symbols?

In WinDbg/CDB, the !sym noisy command can be used to show verbose output when the debugger is trying to resolve symbols. Visual Studio 2005 shows the symbol path as a column in the Modules window.

I am not aware of a simple way to identify where Visual Studio 2003 is loading its symbols from. The easiest way might be to use the handle utility from Sysinternals:

C:\src>handle foo.pdb

Handle v3.01
Copyright (C) 1997-2005 Mark Russinovich
Sysinternals - www.sysinternals.com

devenv.exe      pid: 4832   C:\WINDOWS\Microsoft.NET\Framework\v1.1.4322\Temp
orary ASP.NET Files\someapp\dbf846f9\1ab0397c\assembly\dl2\c33a0bcb\eca89e48_3
b78c601\foo.PDB

In this case ASP.NET has copied the binary and the PDB to a temporary location. This is another way a PDB can wind up being mismatched.

How are PDB's matched to binaries?

The short version: the PDB contains a date stamp and a checksum which is matched against the binaries.

How can I tell if a PDB matches a binary?

The symchk utility that comes with the Debugging Tools for Windows can be used to do this. There are many options for this utility, so you should read the output of symchk /?.

Some Twists on Blocked Finalizers

Blocked finalizer threads have gotten some recent publicity on Tess Ferrandez's blog. I recently ran into this myself, although the particulars were slightly different.

Our problem was manifesting itself in a very long-running console application that both connected to a database and made web service calls. Eventually the app would start hitting OutOfMemoryExceptions. (Completely unrelated to these OutOfMemoryExceptions. We have a high incidence of OutOfMemoryExceptions around here.)

Fixing the Wrong Problem and Moving the Goalposts

My initial reaction to this problem was (sensibly, I think) to attempt to reduce the amount of memory that the application was using. I admit that I was also focused on different problems at the time, so any makeshift solution I could get to work here would have been just fine with me.

We threw in some buffer pools, et cetera, and managed to get the app running slightly longer, but the OOM's always resurfaced. Every time they did, the heap looked completely different due to the semi-drastic changes we were making to the app's memory profile. After the third or maybe fourth change it dawned on me that perhaps we were just confusing the real issue.

The Proverbial Lightbulb Over the Head

The turning point came when I noticed that most of the overly-abundant objects on the heap in a particular dump were types that were likely to be associated with finalizers.

0:000> !dumpheap -stat
------------------------------
...
2,431,972    48,639,440 System.Threading.WaitHandle/__WaitHandleHandleProtector
...

This hadn't been the case in the previous dumps. Those were eaten alive by byte arrays and strings. By this point, we had done such a fantastic job of optimizing the application for memory usage that the problem came more clearly into focus.

The Road to Victory

At this point I used !threads in SOS to locate the finalizer thread (some useless info trimmed out here):

0:000> !threads
ID APT   Exception
0  STA System.OutOfMemoryException
4  MTA (Finalizer)

Thread 4 was stuck in a WaitForSingleObject and, not unrelated to the problem here, the native stacks of almost all of the other threads in the process were in some variation of "WaitUntilGCComplete." Probably should have noticed that sooner. So sue me.

Anyway, the top of the finalizer's stack was interesting:

0:004> ~4k
ChildEBP RetAddr
00f0f658 7c822124 ntdll!KiFastSystemCallRet
00f0f65c 77e6baa8 ntdll!NtWaitForSingleObject+0xc
00f0f6cc 77e6ba12 kernel32!WaitForSingleObjectEx+0xac
00f0f6e0 776c54ef kernel32!WaitForSingleObject+0x12
00f0f6fc 77789905 ole32!GetToSTA+0x6f
00f0f71c 77787ed7 ole32!CRpcChannelBuffer::SwitchAptAndDispatchCall+0xcb

The finalizer is waiting to make a call over to an STA thread. Why's it doing that? And wait, what STA thread? Funny you should ask.

Visual Basic Complications

No technical problem at my post-VBScript company would be complete without Visual Basic putting a unique spin on things. This one is no exception. The authors of this app are fairly oblivious to COM apartments and didn't intentionally create any STA threads. They also weren't using COM Interop directly.

So how did thread zero end up being an STA thread (the astute among you may have noticed this in the !threads output above)? The MSDN documentation seems to clearly say that MTA is the default.

The reason was that the app was written in Visual Basic. If the Main() method in a VB program is lacking the STAThreadAttribute and the MTAThreadAttribute, the VB compiler will stick an STAThreadAttribute on it for you when the MSIL is emitted (the C# compiler doesn't do this). Presumably–and I'm guessing here but I think this is a good guess–this is done for compatibility with VB6.

The main thread of this application was sleeping the vast majority of the time. Admittedly, this is not a fantastic example of multithreaded design, but it wouldn't have caused any problems if the thread were MTA. As Raymond has pointed out, sleeping on an STA thread can have some ill effects.

The Fix

This problem was solved by decorating the main method with <MTAThread()>. This is possibly the smallest fix to an apparently massive problem I have ever personally witnessed.

How to Notice This Problem Much Sooner

There are a few ways this could have been less painful. The easiest might have been to check the output of SOS's !finalizequeue command. If I had done that I would have seen the following (again trimmed for clarity):

Heap 0
generation 0 has 0 finalizable objects
generation 1 has 0 finalizable objects
generation 2 has 11,212 finalizable objects
Ready for finalization 1,163,295 objects

Which does not look healthy. Looking more closely at the native stacks would have also probably yielded a solution on day one.

Please Stop Spamming the Debugger Output

When I'm writing a GUI or a multithreaded server application, I make heavy use of OutputDebugString (and various API's that map to it). This is a very useful tool when the act of stepping through an application has side effects that drastically change the experiment. Unfortunately, it looks everyone else likes this API, too.

I try to be courteous and pay my developer taxes here–when my application is deployed to customers, rest assured that it will not be writing debugger output by default. I put my calls to OutputDebugString on a switch in one way or another, or I just compile them out in release mode.

(This is not just being kind to anyone trying to debug something else on a machine running my application. I need to do this to avoid embarrassment. I curse quite a bit in my traces.)

Not everybody is taking the trouble to do this. To wit:

Debugger output spam from Visual Studio

If I open up dbmon or debugview and wave my mouse around a little, i'm inundated with messages from these applications:

  • Visual Studio 2005 [by far the worst].
  • Visual SourceSafe [a close second place].
  • Trillian
  • The remnants of Symantec products that I have not yet succeeded in disabling.

I find it particularly annoying that my development tools are hampering my development in this way.

Pointless Minesweeper Source Code

The source code for my Pointless Minesweeper Clone can be found here. Enjoy it.

The license is BSD (you may know this as the "hey, go nuts" license).

Pointless Minesweeper Clone

Minesweeper++

My roommate, a grizzled veteran of Cornell's CS 211, made an announcement a week or two ago. It was something along the lines of, "gee whiz, I'm glad they didn't make us write minesweeper as a project. It looks impossible." To prove my vast intellectual superiority I made a winforms clone in about six or seven hours, including the time it took to draw the graphics and perform extensive "QA."

You can start the clickonce app from this link.

Screenshot:

Minesweeper++ screenshot

The source will be forthcoming. Bugs and so forth, feel free to contact me. The rendering is a bit on the slow side for my tastes, so you needn't mention that.

Debugger Exercise: Displaying a Function’s Return Value

I found myself needing to automatically manipulate the return value of a managed function in release code today. I thought this would make an interesting little writeup.

For the purposes of the demonstration, let's use this example program:

class Program
{
    static void Main()
    {
        while (!Console.KeyAvailable)
        {
            Console.WriteLine(FooBar(false));
            Console.WriteLine(FooBar2());
        }
    }

    static string FooBar(bool b)
    {
        if (b)
        {
            return "b";
        }
        return "a";
    }

    static string FooBar2()
    {
        return FooBar(true);
    }
}

The function we're interested in will be FooBar. I've made things interesting by calling the function from two places–let's assume we're not sure where the function is called from. We'll make things more interesting still by accepting the limitation that we won't cheat and set a breakpoint on the ret instruction in FooBar. The function I needed to instrument today was a little monolithic, so let's take it as a given that we need to be able to handle a function that might be long and might have more than one exit point.

Compile this program and start it in WinDbg. Start by loading SOS:

0:003> .loadby sos mscorwks

The first thing we'll do is set a breakpoint on FooBar. (I'm going to do this manually so that these steps will work with the CLR v1.1. The new version of SOS makes this slightly easier. I'm also going to cover this quickly, so if you want a clearer explanation of what's going on in this next step see Eran Sandler's writeup here.)

To set the breakpoint, let the program run for a second so that the method is compiled by the JIT. Then we need to find and dump out the MethodTable for the class:

0:003> !name2ee retaddr!retaddr.Program
Module: 00912d5c (retaddr.exe)
Token: 0x02000002
MethodTable: 00913140
EEClass: 009112ec
Name: retaddr.Program

0:003> !dumpmt -md 00913140
EEClass: 009112ec
Module: 00912d5c
Name: retaddr.Program
mdToken: 02000002
BaseSize: 0xc
ComponentSize: 0x0
Number of IFaces in IFaceMap: 0
Slots in VTable: 8
--------------------------------------
MethodDesc Table
   Entry MethodDesc      JIT Name
...
00d00070   00913120      JIT retaddr.Program.Main()
00d000d8   00913128      JIT retaddr.Program.FooBar(Boolean)
00d00130   00913130      JIT retaddr.Program.FooBar2()
...

Now that we have the MethodTable, we need to dump out the MethodDesc for FooBar to get the virtual address of the jitted code.

0:003> !dumpmd 00913128
Method Name: retaddr.Program.FooBar(Boolean)
Class: 009112ec
MethodTable: 00913140
mdToken: 06000002
Module: 00912d5c
IsJitted: yes
m_CodeOrIL: 00d000d8

The compiled method begins at m_CodeOrIL (this is called "Method VA" in SOS 1.1). We can set a breakpoint there with "bp 00d000d8."

Here comes the trick. We know that the return address to the method should be on the top of the stack when the method is called. In other words, the ESP register points at the return address. We can set a dynamic breakpoint on that return address using a command like this:

bp 00d000d8 "bp poi(@esp); g"

If you try this, you'll notice that we break right after FooBar has returned to its caller. As with any function obeying one of the standard calling conventions, the return value is stored in the EAX register. We can give the dynamic breakpoint a command of its own that manipulates this value like this:

bp 00d000d8 "bp poi(@esp) \"!do -nofields @eax; g\"; g"

If we run the sample program now, we'll see this output repeating itself:

breakpoint 1 redefined
Name: System.String
MethodTable: 790fa3e0
EEClass: 790fa340
Size: 20(0x14) bytes
String: a

breakpoint 2 redefined
Name: System.String
MethodTable: 790fa3e0
EEClass: 790fa340
Size: 20(0x14) bytes
String: b

Voila.

Beware the OutOfMemoryException Red Herring

Low memory conditions have a way of executing lightly-or-never-tested code paths. Bugs in these paths can sometimes cast a penumbra over the real reason your application is crashing.

Worse, the resulting exceptions or crashes are likely to be more or less nondeterministic. Worse still, the most obvious steps with SOS rarely work once the CLR or the IIS worker process has started to OOM and has tossed you an access violation.

As a concrete example–this is a !clrstack from an application that appears to be crashing with an IOException.

0:026> !clrstack
Thread 26
ESP         EIP
0x10e3ea90  0x77e55dea [FRAME: GCFrame]

This doesn't tell us much, but if we look at the native stack:

0:026> k
ChildEBP RetAddr
10e3e9e8 79216aed kernel32!RaiseException+0x53
WARNING: Stack unwind information not available. Following frames may be wrong.
10e3ea40 79216a70 mscorsvr!GetAssemblyMDImport+0x2e1d4
10e3ea68 79216a24 mscorsvr!GetAssemblyMDImport+0x2e157
10e3ea78 79265ec2 mscorsvr!GetAssemblyMDImport+0x2e10b
10e3eab4 7924e9d3 mscorsvr!CoEEShutDownCOM+0x12dc1
10e3eac8 7c82eeb2 mscorsvr!GetMetaDataPublicInterfaceFromInternal+0x7e73
10e3eaec 7c82ee84 ntdll!ExecuteHandler2+0x26
10e3eb94 7c82ecc6 ntdll!ExecuteHandler+0x24
10e3eb94 77e55dea ntdll!KiUserExceptionDispatcher+0xe
10e3eee0 79216aed kernel32!RaiseException+0x53
10e3ef38 7924c8d0 mscorsvr!GetAssemblyMDImport+0x2e1d4
10e3efe8 791b3208 mscorsvr!GetMetaDataPublicInterfaceFromInternal+0x5d70
10e3eff4 791b3ad7 mscorsvr!Ordinal76+0x3208
10e3f108 791d3ef0 mscorsvr!Ordinal76+0x3ad7
10e3f1c4 791d3fa4 mscorsvr!GetCompileInfo+0x13e3
10e3f1ec 7931392f mscorsvr!GetCompileInfo+0x1497
10e3f21c 791cc3c8 mscorsvr!ReleaseFusionInterfaces+0x5f3d7
10e3f264 793139e1 mscorsvr!Ordinal76+0x1c3c8
10e3f2d8 7924a3f2 mscorsvr!ReleaseFusionInterfaces+0x5f489
10e3f2f0 791d4096 mscorsvr!GetMetaDataPublicInterfaceFromInternal+0x3892

We can see from the frames in the middle (the lower kernel32!RaiseException through the subsequent ntdll calls) that the current exception is being raised from inside another exception handler (a catch block, in managed terms). The current exception doesn't appear to reflect this:

0:026> !cen
System.IO.IOException (0xbab602fc)

0:026> !do 0xbab602fc
Name: System.IO.IOException
-----------------
Exception bab602fc in MT 79bd1234: System.IO.IOException

But with some wizardry, we can coerce the relevant information. We can hypothesize that the original exception would be passed as a parameter somewhere before the original call to kernel32!RaiseException. Isolating just that frame on the stack, and the frame before it:

0:026> kb
…
10e3eee0 79216aed e0434f4d 00000001 00000000 kernel32!RaiseException+0x53
…
10e3ef38 7924c8d0 bab5ce60 00000000 bab5ce60 mscorsvr!GetAssemblyMDImport+0x2e1d4

The address 0xbab5ce60 (second line) looks like a good candidate, being close to the address of the current exception while not being equal to it.

0:026> !do bab5ce60
Name: System.IO.IOException
-----------------
Exception bab5ce60 in MT 79bd1234: System.IO.IOException
_message: Unable to create a transport connection.
_innerException:
   Exception 06c6003c in MT 79b96cec: System.OutOfMemoryException
   _remoteStackTraceString:

Bingo. (Mind you, we haven't taken any steps yet to actually fix the problem.) This was relatively easy, since all of the relevant information could be reconstructed from the offending stack. In other situations, you may need to !DumpAllExceptions to see what has been happening recently in the application.

Note: a very good hint that you are looking at an OOM situation is that your crash dump is exactly two gigabytes in size.