Posts Tagged ‘Windows’

Why is my application hanging in weird places?

Here's a problem:

  1. You have an application that's hanging permanently or temporarily.
  2. The hang does not occur in any synchronization code that you have written yourself.
  3. You are writing a windows forms application, or any kind of application using COM interop.

Let me present an educated guess as to what is happening. Note to people getting here by googling: there are actually many other reasons this might happen in addition to the one I am going to tell you about. I merely suggest that you spend more time thinking before being led too far down a rabbit hole by what I am about to write. Experience has taught me that people tend to lock on to one internet analysis (say, an unrelated KB article mentioning the same exception text) and waste a lot of time on it. None of the other sites seem to be nice enough to explicitly warn you about your own tendencies–I am, you should thank me.

With those disclaimers, here is my psychic debugging response:

  1. You are inadvertently calling a method on an STA object created on a different thread.
  2. That other thread is busy doing something else.

For people that managed (har har) to avoid COM development, or stayed completely in the insulated VB6 world, I will now explain briefly everything you really need to know about threading models in this, our fantastic futuristic age.

Methods on single threaded apartment (STA) objects are synchronized. The object receives all calls to its methods on the thread that owns it. This is achieved by creating a hidden window that receives messages for any calls made on those objects. Multi-threaded apartment (MTA) objects behave more or less like any .NET object you create: calls are not automatically synchronized, and you are responsible for doing any synchronization necessary yourself.

So in order for calls to STA objects to work correctly, the thread that owns them has to be free to run its message loop a sufficient percentage of the time. Running a computationally-intensive process on an STA thread is a great way to hang the other threads in your application, if you are careless or ignorant of where the STA calls are.

If you attach with Windbg and dump out the native stacks (~*k), you will probably see threads in calls to WaitOne that got there through functions named things like "SendReceive" above some interop marshalling code if this is indeed your problem. If you are lucky you might see a function called "GetToSTA," however, the stacks unfortunately do not always make things that easy.

This is one of those things that I have seen a bunch of times, but seems to be on the radar of an increasingly small number of people. It often takes me longer than it should to figure it out, too. If you are doing any kind of practical Windows development, you still need to understand the basics of STA/MTA. That's because any development that is "practical" will almost necessarily involve interoperating with old code.

Here's a related post that I wrote, about how STA objects can result in blocked finalizer threads.

Is XAML an Elaborate Joke?

Don Box says:

As a WPF user, I spend at least as much time reading and writing XAML as I do reading/writing C# code that does WPF-isms. I do spend a lot of time in C#, but little of it is WPF specific, which arguably is one of the strengths of WPF’s data/content facilities.

In my opinion, the sooner folks get thrown into the XAML pool the sooner they learn to swim.

(Emphasis mine.)

I thought the whole point of selling our souls to the XML devils was that development tools would deal with that slop for us.

Forget swimming. I am having trouble seeing any non-masochistic reason for me to learn this at all. I am just not getting through the Microsoft marketing on this one. I say this as somebody with deep-to-extremely-deep knowledge of most of the Microsoft tools of the last 10 years–I'm not Don Box, but I'm also no slouch.

Give me one good reason why I should spend my spare time learning an API that looks like this. (And it would be my spare time. I am still living here in the real world, where it still seems to be occasionally more practical to write Windows platform code in C++, for god's sake.)

Wikipedia reassures me–in a rather hilarious, pleonastic way–that this is "simply XML."

[A] key aspect of the technology is the reduced complexity needed for tools to process XAML, because it is simply XML. As a result, a variety of products are emerging, particularly in the WPF space, which create XAML-based applications. As XAML is simply based on XML, developers and designers are able to share and edit content freely amongst themselves without requiring compilation.

(Emphasis mine.)

Show me one XML-based API that does not, in one way or another, prove this to be an oxymoron. (See also: "Java programmer writes application which reads 243 XML files on startup then wonders why it takes 30 seconds to start." Thanks, Slava.)

Sometimes trite observations become trite because they are obviously correct. Who created the creator? Why re-invent the s-expressions that have been around since the 50's, but poorly? Et cetera. In these cases the fact that the criticism is well-worn does nothing to detract from it.

Maybe I am just being a grizzled old fogey (at the incredibly advanced age of 27.) It wouldn't be the first time that I have been described as a crotchety septugenarian in the body of a young adult. If I am way off the mark here, please enlighten me.

Update, May 20th: Jon Harrop avoids my bitching and gives solutions. See XAML or F#.

I’ll See You in Hell, ATI

I hate you guys. Seriously, drop dead.

SERVICE_NAME: ati hotkey poller
        TYPE               : 110  WIN32_OWN_PROCESS
                                  (interactive)
        STATE              : 4  RUNNING
                                (NOT_STOPPABLE,
                                 NOT_PAUSABLE,
                                 ACCEPTS_SHUTDOWN)
        WIN32_EXIT_CODE    : 0  (0x0)
        SERVICE_EXIT_CODE  : 0  (0x0)
        CHECKPOINT         : 0x0
        WAIT_HINT          : 0x0
        PID                : 1132
        FLAGS              :

Debugging and Symbols FAQ

This is something I typed up internally, to help resolve the confusion that precipitates when Visual Studio begins stepping through comments. Hopefully this will be helpful to someone else.

What is required for source mode debugging?

  • Binaries (.dll, .exe, .ocx, …).
  • Symbols (the .pdb file).
  • Source files.

The symbol files tell your debugger how points in the binary relate to the source files. When you set a breakpoint in a source file, the debugger uses the symbol file to look up the corresponding instruction in the binary.

I have all of these things. Why is my debugger behaving strangely or not working at all?

Well, there are a bunch of different things that can go wrong.

Your symbols might be mismatched.

You can't (as an example) use the symbols for version 2.0 of a DLL to debug version 1.0 of that DLL. Perhaps, in this case, you have copied a new binary into your run directory without copying the corresponding .pdb file. It's also possible that your IDE or compiler did that for you.

Some debuggers, like WinDbg, will NOT load a mismatched PDB unless you force this to happen. Others, like Visual Studio 2003, will do this without so much as a complaint. That can be very confusing in some situations.

The module you are trying to debug is not loaded into the process.

If you are attempting to hit a breakpoint, a good sanity check is to make sure that the module you are trying to debug is being loaded. If it isn't, you are experiencing something other than a PDB problem. For example, your application could be experiencing an exception before your code is called. Make sure you have "break on exceptions" turned on.

In Visual Studio, you can look at the "modules" window to make sure your binary is loaded. The corresponding command in WinDbg is lm.

You are attached with the wrong debugger.

This is another non-symbol issue. There are many different kinds of debuggers:

  • Native debuggers: cdb, ntsd, windbg, Visual Studio.
  • CLR debuggers: cordbg, mdbg, deblector, Visual Studio.
  • Oddballs: the VB6 IDE, VS.NET's script debugger, etc.

You should make sure you are attaching to or starting your process using the right kind of debugger. Visual Studio gives you a dialog to choose the appropriate debugger(s).

The source file is not recognized.

Say I build a .dll from c:\foo\x.cpp, and you take that library from me and try to set a breakpoint in c:\bar\x.cpp. You might need to tell the debugger that your source path is different. Visual Studio is usually pretty good at figuring this out on its own and/or asking you when it is having a problem.

The _NT_SOURCE_PATH environment variable is honored as a search path for source files by most of the Windows debuggers. The .srcpath command in WinDbg/CDB can be used to display and change the source search path in those debuggers.

The symbols are corrupt.

This is rare, but it can happen. The fix is usually to just rebuild the binary and regenerate the PDB.

The binary is not debuggable.

.NET assemblies have the added limitation that they must be built in /debug mode for conventional CLR debugging to work. You can tell if a module is debuggable by opening it up in Reflector or ILDASM. You should see this attribute:

[assembly: Debuggable(true, true)]

Where does the debugger get the symbols?

Most Windows debuggers work like this:

  • The folder containing the binary is searched for a matching PDB.
  • The debugger's symbol search path is then searched. Most Windows debuggers (including Visual Studio) take their default search path from the _NT_SYMBOL_PATH environment variable.

How can I tell where the debugger is getting its symbols?

In WinDbg/CDB, the !sym noisy command can be used to show verbose output when the debugger is trying to resolve symbols. Visual Studio 2005 shows the symbol path as a column in the Modules window.

I am not aware of a simple way to identify where Visual Studio 2003 is loading its symbols from. The easiest way might be to use the handle utility from Sysinternals:

C:\src>handle foo.pdb

Handle v3.01
Copyright (C) 1997-2005 Mark Russinovich
Sysinternals - www.sysinternals.com

devenv.exe      pid: 4832   C:\WINDOWS\Microsoft.NET\Framework\v1.1.4322\Temp
orary ASP.NET Files\someapp\dbf846f9\1ab0397c\assembly\dl2\c33a0bcb\eca89e48_3
b78c601\foo.PDB

In this case ASP.NET has copied the binary and the PDB to a temporary location. This is another way a PDB can wind up being mismatched.

How are PDB's matched to binaries?

The short version: the PDB contains a date stamp and a checksum which is matched against the binaries.

How can I tell if a PDB matches a binary?

The symchk utility that comes with the Debugging Tools for Windows can be used to do this. There are many options for this utility, so you should read the output of symchk /?.

Some Twists on Blocked Finalizers

Blocked finalizer threads have gotten some recent publicity on Tess Ferrandez's blog. I recently ran into this myself, although the particulars were slightly different.

Our problem was manifesting itself in a very long-running console application that both connected to a database and made web service calls. Eventually the app would start hitting OutOfMemoryExceptions. (Completely unrelated to these OutOfMemoryExceptions. We have a high incidence of OutOfMemoryExceptions around here.)

Fixing the Wrong Problem and Moving the Goalposts

My initial reaction to this problem was (sensibly, I think) to attempt to reduce the amount of memory that the application was using. I admit that I was also focused on different problems at the time, so any makeshift solution I could get to work here would have been just fine with me.

We threw in some buffer pools, et cetera, and managed to get the app running slightly longer, but the OOM's always resurfaced. Every time they did, the heap looked completely different due to the semi-drastic changes we were making to the app's memory profile. After the third or maybe fourth change it dawned on me that perhaps we were just confusing the real issue.

The Proverbial Lightbulb Over the Head

The turning point came when I noticed that most of the overly-abundant objects on the heap in a particular dump were types that were likely to be associated with finalizers.

0:000> !dumpheap -stat
------------------------------
...
2,431,972    48,639,440 System.Threading.WaitHandle/__WaitHandleHandleProtector
...

This hadn't been the case in the previous dumps. Those were eaten alive by byte arrays and strings. By this point, we had done such a fantastic job of optimizing the application for memory usage that the problem came more clearly into focus.

The Road to Victory

At this point I used !threads in SOS to locate the finalizer thread (some useless info trimmed out here):

0:000> !threads
ID APT   Exception
0  STA System.OutOfMemoryException
4  MTA (Finalizer)

Thread 4 was stuck in a WaitForSingleObject and, not unrelated to the problem here, the native stacks of almost all of the other threads in the process were in some variation of "WaitUntilGCComplete." Probably should have noticed that sooner. So sue me.

Anyway, the top of the finalizer's stack was interesting:

0:004> ~4k
ChildEBP RetAddr
00f0f658 7c822124 ntdll!KiFastSystemCallRet
00f0f65c 77e6baa8 ntdll!NtWaitForSingleObject+0xc
00f0f6cc 77e6ba12 kernel32!WaitForSingleObjectEx+0xac
00f0f6e0 776c54ef kernel32!WaitForSingleObject+0x12
00f0f6fc 77789905 ole32!GetToSTA+0x6f
00f0f71c 77787ed7 ole32!CRpcChannelBuffer::SwitchAptAndDispatchCall+0xcb

The finalizer is waiting to make a call over to an STA thread. Why's it doing that? And wait, what STA thread? Funny you should ask.

Visual Basic Complications

No technical problem at my post-VBScript company would be complete without Visual Basic putting a unique spin on things. This one is no exception. The authors of this app are fairly oblivious to COM apartments and didn't intentionally create any STA threads. They also weren't using COM Interop directly.

So how did thread zero end up being an STA thread (the astute among you may have noticed this in the !threads output above)? The MSDN documentation seems to clearly say that MTA is the default.

The reason was that the app was written in Visual Basic. If the Main() method in a VB program is lacking the STAThreadAttribute and the MTAThreadAttribute, the VB compiler will stick an STAThreadAttribute on it for you when the MSIL is emitted (the C# compiler doesn't do this). Presumably–and I'm guessing here but I think this is a good guess–this is done for compatibility with VB6.

The main thread of this application was sleeping the vast majority of the time. Admittedly, this is not a fantastic example of multithreaded design, but it wouldn't have caused any problems if the thread were MTA. As Raymond has pointed out, sleeping on an STA thread can have some ill effects.

The Fix

This problem was solved by decorating the main method with <MTAThread()>. This is possibly the smallest fix to an apparently massive problem I have ever personally witnessed.

How to Notice This Problem Much Sooner

There are a few ways this could have been less painful. The easiest might have been to check the output of SOS's !finalizequeue command. If I had done that I would have seen the following (again trimmed for clarity):

Heap 0
generation 0 has 0 finalizable objects
generation 1 has 0 finalizable objects
generation 2 has 11,212 finalizable objects
Ready for finalization 1,163,295 objects

Which does not look healthy. Looking more closely at the native stacks would have also probably yielded a solution on day one.

Please Stop Spamming the Debugger Output

When I'm writing a GUI or a multithreaded server application, I make heavy use of OutputDebugString (and various API's that map to it). This is a very useful tool when the act of stepping through an application has side effects that drastically change the experiment. Unfortunately, it looks everyone else likes this API, too.

I try to be courteous and pay my developer taxes here–when my application is deployed to customers, rest assured that it will not be writing debugger output by default. I put my calls to OutputDebugString on a switch in one way or another, or I just compile them out in release mode.

(This is not just being kind to anyone trying to debug something else on a machine running my application. I need to do this to avoid embarrassment. I curse quite a bit in my traces.)

Not everybody is taking the trouble to do this. To wit:

Debugger output spam from Visual Studio

If I open up dbmon or debugview and wave my mouse around a little, i'm inundated with messages from these applications:

  • Visual Studio 2005 [by far the worst].
  • Visual SourceSafe [a close second place].
  • Trillian
  • The remnants of Symantec products that I have not yet succeeded in disabling.

I find it particularly annoying that my development tools are hampering my development in this way.

Pointless Minesweeper Source Code

The source code for my Pointless Minesweeper Clone can be found here. Enjoy it.

The license is BSD (you may know this as the "hey, go nuts" license).

Pointless Minesweeper Clone

Minesweeper++

My roommate, a grizzled veteran of Cornell's CS 211, made an announcement a week or two ago. It was something along the lines of, "gee whiz, I'm glad they didn't make us write minesweeper as a project. It looks impossible." To prove my vast intellectual superiority I made a winforms clone in about six or seven hours, including the time it took to draw the graphics and perform extensive "QA."

You can start the clickonce app from this link.

Screenshot:

Minesweeper++ screenshot

The source will be forthcoming. Bugs and so forth, feel free to contact me. The rendering is a bit on the slow side for my tastes, so you needn't mention that.

Beware the OutOfMemoryException Red Herring

Low memory conditions have a way of executing lightly-or-never-tested code paths. Bugs in these paths can sometimes cast a penumbra over the real reason your application is crashing.

Worse, the resulting exceptions or crashes are likely to be more or less nondeterministic. Worse still, the most obvious steps with SOS rarely work once the CLR or the IIS worker process has started to OOM and has tossed you an access violation.

As a concrete example–this is a !clrstack from an application that appears to be crashing with an IOException.

0:026> !clrstack
Thread 26
ESP         EIP
0x10e3ea90  0x77e55dea [FRAME: GCFrame]

This doesn't tell us much, but if we look at the native stack:

0:026> k
ChildEBP RetAddr
10e3e9e8 79216aed kernel32!RaiseException+0x53
WARNING: Stack unwind information not available. Following frames may be wrong.
10e3ea40 79216a70 mscorsvr!GetAssemblyMDImport+0x2e1d4
10e3ea68 79216a24 mscorsvr!GetAssemblyMDImport+0x2e157
10e3ea78 79265ec2 mscorsvr!GetAssemblyMDImport+0x2e10b
10e3eab4 7924e9d3 mscorsvr!CoEEShutDownCOM+0x12dc1
10e3eac8 7c82eeb2 mscorsvr!GetMetaDataPublicInterfaceFromInternal+0x7e73
10e3eaec 7c82ee84 ntdll!ExecuteHandler2+0x26
10e3eb94 7c82ecc6 ntdll!ExecuteHandler+0x24
10e3eb94 77e55dea ntdll!KiUserExceptionDispatcher+0xe
10e3eee0 79216aed kernel32!RaiseException+0x53
10e3ef38 7924c8d0 mscorsvr!GetAssemblyMDImport+0x2e1d4
10e3efe8 791b3208 mscorsvr!GetMetaDataPublicInterfaceFromInternal+0x5d70
10e3eff4 791b3ad7 mscorsvr!Ordinal76+0x3208
10e3f108 791d3ef0 mscorsvr!Ordinal76+0x3ad7
10e3f1c4 791d3fa4 mscorsvr!GetCompileInfo+0x13e3
10e3f1ec 7931392f mscorsvr!GetCompileInfo+0x1497
10e3f21c 791cc3c8 mscorsvr!ReleaseFusionInterfaces+0x5f3d7
10e3f264 793139e1 mscorsvr!Ordinal76+0x1c3c8
10e3f2d8 7924a3f2 mscorsvr!ReleaseFusionInterfaces+0x5f489
10e3f2f0 791d4096 mscorsvr!GetMetaDataPublicInterfaceFromInternal+0x3892

We can see from the frames in the middle (the lower kernel32!RaiseException through the subsequent ntdll calls) that the current exception is being raised from inside another exception handler (a catch block, in managed terms). The current exception doesn't appear to reflect this:

0:026> !cen
System.IO.IOException (0xbab602fc)

0:026> !do 0xbab602fc
Name: System.IO.IOException
-----------------
Exception bab602fc in MT 79bd1234: System.IO.IOException

But with some wizardry, we can coerce the relevant information. We can hypothesize that the original exception would be passed as a parameter somewhere before the original call to kernel32!RaiseException. Isolating just that frame on the stack, and the frame before it:

0:026> kb
…
10e3eee0 79216aed e0434f4d 00000001 00000000 kernel32!RaiseException+0x53
…
10e3ef38 7924c8d0 bab5ce60 00000000 bab5ce60 mscorsvr!GetAssemblyMDImport+0x2e1d4

The address 0xbab5ce60 (second line) looks like a good candidate, being close to the address of the current exception while not being equal to it.

0:026> !do bab5ce60
Name: System.IO.IOException
-----------------
Exception bab5ce60 in MT 79bd1234: System.IO.IOException
_message: Unable to create a transport connection.
_innerException:
   Exception 06c6003c in MT 79b96cec: System.OutOfMemoryException
   _remoteStackTraceString:

Bingo. (Mind you, we haven't taken any steps yet to actually fix the problem.) This was relatively easy, since all of the relevant information could be reconstructed from the offending stack. In other situations, you may need to !DumpAllExceptions to see what has been happening recently in the application.

Note: a very good hint that you are looking at an OOM situation is that your crash dump is exactly two gigabytes in size.

C++ and VS2005 Deployment PSA

This is a straightforward fix, but I couldn't find anything about it anywhere else. I have a handful of old ATL projects that have been upgraded over the years from Visual C++ 6.0, through VS2003 and are now being built using Visual C++ 8. I was trying to add them to a deployment project today, and kept running into this error:

Two or more objects have the same target location
('[targetdir]\regsvr32.trg').

regsvr32 error building a migrated C++ project

The issue is that the way libraries are registered has changed over the various incarnations of Visual Studio. My old projects were still using a post-build step to perform registration:

VC++ project using a post-build step to register itself

This apparently can create the output file that the deployment package is complaining about. The fix is to delete the post build steps and enable the modern registration option on the Linker's General tab.

VC++ project being registered via the new method.