Posts Tagged ‘ASP.NET’

The Russian Doll Approach to Web Services

Here's an anecdote for the WTF inbox. I assure you this is very real, but I cannot divulge any of the specifics.

Some time ago a friend of mine was talking to a web services vendor, who was explaining his versioning scheme. The vendor's approach was to make all of the web service functions accept a single parameter describing the function being called, and the version of the function requested. Prototypically:

public object Foo(FunctionCallInfo)

And what is FunctionCallInfo, you ask? Why, it is a strict XML document adhering to a schema that he would provide.

My observation, which my friend also arrived at independently, was that this person was basically creating an implementation of web services inside of web services. So nat'ralists observe, a flea / Hath smaller fleas that on him prey.

XML: the cause of, and solution to, all of your development problems.

What would do in this situation?

Nonblocking Pool Class

This is not an original idea but I thought I would post/explain it anyway. This is a generalized version of a pattern I have been using for a while. I'm not sure where I first picked it up but I've seen it used in several places.

The purpose of this class is to pool instances of a particular type in a server application. The assumptions I am making about the problem are:

  • It is both possible and worthwhile to reuse instances of a certain type. Types that may fit this criteria are large arrays of primitive types, types that hold unmanaged or scarce resources such as connections, et al. Not all types fit this criteria, obviously.
  • It is more undesirable to have a thread enter a waiting state (fail to acquire a lock, in other words) than it is to create a new instance of the type being reused. That would be the case if the instances are somewhat cheap but the average request or call time to your server is relatively long.

The nice thing about this pool class is that it handles the second case gracefully. It will reuse objects as much as possible, but it won't block a thread in the case that the attempt fails. If it didn't, you might end up introducing massive contention in your attempt to increase throughput with a different, locking pool.

The class provides very lightweight synchronization using atomic operations - there's no use of critical sections (the lock keyword).

  /// <summary>
  /// Provides and reuses objects of type <typeparamref name="T"/>.
  /// </summary>
  /// <typeparam name="T">
  /// The type that is pooled. Must provide a default constructor.
  /// </typeparam>
  public class NonBlockingPool<T>
     where T : new()
  {
     // Contains the pooled items.
     private Stack<T> _stack;

     // The maximum size of _stack.
     private int _max;

     // This reference is used to ensure that only one thread
     // calls methods on _stack at a time.
     private object _lock = new object();

     /// <summary>
     /// Gets or sets the maximum size of the pool.
     /// </summary>
     public int MaximumSize
     {
        get { return _max; }
        set { _max = value; }
     }

     /// <summary>
     /// Gets a pooled instance of type <typeparamref name="T"/>,
     /// or yields a new instance.
     /// </summary>
     public T Get()
     {
        // If two threads enter this method at the same time,
        // only one will acquire _lock (the other will be given
        // null). The caller that fails to acquire _lock will
        // be returned a new instance of T.
        T ret = default(T);
        object obj = Interlocked.Exchange(ref _lock, null);
        try
        {
           if (obj != null && _stack.Count > 0)
           {
              ret = _stack.Pop();
           }
           else
           {
              ret = new T();
           }
        }
        finally
        {
           if (obj != null)
           {
              _lock = obj;
           }
        }
        return ret;
     }

     /// <summary>
     /// Reuses an instance of type <paramref name="T"/> in a
     /// subsequent request or call whenever possible.
     /// </summary>
     public void Reuse(T t)
     {
        // If two threads enter this method at the same time,
        // only one will acquire _lock (the other will be given
        // a null reference). The instance of T provided by
        // the losing thread will just be collected and not
        // reused.
        object obj = Interlocked.Exchange(ref _lock, null);
        try
        {
           if (obj != null && _stack.Count < _max)
           {
              _stack.Push(t);
           }
        }
        finally
        {
           if (obj != null)
           {
              _lock = obj;
           }
        }
     }

     /// <summary>
     /// Constructor.
     /// </summary>
     /// <param name="max">
     /// The maximum number of instances of
     /// <typeparamref name="T"/> to hold in the pool.
     /// </param>
     public NonBlockingPool(int max)
     {
        if (max < 0)
        {
           throw new ArgumentOutOfRangeException("max");
        }
        _stack = new Stack<T>(max);
        _max = max;
     }
  }

Here's a (contrived) minimal example of a consumer of such a pool. This server class makes a context object available to each thread for the duration of each request. This object is stored in a slot unique to each thread (specified with the ThreadStaticAttribute) while a ProcessRequest function is called. The instance is returned to the pool in a finally block after that call is finished.

  public class SampleServer
  {
     [ThreadStatic]
     private static ServerContext _context;

     private NonBlockingPool<ServerContext> _pool;

     public static ServerContext Context
     {
        get { return _context; }
     }

     internal void ProcessRequest(IServerApp app)
     {
        try
        {
           _context = _pool.Get();
           app.ProcessRequest();
        }
        finally
        {
           if (_context != null)
           {
              _context.Reset();
              _pool.Reuse(_context);

              // We want the context to be collected if it isn't
              // actually reused by the pool.
              _context = null;
           }
        }
     }
  }

A more concrete example might be an IHttpModule or a remoting server channel sink. As I said once already, it's important to consider 1) the type of resource you are pooling and 2) the amount of load your application is expecting before committing yourself to a pattern such as this one.

App Server Autopsy

Earlier this week, we had a production issue with application servers that seized up and stopped serving requests. On one impacted server,

  • The private bytes counter for the w3wp process was very high for this type of server. The process is usually under 400MB, but at the time of the problem it was about 1.2GB.
  • The CPU utilization was at 0%.
  • The server immediately returned an OutOfMemoryException to all Remoting callers.
  • The server was down for fifteen minutes with this condition and didn’t recover without an iisreset. We did of course get a hang dump first.

The problem occurred very early in the morning, seemingly on many servers at once. It happened again later in the day on just one server.

The most obvious thing to look for here is a memory leak in the application. Since this is a managed application, that could mean one of a few things:

  • A lot of memory tied up in objects that are accidentally rooted, and therefore correctly not being collected. For example, you could have a static Hashtable somewhere that just kept accumulating references.
  • Not a lot of objects allocated, but many objects pinned. Since the GC can’t move these objects, they could be causing heap fragmentation. In slightly less technical terms, we’ve got plenty of free memory but none of it in large enough blocks to be useful.
  • Memory being leaked in calls into unmanaged code: COM-Interop, P/Invoke, and so forth.

I know the application well enough to say that both (2) and (3) are pretty unlikely. We’ve tried to keep it as purely managed as possible, so there’s not much COM-Interop and few reasons to ever allocate a GCHandle.

Curiously enough, the answer turned out to be none of these possibilities.

The obvious first step was to load SOS and see just what was insisting on having all of this memory. The way we do that with SOS is with the “!dumpheap –stat” command. This was the tail end of the output.

…
0x020126b0     64,951    41,967,420 System.Int32[]
0x0e5e563c  1,059,033    42,361,320 System.Data.DataRow
0x02012970     63,235    47,628,552 System.Collections.Hashtable/bucket[]
0x0201209c    130,914    54,505,988 System.Object[]
0x79b94638  2,441,756   309,847,632 System.String
0x02012c3c     12,703   411,828,216 System.Byte[]

The 300MB of strings is not necessarily out of the ordinary. This is, after all, a web application. The 400MB of byte arrays is very strange indeed, however. Add to that the fact that there are relatively very few arrays taking up a huge amount of space.

Doing some quick math, I saw that the average byte array size was in the neighborhood of 32KB, and I had absolutely no idea what that could be.

I started what could become a grueling process of looking at an awful lot of byte arrays. The !dumpheap –type System.Byte[] command, aside from taking a very long time, printed out many long runs of arrays that looked like this:

…
0x027c8b68 0x02012c3c   31,756    2 System.Byte[]
0x027d0774 0x02012c3c   31,756    2 System.Byte[]
0x027d8380 0x02012c3c   31,756    2 System.Byte[]
0x027dff8c 0x02012c3c   31,756    2 System.Byte[]
0x027e7b98 0x02012c3c   31,756    2 System.Byte[]
0x027ef7a4 0x02012c3c   31,756    2 System.Byte[]
0x027f73b0 0x02012c3c   31,756    2 System.Byte[]
0x027fefbc 0x02012c3c   31,756    2 System.Byte[]
…

These were interspersed with shorter runs of basically random sizes. At this point I was sure I had at least located the objects causing the problem, but I still needed to find out where they were coming from.

The weird thing was that I couldn’t find roots for any of these arrays. I was starting to think something was wrong with !gcroot, because aside from also taking a very long time it wasn’t finding me any roots.

I noticed that almost all of the arrays were in GC generation 2 (gen2), which indicates that relatively speaking, they’d been around a long time. However, the first run of the arrays was still in gen0. Inspired, I tried to find the !gcroot of the address of one of those arrays. This is what I found:

HANDLE(Strong):23811d8:Root:0xa655000(System.Object[])->
0x26de25c(System.Web.UbyteBufferAllocator)->
0x26de274(System.Collections.Stack)->
0x293b12c(System.Object[])->0x2861fdc(System.Byte[])

This did not look at all familiar. This is saying that the byte array is rooted by an instance of a framework type called System.Web.UbyteBufferAllocator, which is itself rooted in a static field somewhere. UbyteBufferAllocator is an internal type, so none of our code could be creating it.

I turned away from WinDbg for a minute and opened up Reflector. I took a look at where this class is used:

UByteBufferAllocator users

Mercifully, there weren’t too many possibilities. The parameters to the UbyteBufferAllocator constructor determine the size of the byte arrays that it will create and maintain.

Checking the size of thee allocators beinig created in each of these cases, I found the match. The type initializer for System.Web.HttpResponseBufferElement owned the UbyteBufferAllocator that was creating these arrays.

HttpResponseBufferElement initializer

I took a look at the implementation of the allocator class. It maintains a pool of byte arrays, and provides them to the HttpResponseBufferElement type. It only pools a small number of buffers, yielding new buffers if the demand for them is very high.

The HttpResponseBufferElement class itself is only used from a few places:

  • System.Web.HttpWriter.BufferData
  • System.Web.HttpWriter.BufferResource
  • System.Web.HttpWriter.FlushCharBuffer

This revelation made my job quite a bit easier.

This machine serves clients that, for the most part, are talking to it using .NET Remoting. The responses to these calls would not be returned using an HttpWriter. There are, however, a handful of web services hosted on the server. These would write back XML data to their clients using a HttpWriter object.

To confirm this, I dumped out the data in the byte arrays using the dc command. I guess if I had been thinking a little clearer I might have tried that right away. The data was definitely a web service response.

total 4,834 objects
0:000> dc 0x46c5cac8
46c5cac8  02012c3c 00007c00 69766564 30313e64  <,...|..barid>10
46c5cad8  39383335 6b2f3c38 65647965 3e646976  53898</foobarid>
46c5cae8  79656b3c 45766544 746e6576 65707954  <foobarEventType
46c5caf8  656d614e 6f74533e 53206b63 74696c70  Name>Foobar data
46c5cb08  61262073 203b706d 6e676953 63696669  s & Signific
46c5cb18  20746e61 636f7453 6944206b 65646976  ant Foobar Data
46c5cb28  3c73646e 79656b2f 45766544 746e6576  nds</foobarEvent
46c5cb38  65707954 656d614e 656b3c3e 76654479  TypeName><foobar

All of this would indicate that the server tried to push out a VERY big response in the recent past. The 1.1 GC was evidently crippled by this, since it was not collecting the unrooted objects in gen2. It was reproducible enough to occur on several servers at once. This is, without a doubt, a bug in the GC algorithm or the execution engine.

This may be fixed in 2.0. I won’t be able to tell you because I don’t intend to test it.

We did some more investigation to figure out exactly which web service it was and who was using it, with methods that won’t make sense to anyone without an intimate knowledge of the app.

You may recall that at the beginning of the article I said,

The problem occurred very early in the morning, seemingly on many servers at once. It happened again later in the day on just one server.

This turned out to be significant.

In the early morning hours, our India QA team was stress testing the functionality that uses the problem web service, and came across this bug. Evidently thinking that the server going down was unrelated, the tester tried again and again on other servers.

Later in the day, the developer who owns the functionality tried to reproduce the bug that the tester entered, and brought another server down. A few hours after that I had tracked it down in WinDbg.

The fix, on our end anyway, was to fix a single sql query that resulted in truly epic amounts of unnecessary data being returned.

Boom!

Debugging ASP/Visual Basic Applications, Some Assembly Required

This is a very technical post about debugging an unmanaged memory dump. The tool I am using for this is WinDbg. You can get more general information about WinDbg here.

We had our production release a few days ago, which means some random issues any way you slice it. It’s been much more serious in the past, but I looked at an interesting IIS hang dump today.

Although most of the debugging I’ve gone through here has been managed code, the dump in question is from a classic ASP/COM application. In our production environment, we run a legacy application side-by-side with a new ASP.NET application. To the user, it’s all the same app; to us, it’s a bit of an integration nightmare.

In this particular case, the server was unresponsive and not serving any requests. We took a dump of all of the worker processes before being forced to do an iisreset.

The first thing I did after opening the dump was to take a quick look at the stacks on all of the threads to get an idea of the overall activity. I noticed that there were a ton (ok, about twenty) with a stack more or less like this:

0:022> k
ChildEBP RetAddr
030de954 7c822114 ntdll!KiFastSystemCallRet
030de958 77e6711b ntdll!NtWaitForMultipleObjects+0xc
030dea00 7739cd08 kernel32!WaitForMultipleObjectsEx+0x11a
030dea5c 77697483 user32!RealMsgWaitForMultipleObjectsEx+0x141
030dea84 776974f2 ole32!CCliModalLoop::BlockFn+0x80
030deaac 7778866b ole32!ModalLoop+0x5b
030deac8 77788011 ole32!ThreadSendReceive+0xa0
030deae4 77787ed7 ole32!CRpcChannelBuffer::SwitchAptAndDispatchCall+0x112
030debc4 776975b8 ole32!CRpcChannelBuffer::SendReceive2+0xc1
030debe0 7769756a ole32!CCliModalLoop::SendReceive+0x1e
030dec4c 776c4eee ole32!CAptRpcChnl::SendReceive+0x6f
030deca0 77ce127e ole32!CCtxComChnl::SendReceive+0x91
030decbc 77ce13ca rpcrt4!NdrProxySendReceive+0x43
030df0a4 77d0c947 rpcrt4!NdrClientCall2+0x206
030df0bc 77d0c911 oleaut32!IDispatch_RemoteInvoke_Proxy+0x1c
030df37c 6b61c892 oleaut32!IDispatch_Invoke_Proxy+0xb6
WARNING: Stack unwind information not available. Following frames may be wrong.
030df3e4 6b61fb5c vbscript!DllRegisterServer+0x8285
00000000 00000000 vbscript!DllRegisterServer+0xb54f

This is an automation call – either to a local or remote com server. Notice that we’re warned that bottom two frames on the stack “may be wrong.” This is because there is optimized code running that does not set up a clean stack frame.

That is, code compiled without the optimizer will usually do something like this at the start of each function:

push   ebp           ; Save the old stack base
mov    ebp, esp      ; Stack base becomes the current top of the stack
sub    esp, 0xc      ; Save space for local variables
mov    eax, [ebp+8]  ; example reference to one of the parameters

Optimized functions can omit that, and refer to parameters and local variables relative to the ESP register instead of the EBP register. All of this matters to us now because this makes it significantly harder for the debugger to construct a clean stack trace.

So anyway, we’re not 100% sure at this point that the call is originating on an .asp page (executing in vbscript.dll), but since that’s pretty much all this process does, it’s more than a safe bet.

How do we start digging into what these calls are? Well, a good starting point is SIEExtPub, which is a debugger extension just for COM stuff. It’s maintained by and for Microsoft’s SIE group, which employs some of the most badass engineers you will ever meet. (You can download a copy of this dll here).

SIEExtPub has a !comcalls function which purports to show the automation calls on all threads. This was the first thing I tried, and in a similar situation it’s what I would recommend.

Unfortunately, this didn’t get me anywhere. The output for all of the threads looked like this:

       Thread 49 - STA
Target Process ID: 13a82170 = 329785712
Target Thread  ID: d48526a2  (STA - Possible junk values)

It’s possible I don’t totally understand the output, but the reason I’m saying this looks bogus is because the Process ID and Thread ID values are outside of the range that I’m used to seeing. Typically (although I’m not sure it’s by rule), PID’s and TID’s are word values (ie, not more than 0xffff).

It was time for a radical re-evaluation of the whole scene. I decided to dissect one of the function calls, as close to “our code” as I could get. That would be this function:

030df37c 6b61c892 oleaut32!IDispatch_Invoke_Proxy+0xb6

Which is the last one on the stack before we are warned that “the following frames might be wrong.” Below that, it’s likely to get dicey.

This isn’t a documented API function, but I’m going to guess that its parameters will closely follow those to IDispatch::Invoke or possibly IDispatchEx::InvokeEx since this is being called from a script. I’ve done enough C++ COM coding to know what to expect for those.

I googled the name of the function too, and found this definition in the ReactOS source:

HRESULT CALLBACK IDispatch_Invoke_Proxy(
       IDispatch* This,
       DISPID dispIdMember,
       REFIID riid,
       LCID lcid,
       WORD wFlags,
       DISPPARAMS* pDispParams,
       VARIANT* pVarResult,
       EXCEPINFO* pExcepInfo,
       UINT* puArgErr)

Which is basically the same as IDispatch::Invoke. I decided to take a stab at the parameters being passed. I needed to dump the stack frame out to see them. To do that, I grabbed the child EBP pushed onto the stack at that call:

ChildEBP RetAddr
030df0bc 77d0c911 oleaut32!IDispatch_RemoteInvoke_Proxy+0x1c

And I used that with the dds (dump dwords with symbols) command:

0:022> dds 030df0bc
030df0bc  030df37c
030df0c0  77d0c911 oleaut32!IDispatch_Invoke_Proxy+0xb6  ; return address
030df0c4  141004b4                                       ; this
030df0c8  60030005                                       ; dispIdMember
030df0cc  6b655340 vbscript!DllRegisterServer+0x40d33    ; &IID_NULL
030df0d0  00000409                                       ; en-US
030df0d4  00020001                                       ; DISPATCH_METHOD
030df0d8  030df438                                       ; pDispParams
030df0dc  030df360                                       ; pVarResult
030df0e0  030df454                                       ; pExcepInfo
030df0e4  030df428                                       ; puArgErr

I’d like to point out a few things here before moving on. First, notice that it gives us some text next to the third parameter to the function. It does this because the address is within the vbscript.dll module (meaning it’s a constant or global variable). The name it gives us isn’t very helpful, though, because the only symbols we have for vbscript are export symbols. Unless you work for Microsoft, in which case this whole thing should be much, much easier for you.

If you read the documentation for Invoke, you’ll notice that they say the riid parameter is reserved and must be set to IID_NULL. If we’re looking for signs that what we’re looking at is making sense, we should remember that IID_NULL would be compiled into a dll as a constant (bing). We can dump that GUID-sized memory chunk out and see that it is indeed NULL (bing):

0:022> dd 6b655340 l4
6b655340  00000000 00000000 00000000 00000000

And finally, the next parameter should be a locale and it is set to 0x409. If you handle locales a good bit you might recognize that as 1033 decimal, or en-US (bing).

Generally speaking, I think it’s a good idea to stop every now and then when debugging and do sanity checks like that.

So, moving on, if we want to look at the parameters being passed to the function we should dig into the pDispParams parameter (number six). This is a pointer to a DISPPARAMS structure, which can be found in oaidl.h.

typedef struct tagDISPPARAMS
{
    VARIANTARG *rgvarg;
    DISPID *rgdispidNamedArgs;
    UINT cArgs;
    UINT cNamedArgs;
} DISPPARAMS;

The first field in the structure is the array of arguments, and the third is the length of that array. If we dump out that parameter,

0:022> dd 030df438 l4
030df438  057a40f0 00000000 00000002 00000000

The dwords are the fields in the DISPPARAMS structure, from right to left. We can see that there are two parameters. The first field in the structure points to the array of VARIANTs.

As a quick review, a VARIANT structure is used heavily in COM/VB and looks something like this:

typedef struct tagVARIANT
{
    VARTYPE vt;
    WORD reserved1;
    WORD reserved2;
    WORD reserved3;
    union
    {
        // DWORD-sized value
    }
} VARIANT;

The logical thing to do now is to dump out the array:

0:022> dd 057a40f0 L8
057a40f0  006f0008 00790064 06f50524 40e2e04f
057a4100  012d0008 6b60dc2d 06f504c8 0134d1a8

In each line, the low word of the first dword (0x0008) is the variant type (vt) and the variant value is the third dword. You can look it up in the headers, but I know from memory that 8 is the variant type for a BSTR.

A BSTR is the type of string used by Visual Basic / ASP internally (and is used widely in COM in general). It is a length-prefixed string, but also null-terminated.

I dumped out the string parameters like so:

0:022> du 06f504c8
06f504c8  "A URL"

0:022> du 06f50524
06f50524  "A query string"

(No, these aren’t the real parameters. I’m obfuscating them for security reasons).

This is where I stopped, because the strings were very familiar to me and I knew immediately the function being called.

If I hadn’t narrowed the issue down at this point I might have tried looking in the dumps of other processes for activity, and/or tried to look for familiar strings further down on the stack. The issue isn’t totally resolved at this point, but I at least have somewhere to go for further instrumentation and other measures.

Hopefully, somebody found this slightly interesting. Until next time, amigos. If you’re out on your bike tonight, please, wear white.

Does ANYONE Comprehend ASP.NET Web Projects?

Once every few weeks I find myself wrestling with these foul beasts. And each time, I find the following phrase echoing in the empty space between my ears.

Mit der Dummheit kämpfen Götter selbst vergebens.

I have been making every attempt to avoid dealing with them. For the most part, a series of NAnt scripts have insulated me. But now and then I need to get one working for the sake of helping one of the many lost souls who depend on them.

Last time, I took screenshots.

The first step was to try to add the project to a solution.

Creating an ASP.NET web project - Step 1

It wants me to enter the URL of the project, so far so good…

Creating an ASP.NET web project - Step 2

Now we have the first sign of trouble. The URL bit appears to have punted to a standard file open dialog.

Creating an ASP.NET web project - Step 3

I faithfully yet skeptically select the file, but I have the sinking feeling of having been here before.

Creating an ASP.NET web project - Step 4

Ah yes, I remember now.

Creating an ASP.NET web project - Step 5

So there you have it: web projects are an ouroboros; they are a maddening riddle, answered only by another question.

Bizarro TypeInitializationException in System.Runtime.Remoting

We’ve seen this exception on two occasions in three months in a production environment. I’m going to throw it up here since there are no Google hits on it so far, at least that I can find.

System.Reflection.TargetInvocationException:
Exception has been thrown by the target of an invocation. ---> 

System.TypeInitializationException: The type initializer for
"System.Runtime.Remoting.Channels.Http.HttpRemotingHandlerFactory" threw an exception. --->
System.MissingFieldException: Field not found: ?.s_webServicesFactoryType.

   at System.Runtime.Remoting.Channels.Http.HttpRemotingHandlerFactory..cctor()

   --- End of inner exception stack trace ---

   --- End of inner exception stack trace ---

   at System.RuntimeType.CreateInstanceImpl(Boolean publicOnly)

   at System.Activator.CreateInstance(Type type, Boolean nonPublic)

   at System.RuntimeType.CreateInstanceImpl(BindingFlags bindingAttr,
Binder binder, Object[] args, CultureInfo culture, Object[] activationAttributes)

   at System.Activator.CreateInstance(Type type, BindingFlags bindingAttr,
Binder binder, Object[] args, CultureInfo culture, Object[] activationAttributes)

   at System.Web.Configuration.HandlerMapping.Create()

   at System.Web.Configuration.HandlerFactoryCache..ctor(HandlerMapping mapping)

   at System.Web.HttpApplication.GetFactory(HandlerMapping mapping)

   at System.Web.HttpApplication.MapHttpHandler(HttpContext context,
String requestType, String path, String pathTranslated, Boolean useAppConfig)

   at System.Web.MapHandlerExecutionStep.System.Web.HttpApplication+IExecutionStep.Execute()

   at System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously)

This is a type initializer (.cctor) in System.Runtime.Remoting throwing up because one of the private fields cannot be found. Here’s the type in Reflector, proof that something is definitely rotten in the state of Denmark.

The s_webServicesFactoryType static field

The type initializer is just trying to set this field to null.

HttpRemotingHandlerFactory static initializer

Here are some more of the details:

  • The servers were different, but running the same application. Their windows patch levels would have been the same.
  • w3wp is set to recycle on these servers. This exception will happen constantly (thousands of times) until the recycle happens forcibly or on its own, then it goes away.

So, I think this is definitely a JIT or assembly loader bug related to some obscure timing issues. If I can get the native image or a full process dump while this is happening, presumably this could be tracked down.

Things I need

It would be really great if any of these things existed. Some of them might, but I’ve been unsuccessful in finding any of them.

  1. A utility that finds/cleans files that are in Visual Sourcesafe, but are not found in Visual Studio projects and/or solutions. When a file is ‘deleted’ in VS2003, it’s not removed from VSS. Repeated on a vast scale, this gets annoying and throws a wrench into some other things that I’m trying to do.
  2. A NAnt task that lets me use csc/vbc for all files in a Visual Studio project. I need the command line arguments (ie, /debug:pdbonly) for both of those, and the convenience of the solution task. (Of all of these, it seems like this is the most likely to exist and I just haven’t found it.)
  3. DebugEngine extensions intended for ASP.NET apps. They should be able to show me the pages that are running, the HttpContext for each request, the items that are in session, and so forth.
  4. Someone who can take difficult, complex programming tasks from me. Ideally they would be smart enough that I would trust them implicitly, rather than agonize about the projects on a daily basis.
  5. A Visual Studio project type for managing a number of XML files. By that I mean, a bunch of NAnt scripts. Actually, any tool that has a solution-like treeview and a document outline utility will work.
  6. PowerCollections, but for .NET 1.1. (Yeah, yeah, it might not be too hard to port them myself).
  7. An “Add New Item…” item in Visual Studio that has NO default name set (like “Class1.cs”), NO autogenerated comments, and has internal access by default. If I call the item IAnything, it should figure out that the item is an interface and not a class.

I’ve had some success lately eliciting comments from knowledgeable googlers, and I have high hopes for this post.

Getting some grease under my fingernails

I decided I needed some debugger extensions specifically intended for ASP.NET hang dumps. Psscor has a few nice ones but I figured I would learn a lot in the attempt, anyway.

Here’s how the project is breaking down.

Days 1-2: Messed around trying to learn how to use the build utility that comes with the Windows 2003 DDK. This would have been faster if I had looked at a makefile more than once in the previous five years.

Day 3: Toyed with the idea of using LoadLibrary on psscor and filtering the output, but instead decided that I was insane.

Days 4-5: Got pretty familiar with writing an extension on my own, and what the implementation details would be. It was becoming pretty clear that I would need to get some info on the structures if I was to have any hope of getting finished.

Days 5-7: Downloaded the SSCLI and realized that it’s got a (pretty old) version of SOS.dll in it. I have a few moments of euphoria here and radically switch approaches: I will now start with this and extend it to suit my needs.

I pretty quickly found out that this DLL is assuming the SSCLI execution engine and won’t work with the Microsoft production ones (mscorsvr in my case, or mscoree). No problem, I assume that won’t take too long to fix.

The problem is that this thing is filled with code like this:

// Note this is possible to be spoofed, but pretty unlikely
// call XXXXXXXX
if (spot[-5] == 0xE8) {
    move (*whereCalled, retAddr-4);
    *whereCalled += retAddr;
    //*whereCalled = *((int*) (retAddr-4)) + retAddr;
    if (*whereCalled < 0x80000000 && *whereCalled > 0x1000)
        return;
    else
        *whereCalled = 0;
}

(your guess is as good as mine).

I’m not sure if they had their best programmers working on this thing. I don’t know. Perhaps they just never expected someone like me to look at it in depth.

So that’s where I’m stuck for the moment. I’m going through it, commenting it and trying to understand it. I will eventually refactor it quite a bit. At the moment there are a lot of very long, very hard to follow functions in it. But, it’s good to look at some unmanaged code for a change.

Managed Debugging with WinDbg, Part 1 of N

There seems to be relatively little information about WinDbg available, so I will try to post some things as I figure them out myself.

We had a severe-to-extremely-severe production problem last week, and my recent activity with WinDbg was another example of “learning with a gun to your forehead.”

Getting the Right Extensions

If you are doing any kind of managed debugging, you will want the current set of extensions which can be found here.

The simplest thing to do is just drop the .dlls in the install directory for WinDbg. The two dll’s in this set that will be most important to you are psscor.dll and sieextpub.dll. Psscor has a lot of tools for dumping the contents of managed objects, and the second has some powerful functions for showing application and thread state.

There is a third extension, sos.dll, whose functionality is mostly overlapping with psscor.

Extensions Basics

Load an extension into WinDbg like so:

0:000> .load psscor

At any point, you can see the extensions you have loaded with this command:

0:000> .chain

All of the extensions I’ve mentioned come with help commands. The help for the topmost extension in the chain can be called like this:

0:000> !help

But you can always refer back to extensions further down in this way:

0:000> !sieextpub.help

An Example – Debugging an ASP.NET Hang

This was the scenario we found ourselves in last week. The first thing you will need to do is get a hang dump of the worker process using AdPlus, which I don’t have time to cover here. However, it is relatively straightforward.

Once we’ve got the dump, the most obvious thing to try is to see what the process is doing. To do that, load psscor and use this command:

0:000> ~*e!clrstack

This will dump out the managed stack of all of the threads in the process (the ~*e means that we want to iterate through all of the threads and perform the specified action for each).

That will give us a general idea of what is going on. The stack of one or more particular threads is bound to be interesting, and we can narrow it down to the stack trace of a single thread using:

0:000> ~113e!clrstack

Here I’ve replaced * (all threads) with a single thread, 113. Assuming this is a managed thread, you should see some output like this.

Thread 113
ESP         EIP     

…

0x0dc5f6cc  0x0fa3bb17 [DEFAULT] [hasThis] Void System.Web.UI.Page.ProcessRequestMain()
0x0dc5f710  0x0fa3aedf [DEFAULT] [hasThis] Void System.Web.UI.Page.ProcessRequest()
0x0dc5f74c  0x0fa3a94b [DEFAULT] [hasThis] Void System.Web.UI.Page.ProcessRequest(Class System.Web.HttpContext)

…

For this example, I’ll just show how to figure out which page in the application is executing. Since we’re running inside a page class for a lot of the response, the this pointer should be all we need.

Grab two stack pointers (the ESP register) and use the psscor.DumpStackObjects command, or dso for short:

0:000> !dso 0x0dc5f6cc  0x0dc5f74c 

Thread 0
ESP/REG    Object     Name
0x0dc5f6cc 0x31f0d208 System.Collections.Specialized.HybridDictionary
0x0dc5f6d0 0x0371c55c _ASP.incomeStatement_aspx
0x0dc5f6d8 0x0371c55c _ASP.incomeStatement_aspx
0x0dc5f714 0x31f0d208 System.Collections.Specialized.HybridDictionary
0x0dc5f720 0x070ca198 System.Globalization.CultureInfo
0x0dc5f724 0x070fb010 System.Threading.Thread
0x0dc5f728 0x0371c55c _ASP.incomeStatement_aspx
0x0dc5f73c 0x31f0d208 System.Collections.Specialized.HybridDictionary

Bingo – in this case, _ASP.incomeStatement_aspx is an instance of the page class.

Note that the first argument to dso is the upper stack pointer, and the second is the lower stack pointer. I’ll post some more stuff when I have time.

Release Day 422: A Vignette


McFunley: did you name this page something impossible to remember correctly on purpose?
McFunley: this is a good anti-querystring hacking technique
DarrelHerbst: the twocheckboxcolumn.aspx?
DarrelHerbst: i remember it
DarrelHerbst: your mind is addled with coffee
McFunley: i typed "twocolumncheckbox" and "twocheckcolumnbox" and "boxcolumnchecktwo" before I got it right
DarrelHerbst: addled