Math, Programming, and Minority Reports

Remoting Email Thread
May 6th, 2005

Here is an email thread I had recently with some good information in it. Rather than try to condense it, I’ll just post the thread.

From: Henry Derstine

After 2 days of banging my head against the wall I came across your post on robgruen’s WebLog. You said “The div with the KB article comes with a C# version and a VB.NET version. If you compile them both, the C# version is faster by a factor of ten or more. The VB version is slower than standard DataSet serialization. The reason is because of late binding - option strict is off in the projects.”

This has been what I have been experiencing. But when I switched to the C# div I still get a slower performance then when I don’t use the surrogate. I was wondering if using the C# version should take care of my problem or if there is something else I am doing wrong.

Thanks a lot, your post made me think I just might be sane afterall.

Henry

From: Dan McKinley

Hey Henry -

There is the issue with the Option Strict Off in the VB version of the surrogate; but there are other concerns.

My experience has been that the surrogate can actually be much faster OR much slower based on the type of data that is being sent. If you are sending a lot of numeric data, it’s likely that the surrogate will help you. If you are sending a lot of text, the surrogate could actually end up hurting.

The reason is mostly because serialization tries to preserve object references. For example, if two objects in the graph hold references to each other, binary serialization will preserve that relationship. In the context of a dataset, you don’t care about this that much. Two fields are never going to point at each other.

So if you profile the binary serialization of a DataSetSurrogate, you’ll see that a lot of the time is going into tracking the objects in a hashtable.

With numeric data, the benefits of the saved space (8 bytes for a decimal instead of a long string of text, for example) outweigh the overhead associated with tracking object references. In that case the surrogate is much faster.

With text data, you are really just adding the overhead without reducing space by that much (“ABCD” in the dataset is still going to be “ABCD” in the byte stream).

Oh, and make sure your numeric columns have .DataType set to decimal! Otherwise they’ll be interpreted as strings and there will be no benefit.

Let me know if that makes sense, or if I can clarify anything.

-Dan

From: Henry Derstine

Thanks for the quick reply. I think I follow everything you said.

Most of our data being transfered is text and right now it will be for viewing purposes so I don’t really need any updating ability.

I guess the disheartening thing is everyone seems to point to that knowledge base as a “great” fix to remoting dataset performance and they don’t mention anything about it not working in certain scenarios.

Do you have any pointers on remoting data that could improve performance? Is it possible to remote a datareader? Since we only need to view the data I thought this would be a lighter weight approach but I couldn’t see how to remote a datareader object. And I have had experience in the past when binding a datareader to datagrid you get a performance loss compared to dataset to a datagrid.

Oh yeah, since I have you. Have you seen/used Peter Bromberg’s article on compression and serialization of datasets?

http://www.eggheadcafe.com/articles/20031219.asp

I was wondering if that would give me any gains considering the compression aspect. I haven’t tried it because I have been trying to work out the previous problem.

From: Dan McKinley

If you configured the lifetimes, etc the right way you’d be able to send a TransparentProxy for a DataReader to a remote machine. That’d never be a high performance solution, though. You usually want to make fewer, larger remote calls instead of a lot of smaller ones. Going through the proxy, you’d effectively be making a remote call for every row in the result set.

It makes sense when you think about it, since the datareader has to keep a connection to the database open.

As far as compression goes, I think it also depends on the situation. In our primary application, it didn’t pan out because we were sending our data only a few physical feet over gigabit Ethernet. Compression requires more CPU and memory overhead, so it just slowed us down. If you’re sending this data over the internet, it could certainly speed things up and I’ve worked on client/server apps where that is the case.

In your situation, it sounds like you might want to look into caching the data where you’re using it. Have you looked into HttpContext.Cache at all? It won’t make the remote call any quicker, but it might let you avoid making it for every page displayed.

From: Henry Derstine

Thanks for all the info again. I just wanted to let you know what I found when I looked a little closer at where I had bottle necks. I ended up testing 3 scenarios. 1) no wrapper class, 2)binary serializing the ds, 3)compression.

The avg times to return to the client a ds with 11,000 rows and 15 cols are as follows-

13.547s
17.584s
18.462s

Not what I originally expected.

When I looked at the time needed to actually retrieve and modify the datasets (compress/serialize) before returning it to the client this is what I got.

10.031s
17.081s
18.125s

As you can see the transfer time for the wrapped dataset is about 2-3% of the total time, while the unwrapped ds the transfer time is about 25% of the total. So basically what you said earlier was dead on. We have a short distance and textual data so all the mashing of the ds is outweighing the gain in transfer. So I guess unless I expect a greater travel distanct or get a better processor on the server I expect to not do anything “fancy” to improve the remoting of my DS.

Thanks again.

Henry

Things I Need
March 27th, 2005

It would be really great if any of these things existed. Some of them might, but I’ve been unsuccessful in finding any of them.

A utility that finds/cleans files that are in Visual Sourcesafe, but are not found in Visual Studio projects and/or solutions. When a file is ‘deleted’ in VS2003, it’s not removed from VSS. Repeated on a vast scale, this gets annoying and throws a wrench into some other things that I’m trying to do.
A NAnt task that lets me use csc/vbc for all files in a Visual Studio project. I need the command line arguments (ie, /debug:pdbonly) for both of those, and the convenience of the solution task. (Of all of these, it seems like this is the most likely to exist and I just haven’t found it.)
DebugEngine extensions intended for ASP.NET apps. They should be able to show me the pages that are running, the HttpContext for each request, the items that are in session, and so forth.
Someone who can take difficult, complex programming tasks from me. Ideally they would be smart enough that I would trust them implicitly, rather than agonize about the projects on a daily basis.
A Visual Studio project type for managing a number of XML files. By that I mean, a bunch of NAnt scripts. Actually, any tool that has a solution-like treeview and a document outline utility will work.
PowerCollections, but for .NET 1.1. (Yeah, yeah, it might not be too hard to port them myself).
An “Add New Item…” item in Visual Studio that has NO default name set (like “Class1.cs”), NO autogenerated comments, and has internal access by default. If I call the item IAnything, it should figure out that the item is an interface and not a class.

I’ve had some success lately eliciting comments from knowledgeable googlers, and I have high hopes for this post.

Managed Debugging With WinDbg, Part 1 of N
February 5th, 2005

There seems to be relatively little information about WinDbg available, so I will try to post some things as I figure them out myself.

We had a severe-to-extremely-severe production problem last week, and my recent activity with WinDbg was another example of “learning with a gun to your forehead.”

Getting the Right Extensions

If you are doing any kind of managed debugging, you will want the current set of extensions which can be found here.

The simplest thing to do is just drop the .dlls in the install directory for WinDbg. The two dll’s in this set that will be most important to you are psscor.dll and sieextpub.dll. Psscor has a lot of tools for dumping the contents of managed objects, and the second has some powerful functions for showing application and thread state.

There is a third extension, sos.dll, whose functionality is mostly overlapping with psscor.

Extensions Basics

Load an extension into WinDbg like so:

0:000> .load psscor

At any point, you can see the extensions you have loaded with this command:

0:000> .chain

All of the extensions I’ve mentioned come with help commands. The help for the topmost extension in the chain can be called like this:

0:000> !help

But you can always refer back to extensions further down in this way:

0:000> !sieextpub.help!sieextpub.help

An Example — Debugging an ASP.NET Hang

This was the scenario we found ourselves in last week. The first thing you will need to do is get a hang dump of the worker process using AdPlus, which I don’t have time to cover here. However, it is relatively straightforward.

Once we’ve got the dump, the most obvious thing to try is to see what the process is doing. To do that, load psscor and use this command:

0:000> ~*e!clrstack

This will dump out the managed stack of all of the threads in the process (the ~*e means that we want to iterate through all of the threads and perform the specified action for each).

That will give us a general idea of what is going on. The stack of one or more particular threads is bound to be interesting, and we can narrow it down to the stack trace of a single thread using:

0:000> ~113e!clrstack

Here I’ve replaced * (all threads) with a single thread, 113. Assuming this is a managed thread, you should see some output like this.

Thread 113
ESP         EIP

...

0x0dc5f6cc  0x0fa3bb17 [DEFAULT] [hasThis] Void System.Web.UI.Page.ProcessRequestMain()
0x0dc5f710  0x0fa3aedf [DEFAULT] [hasThis] Void System.Web.UI.Page.ProcessRequest()
0x0dc5f74c  0x0fa3a94b [DEFAULT] [hasThis] Void System.Web.UI.Page.ProcessRequest(Class System.Web.HttpContext)

...

For this example, I’ll just show how to figure out which page in the application is executing. Since we’re running inside a page class for a lot of the response, the this pointer should be all we need.

Grab two stack pointers (the ESP register) and use the psscor.DumpStackObjects command, or dso for short:

0:000> !dso 0x0dc5f6cc  0x0dc5f74c

Thread 0
ESP/REG    Object     Name
0x0dc5f6cc 0x31f0d208 System.Collections.Specialized.HybridDictionary
0x0dc5f6d0 0x0371c55c _ASP.incomeStatement_aspx
0x0dc5f6d8 0x0371c55c _ASP.incomeStatement_aspx
0x0dc5f714 0x31f0d208 System.Collections.Specialized.HybridDictionary
0x0dc5f720 0x070ca198 System.Globalization.CultureInfo
0x0dc5f724 0x070fb010 System.Threading.Thread
0x0dc5f728 0x0371c55c _ASP.incomeStatement_aspx
0x0dc5f73c 0x31f0d208 System.Collections.Specialized.HybridDictionary

Bingo — in this case, _ASP.incomeStatement_aspx is an instance of the page class.

Note that the first argument to dso is the upper stack pointer, and the second is the lower stack pointer. I’ll post some more stuff when I have time.

Remoting Email Thread May 6th, 2005