Dan McKinley
Math, Programming, and Minority Reports

@mcfunley.com

Remoting Email Thread
May 6th, 2005

Here is an email thread I had recently with some good information in it. Rather than try to condense it, I’ll just post the thread.

From: Henry Derstine

After 2 days of banging my head against the wall I came across your post on robgruen’s WebLog. You said “The div with the KB article comes with a C# version and a VB.NET version. If you compile them both, the C# version is faster by a factor of ten or more. The VB version is slower than standard DataSet serialization. The reason is because of late binding - option strict is off in the projects.”

This has been what I have been experiencing. But when I switched to the C# div I still get a slower performance then when I don’t use the surrogate. I was wondering if using the C# version should take care of my problem or if there is something else I am doing wrong.

Thanks a lot, your post made me think I just might be sane afterall.

Henry


From: Dan McKinley

Hey Henry -

There is the issue with the Option Strict Off in the VB version of the surrogate; but there are other concerns.

My experience has been that the surrogate can actually be much faster OR much slower based on the type of data that is being sent. If you are sending a lot of numeric data, it’s likely that the surrogate will help you. If you are sending a lot of text, the surrogate could actually end up hurting.

The reason is mostly because serialization tries to preserve object references. For example, if two objects in the graph hold references to each other, binary serialization will preserve that relationship. In the context of a dataset, you don’t care about this that much. Two fields are never going to point at each other.

So if you profile the binary serialization of a DataSetSurrogate, you’ll see that a lot of the time is going into tracking the objects in a hashtable.

With numeric data, the benefits of the saved space (8 bytes for a decimal instead of a long string of text, for example) outweigh the overhead associated with tracking object references. In that case the surrogate is much faster.

With text data, you are really just adding the overhead without reducing space by that much (“ABCD” in the dataset is still going to be “ABCD” in the byte stream).

Oh, and make sure your numeric columns have .DataType set to decimal! Otherwise they’ll be interpreted as strings and there will be no benefit.

Let me know if that makes sense, or if I can clarify anything.

-Dan


From: Henry Derstine

Thanks for the quick reply. I think I follow everything you said.

Most of our data being transfered is text and right now it will be for viewing purposes so I don’t really need any updating ability.

I guess the disheartening thing is everyone seems to point to that knowledge base as a “great” fix to remoting dataset performance and they don’t mention anything about it not working in certain scenarios.

Do you have any pointers on remoting data that could improve performance? Is it possible to remote a datareader? Since we only need to view the data I thought this would be a lighter weight approach but I couldn’t see how to remote a datareader object. And I have had experience in the past when binding a datareader to datagrid you get a performance loss compared to dataset to a datagrid.

Oh yeah, since I have you. Have you seen/used Peter Bromberg’s article on compression and serialization of datasets?

http://www.eggheadcafe.com/articles/20031219.asp

I was wondering if that would give me any gains considering the compression aspect. I haven’t tried it because I have been trying to work out the previous problem.


From: Dan McKinley

If you configured the lifetimes, etc the right way you’d be able to send a TransparentProxy for a DataReader to a remote machine. That’d never be a high performance solution, though. You usually want to make fewer, larger remote calls instead of a lot of smaller ones. Going through the proxy, you’d effectively be making a remote call for every row in the result set.

It makes sense when you think about it, since the datareader has to keep a connection to the database open.

As far as compression goes, I think it also depends on the situation. In our primary application, it didn’t pan out because we were sending our data only a few physical feet over gigabit Ethernet. Compression requires more CPU and memory overhead, so it just slowed us down. If you’re sending this data over the internet, it could certainly speed things up and I’ve worked on client/server apps where that is the case.

In your situation, it sounds like you might want to look into caching the data where you’re using it. Have you looked into HttpContext.Cache at all? It won’t make the remote call any quicker, but it might let you avoid making it for every page displayed.


From: Henry Derstine

Thanks for all the info again. I just wanted to let you know what I found when I looked a little closer at where I had bottle necks. I ended up testing 3 scenarios. 1) no wrapper class, 2)binary serializing the ds, 3)compression.

The avg times to return to the client a ds with 11,000 rows and 15 cols are as follows-

13.547s
17.584s
18.462s

Not what I originally expected.

When I looked at the time needed to actually retrieve and modify the datasets (compress/serialize) before returning it to the client this is what I got.

10.031s
17.081s
18.125s

As you can see the transfer time for the wrapped dataset is about 2-3% of the total time, while the unwrapped ds the transfer time is about 25% of the total. So basically what you said earlier was dead on. We have a short distance and textual data so all the mashing of the ds is outweighing the gain in transfer. So I guess unless I expect a greater travel distanct or get a better processor on the server I expect to not do anything “fancy” to improve the remoting of my DS.

Thanks again.

Henry

Back home