Posts Tagged ‘Remoting’

DataSetSurrogate Remoting Sink

I have received a trickle of requests for some code that I have written and alluded to here and here: namely, a custom Remoting sink that swaps DataSets for DataSetSurrogates as they pass by.

Unfortunately, it would not be legal for me to release this code. Sorry. However, I think I can give a general outline of it without giving away the farm.

The first thing I should say is that the cure for your remoting performance woes is probably not the DataSetSurrogate in every case. (Review what I have said here and here for more information).

If it’s at all practical, my recommendation to you is to simply avoid using DataSets in n-tier situations.

Still here? Great.

Ok, with the disclaimers out of the way, the code works in two pieces: the remoting sinks and the ISerializationSurrogate. The serialization surrogate is relatively easy to implement, if you’ve gotten this far.

The sinks are a little more clever. I wanted to continue using the framework BinaryFormatter sinks, since they added a lot of things to the mix that I didn’t feel like re-implementing.

However, there’s no way for you to modify how the BinaryFormatterClient/Server sinks do their serialization. More explicitly, there’s nowhere to plug in the ISerializationSurrogate object that you’ve written.

The client and server sinks actually pull-the-old-switcheroo on the messages coming through the pipe. That’s the clever bit.

So I am leaving the framework sinks there, but altering the messages that go through them. The messages they see have a binary representation of the “real” messages appended as a parameter.

That’s the fastest way to write the sinks. If you implement complete formatter sinks, you can accomplish it a little more efficiently.

Hope this helps. Sorry that I can’t be more specific. If you’ve gotten this far without your head exploding, trust me, you’re smart enough to write these classes.

Anyway, the result is significantly faster for many kinds of data. Not for all kinds of data.

Bizarro TypeInitializationException in System.Runtime.Remoting

We’ve seen this exception on two occasions in three months in a production environment. I’m going to throw it up here since there are no Google hits on it so far, at least that I can find.

System.Reflection.TargetInvocationException:
Exception has been thrown by the target of an invocation. ---> 

System.TypeInitializationException: The type initializer for
"System.Runtime.Remoting.Channels.Http.HttpRemotingHandlerFactory" threw an exception. --->
System.MissingFieldException: Field not found: ?.s_webServicesFactoryType.

   at System.Runtime.Remoting.Channels.Http.HttpRemotingHandlerFactory..cctor()

   --- End of inner exception stack trace ---

   --- End of inner exception stack trace ---

   at System.RuntimeType.CreateInstanceImpl(Boolean publicOnly)

   at System.Activator.CreateInstance(Type type, Boolean nonPublic)

   at System.RuntimeType.CreateInstanceImpl(BindingFlags bindingAttr,
Binder binder, Object[] args, CultureInfo culture, Object[] activationAttributes)

   at System.Activator.CreateInstance(Type type, BindingFlags bindingAttr,
Binder binder, Object[] args, CultureInfo culture, Object[] activationAttributes)

   at System.Web.Configuration.HandlerMapping.Create()

   at System.Web.Configuration.HandlerFactoryCache..ctor(HandlerMapping mapping)

   at System.Web.HttpApplication.GetFactory(HandlerMapping mapping)

   at System.Web.HttpApplication.MapHttpHandler(HttpContext context,
String requestType, String path, String pathTranslated, Boolean useAppConfig)

   at System.Web.MapHandlerExecutionStep.System.Web.HttpApplication+IExecutionStep.Execute()

   at System.Web.HttpApplication.ExecuteStep(IExecutionStep step, Boolean& completedSynchronously)

This is a type initializer (.cctor) in System.Runtime.Remoting throwing up because one of the private fields cannot be found. Here’s the type in Reflector, proof that something is definitely rotten in the state of Denmark.

The s_webServicesFactoryType static field

The type initializer is just trying to set this field to null.

HttpRemotingHandlerFactory static initializer

Here are some more of the details:

  • The servers were different, but running the same application. Their windows patch levels would have been the same.
  • w3wp is set to recycle on these servers. This exception will happen constantly (thousands of times) until the recycle happens forcibly or on its own, then it goes away.

So, I think this is definitely a JIT or assembly loader bug related to some obscure timing issues. If I can get the native image or a full process dump while this is happening, presumably this could be tracked down.

Remoting Email Thread

Here is an email thread I had recently with some good information in it. Rather than try to condense it, I’ll just post the thread.

N-Tier Remoting Frustrations

Link.

People DO want n-layer, because that provides reuse, maintainability and overall lower costs. Logical separation of UI, business logic, data access and data storage is almost always of tremendous benefit.

People MIGHT want n-tier if they need the scalability or security it can offer, and if those benefits outweigh the high cost of building a physically distributed system. But the cost/benefit isn’t there as often as people think, so a lot of people build physical n-tier systems for no good reason. They waste time and money for no real gain. This is sad, and is something we should all fight against.

I think I agree with this in total.

The project I work on has hit basically every problem with Remoting that exists.

  • Unbelievably slow dataset serialization (see KB829740). Kind of a big issue for us because many of our developers aren’t so comfortable with creating their own data structures / objects, and consequently were relying very heavily on ADO objects. I resolved the speed issue by implementing my own formatter sink.
  • SocketExceptions (specifically, WSAENOBUFS) when transporting huge amounts of data. To some extent this is mitigated by better hardware. Although Q322975 refers to this as a “BUG,” we spoke with the Remoting development team through our premier support guys and they refer to this as designed behavior. There isn’t a quick fix (ie, registry change) here, so I’m going to have to try putting in a compression sink to get us some headroom.
  • The lack of a default remote call timeout. This killed us in production because of 1) a network problem and 2) some bad codes that circumvented the HttpRuntime’s executionTimeout in a very unfortunate way. It took us 4 horrendous days of the servers crashing every 20 minutes with three Microsoft CPR people on-site to finally figure out what the problem was.

There are probably a few others which I am forgetting now. I am still talking to MSFT reps about the second, but I have no expectations that that will get anywhere.

But getting back to my main point (yes, I had one). The really frustrating part about this for me is that I don’t even consider our middle tier to be necessary. We put it there primarily for security purposes, and I don’t know, maybe we get that. But with respect to the way most of our developers write code, this is no help. It doesn’t necessarily create a logical service boundary, because nobody really gave them the direction to do that. To an extent it just makes the spaghetti worse.

Now for my secondary point. A lot of this is certainly our fault. But what drives me crazy is comments like this one:

This does assume you listened to advice from people like Ingo Rammer, Richard Turner and myself and avoided creating custom sinks, custom formatters or custom channels. If you ignored all this good advice then you’ll get what you deserve I guess…

It’s frustrating as hell to be backed into creating workarounds for lousy performance which you (well, your boss, who mandated this architecture) were never warned about and then to be openly mocked for doing so by the ‘experts.’ If we had known that Remoting would be throwing us this many problems, we probably would not have tried to use it for n-tier, and maybe with a little luck we would have decided against our flavor of n-tier altogether.

Get us some better advice, folks.

The “Other” DataSet Serialization Problem

There are many articles out there covering the well-known DataSet serialization performance issues. Dino Esposito had a pretty good article in MSDN magazine outlining this, and the ADO 2.0 solution to it.

There is another DataSet serialization problem which I consider to be almost as annoying, that is unfortunately not addressed in ADO 2.0. This program attempts to highlight it:

using System;
using System.Data;
using System.IO;
using System.Diagnostics;
using System.Runtime.Serialization.Formatters.Binary;

// Simple class that holds a reference into the dataset.
[Serializable]
internal class DataSetDescriptor
{

       private DataSet _data;
       private DataTable _root;

       public DataSetDescriptor(DataSet ds)
       {
              _data = ds; _root = ds.Tables[0];
       }

       // Ensures that the _root table is really the same
       // one that's in the dataset.
       public void Assert()
       {
              DataTable t = _data.Tables[0];
              Debug.Assert(object.ReferenceEquals(t, _root));
       }
}

class Program
{
       static void Main(string[] args)
       {
              DataSet ds = new DataSet();
              ds.Tables.Add();
              ds.RemotingFormat = SerializationFormat.Binary;
              DataSetDescriptor dsd = new DataSetDescriptor(ds);

              // This will (obviously) succeed.
              dsd.Assert();

              using (MemoryStream stream = new MemoryStream())
              {
                     BinaryFormatter f = new BinaryFormatter();
                     f.Serialize(stream, dsd);
                     stream.Position = 0;
                     dsd = (DataSetDescriptor)f.Deserialize(stream);
              }

              // This will (curiously) fail.
              dsd.Assert();
       }
}

The reason the second assert fails is that the _data has serialized the _root into itself, thus breaking our reference to it. The _root is serialized again by the formatter’s algorithm, so we get a copy. This image might help to understand the issue:

DataSet serialization problems

I have my own not-for-the-feint-of-heart (but transparent to developers!) solution to this, which involves remoting sinks and a bunch of surrogate classes. It doesn’t look like I will be retiring it anytime soon.

While I’m on the subject, one thing that I find simultaneously funny and depressing is that in the VB.NET version of the famous DataSetSurrogate solution in KB 829740, option strict is off. If you actually run the Visual Basic example, you will find that it is considerably slower than the default DataSet serialization as a result of late binding. It’s slower than the C# version of the program by about a factor of ten.