The "Other" DataSet Serialization Problem
October 30th, 2004

There are many articles out there covering the well-known DataSet serialization performance issues. Dino Esposito had a pretty good article in MSDN magazine outlining this, and the ADO 2.0 solution to it.

There is another DataSet serialization problem which I consider to be almost as annoying, that is unfortunately not addressed in ADO 2.0. This program attempts to highlight it:


    using System;
    using System.Data;
    using System.IO;
    using System.Diagnostics;
    using System.Runtime.Serialization.Formatters.Binary;

    // Simple class that holds a reference into the dataset.
    [Serializable]
    internal class DataSetDescriptor
    {

           private DataSet _data;
           private DataTable _root;

           public DataSetDescriptor(DataSet ds)
           {
                  _data = ds; _root = ds.Tables[0];
           }

           // Ensures that the _root table is really the same
           // one that's in the dataset.
           public void Assert()
           {
                  DataTable t = _data.Tables[0];
                  Debug.Assert(object.ReferenceEquals(t, _root));
           }
    }

    class Program
    {
           static void Main(string[] args)
           {
                  DataSet ds = new DataSet();
                  ds.Tables.Add();
                  ds.RemotingFormat = SerializationFormat.Binary;
                  DataSetDescriptor dsd = new DataSetDescriptor(ds);

                  // This will (obviously) succeed.
                  dsd.Assert();

                  using (MemoryStream stream = new MemoryStream())
                  {
                         BinaryFormatter f = new BinaryFormatter();
                         f.Serialize(stream, dsd);
                         stream.Position = 0;
                         dsd = (DataSetDescriptor)f.Deserialize(stream);
                  }

                  // This will (curiously) fail.
                  dsd.Assert();
           }
    }

  

The reason the second assert fails is that the _data has serialized the _root into itself, thus breaking our reference to it. The _root is serialized again by the formatter’s algorithm, so we get a copy. This image might help to understand the issue:

The Other DataSet Serialization Problem

I have my own not-for-the-feint-of-heart (but transparent to developers!) solution to this, which involves remoting sinks and a bunch of surrogate classes. It doesn’t look like I will be retiring it anytime soon.

While I’m on the subject, one thing that I find simultaneously funny and depressing is that in the VB.NET version of the famous DataSetSurrogate solution in KB 829740, option strict is off. If you actually run the Visual Basic example, you will find that it is considerably slower than the default DataSet serialization as a result of late binding. It’s slower than the C# version of the program by about a factor of ten.