Archive for December, 2005

Most of the Information on the Internet is Wrong

Attention reader: this website may contain terrible advice and fundamentally flawed code samples. Personally, I don't believe this to be the case, but my advice to you is to read it as if that were true. That you should question everything you read is not a principle unique to technical websites, of course. However, I have found that some very smart people are willing to suspend disbelief when they see code written on some idiot's website.

I have complained about the Code Project website before. Although there are some exceptional articles on it, and many useful samples, these are dwarfed by the sheer volume of terrible ideas. There is a rating system, but this is only a halfassed attempt to filter out the crap. Fact is, the names of API functions bring in traffic. Code quality is not part of the pagerank formula.

My objective isn't to single out the Code Project - it is only the most successful of many similar sites. The reason I am mentioning it is because I found the following in an article today:

What we are implementing is called a COM class, so it should have a GUID associated with it. There is a tool you can use to generate a GUID called guidgen.exe, or you can take the one I've generated for you:

[Guid("{21F21921-B0FD-4801-862F-4BC417928574}")]

This is slightly paraphrased, and the GUID is replaced (I'm not sure why, but I would feel bad embarrasing the author by linking to him). It's clear to me that he doesn't fully understand what he is trying to teach, but it probably isn't to the lion's share of ignorant programmers in the world. It's also obvious to me that using a GUID from a website tutorial in commercial software is a profoundly bad idea, but this apparently isn't so for everyone.

I wonder how many senior developers have blown their stack after finally finding this GUID conflict in a subordinate's code? That would be an interesting case study of the law of large numbers.

The phenomenon of blindly accepting anything in virtual print goes in both directions. In addition to giving you lots of things you can do (but shouldn't), sites also tell you things you can't do (but actually, you can). Many are the times I've been told, "sorry, this bug can't be fixed" with a link to a Code Project forum post with some anonymous boob saying, "there's no way to do it." Similar incidents over the years have put me on a hair trigger when it comes to samples.

If you can't trust the MSDN documentation all of the time, you certainly can't trust the plebeians in the forums.

Using LeakDiag to Debug Unmanaged Memory Leaks

I have been getting quite a few google hits for search strings like this:

unmanaged memory leaks windbg

It's the second-most-common combination of search terms, trailing "hank goldberg picks" by a hell of a lot. I don't think the searches are coming from the same demographic. Anyway, I thought I would write up one of the easiest techniques that I'm aware of for debugging a memory leak in unmanaged code. This one doesn't touch WinDbg, but rather uses a few other Microsoft PSS tools specifically built for this purpose.

For this example, I fired up the MFC wizard and created a new scratch application. To that I added some logic to leak roughly 2K of memory every tenth of a second.

#include <vector>

using namespace std;

BEGIN_MESSAGE_MAP(CMainFrame, CFrameWnd)
    ON_WM_CREATE()
    ON_WM_TIMER()
END_MESSAGE_MAP()

// Incredibly stupid memory leak
void CMainFrame::OnTimer(UINT_PTR nIDEvent)
{
    UNREFERENCED_PARAMETER(nIDEvent);

    vector<int>* pvec = new vector<int>();
    for(int i = 0; i < 500; i++)
    {
        pvec->push_back(i);
    }
}

int CMainFrame::OnCreate(LPCREATESTRUCT lpCreateStruct)
{
    // ...
    this->SetTimer(1, 100, NULL);
}

Supposing we had not done this on purpose, it would be clear from looking at the process in Perfmon that we were dealing with a memory leak. The Private Bytes counter for this process grows steadily while the application is doing nothing in particular.

Perfmon output for a program leaking memory

The tools that we'll be using to look at this problem are LeakDiag, LDParser, and LDGrapher. You can download them all from ftp://ftp.microsoft.com/PSS/Tools/ (LeakDiag and LDParser are bundled together).

After opening the problem application, start LeakDiag.exe. In Tools->Options, we want to increase the stack depth to the maximum (32). The reason for this is because in an application written in any medium to high level language you are typically pretty far from the actual call to malloc when you are leaking memory.

Adjusting the stack depth

There are a few options available (on the main dialog) for the specific allocator to monitor. Several may generate hits for the same leak (The CRT malloc will ultimately call the NT APIs, for example), but try to pick the one that most describes your application. Click Start and create a few logs as the leak manifests itself. In the MFC application I wrote, the leak is occurring constantly. Your application may need to run for many hours before you can get any worthwhile data.

LeakDiag

After doing this, you can use the LDParser application to open up one of the log files. You'll see something like this:

LDParser

The upper-right pane is a list of unique stack traces when the specified allocator was invoked. The list should be sorted by the total amount of data allocated by each. The bottom pane shows the stack trace for the active stack ID. In my case, the stack allocating the most memory is my intentional leak (notice that CMainFrame::OnTimer is in frame ten).

If your situation is more complicated than mine, as it almost certainly will be, there is one other tool you should be aware of. LDGrapher can take a set of logs generated by LeakDiag and generate a set of graphs of allocations over time. Here is the output of my application over a few minutes:

LDGrapher

Each stack ID is represented by a line on the graph. Hopefully, this will help some of you debugging unmanaged memory leaks.

Concurrency Approaches Contrasted

Here are a few roughly-equivalent class declarations using different languages and libraries. I say "roughly equivalent" because the implementation details and performance characteristics may actually be quite different in each case. However, the end result of each is that a function on a member variable is called in a non-blocking fashion, synchronized (presumably against other operations on the same class) with the use of some resource A.


// Vanilla C# with locks
public class FooBar
{
    private object A = new object();
    private T _member = new T();
    private delegate void FooBarDel();

    public void Foo()
    {
        FooBarDel del = new FooBarDel(FooBar);
        del.BeginInvoke(
          new AsyncCallback(FooBarDone), del);
    }

    private void FooBar()
    {
        lock(A)
        {
            _member.DoSomething();
        }
    }

    // You can typically get away without this, but
    // the documentation recommends otherwise.
    private void FooBarDone(IAsyncResult r)
    {
        FooBarDel d = r.AsyncState;
        d.EndInvoke(r);
    }
}

// Comega using a chord join pattern
public class FooBar
{
    private T _member = new T();
    private async A();
    public async Foo() & A()
    {
        _member.DoSomething();
        A();
    }

    public FooBar()
    {
        A();
    }
}

// C# 2.0+ using CCR
public class FooBar
{
    private T _member = new T();
    private Port<int> _p = new Port<int>();

    public void Foo()
    {
        _p.Post(1);
    }

    public FooBar()
    {
        activate(!_p.with(delegate(int i)
        {
            _member.DoSomething();
        }));
    }
}

Looking at the other two examples I think it's clear that C# as it stands today is lacking. That's not exactly a groundbreaking statement, I realize, but I haven't seen many examples putting all of these together in one spot.

Comega is very elegant in a situation as simple as this. CCR also seems less mistake-prone than the plain C# version, but the example I picked isn't where CCR really shines.

(One thing I should say is that I'm not sure that the CCR example is correct or even compiles, since the library isn't public yet. I worked this out from reading the whitepaper [pdf]. It's only a few pages, so I recommend giving it a read.)

The goal of CCR has been to allow complex constructs such as this (also from the whitepaper) to be made easily, and dynamically if necessary:


    // The finish operation or the update operation
    // execute with exclusive control, whereas the
    // GetState and QueryState operations execute
    // concurrently. Operations of either type are
    // interleaved safely (using the '^' operator).
    activate(exclusive(p.with(DoneHandler),
        !p.with(UpdateState))
        ^
        concurrent(!p.with(GetState),
        !p.with(QueryState))
    );

Comega, on the other hand, provides static language features. The good news is that since CCR is just a library, it would be usable from a future C# (4.0, one would hope) that incorporated some of the ideas in Comega at the language level for the easier cases.

I had thought it was a shame that none of the Comega concurrency constructs had made it into the 3.0 C# specification. However, with CCR I think I can see why this is the case. CCR, or something like it, should offer a lot of power and flexibility well within the 3.0 timeframe without taking the risk of baking the 'wrong' keywords into C# itself.

Wrote a Concurrency Library for Everyone

Herb Sutter says the concurrency revolution is coming. This guy says, “*** you!! the concurrency revolution is coming!”

That is the kind of attitude we need in the industry. We need more in-your-face greeks and fewer genial mustachioed C++ architects. Call me crazy.

I know this is the type of enthusiasm that is typically found in insane people building anti-gravity machines in their backyards, but I think these guys might be on to something with their Concurrency and Coordination Runtime (CCR).

I highly recommend watching the video–even if you have no idea what these people are talking about. Seriously mind-blowing stuff.

Making a DataSet Read-Only

Let's say you're writing a component that makes a DataSet available to a number of clients. The DataSet is expected to persist in your application for a while, and be used in code written by many different developers.

You could carefully craft an email explaining that,

Hey everybody, this DataSet is shared. Please don't change the data in it for the purposes of your own piece of the application. That could create some odd bugs that would be really hard to track down. If you need to do something like that, please make a copy of the DataSet first.

I suppose you could do that, if you wanted to waste your time. You could always create a copy of the DataSet before handing it over to anybody, but that may introduce some unnecessary performance issues if the DataSet is large. It feels a little like punishing everyone for the bad behavior of a few. If you wanted to actually prevent such problems, you could write a clever class like this that acts like a "lock" on the DataSet.

/// <summary>
/// Class that ensures that clients keep their pesky hands
/// off of a particular dataset.
/// </summary>
internal class DataSetLock
{
    [Conditional("DEBUG")]
    public static void Install(DataSet ds)
    {
        if (ds == null)
        {
            throw new ArgumentNullException("ds");
        }
        EventHandler thrower = delegate(object sender, EventArgs e)
        {
            string msg = string.Format(
                "You can't change: {0}.", ds.DataSetName);
            throw new InvalidOperationException(msg);
        };

        foreach (DataTable t in ds.Tables)
        {
            t.RowChanging += new DataRowChangeEventHandler(thrower);
            t.RowDeleted += new DataRowChangeEventHandler(thrower);
            t.ColumnChanging += new DataColumnChangeEventHandler(thrower);
            t.TableClearing += new DataTableClearEventHandler(thrower);
            t.TableNewRow += new DataTableNewRowEventHandler(thrower);
        }
    }
    private DataSetLock() { }
}

The "lock" is really an anonymous method that we attach, willy nilly, to all of the change events in the DataSet. We use lexical closure to add the DataSet name passed in to the exception message dozens of caffeine-addled developer brains will be processing shortly. [Note: if you don't think it's lexical closure, go talk to this guy. He's really into the topic.]

I put the ConditionalAttribute on the Install method as well, so as long as you have reasonably good testing coverage this check will just be compiled away in retail code.

Here's a quick usage example:


static void Main(string[] args)
{
    DataSet ds = new DataSet();
    ds.DataSetName = "My Data Set";
    DataTable t = ds.Tables.Add();
    t.Columns.Add("foo", typeof(string));

    DataSetLock.Install(ds);

    // Pow!
    t.LoadDataRow(new object[] { "asdf" }, true);
}

This prints:

System.InvalidOperationException: You can't change: My Data Set.

Enjoy.

A Plea for Change in Sports Commentary

Here is one football statistic that I never want to hear again:

Team X is 40-0 when they have someone rush for 100 yards.

The conclusion you are supposed to draw is that Team X should try to run the football. Here's one way you could rephrase the statistic:

Team X does a pretty good job of winning when they're winning.

It's post hoc reasoning, people! Come on! I'm starting to think Howard Cosell was on to something when he spent years hating on Frank Gifford because he was not a trained sports journalist. I don't have a high opinion of journalists either, but now the Giffords are the norm in the commentary business. The intelligence of the analysis suffers. I'm all for coaches-as-commentators, but I think we should all agree that players-as-commentators is not working out.

Monad Script to Scrub Perfmon Output

Here's the scenario: I had about eight hours of perfmon output, for several hundred counters sampled once per second. I wanted to put this into a database already containing the parsed IIS logs from the same machine, to try to correlate some of the resource utilization peaks with URL's.

The only snag was that Microsoft's LogParser couldn't handle the CSV files that perfmon generated. The input lines were far too long for it.

The call is from heroism. Will you accept the charges?

The solution I came up with was to hack together an MSH script that processed the files. It's not pretty (because it didn't have to be), but here it is.


#
# trim-perflog.msh
#   This pares down a set of enormous perfmon logs to a
#   size that can be managed by the Microsoft LogParser.
#
#   This finds the column indices that we're interested in,
#   then runs through each line in the input CSV file(s)
#   and pulls them out.
#

# The counters in the perfmon trace that we're interested in.
#
$counters =
"(PDH-CSV 4.0) (Eastern Standard Time)(300)",
"\\machine\.NET CLR Exceptions(w3wp)\# of Exceps Thrown / sec",
"\\machine\.NET CLR LocksAndThreads(w3wp)\Contention Rate / sec",
"\\machine\.NET CLR Memory(w3wp)\% Time in GC",
"\\machine\.NET CLR Remoting(w3wp)\Remote Calls/sec",
"\\machine\Process(w3wp)\Page Faults/sec",
"\\machine\Process(w3wp)\% Processor Time",
"\\machine\Process(w3wp)\% User Time",
"\\machine\Process(w3wp)\% Privileged Time",
"\\machine\Process(w3wp)\Private Bytes";

# Prettier names for the counters.
#
$columnAlias = "time", "exceptionsPerSecond",
"contentionPerSecond", "pctTimeInGC",
"remoteCallsPerSecond", "pageFaultsPerSecond",
"pctProcessorTime", "pctUserTime",
"pctPrivelegedTime", "privateBytes";

# Returns an array that contains the index of each of the
# $counters in the csv.
#
function getPerflogIndices
{
    param([System.String]$columns)

    $tokens = $columns.Split(',');
    for($i = 0; $i -lt $tokens.Length; $i++)
    {
        $t = $tokens[$i];
        $t = $t.Substring(1, $t.Length-2);
        $index = -1 ;
        for($j = 0; $j -lt $counters.Length; $j++)
        {
            if($t -eq $counters[$j])
            {
                $index = $j;
            }
        }
        if($index -ge 0)
        {
            write-object $i;
        }
    }
}

# writes a CSV line using the values in $array.
#
function write-array
{
    param([System.IO.StreamWriter]$writer, $array)
    $j = 0;
    $last = $array.Length;
    foreach($a in $array)
    {
        $writer.Write($a);
        if($j -ne $last)
        {
            $writer.Write(",");
        }
        $j++;
    }
    $writer.WriteLine();
}

# Writes a single line to the result CSV file.
# This requires:
#    $writer  - The output stream.
#    $ln      - A single line read from
#                the input stream.
#    $indices - The column indices of the
#                  subject perfmon counters.
#
function write-csv-line
{
    param([System.IO.StreamWriter]$writer,
       [System.String]$ln, [System.Object[]]$indices)

    $vals = $ln.Split(',');
    $j = 0;
    $last = $indices.Length - 1;
    foreach($i in $indices)
    {
        $writer.Write($vals[$i]);
        if($j -ne $last)
        {
            $writer.Write(",");
        }
        $j++;
    }
    $writer.WriteLine();
}

# Trims down the input CSV file. If $names is True,
# writes the names of the columns as the first line
# in the output.
function trim-perflog
{
    param([System.IO.FileInfo]$in, [System.Boolean]$names)

    $r = $in.OpenText();
    $ln = $r.ReadLine();
    $indices = getPerfLogIndices $ln;

    $wr = [System.IO.File]::AppendText($out);
    if($names)
    {
        write-array $wr $columnAlias;
    }
    while($r.Peek() -ge 0)
    {
        write-csv-line $wr $r.ReadLine() $indices;
    }
    $wr.Close();
}

$out = "c:\perflogs\output.csv";

trim-perflog c:\perflogs\prod_000005.csv True
trim-perflog c:\perflogs\prod_000006.csv False
trim-perflog c:\perflogs\prod_000007.csv False