Archive for the ‘Angst’ Category

Python PostgreSQL Driver Authors Hate You

DBAPI2 is all well and good. To a point. But if you have the usual website scaling problem, namely the one where you have a master database that worked fine when you were tiny and–dear god–not so well right now, the idea of interchangeable database libraries is basically a crock.

Before I am inundated with hate mail let me dial back my rhetoric a little bit. The existence of an API that works with altogether different databases is a wonderful thing and without it, things like Django or SQLAlchemy would not be possible. So rest assured I am not a complete maniac. I am not even really here to talk about dbapi2. I am just saying that 1) no two libraries are the same, 2) given sufficient scale this will matter to you, 3) the devil is in the details, and 4) the devil likes screwing things with white hot pokers.

Database client drivers intended for the same database can do drastically different things. By Python standards, the Postgres driver situation is completely schizo. There are a lot of them available - there are five dedicated Postgres drivers listed on the wiki, as opposed to just one for MySQL. People might choose different drivers for licensing reasons, for religious reasons, randomly (because they never did any analysis like I am about to do), or for completely inscrutable reasons because they are just plain out of their minds. You really would not believe how much blood I have seen spilled over Postgres client drivers.

Here, let me show you what I am talking about. Examine the following python program, which runs an identical operation on a pyPgSQL connection and a psycopg2 connection.

#! /usr/bin/env python
from __future__ import with_statement
from contextlib import closing
from pyPgSQL import PgSQL as pypgsql
import psycopg2

test_dsn = 'host=127.0.0.1 port=5432 user=dan dbname=postgres'

def test_select(c):
    with closing(c.cursor()) as cr:
        cr.execute('select 1')
        print cr.fetchall()

def test():
    with closing(pypgsql.connect(test_dsn)) as c:
        test_select(c)

    with closing(psycopg2.connect(test_dsn)) as c:
        test_select(c)

if __name__ == '__main__':
    test()

Here's what happens when pyPgSQL runs that select.

select version()
BEGIN WORK
DECLARE "PgSQL_0062AF80" CURSOR FOR select 1
FETCH 1 FROM "PgSQL_0062AF80"
SELECT typname, -1 , typelem FROM pg_type WHERE oid = 23
FETCH ALL FROM "PgSQL_0062AF80"
CLOSE "PgSQL_0062AF80"
ROLLBACK WORK

First note that pyPgSQL issues a SELECT VERSION() command for every new connection. Why's it do that? Well since I've already dug through the source I can tell you that it does this to see if it has to do something wacky for PostgreSQL 7.1 and below. There's no way to disable this without patching the library.

This is not an enormous problem if the connection is pooled and reused, but it immediately becomes one if you want to use an out-of-process pool like PgBouncer. Every pyPgSQL connection that you make to pgbouncer will run this query, and in that scenario you are probably making zillions.

We get a transaction, even though I don't remember having asked for one. Since we never commit, it's rolled back. As default behavior this bites–more on this in a bit.

Finally, get a load of this:

SELECT typname, -1 , typelem FROM pg_type WHERE oid = 23;

That is pyPgSQL asking postgres what the name of the type associated with OID 23 is. I can tell you what it is without looking, it's an int4. The OID's of built-in types are hardcoded (see catalog/pg_types.h in the postgres source), so this is worse than pointless.

Well that was a shitshow. Now, what does psycopg2 do?

SET DATESTYLE TO 'ISO'
SHOW client_encoding
SHOW default_transaction_isolation
BEGIN; SET TRANSACTION ISOLATION LEVEL READ COMMITTED
select 1
ROLLBACK

This is marginally more acceptable. Which, to my point, means it's radically different. Right off the bat it is obvious that psycopg2 does not really support cursors. Apparently you can argue either side of this, but regardless it's a significant difference between the two libraries.

It would be better if there were a way to tell psycopg2 what the client encoding and default transaction isolation levels are, rather than have it query this with each connection. Again, this is fine unless you want to use an external connection pool. Note that READ COMMITTED is the default, which makes the SET unecessary, but it is issued anyway. (And since Postgres only really supports two isolation levels, it would be likewise pointless if the server setting were READ UNCOMMITTED).

As with pyPgSQL, the transaction-as-default-behavior thing is thoroughly brutal. As far as I can tell, this is not something the DBAPI2 PEP demands. There are many reasons why I think this is a bad idea, but they all boil down to the fact that transactions are not free. If you're executing a stored procedure, or just reading data, this boilerplate is superfluous. And if you want to get the most out of your database, you will have to turn this off. The syntax for disabling it is, of course, totally different and underdocumented in both libraries.

Note that PDO does nothing like this. PDO expects you to know what a transaction is. It gives you a prepared statement that you might not want, but that's a different problem.

If you've read this far, maybe you'd like to know what my advice is. Well, first of all, do not take the choice of driver lightly, and understand what you're getting into. You almost definitely want to use psycopg2, because on balance it is the least evil option*. If it comes to it, you can always patch out the unnecessary SHOW statements (although honestly, I'm not sure this will ever become an issue like pyPgSQL's stupid selects). However, you should take care to manage your use of transactions from the start of your project. That is the kind of thing that will be really painful to change after you have hundreds of queries implicitly relying on the default behavior.

* Also note that the pyPgSQL source mixes tabs and spaces. FML.

From the Annals of Dubious Achievement

It is I who wrote RichardIsAFuckingIdiotControl, voted the "best comment in source code … ever encountered" in this StackOverflow question. I didn't submit it, nor do I know the person that did. (I came across it on reddit, or something. I don't have the will or personal bandwidth to participate in something like StackOverflow.) This happened at a former employer.

I am not sure that I am proud of this. Actually I'm pretty sure that I'm not proud of it. But I'll explain what the deal was for the record anyway. I am also not sure that this deserves the "accolades" that it received. After all it's not technically a comment (although it contains some doozies) and I would never have chosen pure vitriol over, say, something from Ritchie's Odd Comments and Strange Doings in Unix.

Obviously what sets this apart from other snippets in the genre is the over-the-top hatred of a very specific colleague. So let me tell you a little about Richard.

First, his name wasn't Richard. Whoever submitted the sample was wise enough to change that (thanks, guy). I didn't even use his given name in the original source, although it was a moniker that most people would have understood. "Richard" was no longer employed by the company when I wrote this. He had recently been fired for repeatedly showing up in the early to mid-afternoon drunk and coked out of his mind (I guess nobody told him that we real programmers show up on time and drink at our desks).

Richard was a recent college grad with roughly a 1.8 GPA from a decent-but-not-prestigious CS program. Miraculously (if you're Richard), someone decided he was a "cultural fit" and therefore deserved $70K per year. For readers who have only ever been exposed to polite society, it is important to note that this is standard practice in the financial industry and is considered "normal." Around this time, I had gotten a little bent out of shape about hiring for various reasons that I won't go into, and I was the asshole. Go figure.

Personally, there is only one word that describes the kid. That word is broseph. If I were to call him a violent, drug-addled menace, it would not be hyperbole. Let me say it again, "cultural fit."

After Richard's exit, I had to take over his code. I wound up rewriting almost all of it from scratch, and this class was serving as a blast shield around the volatile remains of what I could salvage. Honestly, at this point I was bored to tears, and a lot of what I did was probably total crap. If there's any justice, right now someone's writing a profanity-laden class with my name all over it.

But anyway, let’s go over the code. I’ll add some color, where I can remember what was going on.


// The main problem is the BindCompany() method,
// which he hoped would be able to do everything. I hope he dies.

I am firmly opposed to the death penalty, but if a piano fell on his head I would be sincerely happy. (And again, for some reason I'm the asshole.)


public void BindCompany(int companyId) { }

// snip

private void MakeSureNobodyAccidentallyGetsBittenByRichardsStupidity()
{
    // Make sure nobody is actually using that fucking bindcompany method
    MethodInfo m = this.GetType().GetMethod("BindCompany", BindingFlags.DeclaredOnly |
        BindingFlags.Instance | BindingFlags.Public | BindingFlags.NonPublic);
    if (m != null)
    {
        throw new RichardIsAFuckingIdiotException("No!! Don't use the fucking BindCompany method!!!");
    }
    // P.S. this method is a joke ... the rest of the class is fucking serious
}

There was some deeply-entrenched reason why I could not change the definition of the BindCompany method. It had to be there, but the code paths that called it were all fundamentally flawed, or something like that. I decided to be funny and use reflection to raise an exception if anyone happened to redefine with an implementation of it in a derived class.


/// <summary>
/// This returns true if this control is supposed to be doing anything
/// at all for this request. Richard thought it was a good idea to load
/// the entire website during every request and have things turn themselves
/// off. He also thought bandanas and aviator sunglasses were "fuckin'
/// gnarly, dude."
/// </summary>
protected bool IsThisTheRightPageImNotSureBecauseRichardIsDumb()
{
    return Request.QueryString["Section"] == this.MenuItemKey;
}

One of the really crazy things about this application was that for every web request, it would actually load several hundred control classes and call methods on them. Maybe two or three of these would actually be necessary. They would all determine (based on the URL, I think) if they were supposed to be drawing anything. It would have been much easier to, god I don't know, just call the methods that were necessary to draw each page. I still have nightmares about this ridiculous contraption every now and then.

I guess the StackOverflow snippet doesn't capture this, but the best thing about Richard's code was that he loved property getters and setters. No, wait, that's not quite right. Lots of people love getters and setters, but Richard seemed to be in love with getters and setters. So much so that about 70% of his logic took place in them. More than once I deleted code that looked like this:

foo.x = foo.x;

Only to break entire pages, because the side effects of that assignment were doing everything. Anyway, I hope you can all see where I was coming from now.

PHP FAIL

The explanation for maybe 70% of the pages on the internet that say,

Well I don't know what the hell just happened but maybe you should delete your cookies.

can be found here. Specifically,

The setcookie function will silently change the '' cookie value to 'deleted' and override your expiry time to time() - 1 year.

Now in percentage terms, the number of people that have a local date that is incorrect by more than one year is probably very small. However the law of enormous numbers implies that at least of few such people will be using your site if you are successful, more or less depending on your usage demographics. What is the result? Well, those people are sending you "deleted" as a cookie value and your naively-written script probably has no idea what to do with it. Then your script tries to delete the cookie if it doesn't know how to handle it, but hey we've been here already, and everybody frowns and scratches their heads for a bit before giving up and writing the "what the hell" error page.

The preventative measure I would suggest in situations like these is writing relatively low-level tests to hit your pages and confirm that your application is being a good HTTP citizen. Even if you have to hack around PHP's hacks to accomplish this.

Solitaire et Solidaire

I am Sean Bell

Some Photos of Brooklyn Renaissance

In front of City Hall, a man plays the steel drums in front of the ZipCar tent. Increased access to the birthright of every citizen (the automobile) was advocated. Free watermelon was served.

ZipCar event

Nearby a person reminds us (via bumper sticker) to drive safely, while parked in a painted bike lane.

Cop in bike lane

The placard in his window lets us know that this person is a hero, rather than an ordinary disgrace.

Placard of cop in bike lane

Back at Etsy Labs, time for the deliveries (that's the sidewalk).

UPS parked on the sidewalk

Also near the Labs, signs that Brooklyn's repressed SUV-owning minority are joining forces and taking back the sidewalks.

Brooklyn Hood Ryderz

Meanwhile, on Brooklyn Bridge Boulevard, a woman is brutally killed.

Woman killed on Brooklyn Bridge Boulevard

If it occurs to anyone that traffic should not be hurtling through Brooklyn at speed, they don't speak up.

I’ll See You in Hell, ATI

I hate you guys. Seriously, drop dead.

SERVICE_NAME: ati hotkey poller
        TYPE               : 110  WIN32_OWN_PROCESS
                                  (interactive)
        STATE              : 4  RUNNING
                                (NOT_STOPPABLE,
                                 NOT_PAUSABLE,
                                 ACCEPTS_SHUTDOWN)
        WIN32_EXIT_CODE    : 0  (0x0)
        SERVICE_EXIT_CODE  : 0  (0x0)
        CHECKPOINT         : 0x0
        WAIT_HINT          : 0x0
        PID                : 1132
        FLAGS              :

The Worst Possible Way to Handle Exceptions

The worst possible way to "handle" exceptions is to show the user a message box with __FILE__ as the text. This is extremely poor form.

Intel VTune error dialog

Apparently, Intel did not get the memo. Worse, this happens as many as ten times when I open Visual Studio. Why do I feel like I was mistakenly given a debug build of VTune?

Thus Ends the Great Adsense Experiment

I install Adblock on all my machines, so I wasn't seeing any of the ads on my own site. I had thought they were all going to be hilariously off-topic (one that stood out was some guy named "Dan Poynter" plugging his small business), but apparently not. It didn't occur to me that if I wrote articles decrying pet psychics that Google would plaster ads for palm readings all over my site, but in retrospect that was obviously the only possible result. It turns out that the money isn't worth looking like an idiot.

Exploit Natural Mappings in Interface Design

The on-screen guide for my Direct TV box looks roughly like this:

DirectTV blows

The control on my remote that causes the current channel to move up and down looks like this:

DirectTV blows

If the problem here is obvious to you, you too may have a future in user interface design. If you thought to yourself, "hey, they screwed up an obvious natural mapping," then you are probably already involved on some level.

Just to make it perfectly clear what I am talking about, the orientation of the controls are out of whack. From my perspective, the channels increase going down on the screen but going up on the remote. The fact that the two don't match causes me to change the channel up when I meant to change it down, and vice versa, pretty much constantly.

You might call me a moron, and most days you might be right, but not here. This design is broken. It should do what I want without requiring me to stop to think about it.

It's amazing that decades after the publication of The Design of Everyday Things that it is still possible for a well-funded company to make this mistake. This company has enough money to buy the NFL through 2010, but apparently it isn't able to find a decent interface designer.

Most of the Information on the Internet is Wrong

Attention reader: this website may contain terrible advice and fundamentally flawed code samples. Personally, I don't believe this to be the case, but my advice to you is to read it as if that were true. That you should question everything you read is not a principle unique to technical websites, of course. However, I have found that some very smart people are willing to suspend disbelief when they see code written on some idiot's website.

I have complained about the Code Project website before. Although there are some exceptional articles on it, and many useful samples, these are dwarfed by the sheer volume of terrible ideas. There is a rating system, but this is only a halfassed attempt to filter out the crap. Fact is, the names of API functions bring in traffic. Code quality is not part of the pagerank formula.

My objective isn't to single out the Code Project - it is only the most successful of many similar sites. The reason I am mentioning it is because I found the following in an article today:

What we are implementing is called a COM class, so it should have a GUID associated with it. There is a tool you can use to generate a GUID called guidgen.exe, or you can take the one I've generated for you:

[Guid("{21F21921-B0FD-4801-862F-4BC417928574}")]

This is slightly paraphrased, and the GUID is replaced (I'm not sure why, but I would feel bad embarrasing the author by linking to him). It's clear to me that he doesn't fully understand what he is trying to teach, but it probably isn't to the lion's share of ignorant programmers in the world. It's also obvious to me that using a GUID from a website tutorial in commercial software is a profoundly bad idea, but this apparently isn't so for everyone.

I wonder how many senior developers have blown their stack after finally finding this GUID conflict in a subordinate's code? That would be an interesting case study of the law of large numbers.

The phenomenon of blindly accepting anything in virtual print goes in both directions. In addition to giving you lots of things you can do (but shouldn't), sites also tell you things you can't do (but actually, you can). Many are the times I've been told, "sorry, this bug can't be fixed" with a link to a Code Project forum post with some anonymous boob saying, "there's no way to do it." Similar incidents over the years have put me on a hair trigger when it comes to samples.

If you can't trust the MSDN documentation all of the time, you certainly can't trust the plebeians in the forums.