Dan McKinley
Math, Programming, and Minority Reports

@mcfunley.com

PGProxy: A Testing Proxy for Postgres
December 6th, 2009

I’ve released a package that I’ve been using for well over a year now for the purpose of writing functional test suites against applications using Postgres. The executive summary is that PGProxy allows you to write tests that are transactionally isolated from one another, without doing anything special in your application code.

This project is aimed more at functional tests of a website (using something like Selenium) than it is for unit tests of a single class or module. In those cases, using mock objects or other strategies is more viable. But, you could certainly use PGProxy in those scenarios as well.

You can get it on github here.

PGProxy is written in Python using Twisted, and has its own extensive set of unit and functional tests.

Why You Would Want This

If you have a sufficiently large set of functional tests written by a sufficiently large team, eventually naively-written tests will begin to interfere with one another. A really simple example of two tests that would interfere with one another would be:

Clearly, if we don’t take any special precautions in our test suite, Test A will never succeed if it is run after Test B. And if we require that Test A always runs before Test B, then we’re forced to recreate our fixture database in between test runs. And this is just two tests—when you’re talking about thousands, the potential interactions can be huge. Not to mention that depending on big your fixture databases are, creation at the start of every run can be a pain.

So eventually you will probably decide it would be better to restore the fixture data to a known state in between test cases. There are a few different ways to try to accomplish this:

  1. Restore the fixture database between test cases using something like CREATE DATABASE .. WITH TEMPLATE.
  2. Have each test be responsible for undoing whatever it does.
  3. Have each test be responsible for creating any data that it’s going to use.
  4. Don’t write tests for user “steve,” but rather locate a user in the fixture data that meets the criteria that you need.
  5. Make each test work inside of a transaction, and roll the transaction back when the test is completed.

There are problems with all of these approaches. 1) is very slow. 2) is very tedious for developers and error prone. 3) is similarly tedious and can be slow, depending on how much logic creating your entities entails. 4) is really just moving the goalposts, because tests are still going to interfere with each other. 5) works to the extent that you can have all of your database access code share a connection and to the extent that your code does not try to use its own transactions.

But! I am writing this to tell you about an exciting new option, namely, “do something crazy.” PGProxy is that crazy thing, and it works pretty well.

How it Works

As mentioned above, your test case can only work inside of a transaction if you are able to use a single connection per database per test case. If your test case is making calls to multiple processes that all want to use your fixture database, that is a pretty difficult thing to do.

PGProxy solves the problem by, you guessed, it, proxying all of your database connections. So if you have a PHP site running in Apache and a scala service running in Jetty, they can now share database connections in your tests. And consequently, they can share a transaction.

PGProxy diagram

The other issue that comes up in using transactions to make test cases is that it’s pretty common for the code that you’re testing to want to use its own transactions. PGProxy solves this by rewriting transaction usage within the test case into SAVEPOINT usage. In other words, if you have a test that runs this SQL:

BEGIN;
update users set username='chuck' where username='steve';
COMMIT;
BEGIN;
update users set password='foo' where username='chuck';
ROLLBACK;

PGProxy will rewrite that to this:

BEGIN; -- my test case
SAVEPOINT x;
update users set username='chuck' where username='steve';
RELEASE SAVEPOINT x;
SAVEPOINT y;
update users set password='foo' where username='chuck';
ROLLBACK TO SAVEPOINT y;
ROLLBACK; -- my test case
Running the Proxy

There are a few ways to run the proxy. If you are writing your test suite using python (ie, using unittest), you can set up your test runner like this:

from __future__ import with_statement
import pgproxy

this_dir = os.path.realpath(os.path.dirname(__file__))
pidfile = os.path.join(this_dir, 'pgproxy.pid')
logfile = os.path.join(this_dir, 'pgproxy.log')

def run():
    # this will shut down the proxy when the tests complete.
    with pgproxy.run(pidfile=pidfile, logfile=logfile):
        run_test_suite()

def run_test_suite():
    # this should actually run your tests
    pass

if __name__ == '__main__':
    run()

Or you can use something like this script to start a standalone pgproxy process:

#! /usr/bin/env python
import pgproxy
import os

this_dir = os.path.realpath(os.path.dirname(__file__))
pidfile = os.path.join(this_dir, 'pgproxy.pid')
logfile = os.path.join(this_dir, 'pgproxy.log')

pgproxy.run(listenPort=5433, serverAddr=('localhost', 5432),
            pidfile=pidfile, logfile=logfile)

In both of these cases, PGProxy is configured to accept connections on port 5433, and to connect to the Postgres server running on port 5432. In these examples you would tell your application to connect to port 5433.

In order to run PGProxy, you need Twisted version 8.1.0 or later.

Test Harness Integration

PGProxy accepts two special queries that signal the start and the end of tests. Your test suite will need to invoke these in setUp and tearDown (or whatever the equivalents are in the language / framework that you’re using). Here’s a unittest example:

class TestCase(unittest.TestCase):
    def setUp(self):
        self.query("BEGIN TEST '%s'" % self._testMethodName)

    def tearDown(self):
        self.query("ROLLBACK TEST '%s'" % self._testMethodName)

    def test_something(self):
        # now this has a transaction and can't do any serious damage
	# to the fixture data.
	pass

As you may have noticed above, the BEGIN/ROLLBACKS are sent to postgres with comments stating which test is running, which can be pretty handy if you find yourself needing to look at the postgres logs to debug something. Here you would see:

BEGIN; -- test_something
ROLLBACK; -- test_something

Since setUp and tearDown are frequently overridden by developers for other purposes, I generally like to use a metaclass to wrap test cases in transactions instead. This way if a developer forgets to call the base test case’s setUp method, it’s no big deal for the rest of the suite. I’ll leave that as an exercise. You get the idea.

A Word About Twisted

Lord knows I have a complicated opinion of all things Twisted, and maybe someday I will write something about that. And by “someday” I mean I am almost certainly never going to, because I have enough trouble staying out of nerd fights on the internet.

But I have to say that for this project, with the precise set of requirements that it had, and taking as a given my pre-existing wealth of experience with Twisted, things worked out great. This was a from-scratch rewrite of my first version, which was written using asyncore. The asyncore version was riddled with obscure race conditions, and it turned out to be much easier to just rewrite the damned thing using an event-driven framework than it ever was to debug the original. There was not a better choice of Python framework for this project, though I did toy with the idea of using scala. Anyway, I hope this praise for Twisted doesn’t come across as excessively faint.

The End

Enjoy! Don’t hesitate to drop me a line if you find this useful or have bug reports.

Back home