March 10th, 2014
The person on build rotation, or the nightly schlimazel I suppose, went into a hot 5’x8’ closet containing an ancient computer. This happened after everyone else had left, so around 8:30PM. Although in crunch time that was more like 11:30PM. And we were in crunch time at one point for a stretch of a year and a half. “That release left a mark,” my friend Matt used to say. In a halfhearted attempt at fairness to those who will take this post as a grave insult, I’ll concede that my remembrance of these details is the work of The Mark.
Anyway, the build happened after quitting time. This guaranteed that if anything went wrong, you were on your own. Failure in giving birth to the test build implied that the 20 people in Gurgaon comprising the QA department would show up for work in a matter of hours having nothing to do.
You used a tool called “VBBuild.” This was a GUI tool, rumored to be written by Russians:
VBBuild did mysterious COM stuff to create the DLLs that nobody at the time understood properly. It presented you with dozens of popups even when it was working perfectly, and you had to be present to dismiss each of them. The production of executable binary code was all smoke and lasers. And, apparently, popups.
Developers wrote code using the more familiar VB6 IDE. The IDE could run interpreted code as an interactive debugger, but it could not produce finished libraries in a particularly repeatable or practical way. So the release compilation was different in many respects from what programmers were doing at their desks. Were there problems that existed in one of these environments but not the other? Yes, sometimes. I recall that we had a single function that weighed in at around 70,000 lines. The IDE would give up and execute this function even if it contained clear syntax errors. That was the kind of discovery which, while exciting, was wasted in solitude somewhere past midnight as you attempted to lex and parse the code for keeps.
Developers weren’t really in the habit of doing complete pulls from source control. And who could blame them, since doing this whitescreened your machine for half an hour. They were also never in any particular hurry to commit, at least until it was time to do the test build. As there was no continuous integration at the time, this was the first time that all of the code was compiled in several days.
Often [ed: always] there were compilation errors to be resolved. We were using Visual Sourcesafe, so people could be holding an exclusive lock on files containing the errors. Typically, this problem was addressed by walking around the office an hour before build time and reminding everyone to check their files in. In the event that someone forgot [ed: every time], there was an administrative process for unlocking locked files. Not everyone had the necessary rights to do this, but happily, I did.
By design, the build tried to assume an exclusive lock on all of the code. As a result, nobody could work while the build was in progress. Sometimes, the person performing the build would check all of the files out and not check them back in. So your first act the morning after a build might be to walk over to the build closet and release the source files from their chains.
Deployment required dozens of manual steps that I will never be able to remember. When the build was done, you copied DLLs over to the test machines and registered them there. By “copied” I mean that you selected them in an explorer window, pressed “Ctrl-C,” and then pressed “Ctrl-V” to paste them into another. There was no batch script worked out to do this more efficiently. Ok, this is a slight lie. There had been a script, but was put out to pasture on account of a history of hideous malfunction. And popups. On remote machines sometimes, where they could only be dismissed by wind and ghosts.
Registration involved connecting to each machine with Remote Desktop and right clicking all the DLLs. You could skip a machine or just one library, and things would be very screwy indeed.
The production release, which happened roughly twice a year under ideal conditions, was identical to this but with the added complexity of about eight more servers receiving the build. And we might take the opportunity to add completely new machines, which would not necessarily have the same patch levels for, oh, like 700,000 windows components that were relied upon.
Given eight or ten machines, the probability of a mistake on at least one of the servers approached unity. So the days and weeks following a production release were generally spent sussing out all of the minute differences and misconfigurations on the production machines. There would be catastrophic bugs that affected a tiny sliver of requests, under highly specific server conditions, and only if executed on one server out of eight. I was an expert at debugging in disassembly at the time. Upon leaving the job, I thought that this was pretty badass. But in the seven years since–do you know what? It’s never come up.
At one point I wrote a new script to perform the deployment. It was an abomination of XML to be sure, but it got the job done without all of the popups. I started doing the test build with this with some success and suggested that we use it for the production release. This was out of the question, I was told by one of my closer allies in the place. The production release was “too important to use a script.”
The operating systems and supporting libraries on the machines were also set up by hand, by a separate team, working from printed notes. The results were similar. This is kind of another story.
This all happened in 2003.