Dan McKinley urn:uuid:0f5320a2-eb2c-2b09-f70e-ca9845402e07 https://mcfunley.com/assets/images/favicon.png https://mcfunley.com/assets/images/favicon.png 2019-03-18T16:18:08+00:00 Some Recent Work 2017-05-08T00:00:00+00:00 2017-05-08T00:00:00+00:00 urn:uuid:6d6cb15a-f879-141b-247f-b7971b5bd79f <p>Here are some links to recent work I’ve done elsewhere.</p> <ul> <li><a href="https://blog.skyliner.io/ship-small-diffs-741308bec0d1">Ship Small Diffs</a> - I tried to transmute the anguish I feel looking at huge changesets into words.</li> <li><a href="https://hackernoon.com/mistakes-you-apparently-just-have-to-make-yourself-cc2dd2bfc25c">Mistakes You Apparently Just Have to Make Yourself</a> - Getting youngfolk to listen to you is harder than I realized.</li> <li><a href="https://blog.skyliner.io/fourteen-months-with-clojure-beb8b3e4bf00">Fourteen Months with Clojure</a> - Going back to my Lisp roots here.</li> <li><a href="http://pushtrain.club/">The Push Train</a> - Trying to frantically document some of the human element of making engineering function at a high level, which for whatever reason didn’t strike me as vital at the time.</li> <li><a href="https://speakerdeck.com/mcfunley/deploying-often-is-a-very-good-idea">Deploying Often is a Very Good Idea</a> - Conditional probability is extremely good.</li> <li><a href="https://blog.skyliner.io/you-cant-have-a-rollback-button-83e914f420d9">You Can’t Have a Rollback Button</a> - Please engrave “but what if you didn’t?” on my tombstone I guess.</li> <li><a href="https://blog.skyliner.io/a-simple-pattern-for-jobs-and-crons-on-aws-2f965e43932f">A Simple Pattern for Jobs and Crons on AWS</a> - Not only did I stoop to writing a practical post for once, I also wrote <a href="https://medium.com/@mcfunley/at-most-once-vs-at-least-once-f215dafd27e2">a followup</a>.</li> <li><a href="https://blog.skyliner.io/no-way-out-but-through-1db41c648697">No Way Out but Through</a> - More ranting and raving about deploying more than once a year.</li> </ul> Dan McKinley https://mcfunley.com/ The Unreasonable Effectiveness of Mathematics in Planning 2016-02-03T00:00:00+00:00 2016-02-03T00:00:00+00:00 urn:uuid:0db0d98a-fb9b-cc6a-94a8-c4506f128ce4 <p>I was speaking on a panel the other day that was handed the topic, “the challenges of balancing data-light product bets vs purely data driven incremental improvements.” <a href="https://twitter.com/skamille">Camille Fournier</a> was also a panelist and wrote up her thoughts <a href="http://whilefalse.blogspot.com/2016/01/qualitative-or-quantitative-but-always.html">here</a>. Camille’s take (which I think is right) is that even if you don’t have data to work from, you can still approach projects analytically.</p> <p>For me, the process of behaving analytically incorporates mathematical reasoning but not necessarily <em>data</em>. And I think this kind of spitballing is a useful activity, even if the numbers are made up. The reason for this is that human brains were forged on the African savanna where nothing is very fast, very large, or very small, cosmically speaking, and we are laughably equipped for coping with orders of magnitude.</p> <figure> <img src="http://i.imgur.com/bIMKyZl.jpg" /> <figcaption>That is also why you think this looks awesome, but don't let that spoil it for you.</figcaption> </figure> <p>The kind of thinking I’m describing works like this: <em>“ok that’s a thing measured in thousands multiplied by a thing measured in tens of thousands, and then filtered through a rate of a few percent, are we even close?”</em>. When permitted to skip this check on deficient intuition, most humans will sense their way to the wrong answers.</p> <p>But on the panel and in subsequent discussions, it’s been easy to run with the dichotomy that you’ve either got data to work from, or you’ve got nothing at all. The temptation is to jump into philosophical takes given examples of products or entire markets that could not have been calculated with forsight before they existed. While that’s valid, I think it doesn’t describe most of the situations that you encounter in the wild.</p> <h3 id="data-exists-and-we-dont-want-to-look">Data Exists, and We Don’t Want to Look</h3> <p>The daily grind at a company consists of building in proximity to a thing that’s satisfying some definition of “working.” Yes, there’s always the innovator’s dilemma to worry about and the prospect of weird new platforms that will enable use cases you don’t understand yet. But the degree to which we’re striking out into the <em>undiscovered country</em> is overstated.</p> <p>Companies release products that you’d figure shouldn’t have survived opportunity analysis all the time. They just don’t pitch them that way:</p> <blockquote> <p>This feature notifies pairs of individuals that have arranged an unlikely relationship on the internet beforehand. The notifications are delivered two or three times a year, and only if the parties are in close geographic proximity. And they both have an optional iOS app installed. And in this scenario one of the people is known to be in a cohort that tends to not have that iOS app installed. And then at the end of this funnel we’re hoping that some small percentage of these folks will wind up showing up online and buying a thing. <em>Later.</em></p> </blockquote> <p>I have a real launch in mind with that, but I’ve rendered it unrecognizable and absurd by describing it accurately. This isn’t a situation where the volume couldn’t be estimated. If it were, I’d have a harder time lampooning it. This is the neglected scenario: we have all the data we need, but instead of deploying it we shipped something doomed.</p> <p>When you hear people speak in defense of such things, they act out the same misdirection and head straight for the words we use when we’re discussing the iPod. <em>You can’t, like, quantify vision, man.</em> What they’re really espousing is the idea that product success obeys an uncertainty principle. If we look at things too closely, the magic disappears. And of course the good vibes would sublimate in this case, because the magic is nonsense.</p> <h3 id="the-hazards-of-narrative-arc">The Hazards of Narrative Arc</h3> <p>Of course, this is not what anyone is actually thinking. Nobody sets out to ignore data on purpose, hoping to improve their chances of failing. You just watched me retcon an ethos onto feral behavior. And in doing so, I am part of the problem.</p> <p>Everyone’s the hero of the novel they’re writing in their heads. That is the human condition. And having <a href="https://en.wikipedia.org/wiki/List_of_artistic_depictions_of_Steve_Jobs">saved a company by inventing a new market</a> is a great narrative arc, which is why we reach for it when we’re actually engaged in something mundane. <a href="/effective-web-experimentation-as-a-homo-narrans">We just systematically find stories too compelling</a>.</p> <p>It is rarely the case that vision can’t be at least sketched using arithmetic. Mathematics is the language we use to describe reality, and vision is generally assumed to have effects <em>in reality.</em> That’s what makes numeric methods more powerful than they should reasonably be. We’re constantly engaged in the art of self-deception, and they force you to snap out of it.</p> Dan McKinley https://mcfunley.com/ Do You Work at Amazon? 2016-01-26T00:00:00+00:00 2016-01-26T00:00:00+00:00 urn:uuid:95ebbab7-0de7-4d46-f064-940561c3ec29 <p><span class="coauthor">Please note that <a href="http://twitter.com/paradosso">Roberto Medri</a> is a coauthor on this post.</span></p> <p><a href="http://continuations.com">Albert Wenger</a> has been one of the VC’s I most admire for a long time. He was very present in the early days at Etsy, and sat in giving counsel on some, uh, <em>significantly astray</em> engineering team meetings. Albert is a smart, data-driven guy whose values roughly align with my own.</p> <p>That said, I have an axe to grind with his latest post, <a href="http://continuations.com/post/138017572565/dont-mind-the-share-price-hint-it-fluctuates">Don’t Mind the Share Price</a>. In it, Albert deploys the story of Amazon as a warning against focusing too much on how the market values a company. This is the story of Amazon:</p> <p><img src="http://i.imgur.com/PdcjSCu.png" alt="Amazon's historical stock price" /></p> <p>Amazon was riding high in the late 90s, then felt the DotCom burst roughly along with the rest of the tech sector. Albert points out that history has shamed anyone that might’ve judged Amazon on its share price fifteen years ago, since it’s returned north of 2000% in the years since.</p> <blockquote> <p>So whether you are running a tech company, working for one, or investing in one I highly recommend not reading too much into changes in share price. Focus instead on whether your company is making real progress.</p> </blockquote> <p>Albert is careful to stress that you should focus on fundamentals over fluctuations in the price, which is generally good advice. But I think the subtext is clear: <em>don’t be discouraged by even large declines in price, because you might be working at the next Amazon.</em></p> <p>This is a premise that we can investigate quantitatively.</p> <h3 id="the-odds-of-being-an-amazon">The Odds of Being an Amazon</h3> <p>Suppose that we’re working at a public company that’s experienced a decline in its share price of at least 50%, relative to a recent high price. We’d like to approximate the odds that this company is going to recover <a ref="#f1" href="#f1" class="footnote">[1]</a>.</p> <p>It turns out that since 2002, there have been <a href="https://github.com/mcfunley/shaken-stocks/blob/master/shaken-stocks.csv">2,132 companies traded on the NASDAQ</a> that fit this description. One of these is indeed Amazon. But how many others are like it?</p> <p>We can take this set of companies and categorize them. Let’s identify companies that wound up being completely wiped out—losing 90% of their remaining value or more—and then all other companies that declined in value. For companies that increased in value, we’ll differentiate those that beat the market (defined as the S&amp;P 500 Index) from those that didn’t. The idea being that you would have been better off just buying an index fund with your cash surplus from working for Google in a parallel universe. And finally, we’ll identify the <a href="https://www.youtube.com/watch?v=zbQTXFJL8lo">miraculous</a>: those companies that return 1000% or more, of which Amazon is one example.</p> <p>If we do that, it looks like this:</p> <table class="table table-striped"> <tr> <th>Category</th> <th>Count</th> <th>Percent</th> <th>Cumulative Percent</th> </tr> <tr> <td class="negative attention">Wiped Out</td> <td class="negative attention">239</td> <td class="negative attention">11.21%</td> <td class="negative attention">11.21%</td> </tr> <tr> <td class="negative">Declined</td> <td class="negative">794</td> <td class="negative">37.24%</td> <td class="negative">48.45%</td> </tr> <tr> <td class="negative">Beaten by Market</td> <td class="negative">344</td> <td class="negative">16.14%</td> <td class="negative">64.59%</td> </tr> <tr> <td class="positive">Beat Market</td> <td class="positive">661</td> <td class="positive">31.00%</td> <td class="positive">95.59%</td> </tr> <tr> <td class="positive attention">Miracle</td> <td class="positive attention">94</td> <td class="positive attention">4.41%</td> <td class="positive attention">100.00%</td> </tr> </table> <p>Here we can see that about 65% percent of public companies that find themselves in this situation don’t recover. But 35% of companies do. These are tough odds, but definitely not impossible odds, right?</p> <h3 id="recovery-is-not-good-enough">Recovery is not Good Enough</h3> <p>Albert asks us to consider investors, officers, and employees of the company as having roughly identical situations. This is a mistake. Things are significantly worse in the case of employees <a ref="#f2" href="#f2" class="footnote">[2]</a> at a public company that have been issued options. In these cases, the company may very well recover, but we have to contemplate several other horrifying possibilities.</p> <ul> <li>Employees may have already exercised options at a strike price higher than the current market price. If so, they’re screwed if the company never recovers above that price. Even if the company beats the market from here out.</li> <li>The strike price may be below the current market price, meaning that the options are worth something. But employees may owe taxes (or AMT), forcing them to sell before the recovery.</li> <li>Options may be underwater and worthless. At least in this scenario, there is clarity.</li> </ul> <p>From these situations we can see that as an employee <a ref="#f3" href="#f3" class="footnote">[3]</a>, it makes sense to consider the odds that the company will not just recover, but will ultimately get back to where it was. That looks like this:</p> <table class="table table-striped"> <tr> <th>Category</th> <th>Count</th> <th>Percent</th> <th>Cumulative Percent</th> </tr> <tr> <td class="negative attention">Wiped Out</td> <td class="negative attention">239</td> <td class="negative attention">11.21%</td> <td class="negative attention">11.21%</td> </tr> <tr> <td class="negative">Declined</td> <td class="negative">794</td> <td class="negative">37.24%</td> <td class="negative">48.45%</td> </tr> <tr> <td class="negative">Beaten by Market</td> <td class="negative">344</td> <td class="negative">16.14%</td> <td class="negative">64.59%</td> </tr> <tr> <td class="negative">Recovered Below High Price</td> <td class="negative">210</td> <td class="negative">9.84%</td> <td class="negative">75.04%</td> </tr> <tr> <td class="positive">Beat Market</td> <td class="positive">441</td> <td class="positive">20.66%</td> <td class="positive">95.69%</td> </tr> <tr> <td class="positive attention">Miracle</td> <td class="positive attention">92</td> <td class="positive attention">4.31%</td> <td class="positive attention">100.00%</td> </tr> </table> <p>This makes it worse: <strong>75% of companies won’t recover using this definition</strong>. And only about 4% will make miraculous comebacks of Amazon’s order of magnitude.</p> <h3 id="are-you-making-progress">Are You Making Progress?</h3> <p>Remember that Albert provides us with an important caveat: we should “[f]ocus … on whether the company is making real progress.” But this can be tricky to surmise as an employee, for several reasons:</p> <ul> <li>You are in unavoidably close proximity to a coordinated propaganda campaign. It’s called <em>the company’s internal communications and morale efforts.</em> You may find yourself thinking unreasonably positively about these things.</li> <li>You are putting in hours at this company, and human nature compels us to confuse effort with progress.</li> <li>Remember that we’re talking about a public company. So unless you’re an officer, you’ll have a difficult time of getting detailed information about how much progress the company is really making. And of course timing trades on such information would be <em>illegal</em>.</li> </ul> <p>We should agree that the outlook here is going to be hazy at best, and self-deception is a hazard.</p> <h3>The Base Rate Fallacy's Perverse Tyranny Over the American <a ref="#f4" href="#f4" class="footnote">[4]</a> Mind</h3> <p>If there is any line of reasoning that really drives me crazy, it’s the following:</p> <ul> <li>A series of cosmically unlikely events has unfolded.</li> <li>This is submitted as evidence that <em>it can happen to anyone.</em></li> </ul> <p>Examples of this are everywhere. Someone is going to win powerball, therefore it makes sense to buy tickets. Barack Obama was elected president, therefore systematic racism is toothless. Mark Zuckerberg struck it rich, so you’ve just gotta have faith.</p> <figure> <img src="http://i.imgur.com/0Jb88Db.png" alt="By the way this guy also thinks that picking your own numbers gives you a higher chance of winning." /> <figcaption>By the way this guy also thinks that picking your own numbers gives you a higher chance of winning.</figcaption> </figure> <p>In looking to Amazon (or Google, Facebook, Netflix, or dear god <em>Apple</em>) as consolation in the event that a company has experienced a decline in share price, we make the following mistake. <strong>The probability that successful companies have stumbled in their past is not the probability that a company will succeed, having stumbled.</strong></p> <p>This isn’t a call for nihilism if you find yourself in such a situation. Far from it—it’s a call to realize that the odds are now against you, and to behave proactively.</p> <hr /> <p><em>The code and data for this article is available <a href="https://github.com/mcfunley/shaken-stocks">here, on Github</a>. It’s a bit sloppy and hastily written, sorry. We started from a dataset of companies traded on the NASDAQ that experienced a decline of 50% or more off of a previous high. our dataset started around the year 2000.</em></p> <hr /> <ol class="foot-note-list"> <li> <a name="f1"></a> You may notice that I've switched questions, from "are you working at Amazon" to "is the company Amazon." Calculating the odds that you are working at Amazon would of course require a richer dataset that includes company headcounts, and I am a lazy man. </li> <li> <a name="f2"></a> Investors can more easily scale their commitment to the company by having a diverse portfolio. Employees and officers, however, give 100% of their labor to the company. And in the event that things go well, a large percentage of their net worth derives from the value of the company. Officers have a high floor on their returns, via guaranteed bonuses, parachute provisions, accelerated vesting schedules in the event of termination, and so on. Employees on the other hand are screwed. </li> <li> <a name="f3"></a> This refinement doesn't apply to all employees. Early employees probably have strike prices that are very low, and can make money despite a large drop in the share price. But at a newly-minted public company, <em>most</em> employees are probably new, and <em>most</em> employees are therefore affected. </li> <li><a name="f4"></a>I know that Albert Wenger is German.</li> </ol> Dan McKinley https://mcfunley.com/ Are My Push Notifications Driving Users Away? 2015-11-24T00:00:00+00:00 2015-11-24T00:00:00+00:00 urn:uuid:a59ff9ac-0706-8ec9-179c-c94d942094ad <p>In response to <a href="https://twitter.com/kellan">Kellan’s</a> musing about push notifications on twitter, <a href="http://twitter.com/mccue">Adam McCue</a> asked an interesting question:</p> <blockquote align="center" class="twitter-tweet" lang="en"><p lang="en" dir="ltr"><a href="https://twitter.com/kellan">@kellan</a> <a href="https://twitter.com/mcfunley">@mcfunley</a> what's the best way to do this?</p>&mdash; Adam McCue (@mccue) <a href="https://twitter.com/mccue/status/669386580059099136">November 25, 2015</a></blockquote> <p>I quickly realized that fitting an answer into tweets was hopeless, so here’s a stab at it in longform.</p> <h3 id="how-would-we-do-this">How would we do this?</h3> <p>Let’s come up with a really simple way to figure this out for the case of a single irritating notification. This is limited, but the procedure described ought to be possible for anyone with a web-enabled mobile app. We need:</p> <ol> <li>A way to divide the user population into two groups: a treatment group that will see the ad notification, and a control group that won’t.</li> <li>A way to decide if users have disappeared or not.</li> </ol> <p>To make the stats as simple as possible, we need (1) to be random and we need (2) to be a <a href="http://homepages.wmich.edu/~bwagner/StatReview/Binomial/binomial%20probabilities.htm">binomial measure</a> (i.e. “yes or no,” “true or false,” “heads or tails,” etc).</p> <p>To do valid (simple) stats, we also want our trials to be <em>independent</em> of each other. If we send the same users the notifications over and over, we can’t consider each of those to be independent trials. It’s easy to intuit why that might be: I’m more likely to uninstall your app after the fifth time you’ve bugged me <a href="#f0" ref="#f0" class="footnote">[1]</a>. So we need to consider disjoint sets of users on every day of the experiment.</p> <figure> <img src="http://i.imgur.com/Dy6loZn.png" /> <figcaption>Does this hurt us or help us? <a href="http://store-xkcd-com.myshopify.com/products/try-science">Let's try science.</a></figcaption> </figure> <p>How to randomly select users to receive the treatment under these conditions is up to you, but one simple way that should be broadly applicable is just hashing the user ID. Say we need 100 groups of users: both a treatment and control group for 50 days. We can hash the space of all user ID’s down to 100 buckets <a ref="#f1" href="#f1" class="footnote">[2]</a>.</p> <p>So how do we decide if users have disappeared? Well, most mobile apps make http requests to a server somewhere. Let’s say that we’ll consider a user to be “bounced” if they don’t make a request to us again within some interval.</p> <p>Some people will probably look at the notification we sent (resulting in a request or two), but be annoyed and subsequently uninstall. We wouldn’t want to count such a user as happy. So let’s say we’ll look for usage between one day after the notification and six days after the notification. Users that send us a request during that interval will be considered “retained.”</p> <figure> <img src="http://i.imgur.com/b7Nl6Ve.png" /> <figcaption>Some examples of our binomial model. We'll call a user retained if they request data from us on any of days two through seven counting from the time of the notification. User 4 in this example is not retained because (s)he only requests data on the day the notification was sent.</figcaption> </figure> <p>To run the experiment properly you need to know how long to run it. That depends a lot on your personal details: how many people use your app, how often they use it, how valuable the ad notification is, and how severe uninstalls are for you. For the sake of argument, let’s say:</p> <ul> <li>We can find disjoint sets of 10,000 users making requests to us on any given day, daily, for a long time.</li> <li>(As discussed) we’ll put 50% of them in the treatment group.</li> <li>60% of people active on a given day currently will be active between one and six days after that.</li> <li>We want to be 80% sure that if we move that figure by plus or minus 1%, we’ll know about it.</li> <li>We want to be 95% sure that if we measure a deviation in plus or minus 1% that it’s for real.</li> </ul> <p><a href="http://www.experimentcalculator.com/#lift=1&amp;conversion=60&amp;visits=10000&amp;percentage=50">If you plug all of that into experiment calculator</a> <a href="#f2" ref="#f2" class="footnote">[3]</a> it will tell you that you need 21 days of data to satisfy those conditions. But since we use a trailing time interval in our measurement, we need to wait 28 days.</p> <h3 id="an-example-result">An example result</h3> <p>Ok, so let’s say we’ve run that experiment and we have some results. And suppose that they look like this:</p> <table class="table table-striped"> <tr> <th>Group</th> <th>Users</th> <th>Retained users</th> <th>Bounced users</th> </tr> <tr> <td>Treatment</td> <td>210,000</td> <td>110,144</td> <td>99,856</td> </tr> <tr> <td>Control</td> <td>210,000</td> <td>126,033</td> <td>83,967</td> </tr> </table> <p>Using these figures we can see that we’ve apparently decreased retention by 12.6%, and a <a href="https://gist.github.com/mcfunley/b7b9320e7f0bafcbaab2">test of proportions</a> confirms that this difference is statistically significant. Oops!</p> <h3 id="ive-run-the-experiment-now-what">I’ve run the experiment, now what?</h3> <p>You most likely have created the ad notification because you had some positive goal in mind. Maybe the intent was to get people to buy something. If that’s the case, then you should do an additional computation to see if what you gained in positive engagement outweighs what you’ve lost in users.</p> <h3 id="i-dont-think-i-have-enough-data">I don’t think I have enough data.</h3> <p>You might not have 420,000 users to play with, but that doesn’t mean that the experiment is necessarily pointless. In our example we were trying to detect changes of <em>plus or minus one percent.</em> You can detect more dramatic changes in behavior with smaller sets of users. Good luck!</p> <h3 id="im-sending-reactivation-notifications-to-inactive-users-can-i-still-measure-uninstalls">I’m sending reactivation notifications to inactive users. Can I still measure uninstalls?</h3> <p>In our thought experiment, we took it as a given that users were likely to use your app. Then we considered the effect of push notifications on that behavior. But one reason you might be contemplating sending the notifications is that they’re <em>not</em> using it, and you are trying to reactivate them.</p> <p>If that’s the case, you might want to just measure reactivations instead. After all, the difference between a user who has your app installed but never opens it and a user that has uninstalled your app is mostly philosophical. But you may also be able to design an experiment to detect uninstalls. And that might be sensible if very, very infrequent use of your app can still be valuable.</p> <p>A procedure that might work for you here is to send two notifications. You could then use delivery failures of secondary notifications as a proxy metric for uninstalls.</p> <h3 id="i-want-to-learn-more-about-this-stuff">I want to learn more about this stuff.</h3> <p>As it happens, I recorded <a href="http://shop.oreilly.com/product/0636920040149.do">a video with O’Reilly</a> that covers things like this in more detail. You might also like <a href="http://www.evanmiller.org/">Evan Miller’s blog</a> and <a href="http://ai.stanford.edu/~ronnyk/ronnyk-bib.html">Ron Kohavi’s publications</a>.</p> <hr /> <ol class="footnote-list"> <li><a name="f0"></a><em>"How many notifications are too many?"</em> is a separate question, not considered here.</li> <li><a name="f1"></a>If you do many experiments, you want to avoid using the _same_ sets of people as control and treatment. So include something based on the name of the experiment in the hash. So if user 12345 is in the treatment for 50/50 experiment X, she should be only 50% likely (not 100% likely) to be in the treatment for some other 50/50 experiment Y.</li> <li><a name="f2"></a>The labeling on the tool is for experiments on a website. The math is the same though.</li> </ol> Dan McKinley https://mcfunley.com/ Choose Boring Technology (Expanded, Slide-Based Edition) 2015-07-27T00:00:00+00:00 2015-07-27T00:00:00+00:00 urn:uuid:1432e359-01f1-977f-dd5a-0da6a2c55d5c <p>I gave a spoken word version of <a href="/choose-boring-technology">Choose Boring Technology</a> at OSCON in Portland last week. Here are the slides:</p> <div class="speakerdeck-container"> <div class="speakerdeck-loading"></div> <script id="choose-boring-technology-deck" async="" class="speakerdeck-embed" data-id="454e3843ac184d3f8bcb0e4a50d3811a" data-ratio="1.31113956466069" src="//speakerdeck.com/assets/embed.js"></script> <script>$('#choose-boring-technology-deck').speakerdeck();</script> </div> Dan McKinley https://mcfunley.com/ Choose Boring Technology 2015-03-30T00:00:00+00:00 2015-03-30T00:00:00+00:00 urn:uuid:d62993ee-047e-c4a5-1b11-e986b22566b8 <p>Probably the single best thing to happen to me in my career was having had <a href="http://laughingmeme.org/">Kellan</a> placed in charge of me. I stuck around long enough to see Kellan’s technical decisionmaking start to bear fruit. I learned a great deal <em>from</em> this, but I also learned a great deal as a <em>result</em> of this. I would not have been free to become the engineer that wrote <a href="/data-driven-products-lean-startup-2014">Data Driven Products Now!</a> if Kellan had not been there to so thoroughly stick the landing on technology choices.</p> <figure> <img src="http://i.imgur.com/FRQKLCy.jpg" /> <figcaption>Being inspirational as always.</figcaption> </figure> <p>In the year since leaving Etsy, I’ve resurrected my ability to care about technology. And my thoughts have crystallized to the point where I can write them down coherently. What follows is a distillation of the Kellan gestalt, which will hopefully serve to horrify him only slightly.</p> <h3 id="embrace-boredom">Embrace Boredom.</h3> <p>Let’s say every company gets about three innovation tokens. You can spend these however you want, but the supply is fixed for a long while. You might get a few more <em>after</em> you achieve a <a href="http://rc3.org/2015/03/24/the-pleasure-of-building-big-things/">certain level of stability and maturity</a>, but the general tendency is to overestimate the contents of your wallet. Clearly this model is approximate, but I think it helps.</p> <p>If you choose to write your website in NodeJS, you just spent one of your innovation tokens. If you choose to use <a href="/why-mongodb-never-worked-out-at-etsy">MongoDB</a>, you just spent one of your innovation tokens. If you choose to use <a href="https://consul.io/">service discovery tech that’s existed for a year or less</a>, you just spent one of your innovation tokens. If you choose to write your own database, oh god, you’re in trouble.</p> <p>Any of those choices might be sensible if you’re a javascript consultancy, or a database company. But you’re probably not. You’re probably working for a company that is at least ostensibly <a href="https://www.etsy.com">rethinking global commerce</a> or <a href="https://stripe.com">reinventing payments on the web</a> or pursuing some other suitably epic mission. In that context, devoting any of your limited attention to innovating ssh is an excellent way to fail. Or at best, delay success <a ref="#f1" href="#f1" class="footnote">[1]</a>.</p> <p>What counts as boring? That’s a little tricky. “Boring” should not be conflated with “bad.” There is technology out there that is both boring and bad <a ref="#f2" href="#f2">[2]</a>. You should not use any of that. But there are many choices of technology that are boring and good, or at least good enough. MySQL is boring. Postgres is boring. PHP is boring. Python is boring. Memcached is boring. Squid is boring. Cron is boring.</p> <p>The nice thing about boringness (so constrained) is that the capabilities of these things are well understood. But more importantly, their failure modes are well understood. Anyone who knows me well will understand that it’s only with a overwhelming sense of malaise that I now invoke the spectre of Don Rumsfeld, but I must.</p> <figure> <img src="http://i.imgur.com/n8ElWr3.jpg" /> <figcaption>To be clear, fuck this guy.</figcaption> </figure> <p>When choosing technology, you have both known unknowns and unknown unknowns <a ref="#f3" href="#f3" class="footnote">[3]</a>.</p> <ul> <li>A known unknown is something like: <em>we don’t know what happens when this database hits 100% CPU.</em></li> <li>An unknown unknown is something like: <em>geez it didn’t even occur to us that <a href="http://www.evanjones.ca/jvm-mmap-pause.html">writing stats would cause GC pauses</a>.</em></li> </ul> <p>Both sets are typically non-empty, even for tech that’s existed for decades. But for shiny new technology the magnitude of unknown unknowns is significantly larger, and this is important.</p> <h3 id="optimize-globally">Optimize Globally.</h3> <p>I unapologetically think a bias in favor of boring technology is a good thing, but it’s not the only factor that needs to be considered. Technology choices don’t happen in isolation. They have a scope that touches your entire team, organization, and the system that emerges from the sum total of your choices.</p> <p>Adding technology to your company comes with a cost. As an abstract statement this is obvious: if we’re already using Ruby, adding Python to the mix doesn’t feel sensible because the resulting complexity would outweigh Python’s marginal utility. But somehow when we’re talking about Python and Scala or MySQL and Redis people <a href="http://martinfowler.com/bliki/PolyglotPersistence.html">lose their minds</a>, discard all constraints, and start raving about using the best tool for the job.</p> <p><a href="https://twitter.com/coda/status/580531932393504768">Your function in a nutshell</a> is to map business problems onto a solution space that involves choices of software. If the choices of software were truly without baggage, you could indeed pick a whole mess of locally-the-best tools for your assortment of problems.</p> <figure> <svg width="423px" height="420px" viewBox="0 0 423 420" version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:sketch="http://www.bohemiancoding.com/sketch/ns"> <!-- Generator: Sketch 3.2.2 (9983) - http://www.bohemiancoding.com/sketch --> <title>Crazy</title> <desc>Created with Sketch.</desc> <defs></defs> <g id="Page-1" stroke="none" stroke-width="1" fill="none" fill-rule="evenodd" sketch:type="MSPage"> <g id="Crazy" sketch:type="MSLayerGroup" transform="translate(1.000000, -4.000000)"> <ellipse id="Solutions" stroke="#979797" sketch:type="MSShapeGroup" cx="341.5" cy="229.5" rx="79.5" ry="193.5"></ellipse> <ellipse id="Problems" stroke="#979797" sketch:type="MSShapeGroup" cx="79.5" cy="229.5" rx="79.5" ry="193.5"></ellipse> <g id="arrows" transform="translate(45.000000, 77.000000)" stroke="#D0011B" stroke-width="3" fill="#D0011B" stroke-linecap="square"> <path d="M19.5,26.5 L255.502121,26.5" id="Line" sketch:type="MSShapeGroup"></path> <path id="Line-decoration-1" d="M255.5,26.5 C251.72,25.45 248.48,24.55 244.7,23.5 C244.7,25.6 244.7,27.4 244.7,29.5 C248.48,28.45 251.72,27.55 255.5,26.5 C255.5,26.5 255.5,26.5 255.5,26.5 Z"></path> <path d="M19.5,26.5 L245.5,84.5" id="Line-2" sketch:type="MSShapeGroup"></path> <path id="Line-2-decoration-1" d="M245.186355,84.419507 C241.786016,82.4628271 238.87144,80.7856729 235.471101,78.8289931 C234.94908,80.8630761 234.501633,82.6065758 233.979612,84.6406589 C237.901972,84.5632557 241.263995,84.4969101 245.186355,84.419507 C245.186355,84.419507 245.186355,84.419507 245.186355,84.419507 Z"></path> <path d="M19.5,26.5 L299.5,0.5" id="Line-3" sketch:type="MSShapeGroup"></path> <path id="Line-3-decoration-1" d="M299.296324,0.518912741 C295.435434,-0.177093062 292.126099,-0.773669465 288.265208,-1.46967527 C288.459373,0.621329291 288.6258,2.41361891 288.819965,4.50462347 C292.486691,3.10962472 295.629598,1.9139115 299.296324,0.518912741 C299.296324,0.518912741 299.296324,0.518912741 299.296324,0.518912741 Z"></path> <path d="M19.5,26.5 L255.502121,26.5" id="Line-4" sketch:type="MSShapeGroup"></path> <path id="Line-4-decoration-1" d="M255.5,26.5 C251.72,25.45 248.48,24.55 244.7,23.5 C244.7,25.6 244.7,27.4 244.7,29.5 C248.48,28.45 251.72,27.55 255.5,26.5 C255.5,26.5 255.5,26.5 255.5,26.5 Z"></path> <path d="M63.5,79.5 L256.5,34.5" id="Line-5" sketch:type="MSShapeGroup"></path> <path id="Line-5-decoration-1" d="M256.327927,34.5401208 C252.408243,34.3758734 249.048513,34.2350899 245.128829,34.0708426 C245.605677,36.1159872 246.014403,37.8689684 246.49125,39.9141131 C249.934087,38.0332157 252.88509,36.4210181 256.327927,34.5401208 C256.327927,34.5401208 256.327927,34.5401208 256.327927,34.5401208 Z"></path> <path d="M63.5,79.5 L301.5,116.5" id="Line-6" sketch:type="MSShapeGroup"></path> <path id="Line-6-decoration-1" d="M300.651315,116.368062 C297.077479,114.749853 294.014192,113.362816 290.440356,111.744607 C290.117761,113.819681 289.84125,115.598316 289.518655,117.67339 C293.415086,117.216525 296.754884,116.824927 300.651315,116.368062 C300.651315,116.368062 300.651315,116.368062 300.651315,116.368062 Z"></path> <path d="M63.5,79.5 L254.5,209.5" id="Line-7" sketch:type="MSShapeGroup"></path> <path id="Line-7-decoration-1" d="M254.464216,209.475644 C251.930146,206.480751 249.758085,203.9137 247.224014,200.918806 C246.042418,202.654845 245.02962,204.142878 243.848024,205.878916 C247.563691,207.137771 250.748549,208.216789 254.464216,209.475644 C254.464216,209.475644 254.464216,209.475644 254.464216,209.475644 Z"></path> <path d="M0.5,115.5 L251.5,216.5" id="Line-8" sketch:type="MSShapeGroup"></path> <path id="Line-8-decoration-1" d="M250.981706,216.291443 C247.866929,213.906268 245.19712,211.861831 242.082342,209.476656 C241.298409,211.424847 240.626466,213.094725 239.842533,215.042916 C243.741243,215.4799 247.082995,215.854459 250.981706,216.291443 C250.981706,216.291443 250.981706,216.291443 250.981706,216.291443 Z"></path> <path d="M54.5,176.5 L300.5,193.5" id="Line-10" sketch:type="MSShapeGroup"></path> <path id="Line-10-decoration-1" d="M299.914697,193.459552 C296.216079,192.151452 293.045835,191.030224 289.347217,189.722124 C289.202441,191.817128 289.078346,193.612845 288.93357,195.707849 C292.776964,194.920945 296.071303,194.246456 299.914697,193.459552 C299.914697,193.459552 299.914697,193.459552 299.914697,193.459552 Z"></path> <path d="M54.5,176.5 L288.5,273.5" id="Line-11" sketch:type="MSShapeGroup"></path> <path id="Line-11-decoration-1" d="M288.215373,273.382013 C285.125578,270.964562 282.477183,268.892461 279.387389,266.47501 C278.58323,268.41494 277.89395,270.077737 277.089791,272.017667 C280.983745,272.495188 284.321419,272.904492 288.215373,273.382013 C288.215373,273.382013 288.215373,273.382013 288.215373,273.382013 Z"></path> <path d="M11.5,231.5 L287.5,283.5" id="Line-12" sketch:type="MSShapeGroup"></path> <path id="Line-12-decoration-1" d="M286.658962,283.341544 C283.138722,281.609837 280.121373,280.125516 276.601133,278.393809 C276.212321,280.457502 275.879054,282.226381 275.490243,284.290073 C279.399294,283.958088 282.74991,283.673529 286.658962,283.341544 C286.658962,283.341544 286.658962,283.341544 286.658962,283.341544 Z"></path> <path d="M11.5,231.5 L249.5,223.5" id="Line-13" sketch:type="MSShapeGroup"></path> <path id="Line-13-decoration-1" d="M249.36566,223.504516 C245.552519,222.582095 242.284113,221.79145 238.470973,220.869029 C238.541521,222.967844 238.601991,224.766828 238.67254,226.865643 C242.415132,225.689248 245.623068,224.68091 249.36566,223.504516 C249.36566,223.504516 249.36566,223.504516 249.36566,223.504516 Z"></path> <path d="M0.5,115.5 L248.5,156.5" id="Line-9" sketch:type="MSShapeGroup"></path> <path id="Line-9-decoration-1" d="M248.138638,156.440259 C244.580524,154.78777 241.530711,153.371351 237.972596,151.718862 C237.630068,153.790739 237.336473,155.566633 236.993945,157.63851 C240.894588,157.219122 244.237996,156.859647 248.138638,156.440259 C248.138638,156.440259 248.138638,156.440259 248.138638,156.440259 Z"></path> </g> <g id="problems" transform="translate(33.000000, 91.000000)" stroke="#979797" fill="#4990E2" sketch:type="MSShapeGroup"> <circle id="Oval-3" cx="30" cy="14" r="14"></circle> <circle id="Oval-4" cx="74" cy="66" r="14"></circle> <circle id="Oval-5" cx="14" cy="103" r="14"></circle> <circle id="Oval-6" cx="64" cy="163" r="14"></circle> <circle id="Oval-7" cx="23" cy="219" r="14"></circle> </g> <g id="Solutions" transform="translate(293.000000, 68.000000)" stroke="#979797" fill="#7ED321" sketch:type="MSShapeGroup"> <circle id="Oval-8" cx="26" cy="37" r="14"></circle> <circle id="Oval-9" cx="74" cy="69" r="14"></circle> <circle id="Oval-10" cx="14" cy="99" r="14"></circle> <circle id="Oval-11" cx="71" cy="129" r="14"></circle> <circle id="Oval-12" cx="18" cy="168" r="14"></circle> <circle id="Oval-13" cx="71" cy="205" r="14"></circle> <circle id="Oval-14" cx="22" cy="229" r="14"></circle> <circle id="Oval-15" cx="66" cy="14" r="14"></circle> <circle id="Oval-16" cx="58" cy="289" r="14"></circle> </g> <text id="Problems" sketch:type="MSTextLayer" font-family="Lato" font-size="18" font-weight="normal" fill="#000000"> <tspan x="43" y="18">Problems</tspan> </text> <text id="Technical-Solutions" sketch:type="MSTextLayer" font-family="Lato" font-size="18" font-weight="normal" fill="#000000"> <tspan x="262" y="18">Technical Solutions</tspan> </text> </g> </g> </svg> <figcaption>The way you might choose technology in a world where choices are cheap: "pick the right tool for the job."</figcaption> </figure> <p>But of course, the baggage exists. We call the baggage “operations” and to a lesser extent “cognitive overhead.” You have to monitor the thing. You have to figure out unit tests. You need to know the first thing about it to hack on it. You need an init script. I could go on for days here, and all of this adds up fast.</p> <figure> <svg width="423px" height="420px" viewBox="0 0 423 420" version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" xmlns:sketch="http://www.bohemiancoding.com/sketch/ns"> <!-- Generator: Sketch 3.2.2 (9983) - http://www.bohemiancoding.com/sketch --> <title>Sane</title> <desc>Created with Sketch.</desc> <defs></defs> <g id="Page-1" stroke="none" stroke-width="1" fill="none" fill-rule="evenodd" sketch:type="MSPage"> <g id="Sane" sketch:type="MSLayerGroup" transform="translate(1.000000, -4.000000)"> <ellipse id="Solutions-3" stroke="#979797" sketch:type="MSShapeGroup" cx="341.5" cy="229.5" rx="79.5" ry="193.5"></ellipse> <ellipse id="Problems-2" stroke="#979797" sketch:type="MSShapeGroup" cx="79.5" cy="229.5" rx="79.5" ry="193.5"></ellipse> <g id="arrows" transform="translate(51.000000, 102.000000)" stroke="#D0011B" stroke-width="3" fill="#D0011B" stroke-linecap="square"> <path d="M13.5,1.5 L249.5,1.5" id="Line-14" sketch:type="MSShapeGroup"></path> <path id="Line-14-decoration-1" d="M249.5,1.5 C245.72,0.45 242.48,-0.45 238.7,-1.5 C238.7,0.6 238.7,2.4 238.7,4.5 C242.48,3.45 245.72,2.55 249.5,1.5 C249.5,1.5 249.5,1.5 249.5,1.5 Z"></path> <path d="M13.5,1.5 L248.5,120.5" id="Line-15" sketch:type="MSShapeGroup"></path> <path id="Line-15-decoration-1" d="M248.132239,120.313772 C245.23431,117.669362 242.75037,115.402724 239.852441,112.758314 C238.903738,114.631803 238.090564,116.237651 237.141861,118.111141 C240.988493,118.882062 244.285607,119.542851 248.132239,120.313772 C248.132239,120.313772 248.132239,120.313772 248.132239,120.313772 Z"></path> <path d="M57.5,54.5 L249.5,8.5" id="Line-17" sketch:type="MSShapeGroup"></path> <path id="Line-17-decoration-1" d="M249.078398,8.6010088 C245.157787,8.46060711 241.797264,8.34026282 237.876654,8.19986114 C238.365932,10.2420674 238.785314,11.9925299 239.274592,14.0347362 C242.705924,12.1329316 245.647066,10.5028134 249.078398,8.6010088 C249.078398,8.6010088 249.078398,8.6010088 249.078398,8.6010088 Z"></path> <path d="M0.5,92.5 L240.5,137.5" id="Line-20" sketch:type="MSShapeGroup"></path> <path id="Line-20-decoration-1" d="M240.320814,137.466403 C236.79906,135.737776 233.780414,134.256096 230.25866,132.52747 C229.871654,134.591501 229.539934,136.360671 229.152928,138.424703 C233.061688,138.089298 236.412054,137.801808 240.320814,137.466403 C240.320814,137.466403 240.320814,137.466403 240.320814,137.466403 Z"></path> <path d="M57.5,52.5 L242.5,129.5" id="Line-18" sketch:type="MSShapeGroup"></path> <path id="Line-18-decoration-1" d="M242.1449,129.352202 C239.058585,126.930309 236.413173,124.854402 233.326858,122.432509 C232.51991,124.371281 231.828241,126.033085 231.021292,127.971856 C234.914555,128.454977 238.251637,128.869081 242.1449,129.352202 C242.1449,129.352202 242.1449,129.352202 242.1449,129.352202 Z"></path> <path d="M13.5,1.5 L248.5,183.5" id="Line-16" sketch:type="MSShapeGroup"></path> <path id="Line-16-decoration-1" d="M248.313733,183.355742 C245.968119,180.211065 243.957592,177.515627 241.611978,174.37095 C240.32613,176.031249 239.223974,177.454363 237.938125,179.114662 C241.569588,180.59904 244.68227,181.871364 248.313733,183.355742 C248.313733,183.355742 248.313733,183.355742 248.313733,183.355742 Z"></path> <path d="M0.5,92.5 L253.5,15.5" id="Line-19" sketch:type="MSShapeGroup"></path> <path id="Line-19-decoration-1" d="M253.061904,15.6333334 C249.139957,15.7294168 245.778289,15.8117739 241.856342,15.9078572 C242.467781,17.9168724 242.991872,19.6388854 243.603311,21.6479005 C246.913819,19.542802 249.751397,17.7384319 253.061904,15.6333334 C253.061904,15.6333334 253.061904,15.6333334 253.061904,15.6333334 Z"></path> <path d="M0.5,92.5 L244.5,191.5" id="Line-21" sketch:type="MSShapeGroup"></path> <path id="Line-21-decoration-1" d="M244.204221,191.379991 C241.09632,188.985863 238.432405,186.933753 235.324504,184.539624 C234.534968,186.485551 233.858223,188.153489 233.068687,190.099416 C236.966124,190.547618 240.306784,190.93179 244.204221,191.379991 C244.204221,191.379991 244.204221,191.379991 244.204221,191.379991 Z"></path> <path d="M49.5,150.5 L258.5,19.5" id="Line-22" sketch:type="MSShapeGroup"></path> <path id="Line-22-decoration-1" d="M257.939322,19.8514296 C254.178828,20.9692764 250.955547,21.9274308 247.195052,23.0452775 C248.310345,24.8246376 249.26631,26.3498034 250.381603,28.1291635 C253.026805,25.2319566 255.29412,22.7486365 257.939322,19.8514296 C257.939322,19.8514296 257.939322,19.8514296 257.939322,19.8514296 Z"></path> <path d="M3.5,207.5 L265.5,22.5" id="Line-23" sketch:type="MSShapeGroup"></path> <path id="Line-23-decoration-1" d="M264.902063,22.9222075 C261.208605,24.2448071 258.042784,25.378464 254.349327,26.7010636 C255.560618,28.4165147 256.598868,29.8869013 257.81016,31.6023523 C260.292326,28.5643016 262.419897,25.9602582 264.902063,22.9222075 C264.902063,22.9222075 264.902063,22.9222075 264.902063,22.9222075 Z"></path> <path d="M3.5,207.5 L243.5,147.5" id="Line-24" sketch:type="MSShapeGroup"></path> <path id="Line-24-decoration-1" d="M243.125198,147.593701 C239.203396,147.491836 235.841853,147.404523 231.920052,147.302658 C232.429376,149.339957 232.865941,151.086214 233.375265,153.123513 C236.787742,151.188079 239.712721,149.529135 243.125198,147.593701 C243.125198,147.593701 243.125198,147.593701 243.125198,147.593701 Z"></path> <path d="M3.5,207.5 L244.5,201.5" id="Line-25" sketch:type="MSShapeGroup"></path> <path id="Line-25-decoration-1" d="M244.425346,201.501859 C240.620384,200.546263 237.358988,199.72718 233.554026,198.771584 C233.606292,200.870934 233.651091,202.670376 233.703357,204.769726 C237.456053,203.625972 240.67265,202.645612 244.425346,201.501859 C244.425346,201.501859 244.425346,201.501859 244.425346,201.501859 Z"></path> </g> <g id="problems-2" transform="translate(33.000000, 91.000000)" stroke="#979797" fill="#4990E2" sketch:type="MSShapeGroup"> <circle id="Oval-3" cx="30" cy="14" r="14"></circle> <circle id="Oval-4" cx="74" cy="66" r="14"></circle> <circle id="Oval-5" cx="14" cy="103" r="14"></circle> <circle id="Oval-6" cx="64" cy="163" r="14"></circle> <circle id="Oval-7" cx="23" cy="219" r="14"></circle> </g> <g id="Solutions-2" transform="translate(293.000000, 68.000000)" stroke="#979797" fill="#7ED321" sketch:type="MSShapeGroup"> <circle id="Oval-8" cx="26" cy="37" r="14"></circle> <circle id="Oval-9" cx="74" cy="69" r="14"></circle> <circle id="Oval-10" cx="14" cy="99" r="14"></circle> <circle id="Oval-11" cx="71" cy="129" r="14"></circle> <circle id="Oval-12" cx="18" cy="168" r="14"></circle> <circle id="Oval-13" cx="71" cy="205" r="14"></circle> <circle id="Oval-14" cx="22" cy="229" r="14"></circle> <circle id="Oval-15" cx="66" cy="14" r="14"></circle> <circle id="Oval-16" cx="58" cy="289" r="14"></circle> </g> <text id="Problems-3" sketch:type="MSTextLayer" font-family="Lato" font-size="18" font-weight="normal" fill="#000000"> <tspan x="43" y="18">Problems</tspan> </text> <text id="Technical-Solutions-2" sketch:type="MSTextLayer" font-family="Lato" font-size="18" font-weight="normal" fill="#000000"> <tspan x="262" y="18">Technical Solutions</tspan> </text> </g> </g> </svg> <figcaption>The way you choose technology in the world where operations are a serious concern (i.e., "reality"). </figcaption> </figure> <p>The problem with “best tool for the job” thinking is that it takes a myopic view of the words “best” and “job.” Your job is keeping the company in business, god damn it. And the “best” tool is the one that occupies the “least worst” position for as many of your problems as possible.</p> <p>It is basically always the case that the long-term costs of keeping a system working reliably vastly exceed any inconveniences you encounter while building it. Mature and productive developers understand this.</p> <h3 id="choose-new-technology-sometimes">Choose New Technology, Sometimes.</h3> <p>Taking this reasoning to its <em>reductio ad absurdum</em> would mean picking Java, and then trying to implement a website without using anything else at all. And that would be crazy. You need some means to add things to your toolbox.</p> <p>An important first step is to acknowledge that this is a process, and a conversation. New tech eventually has company-wide effects, so adding tech is a decision that requires company-wide visibility. Your organizational specifics may force the conversation, or <a href="https://twitter.com/mcfunley/status/578603932949164032">they may facilitate developers adding new databases and queues without talking to anyone</a>. One way or another you have to set cultural expectations that <strong>this is something we all talk about</strong>.</p> <p>One of the most worthwhile exercises I recommend here is to <strong>consider how you would solve your immediate problem without adding anything new</strong>. First, posing this question should detect the situation where the “problem” is that someone really wants to use the technology. If that is the case, you should immediately abort.</p> <figure> <img src="http://i.imgur.com/rmdSx.gif" /> <figcaption>I just watched a webinar about this graph database, we should try it out.</figcaption> </figure> <p>It can be amazing how far a small set of technology choices can go. The answer to this question in practice is almost never “we can’t do it,” it’s usually just somewhere on the spectrum of “well, we could do it, but it would be too hard” <a ref="#f4" href="#f4" class="footnote">[4]</a>. If you think you can’t accomplish your goals with what you’ve got now, you are probably just not thinking creatively enough.</p> <p>It’s helpful to <strong>write down exactly what it is about the current stack that makes solving the problem prohibitively expensive and difficult.</strong> This is related to the previous exercise, but it’s subtly different.</p> <p>New technology choices might be purely additive (for example: “we don’t have caching yet, so let’s add memcached”). But they might also overlap or replace things you are already using. If that’s the case, you should <strong>set clear expectations about migrating old functionality to the new system.</strong> The policy should typically be “we’re committed to migrating,” with a proposed timeline. The intention of this step is to keep wreckage at manageable levels, and to avoid proliferating locally-optimal solutions.</p> <p>This process is not daunting, and it’s not much of a hassle. It’s a handful of questions to fill out as homework, followed by a meeting to talk about it. I think that if a new technology (or a new service to be created on your infrastructure) can pass through this gauntlet unscathed, adding it is fine.</p> <h3 id="just-ship">Just Ship.</h3> <p>Polyglot programming is sold with the promise that letting developers choose their own tools with complete freedom will make them more effective at solving problems. This is a naive definition of the problems at best, and motivated reasoning at worst. The weight of day-to-day operational <a href="https://twitter.com/handler">toil</a> this creates crushes you to death.</p> <p>Mindful choice of technology gives engineering minds real freedom: the freedom to <a href="/effective-web-experimentation-as-a-homo-narrans">contemplate bigger questions</a>. Technology for its own sake is snake oil.</p> <p><em>Update, July 27th 2015: I’ve produced a talk based on this article. You can see it <a href="/choose-boring-technology-slides">here</a>.</em></p> <hr /> <ol class="footnote-list"> <li><a name="f1"></a>Etsy in its early years suffered from this pretty badly. We hired a bunch of Python programmers and decided that we needed to find something for them to do in Python, and the only thing that came to mind was creating a pointless middle layer that [required years of effort to amputate](https://www.youtube.com/watch?v=eenrfm50mXw). Meanwhile, the 90th percentile search latency was about two minutes. [Etsy didn't fail](http://www.sec.gov/Archives/edgar/data/1370637/000119312515077045/d806992ds1.htm), but it went several years without shipping anything at all. So it took longer to succeed than it needed to. </li> <li><a name="f2"></a>We often casually refer to the boring/bad intersection of doom as "enterprise software," but that terminology may be imprecise. </li> <li><a name="f3"></a>In saying this Rumsfeld was either intentionally or unintentionally alluding to [the Socratic Paradox](http://en.wikipedia.org/wiki/I_know_that_I_know_nothing). Socrates was by all accounts a thoughtful individual in a number of ways that Rumsfeld is not. </li> <li><a name="f4"></a><p>A good example of this from my experience is [Etsy's activity feeds](https://speakerdeck.com/mcfunley/etsy-activity-feed-architecture). When we built this feature, we were working pretty hard to consolidate most of Etsy onto PHP, MySQL, Memcached, and Gearman (a PHP job server). It was much more complicated to implement the feature on that stack than it might have been with something like Redis (or [maybe not](https://aphyr.com/posts/283-call-me-maybe-redis)). But it is absolutely possible to build activity feeds on that stack.</p> <p>An amazing thing happened with that project: our attention turned elsewhere for several years. During that time, activity feeds scaled up 20x while <em>nobody was watching it at all.</em> We made no changes whatsoever specifically targeted at activity feeds, but everything worked out fine as usage exploded because we were using a shared platform. This is the long-term benefit of restraint in technology choices in a nutshell.</p> <p>This isn't an absolutist position--while activity feeds stored in memcached was judged to be practical, implementing full text search with faceting in raw PHP wasn't. So Etsy used Solr. </p> </li> </ol> Dan McKinley https://mcfunley.com/ Data Driven Products: Lean Startup 2014 2015-01-27T00:00:00+00:00 2015-01-27T00:00:00+00:00 urn:uuid:f99e4b0b-e3c6-2adc-4d38-eccef199f91a <p>Here’s a video of me doing a slightly-amended version of my <a href="/data-driven-products-now">Data Driven Products</a> talk at the <a href="http://leanstartup.co/">Lean Startup Conference</a> back in December.</p> <iframe class="video" src="//www.youtube.com/embed/SZOeV-S-2co?list=PL1M9pu1POlelJcmYWGv_Oq5FPr0J1XKa5" frameborder="0" allowfullscreen=""></iframe> <p>I am told I <a href="http://en.wikipedia.org/wiki/High_rising_terminal">upspeak</a>? You be the judge.</p> Dan McKinley https://mcfunley.com/ Thoughts on the Technical Track 2014-12-09T00:00:00+00:00 2014-12-09T00:00:00+00:00 urn:uuid:4468bf6c-533e-e431-97ad-16ad3a6bad8b <p>I saw <a href="http://lizthedeveloper.com/how-to-reward-skilled-coders-with-something-other-than-people-management">lizTheDeveloper’s post</a> about technical leadership at Simple and I realized that I’ve been meaning to write about this for a while. I hope to persuade you that there are a number of systemic biases working against a healthy technical career path. I don’t think that they’re insurmountable, and I don’t disagree with Liz’s post. But I’ve never heard of a company clearing all of these hurdles at once.</p> <p>I was the first person at Etsy with the title of “Principal Engineer,” which was the technical equivalent to a directorship (i.e., one level below CTO). I’m not saying this to toot my own horn, but rather so that it’s understood that the following comes from someone that was the beneficiary of an existing system.</p> <p>(Incidentally, I think Etsy is an example of a company whose heart is in the right place, and it’s not my intention to single them out.)</p> <h3 id="to-review-management-is-a-job">To Review, Management is a Job</h3> <p>My views on the merits of having a technical track align with those of many people in our industry. Management is a different job, with different skills. They’re not necessarily more <em>difficult</em> skills, they’re just <em>different</em>. By and large they’re unrelated to the day-to-day labor of the people who build technology products.</p> <p>It doesn’t make any sense to divert your technical talent into a discipline where they will need to stop doing technical work. (That’s in the event that they intend to be effective managers, which I concede might be an unrealistic expectation.)</p> <p>Other people have made this case, so I’ll just proceed as if we agree that there must be a way forward for people that are great programmers other than to simply graduate into not programming at all.</p> <p>Having that way forward is an ideal. There is always a gap between our ideals and reality, and we cannot act as though we’ve solved a problem simply by articulating it.</p> <p><a href="#asymmetry" name="asymmetry" class="major-section">Fundamental Asymmetries</a></p> <h3 id="management-just-happens">Management Just Happens</h3> <p>I have had management responsibility thrust upon me at least four times over the course of my career, and at no point has that been my goal. It just happens. Do you want to be a manager? I will now tell you the secret to becoming a manager in a growing company: <em>just wait.</em></p> <p>You have a manager. Eventually, your manager will accrue too many responsibilities, and they will freak out. They will need somebody to take over some of their reports, and that lucky warm body is you.</p> <figure> <img src="/assets/images/homer-manager.png" /> <figcaption>Good hair: also helpful.</figcaption> </figure> <p>It is entirely plausible to become a manager accidentally. It might even be the norm.</p> <h3 id="technical-track-promotions-are-post-hoc">Technical Track Promotions are Post-Hoc</h3> <p>The process for minting a new manager is: <em>crap, we need another manager</em>. There’s no symmetrical forcing function pushing people into the upper ranks of technical leadership.</p> <p>Mentorship and technical feedback are things everyone does on a functioning engineering team. A technical track “promotion” is merely additional recognition given to someone who is already performing that role notably well.</p> <p>If the job is already getting done, then filling the job is clearly not a pressing need. Technical promotions are something that happen when it’s convenient, which is generally never.</p> <h3 id="stumping">Stumping</h3> <p>Between the founding of the United States and the end of the 19th century, it was considered tacky for presidential candidates to personally campaign for the job. Instead, they staged an elaborate farce in which they reluctantly answered the call of the nation to serve. Trying to intentionally get a promotion into the technical track is pretty much just like this.</p> <figure> <img src="/assets/images/garfield.jpg" /> <figcaption>Getting promoted in the technical track is kind of like being James Garfield.</figcaption> </figure> <p>Your work must be recognized, and this is the rub. Let me rephrase: “someone with the power to bestow promotions has to be your fan.” To be promoted you have to be a good mentor, but you also have to worry about playing to an audience. That may be executives, or it may be your peers (and potential competitors). Regardless, you’re running a weird campaign in which actually saying anything directly about wanting the job would be gauche.</p> <p>The most qualified individual contributors may become <em>known</em> without ever really doing this on purpose, but that doesn’t say much for this as a tenable career goal of the sort that can be counted on.</p> <p><a href="#credibility" name="credibility" class="major-section">The Problem of Credibility</a></p> <h3 id="society-applies-to-idealistic-tech-companies-too">Society Applies to Idealistic Tech Companies, Too</h3> <p>American society is not a classless oasis. That’s a lie we tell ourselves. And the person who knows what everyone else gets paid and can fire you is not in your class.</p> <p>A technical job does not have equivalent prestige to a management position with an equivalent salary just because you say it does. Even if you conquer this within your own company, it’s not true in the rest of the industry, and it’s not true in the world at large. In the world our parents live in, it’s a big deal to be somebody else’s boss.</p> <p>You’re hiring people from the world at large all the time. Without continuous effort a technical track decays to its ground state, where the jobs are second class.</p> <h3 id="halfhearted-managers-are-the-worst">Halfhearted Managers are The Worst</h3> <p>The natural result of a system in which technical promotions can’t be counted on and are viewed as suspiciously-maybe-second-class anyway is that people who don’t really give a shit about management wind up going into management. Given the choice of waiting for a technical promotion that may never arrive and taking an offer to manage others, almost everyone is going to take the bird in the hand.</p> <figure> <img src="/assets/images/lumberg.jpg" /> <figcaption>Once you let the soulless suspendered lizard in the building, you are screwed.</figcaption> </figure> <p>Managers that have no passion for management are a blight on society. I can say this because I have been one of them. I was never a good manager, and for that I apologize to anyone that ever had to report to me.</p> <p>I am not an isolated case. Many people in management are frankly terrible at it. And they would rather have technical track jobs anyway, but they have no idea how to make the switch. A credible technical track is a great way to ensure a higher level of satisfaction and competency among the <em>managers</em>.</p> <h3 id="ratios-observed-in-the-wild-make-no-sense">Ratios Observed in the Wild Make No Sense</h3> <p>You don’t need to take my reasoning about the intrinsic pressure favoring management bloat at face value. You can actually look at the ratio of managers to technical employees at your company.</p> <p>At one point, I was alone at my level. There were five theoretically-equivalent directors at the time. The ratio was at least that bad on the lower rungs. (I have no idea if this is still true at that company, and it might not be.)</p> <p>For that to make sense, we’d have to believe a few things that don’t stand up to scrutiny. First, we’d have to believe in a very high proclivity among engineers to manage, and I think that betrays our expectations. Not very many of us got into this business with the hope of not actually building things.</p> <p>Second, we’d have to believe that although it took five directors to effectively manage the organization, only one technical leader was required to advise the same group on the details of the work they do every day.</p> <p><a name="improvements" href="#improvements" class="major-section">What Might Help?</a></p> <h3 id="promotions-should-not-be-miraculous-and-rare">Promotions Should Not Be Miraculous and Rare</h3> <p>Of course, it wouldn’t make logical sense to say that the ratio of individual contributors to managers at a given level must be 1:1. I honestly don’t know if 1:2 or 2:1 is closer to correct. The answer is probably contingent, and the relationship might not be linear.</p> <p>But I think it’s important for any company that takes the ideal of having a tenable technical track seriously to put a stake in the ground on this question. It’s hard to build a credible technical track, and we need a baseline to grade ourselves against.</p> <p>I don’t think that proceeding with the assumption that leaders will just naturally emerge produces the best results. Adding a self-imposed quota achieves accountability. It acknowledges the possibility that problems can lie in the system of recognition, and not only in the talents of the people in the pool for promotions.</p> <p><em>“Do we think that we hire smart people here? Yes? Then we should be able to find N of them worthy of promotion for every manager. If we can’t then the problem is most likely to be found in how we’re recognizing people for their work.”</em></p> <p>I know that the word “quota” is <em>verboten</em> for many, and I gleefully await your flames.</p> <h3 id="address-prestige-with-superpowers">Address Prestige with Superpowers</h3> <p>If we think about why managers and technical employees on even salary footing may be perceived to not truly be equals, it comes down to superpowers. The managers have special capabilities that the technical employees don’t: hiring, firing, compensation, and the like. Is it possible to give technical employees a different set of superpowers, to address the prestige problem?</p> <p>Maybe. I don’t think that I have seen this done correctly yet. If I had superpowers, they were:</p> <ul> <li>The ability to work on whatever I wanted.</li> <li>The ability to talk to anyone I wanted.</li> </ul> <p>These were indeed powerful, but using them to create positive action was difficult. It would have been easy for me to opt out of projects that I didn’t believe in and to do my own thing. I did often do my own thing. But I also worked on projects that I didn’t believe in, because I knew that opting out was a selfish act. One of my friends would just be forced to work on it in my place, and sometimes leadership is about jumping on grenades.</p> <figure> <img src="/assets/images/dark-knight.jpg" /> <figcaption>I guess there are worse superpowers. For example, the ability to allow oneself to be framed for the good of the city.</figcaption> </figure> <p>Talking to other teams made it possible for me to point out places where resources weren’t intelligently allocated. But this also begat mostly negative actions. “Hey, this isn’t the best way to use these folks,” I’d find myself saying all the time. It was draining, and a bummer.</p> <p>Giving the technical leadership deeper involvement in the planning process could address this. Of course that would involve dragging the technical leadership to meetings, which I admit is tricky.</p> <p><a name="closing" href="#closing" class="major-section">In Closing</a></p> <p>I hope I’ve demonstrated that creating a career path outside of management for technical employees is only the beginning of your problems. It’s a good and necessary step, but it’s not an achievement by itself.</p> <p>I’d love to hear from anyone with better ideas. These issues are difficult and I don’t claim to have all of the right answers.</p> Dan McKinley https://mcfunley.com/ Data Driven Products Now! 2014-09-18T00:00:00+00:00 2014-09-18T00:00:00+00:00 urn:uuid:e89c5588-5740-8e4f-715e-0cc2377e0fa9 <p>Back when I was at Etsy, I did a presentation internally about the craft of sizing opportunities. I finally got around to writing a public incarnation of that talk. Here it is:</p> <div class="speakerdeck-container"> <div class="speakerdeck-loading"></div> <script id="data-driven-products-now-deck" async="" class="speakerdeck-embed" data-id="13b6d210211a01327085562b5da4981b" data-ratio="1.0" src="//speakerdeck.com/assets/embed.js"></script> <script>$('#data-driven-products-now-deck').speakerdeck();</script> </div> Dan McKinley https://mcfunley.com/ Manual Delivery 2014-03-10T00:00:00+00:00 2014-03-10T00:00:00+00:00 urn:uuid:c153dff4-755b-8a55-4e30-3150a8fba544 <p>The person on build rotation, or the nightly <em>schlimazel</em> I suppose, went into a hot 5’x8’ closet containing an ancient computer. This happened after everyone else had left, so around 8:30PM. Although in crunch time that was more like 11:30PM. And we were in crunch time at one point for a stretch of a year and a half. “That release left a mark,” my friend Matt used to say. In a halfhearted attempt at fairness to those who will take this post as a grave insult, I’ll concede that my remembrance of these details is the work of The Mark.</p> <p>Anyway, the build happened after quitting time. This guaranteed that if anything went wrong, you were on your own. Failure in giving birth to the test build implied that the 20 people in Gurgaon comprising the QA department would show up for work in a matter of hours having nothing to do.</p> <p>You used a tool called “VBBuild.” This was a GUI tool, rumored to be written by Russians:</p> <p><img src="/assets/images/vbbuild.gif" alt="VBBuild" /></p> <p>VBBuild did mysterious COM stuff to create the DLLs that nobody at the time understood properly. It presented you with dozens of popups even when it was working perfectly, and you had to be present to dismiss each of them. The production of executable binary code was all smoke and lasers. And, apparently, popups.</p> <p>Developers wrote code using the more familiar VB6 IDE. The IDE could run interpreted code as an interactive debugger, but it could not produce finished libraries in a particularly repeatable or practical way. So the release compilation was different in many respects from what programmers were doing at their desks. Were there problems that existed in one of these environments but not the other? Yes, sometimes. I recall that we had a single function that weighed in at around 70,000 lines. The IDE would give up and execute this function even if it contained clear syntax errors. That was the kind of discovery which, while exciting, was wasted in solitude somewhere past midnight as you attempted to lex and parse the code for keeps.</p> <figure> <img src="/assets/images/vb6.jpg" alt="VB6" /> <figcaption>Isaiah 2:4: "And he shall displace VB6 in search engine results with a book written by vegans."</figcaption> </figure> <p>Developers weren’t really in the habit of doing complete pulls from source control. And who could blame them, since doing this whitescreened your machine for half an hour. They were also never in any particular hurry to commit, at least until it was time to do the test build. As there was no continuous integration at the time, this was the first time that all of the code was compiled in several days.</p> <p>Often <em>[ed: always]</em> there were compilation errors to be resolved. We were using Visual Sourcesafe, so people could be holding an exclusive lock on files containing the errors. Typically, this problem was addressed by walking around the office an hour before build time and reminding everyone to check their files in. In the event that someone forgot <em>[ed: every time]</em>, there was an administrative process for unlocking locked files. Not everyone had the necessary rights to do this, but happily, I did.</p> <p>By design, the build tried to assume an exclusive lock on all of the code. As a result, nobody could work while the build was in progress. Sometimes, the person performing the build would check all of the files out and not check them back in. So your first act the morning after a build might be to walk over to the build closet and release the source files from their chains.</p> <figure> <img src="/assets/images/vss.gif" alt="Visual Sourcesafe" /> <figcaption>The Visual Sourcesafe documentation strongly advised against its use on a team of more than four programmers, and apparently this was not a joke.</figcaption> </figure> <p>Deployment required dozens of manual steps that I will never be able to remember. When the build was done, you copied DLLs over to the test machines and registered them there. By “copied” I mean that you selected them in an explorer window, pressed “Ctrl-C,” and then pressed “Ctrl-V” to paste them into another. There was no batch script worked out to do this more efficiently. Ok, this is a slight lie. There had <em>been</em> a script, but was put out to pasture on account of a history of hideous malfunction. And popups. On remote machines sometimes, where they could only be dismissed by wind and ghosts.</p> <p>Registration involved connecting to each machine with Remote Desktop and right clicking all the DLLs. You could skip a machine or just one library, and things would be very screwy indeed.</p> <p>The production release, which happened roughly twice a year under ideal conditions, was identical to this but with the added complexity of about eight more servers receiving the build. And we might take the opportunity to add completely new machines, which would not necessarily have the same patch levels for, oh, like 700,000 windows components that were relied upon.</p> <p>Given eight or ten machines, the probability of a mistake on at least one of the servers approached unity. So the days and weeks following a production release were generally spent sussing out all of the minute differences and misconfigurations on the production machines. There would be catastrophic bugs that affected a tiny sliver of requests, under highly specific server conditions, and <em>only if executed on one server out of eight</em>. I was an expert at debugging in disassembly at the time. Upon leaving the job, I thought that this was pretty badass. But in the seven years since–do you know what? It’s never come up.</p> <figure> <img src="/assets/images/sandp.jpg" alt="Nonstandard &amp; poorly reproducible builds is more like it am I right" /> <figcaption>"The code could be <a href="http://www.bloomberg.com/news/2013-02-05/s-p-analyst-joked-of-bringing-down-the-house-ahead-of-collapse.html">structured by cows</a> and we would build it by hand."</figcaption> </figure> <p>At one point I wrote a new script to perform the deployment. It was an abomination of XML to be sure, but it got the job done without all of the popups. I started doing the test build with this with some success and suggested that we use it for the production release. This was out of the question, I was told by one of my closer allies in the place. The production release was “too important to use a script.”</p> <p>The operating systems and supporting libraries on the machines were also set up by hand, by a separate team, working from printed notes. The results were similar. This is kind of another story.</p> <p>This all happened in 2003.</p> Dan McKinley https://mcfunley.com/ Scalding at Etsy 2014-03-02T00:00:00+00:00 2014-03-02T00:00:00+00:00 urn:uuid:b2555cf4-db74-983b-cde4-1da747c34460 <p>Here’s a presentation I gave about how Etsy wound up using <a href="https://github.com/twitter/scalding">Scalding</a> for analysis. Given at the <a href="http://www.meetup.com/cascading/">San Francisco Cascading Meetup</a>.</p> <div class="speakerdeck-container"> <div class="speakerdeck-loading"></div> <script id="scalding-at-etsy-deck" async="" class="speakerdeck-embed" data-id="309f7f7083c90131707926064ba69595" data-ratio="1.0" src="//speakerdeck.com/assets/embed.js"></script> <script>$('#scalding-at-etsy-deck').speakerdeck();</script> </div> Dan McKinley https://mcfunley.com/ The Case for Secrecy in Web Experiments 2014-01-16T00:00:00+00:00 2014-01-16T00:00:00+00:00 urn:uuid:b055f44c-2b32-1e5b-a566-4e79beea5e83 <p>For four months ending in early 2011, I worked on team of six to redesign Etsy’s homepage. I don’t want to overstate the weight of this in the grand scheme of things, but hopes flew high. The new version was to look something like this:</p> <figure> <a href="/assets/images/nhp2010-big.png"><img src="/assets/images/nhp2010-big.png" /></a> </figure> <p>There were a number of methodological problems with this, one of our very first web experiments. Our statistics muscles were out of practice, and we had a very difficult time <a href="/whom-the-gods-would-destroy-they-first-give-real-time-analytics">fighting the forces of darkness who wanted to enact radical redesigns after five minutes of real-time data</a>. We had no toolchain for running experiments to speak of. The nascent analytics pipeline jobs failed every single night.</p> <p>But perhaps worst of all, we publicized the experiment. Well, “publicized” does not accurately convey the magnitude of what we did. We allowed visitors to join the treatment group using a magic URL. We proactively told our most engaged users about this. We tweeted the magic URL from the <a href="http://www.twitter.com/etsy">@Etsy account</a>, which at that point had well over a million followers.</p> <figure> <a href="http://www.etsy.com/teams/7718/questions/discuss/6848711/page/3?post_id=60817018"><img src="/assets/images/nhp-forum-post.png" alt="The magic URL was chosen to celebrate the CEO&apos;s 31st birthday." /></a> <figcaption>The magic URL was chosen to celebrate the CEO's 31st birthday. None of this was Juliet's fault.</figcaption> </figure> <p>This project was a disaster for many reasons. Nearly all of the core hypotheses turned out to be completely wrong. The work was thrown out as a total loss. Everyone involved learned valuable life lessons. I am here today to elaborate on one of these: <em>telling users about the experiment as it was running was a big mistake.</em></p> <h3 id="the-diamond-forging-pressure-to-disclose-experiments">The Diamond-Forging Pressure to Disclose Experiments</h3> <p>If you operate a website with an active community, and you do A/B testing, you might feel some pressure to disclose your work. And this seems like a proper thing to do, if your users are invested in your site in any serious way. They may notice anyway, and the <a href="http://instagram.com/p/f3HLODBQdH/">most common reaction to change on a beloved site</a> tends to be varying degrees of panic.</p> <figure> <a href="http://www.businessinsider.com/mark-zuckerberg-joins-facebook-group-i-automatically-hate-the-new-facebook-home-page-2009-10"><img alt="If you can&apos;t beat &apos;em, join &apos;em" class="thinborder" src="/assets/images/mz-story.png" /></a> <figcaption>"If you can't beat 'em, join 'em."</figcaption> </figure> <p>As an honest administrator, your wish is to reassure your community that you have their best interest at heart. Transparency is the best policy!</p> <p>Except in this case. I think there’s a strong argument to be made against announcing the details of active experiments. It turns out to be easier for motivated users to overturn your experiment than you may believe. And disclosing experiments is work, and work that comes before real data should be minimized.</p> <h3 id="online-protests-not-necessarily-a-waste-of-time">Online Protests: Not Necessarily A Waste of Time</h3> <p>A fundamental reason that you should not publicize your A/B tests is that this can introduce <a href="http://en.wikipedia.org/wiki/Bias_(statistics)">bias</a> that can affect your measurements. This can even overturn your results. There are many different ways for this to play out.</p> <p>Most directly, motivated users can just perform positive actions on the site if they believe that they are in their preferred experiment bucket. Even if the control and treatment groups are very large, the number of people completing a goal metric (such as purchasing) may be just a fraction of that. And the anticipated difference between any two treatments might be slight. It’s not hard to imagine how a small group of people could determine an outcome if they knew exactly what to do.</p> <figure> <table class="table table-striped desktop"> <tr> <th>Group</th> <th>Visits</th> <th>Conversions (organic)</th> <th>Conversions (gamed)</th> <th>Proportion</th> </tr> <tr> <td>Control</td> <td>10000</td> <td>50</td> <td class="negative attention">10</td> <td class="positive">0.0060</td> </tr> <tr> <td>New</td> <td>10000</td> <td>55</td> <td>0</td> <td class="negative">0.0055</td> </tr> </table><table class="table table-striped mobile"> <tr> <th>Control</th> <th>New</th> </tr> <tr> <td>10000 visits</td> <td>10000 visits</td> </tr> <tr> <td>50 organic conversions</td> <td>50 organic conversions</td> </tr> <tr> <td class="negative attention">10 gamed conversions</td> <td>0 gamed conversions</td> </tr> <tr> <td class="positive">0.60% converted</td> <td class="negative">0.55% converted</td> </tr> </table> <figcaption>Figure 1: In some cases a small group of motivated users can change an outcome, even if the sample sizes are large.</figcaption> </figure> <p>As the scope and details of an experiment become more fully understood, this gets easier to accomplish. But intentional, organized action is not the only possible source of bias.</p> <p>Even if users have no preference as to which version of a feature wins, some will still be curious. If you announce an experiment, visitors will engage with the feature immediately who otherwise would have stayed away. This well-intentioned interest could ironically make a winning feature appear to be a loss. Here’s an illustration of what that looks like.</p> <figure> <table class="table table-striped desktop"> <tr> <th>Group</th> <th>Visits (oblivious)</th> <th>Visits (rubbernecking)</th> <th>Visits (total)</th> <th>Conversions</th> <th>Proportion</th> </tr> <tr> <td>Control</td> <td>500</td> <td>50</td> <td>550</td> <td>30</td> <td class="positive">0.055</td> </tr> <tr> <td>New</td> <td>500</td> <td class="negative attention">250</td> <td>750</td> <td>35</td> <td class="negative">0.047</td> </tr> </table><table class="table table-striped mobile"> <tr> <th>Control</th> <th>New</th> </tr> <tr> <td>500 oblivious visits</td> <td>500 oblivious visits</td> </tr> <tr> <td>50 rubbernecking visits</td> <td class="negative attention">250 rubbernecking visits</td> </tr> <tr> <td>550 total visits</td> <td>750 total visits</td> </tr> <tr> <td>30 conversions</td> <td>35 conversions</td> </tr> <tr> <td class="positive">5.5% converted</td> <td class="negative">4.7% converted</td> </tr> </table> <figcaption>Figure 2: An example in which 100 engaged users are told about a new experiment. They are all curious and seek out the feature. Those seeing the new treatment visit the new feature more often just to look at it, skewing measurement.</figcaption> </figure> <p>These examples both involve the distortion of numbers on one side of an experiment, but <a href="http://en.wikipedia.org/wiki/Novelty_effect">many other scenarios</a> are possible. Users may change their behavior in either group for <a href="http://en.wikipedia.org/wiki/Hawthorne_effect">no reason other than that they believe they are being measured</a>.</p> <p>Good experimental practice requires that you isolate the intended change as the sole variable being tested. To accomplish this, you randomly assign visitors the new treatment or the old, controlling for all other factors. Informing visitors that they’re part of an experiment places this central assumption in considerable jeopardy.</p> <h3 id="predicting-bias-is-hard">Predicting Bias is Hard</h3> <p>“But,” you might say, “most users aren’t paying attention to our communiqués.” You may think that you can announce experiments, and only a small group of the most engaged people will notice. This is very likely true. But as I have already shown, the behavior of a small group cannot be dismissed out of hand.</p> <p>Obviously, this varies. There <em>are</em> experiments in which a vocal minority cannot possibly bias results. But determining if this is true for any given experiment in advance is a difficult task. There is roughly one way for an experiment to be conducted correctly, and there are an infinite number of ways for it to be screwed.</p> <p>A/B tests are already complicated: bucketing, data collection, experimental design, <a href="http://www.experimentcalculator.com">experimental power</a>, and analysis are all vulnerable to mistakes. From this point of view, <em>“is it safe to talk about this?”</em> is just another brittle moving part.</p> <h3 id="communication-plans-are-real-work">Communication Plans are Real Work</h3> <p>Something I have come to appreciate over the years is the role of product marketing. I have been involved in many releases for which the act of explaining and gaining acceptance for a new feature constituted the <em>majority</em> of the effort. Launches involve a lot more than pressing a deploy button. This is a big deal.</p> <figure> <iframe class="video" src="//player.vimeo.com/video/27836540?title=0&amp;byline=0&amp;portrait=0&amp;color=ffffff" frameborder="0" webkitallowfullscreen="" mozallowfullscreen="" allowfullscreen=""></iframe> <figcaption>Product marketing: this is serious business.</figcaption> </figure> <p>It also seems to be true that <a href="https://twitter.com/Nat_S">people who are skilled at this kind of work</a> are hard to come by. You will be lucky to have a few of them, and this imposes limits on the number of major changes that you can make in any given year.</p> <p>It makes excellent sense to avoid wasting this resource on quite-possibly-fleeting experiments. It will delay their deployment, steal cycles from launches for finished features, and it will do these things in the service of work that may never see the light of day!</p> <p>Users will tend to view any experiment as presaging an imminent release, regardless of your intentions. Therefore, you will need to put together a relatively complete narrative explaining why the changes are positive at the outset. A “minimum viable announcement” probably won’t do. And you will need to execute this without the benefit of quantitative results to bolster your case.</p> <h3 id="your-daily-reminder-that-experiments-fail">Your Daily Reminder that Experiments Fail</h3> <p>Doing data-driven product work really does imply that you will not release changes that don’t meet some quantitative standard. In such an event you might tweak things and start over, or you might give up altogether. Announcing your running experiments is problematic given this reality.</p> <p>Obviously, product costs will be compounded by communication costs. Every time you retool an experiment, you will have to bear the additional weight of updating your community. Adding marginal effort makes it more difficult for humans to behave rationally and objectively. We have a name for this well-known pathology: <a href="http://en.wikipedia.org/wiki/Sunk_costs">the sunk cost fallacy</a>. <em>We’ve put so much into this feature, we can’t just give up on it now.</em></p> <figure> <img src="/assets/images/pillory.jpg" /> <figcaption>The fear of admitting mistakes in public can be motivating.</figcaption> </figure> <p>Announcing experiments also has a way of raising the stakes. The prospect of backtracking with your users (and being perceived as admitting a mistake) only makes killing a bad feature less palatable. The last thing you need is additional temptation to delude yourself. You have plenty of this already. The danger of living in public is that it will turn a bad release that should be discarded into an inevitability.</p> <h3 id="consistency-and-expectations">Consistency and Expectations</h3> <p>Let’s say you’ve figured out workarounds for every issue I’ve raised so far. You are still going to want to run experiments that are not publicly declared.</p> <p>Some experiments are inherently controversial or exploratory. It may be perfectly legitimate to try changes that you would never release to learn more about your site. Removing a dearly beloved feature temporarily for half of new registrations is a good example of this. By doing so, you can measure the effect of that feature on lifetime value, and make better decisions with your marketing budget.</p> <p>Other experiments work only when they’re difficult to detect. Search ranking is a high-stakes arms race, and complete transparency can just make it easier for malicious users gain unfair advantages. It’s likely you’re going to want to run experiments on search ranking without disclosing them.</p> <p>It would be malpractice to give users the expectation that they will always know the state of running experiments. They will not have the complete picture. Leading them to believe otherwise can do more harm to your relationship than just having a consistent policy of remaining silent until features are ready for release.</p> <h3 id="what-can-you-share">What can you share?</h3> <p>Sharing too much too soon can doom your A/B tests. But this doesn’t mean that you are doomed to be locked in a steel cage match with your user base over them.</p> <figure> <img src="/assets/images/cagematch.jpg" alt="Forum moderators of the world: good luck." /> <figcaption>Forum moderators of the world: good luck.</figcaption> </figure> <p>You can do rigorous, well-controlled experiments and also announce features in advance of their release. You can give people time to acclimate to them. You can let users preview new functionality, and enable them at a slower pace. These practices all relate to <em>how</em> a feature is released, and they are not necessarily in conflict with how you decide <em>which</em> features should be released. It is important to decouple these concerns.</p> <p>You can and should share information about completed experiments. “What happened in the A/B test” should be a regular feature of your release notes. If you really have determined that your new functionality performs better than what it replaces, your users should have this data.</p> <figure> <a href="https://www.etsy.com/teams/7716/announcements/discuss/12732278/page/1"><img src="/assets/images/nlp-announce.png" /></a> <figcaption>Plain-language A/B test results can ease user anxiety in launches.</figcaption> </figure> <p>Counterintuitively, perhaps, trust is also improved by sharing the details of failed experiments. If you only tell users about your victories, they have no reason to believe that you are behaving objectively. Who’s to say that you aren’t just making up your numbers? Showing your scars (as I tried to do with my homepage story above) can serve as a powerful declaration against interest.</p> <h3 id="successful-testing-is-good-stewardship">Successful Testing is Good Stewardship</h3> <p>Your job in product development, very broadly, is to make progress while striking a balance between short and long term concerns.</p> <ul> <li>Users should be as happy as possible in the short term.</li> <li>Your site should continue to exist in the long term.</li> </ul> <p>The best interest of your users is ultimately served by making the correct changes to your product. Talking about experiments can break them, leading to both quantitative errors and mistakes of judgment.</p> <p>I firmly believe that A/B tests in any organization should be as free, easy, and cheap as humanly possible. After all, <a href="/testing-to-cull-the-living-flower">running A/B tests is perhaps the only way to know that you’re making the right changes</a>. Disclosing experiments as they are running is a policy that can alleviate some discontent in the short term. But the price of this is making experiments harder to run in the long term, and ultimately making it less likely that measurement will be done at all.</p> <p class="acknowledgements"> Thanks to <a href="http://twitter.com/nellwyn">Nell Thomas</a>, <a href="http://twitter.com/stevemardenfeld">Steve Mardenfeld</a>, and <a href="http://hilaryparker.com/">Dr. Parker</a> for their help on this. </p> Dan McKinley https://mcfunley.com/ Growth Hacker TV Interview 2013-07-01T00:00:00+00:00 2013-07-01T00:00:00+00:00 urn:uuid:39551beb-2559-b828-c7fb-d1b3275391b8 <p><a href="https://www.growthhacker.tv/?v=88&amp;sp=cd76f0ddad7563229d12">Here’s an interview</a> I did with <a href="http://growthhacker.tv/">Growth Hacker TV</a> last week. We covered many topics:</p> <ul> <li>What exactly were you smoking when you made <a href="http://www.experimentcalculator.com">experimentcalculator.com</a>?</li> <li>How did Etsy get started running experiments?</li> <li>Do you really hate bandit testing?</li> <li>Do you really think performance never matters?</li> <li>Do you really hate real time analytics?</li> </ul> <p><a href="https://www.growthhacker.tv/?v=88&amp;sp=cd76f0ddad7563229d12">Check it out</a>, for the answer to these and many other thrilling questions.</p> Dan McKinley https://mcfunley.com/ Belated Network World Story 2013-06-12T00:00:00+00:00 2013-06-12T00:00:00+00:00 urn:uuid:5c33b5c5-53ce-362c-b93b-8bff6ab9077f <p>It’s old news by now, but a few months back I was on the cover of Network World along with my colleagues <a href="https://twitter.com/nellwyn">Nellwyn</a>, <a href="https://twitter.com/stevemardenfeld">Steve</a>, and <a href="https://www.facebook.com/dottie.matrix">Dottie</a>.</p> <figure> <a href="http://www.networkworld.com/cgi-bin/mailto/x.cgi?pagetosend=/news/2013/022513-etsy-big-data-266841.html&amp;pagename=/news/2013/022513-etsy-big-data-266841.html&amp;pageurl=http://www.networkworld.com/news/2013/022513-etsy-big-data-266841.html&amp;site=printpage&amp;nsdr=n" alt="Network World / Etsy cover story"><img src="/assets/images/network-world.png" /></a> </figure> <p>If you missed it when it came out, you can read the article <a href="http://www.networkworld.com/cgi-bin/mailto/x.cgi?pagetosend=/news/2013/022513-etsy-big-data-266841.html&amp;pagename=/news/2013/022513-etsy-big-data-266841.html&amp;pageurl=http://www.networkworld.com/news/2013/022513-etsy-big-data-266841.html&amp;site=printpage&amp;nsdr=n">here</a>.</p> Dan McKinley https://mcfunley.com/ My Magnum Opus (Reconsidered) 2013-06-10T00:00:00+00:00 2013-06-10T00:00:00+00:00 urn:uuid:5d336b49-36a3-bbb9-5b5a-90ee595b5b50 <p>I have been very fortunate in a number of respects. I have access to the twisted and talented mind of <a href="https://twitter.com/ericbeug">Eric Beug</a>. And not only do I have a broad mandate to behave like a lunatic, but I also have dozens of like-minded coworkers. I got paid to make this. That makes me a professional actor.</p> <iframe class="video" src="http://player.vimeo.com/video/63440604?title=0&amp;byline=0&amp;portrait=0&amp;color=ffffff" frameborder="0" webkitallowfullscreen="" mozallowfullscreen="" allowfullscreen=""></iframe> <p><br /> <a href="/my-magnum-opus">You may remember my slightly-lower-budget debut</a>.</p> Dan McKinley https://mcfunley.com/ How Long Should You Run Experiments? 2013-05-13T00:00:00+00:00 2013-05-13T00:00:00+00:00 urn:uuid:800e7e89-a3e4-c8f5-660e-c09915585a2f <p>The question of how long an A/B test needs to run comes up all the time. And the answer is that it really depends. It depends on how much traffic you have, on how you divide it up, on the base rates of the metrics you’re trying to change, and on how much you manage to change them. It also depends on what you deem to be acceptable rates for Type I and Type II errors.</p> <p>In the face of this complexity, community concerns (“we don’t want too many people to see this until we’re sure about it”) and scheduling concerns (“we’d like to release this week”) can dominate. But this can be setting yourself up for failure, by embarking on experiments that have little chance of detecting positive or negative changes. Sometimes adjustments can be made to avoid this. And sometimes adjustments aren’t possible.</p> <figure> <img src="/assets/images/played-yourself.jpg" alt="&quot;You ran an A/B test at one percent for a week&quot; - the seldom-heard, missing verse of &quot;You Played Yourself&quot;" /> <figcaption>"You ran an A/B test at one percent for a week" - the seldom-heard, missing verse of <em>You Played Yourself</em>.</figcaption> </figure> <p>To help with this, I built a tool that will let you play around with all of the inputs. You can find it here:</p> <h3 class="prominent-url"><a href="http://www.experimentcalculator.com">http://www.experimentcalculator.com</a></h3> <p>Here’s an example of what you might see using this tool:</p> <figure> <a href="http://www.experimentcalculator.com/#lift=2&amp;conversion=6.5&amp;visits=380"> <img src="/assets/images/excalc-example.png" alt="You can probably go ahead and not test this one. Or hey maybe this isn't worth the time." /> </a> </figure> <p>The source code <a href="https://github.com/mcfunley/experiment-model">is available on github here</a>. The sample size estimate in use is the one described by <a href="http://www.bios.unc.edu/~mhudgens/bios/662/2008fall/casagrande.pdf">Casagrande, Pike and Smith</a>.</p> <p>The following people were all great resources to me in building this: <a href="https://twitter.com/stevemardenfeld">Steve Mardenfeld</a>, <a href="https://twitter.com/jimmybot">James Lee</a>, <a href="https://twitter.com/kimbost">Kim Bost</a>, <a href="https://twitter.com/wzchen">William Chen</a>, <a href="https://twitter.com/paradosso">Roberto Medri</a>, and <a href="https://twitter.com/hirefrank">Frank Harris</a>. <a href="https://twitter.com/peterseibel">Peter Seibel</a> wrote an internal tool a while back that got me thinking about this.</p> Dan McKinley https://mcfunley.com/ The Case Against Bandit Testing 2013-01-24T00:00:00+00:00 2013-01-24T00:00:00+00:00 urn:uuid:27b7890a-6c35-776f-a696-2ee43b4acc20 <p>Many have asked me if Etsy does bandit testing. The short answer is that we don’t, and as far as I know nobody is seriously considering changing that anytime soon. This has come up often enough that I should write down my reasoning.</p> <p>First, let me be explicit about terminology. When we do tests at Etsy, they work like this:</p> <ul> <li>We have a fixed number of treatments that might be shown.</li> <li>We assign the weighting of the treatments at the outset of the test, and we don’t change them.</li> <li>We pick a sample size ahead of time that makes us likely to notice differences of consequential magnitude.</li> </ul> <p>In addressing “bandit testing,” I’m referring to any strategy that might involve adaptively re-weighting an ongoing test or keeping experiments running for indefinitely long periods of time.</p> <p><a href="http://noelwelsh.com/">Noel Welsh</a> at Untyped has written a high-level overview of bandit testing, <a href="http://untyped.com/untyping/2011/02/11/stop-ab-testing-and-make-out-like-a-bandit/">here</a>. It’s a reasonable introduction to the concept and the problems it addresses, although I view the benefits to be more elusive than it presents. “It is well known in the academic community that A/B testing is significantly sub-optimal,” it says, and I have no reason to doubt that this is true. But as I hope to explain, the domain in which this definition of “sub-optimal” applies is narrowly constrained.</p> <h3 id="gremlins-the-ancient-enemy">Gremlins: The Ancient Enemy</h3> <p>At Etsy, we practice continuous deployment. We don’t do releases in the classical sense of the word. Instead, we push code live a few dozen lines at a time. When we build a replacement for something, it lives beside its antecedent in production code until it’s finished. And when the replacement is ready, we flip a switch to make it live. Cutting and pasting an entire class, making some small modifications, and then ramping it up is not unheard of at all. Actually, it’s standard practice, the aesthetics of the situation be damned.</p> <p>This methodology is occasionally attacked for its tendency to leave bits of dead code lying around. I think that this criticism is unfair. We <em>do</em> eventually excise dead code, <em>thank you</em>. And all other methods of operating a consumer website are inferior. That said, if you twist my arm and promise to quote me anonymously I will concede that yes, we do have a pretty epic pile of dead code at this point. I’m fine with this, but it’s there.</p> <figure> <img src="/assets/images/cadillac-graveyard.jpg" alt="Continuous delivery: artist's conception" /> <figcaption>Continuous delivery: artist's conception.<span class="photo-credit">Photo credit: <a href="http://www.flickr.com/photos/ben_pollard/">Ben Pollard</a></span></figcaption> </figure> <p>My experience here has revealed what I take to be a fundamental law of nature. Given time, the code in the “off” branch no longer works. Errors in a feature ramped up to small percentages of traffic also have a way of passing unnoticed. For practitioners of continuous deployment, production traffic is the lifeblood of working code. Its denial is quickly mortal.</p> <p><img src="/assets/images/hacker-news-users.png" alt="The Various Species of &quot;Hacker News&quot; Readers" /></p> <p>This relates to the discussion at hand in that bandit testing will ramp the losers of experiments down on its own, and keep them around at low volume indefinitely. The end result is a philosophical conundrum, of sorts. <em>Are the losers of experiments losing because they are broken, or are they broken because they are losing?</em></p> <h3 id="accounting-irregularities">Accounting Irregularities</h3> <p>The beauty of Etsy’s A/B testing infrastructure lies in its simplicity.</p> <ul> <li>Experiments are initiated with minimal modifications of our config file.</li> <li>Visitors are bucketed based on a single, fixed-width value in a persistent cookie.</li> </ul> <p>One of the advantages of this parsimony is that new tests are “free,” at least in the engineering sense of the word. They’re not free if we are measuring the mass of their combined cognitive overhead. But they are free in that there are no capacity implications of running even hundreds of experiments at once. This is an ideal setup for those of us who maintain that the measurement of our releases ought to be the norm.</p> <p>Bandit testing upsets this situation in an insidious way. As I explained above, once we weight our tests we don’t tweak the proportions later. The reason for this is to maintain the consistency of what visitors are seeing.</p> <p>Imagine the flow of traffic immediately before and after the initiation of an experiment on Etsy’s header. For visitors destined for the new treatment, at first the header looks as it has for several years. Then in their next request, it changes without warning. Should we attribute the behavior of that visitor to the old header or to the new one? Reconciling this is difficult, and in our case we dodge it by throwing out visits that have switched buckets. (We are not even this precise. We just throw out data for the entire <em>day</em> if it’s the start of the experiment.)</p> <p>Bandit testing, in adjusting weights much more aggressively, exacerbates this issue. We would be forced to deal with it in one way or another.</p> <figure> <img src="/assets/images/icarus.jpg" alt="Balancing the minimization of regret with the maximization of test adoption in your organization: artist's conception." /> <figcaption>Balancing the minimization of regret with the maximization of test adoption in your organization: artist's conception.</figcaption> </figure> <p>We could try to establish a rule for what to do with visits that see inconsistent behavior. A universally-applicable heuristic for this is not straightforward. And even if feasible, this approach would necessitate making the analysis more complicated. Increasing complexity in analysis increases the likelihood of it being incorrect.</p> <p>We could continue to ignore visits that see inconsistent behavior. Depending on specifics, this could discard a large amount of data. This decreases the power of the experiment, and undermines its ability to reach a correct conclusion.</p> <p>We could attempt to ensure that visits only ever see one treatment, while re-weighting the test for fresh visitors. This sounds like a great idea, but ruins the notion of tests as an operational free lunch. Test variant membership, for Etsy, is independent across web requests. Introducing dependence brings tradeoffs that developers should be familiar with. We could keep test membership in a larger cookie, but if the cookie gets <em>too</em> large it will increase the number of packets necessary for user requests. We could record test membership on the server, but we would have to build, maintain, and scale that infrastructure. And every time we added an experiment, we would have to ask ourselves if it was really worth the overhead.</p> <h3 id="on-the-ridiculous-expectations-of-runaway-victory">On the Ridiculous Expectations of Runaway Victory</h3> <p>When we release any new feature, it is our hope that it will be a gigantic and undeniable success. Sadly (and as I have discussed at length <a href="/testing-to-cull-the-living-flower">before</a>), this is practically never what happens. Successful launches are almost always characterized by an incremental improvement in some metric other than purchases or registrations.</p> <p>Wins in terms of purchases do happen occasionally, and they make life considerably more bearable when they do. But they’re exceedingly rare. What is not rare is the experience of releasing something that makes purchase conversion <em>worse.</em> This turns out to be very easy in an annoyingly asymmetrical way.</p> <p>What we are usually aiming for with our releases is tactical progress on our longer-term strategic goals. Modest gains or even just “not extremely broken” is what we can rationally hope for. Given this background, bandit testing would be wildly inappropriate.</p> <h3 id="regret-approaches-zero">Regret Approaches Zero</h3> <p>Let me point out something that may not be obvious: when we test features on Etsy, we are not typically testing the equivalent of banner advertisements with a limited shelf life. Not that I am suggesting that there is anything wrong with doing so. Nor do I think this is the only scenario in which bandit testing is called for.</p> <p>But new features and the redesign of existing features are different in several important ways. The unlikelihood of purchase or registration conversion wins means that “regret” in the vernacular sense is minimal to begin with, obviating the need for an algorithm that minimizes regret in the technical sense. And the fact that we are building features for the longer term implies that any regret accumulated during the course of an experiment is minor from the perspective of all history. From this vantage point, the elegant simplicity of <em>not banding testing</em> wins out.</p> <h3 id="in-closing">In Closing</h3> <p>Is bandit testing right for you? I believe it is a question worth asking. It may be the case that you should (to borrow Noel’s imagery) “join their merry band.” And if so, <em>master, be one of them; it is an honourable kind of thievery.</em></p> <p>In the absence of practical constraints, I have no argument against this. But reality is never lacking in practical constraints.</p> Dan McKinley https://mcfunley.com/ Testing to Cull the Living Flower 2013-01-16T00:00:00+00:00 2013-01-16T00:00:00+00:00 urn:uuid:e94b38d7-bdce-f9b8-be95-26b03cc0c89c <p>I was once oblivious to A/B testing.</p> <p>My first several years out of college were spent building a financial data website. The product, and the company, were run by salesmen. Subscribers paid tens of thousands of dollars per seat to use our software. That entitled them to on-site training and, in some cases, direct input on product decisions. We did giant releases that often required years to complete, and one by one we were ground to bits by long stretches of hundred-hour weeks.</p> <p>Whatever I might think of this as a worthwhile human endeavor generally, as a business model it was on solid footing. And experimental rigor belonged nowhere near it. For one thing, design was completely beside the point: in most cases the users and those making the purchasing decisions weren’t the same people. Purchases were determined by a comparison of our feature set to that of a competitor’s. The price point implied that training in person would smooth over any usability issues. Eventually, I freaked out and moved to Brooklyn.</p> <p>When I got to Etsy in 2007, experimentation wasn’t something that was done. Although I had some awareness that the consumer web is different animal, the degree to which this is true was lost on me at the time. So when I found the development model to be the same, I wasn’t appropriately surprised. In retrospect, I still wouldn’t rank waterfall methodology (with its inherent lack of iteration and measurement) in the top twenty strangest things happening at Etsy in the early days. So it would be really out of place to fault anyone for it.</p> <figure> <iframe class="video" src="http://www.youtube.com/embed/LwHN3lOMCR0" frameborder="0" allowfullscreen=""></iframe> <figcaption>Here is an official video produced by Etsy a few months after your author started work.</figcaption> </figure> <p>So anyway, in my first few years at Etsy the releases went as follows. We would plan something ambitious. We’d spend a lot of time (generally way too long, but that’s another story) building that thing (or some random other thing; again, another story). Eventually it’d be released. We’d talk about the release in our all-hands meeting, at which point there would be applause. We’d move on to other things. Etsy would do generally well, more than doubling in sales year over year. And then after about two years or so we would <em>turn off that feature</em>. And <em>nothing bad would happen</em>.</p> <p>Some discussion about why this was possible is warranted. The short answer is that this could happen because Etsy’s growth was an externality. This is still true today, in 2013. We have somewhere north of 800,000 sellers, thousands of whom are probably attending craft fairs as we speak and promoting themselves. And also, our site. We’re lucky, but any site experiencing growth is probably in a similar situation: there’s a core feature set that is working for you. Cool. This subsidizes anything else you wish to do, and if you aren’t thinking about things very hard you will attribute the growth to whatever you did most recently. It’s easy to declare yourself to be a genius in this situation and call it a day. The status quo in our working lives is to confuse effort with progress.</p> <figure> <img src="/assets/images/external-growth.png" alt="Growth at Etsy as an externality" /> <figcaption>An illustration of the problem of product development at Etsy, or generally when growth is an externality. Success and failure both superficially resemble success.</figcaption> </figure> <p>But I had stuck around at Etsy long enough to see behind the curtain. Eventually, the support tickets for celebrated features would reach critical mass, and someone would try to figure out if they were even worth the time. For a shockingly large percentage, the answer to this was “no.” And usually, I had something to do with those features.</p> <p>I had cut my teeth at one job that I considered to be meaningless. And although I viewed Etsy’s work to be extremely meaningful, as I still do, I couldn’t suppress the idea that I wasn’t the making the most of my labor. Even if the situation allowed for it, I did not want to be deluded about the importance and the effectiveness of my life’s work.</p> <p>Measurement is the way out of this. When growth is an externality, controlled experiments are the only way to distinguish a good release from a bad one. But to measure is to risk overturning the apple cart: it introduces the possibility of work being acknowledged as a regrettable waste of time. (Some personalities you may encounter will not want to test purely for this reason. But not, in my experience, the kind of personalities that wind up being engineers.)</p> <p>Through my own experimentation, I have uncovered a secret that makes this confrontation palatable. Here it is: <em>nearly everything fails</em>. As I have measured the features I’ve built, it’s been humbling to realize how rare it is for them to succeed on the first attempt. I strongly suspect that this experience is universal, but it is not universally recognized or acknowledged.</p> <p>If someone claims success without measurement from an experiment, odds are pretty good that they are mistaken. Experimentation is the only way to seperate reality from the noise, and to learn. And the only way to make progress is to incorporate the presumption of failure into the process.</p> <p>Don’t spend six months building something if you can divide it into smaller, measurable pieces. The six month version will probably fail. Because <em>everything</em> fails. When it does, you will have six months of changes to untangle if you want to determine which parts work and which parts don’t. Small steps that are validated not to fail and that build on one another are the best way, short of luck, to actually accomplish our highest ambitions.</p> <p>To paraphrase Marx: <em>the demand to give up illusions is the demand to give up the conditions that require illusions</em>. I don’t ask people to test because I want them to see how badly they are failing. I ask them to test so that they can stop failing.</p> Dan McKinley https://mcfunley.com/ Yes! The Deploy Dashboard Graphs “Screwed Users.” 2013-01-11T00:00:00+00:00 2013-01-11T00:00:00+00:00 urn:uuid:71b8191e-6bb0-031d-527e-af7d1d71b1b8 <p>In my <a href="/whom-the-gods-would-destroy-they-first-give-real-time-analytics">post about real-time analysis</a> I shared a screenshot of part of Etsy’s deployment dashboard. This is the dashboard that every engineer watches as he or she pushes code to production. A bunch of alert readers noticed some odd things about it:</p> <p><img src="/assets/images/deploy-dash-wtf.png" alt="Strange doings on Etsy's deployment dashboard" /></p> <p>The screenshot is not doctored, so yes we do graph “Three-Armed Sweaters” and “Screwed Users.” I can explain. In fact, I can give you excruciating detail about it, if you’re interested! Here goes.</p> <p>“Three-Armed Sweaters” refers to our error pages, which feature one of my favorite drawings in the world. It was done by <a href="http://www.etsy.com/shop/boosterseat">Anda Corrie</a>:</p> <figure> <img src="/assets/images/three-armed-sweater.png" /> <figcaption>Although purely theoretical at first, real versions of the sweater have since been commissioned. These are handed out yearly to the Etsy engineer that [brings the site down in the most spectacular fashion](http://bits.blogs.nytimes.com/2012/07/18/one-on-one-chad-dickerson-ceo-of-etsy/).</figcaption> </figure> <p>So the graph on the dashboard is just counting the number of times this page is shown. But in order to reduce the frequency of false alarms, the graph is <em>actually</em> based on requests to an image beacon hidden on the page. This excludes most crawlers and vulnerability scanners. Those constituencies have a habit of generating thousands of errors when nothing is malfunctioning. But lucky for us, they almost never waste bandwidth on images.</p> <p>Now, there are many reasons why Etsy might not be working, and they don’t all result in our machines serving a sweater page. If our CDN provider can’t reach our production network, it will show an error page of its own instead. In these cases, our infrastructure may not even be seeing the requests. But we can still graph these errors by situating their image beacon on a wholly separate set of web machines.</p> <p>The “screwed users” graph is the union of all of these conditions. So-called, presumably, because all of this nuance is relatively meaningless to outsiders. “Screwed users” also attempts to only count unique visitors over a trailing interval. This has the nice property of causing the screwed users and sweaters graphs to diverge in the event that a single person is generating a lot of errors. The internet, after all, is full of weird people who occasionally do weird things with scripts and browsers.</p> <figure> <img class="extra-vspace" src="/assets/images/search-issue.png" /> <figcaption>This is what it looks like when many real users are seeing error pages. Both graphs spike in concert. In this case, the dark green vertical line shows a search deploy resolving the issue.</figcaption> </figure> <p>You now know exactly as much as I do about the graphing of web errors in real time. I assume that this is a tiny fraction of the world’s total knowledge pertaining to the graphing of web errors in real time. So you would be ill-advised to claim expert status on the basis of grasping everything I have explained here.</p> <p>By the way, most of the software Etsy uses to produce these graphs is freely available. Here’s <a href="https://github.com/etsy/statsd">StatsD</a> and <a href="https://github.com/etsy/logster">Logster</a>.</p> Dan McKinley https://mcfunley.com/ Whom the Gods Would Destroy, They First Give Real-time Analytics 2013-01-09T00:00:00+00:00 2013-01-09T00:00:00+00:00 urn:uuid:a7bc92d8-d224-d947-5450-2554d23d476c <blockquote class="quotation"> <p>Homer: There's three ways to do things. The right way, the wrong way, and the <em>Max Power</em> way!</p> <p>Bart: Isn't that the wrong way?</p> <p>Homer: Yeah. But faster!</p> <p class="attribution">- "Homer to the Max"</p> </blockquote> <p>Every few months, I try to talk someone down from building a real-time product analytics system. When I’m lucky, I can get to them early.</p> <p>The turnaround time for most of the web analysis done at Etsy is at least 24 hours. This a ranking source of grousing. Decreasing this interval is periodically raised as a priority, either by engineers itching for a challenge or by others hoping to make decisions more rapidly. There are companies out there selling instant usage numbers, so why can’t we have them?</p> <p>Here’s an excerpt from a manifesto demanding the construction of such a system. This was written several years ago by an otherwise brilliant individual, whom I respect. I have made a few omissions for brevity.</p> <blockquote> <p><strong><em>We believe in…</em></strong></p> <ol> <li><strong>Timeliness</strong>. I want the data to be at most 5 minutes old. So this is a near-real-time system.</li> <li><strong>Comprehensiveness</strong>. No sampling. Complete data sets.</li> <li><strong>Accuracy</strong> (how precise the data is). Everything should be accurate.</li> <li><strong>Accessibility</strong>. Getting to meaningful data in Google Analytics is awful. To start with it’s all 12 - 24 hours old, and this is a huge impediment to insight &amp; action.</li> <li><strong>Performance</strong>. Most reports / dashboards should render in under 5 seconds.</li> <li><strong>Durability</strong>. Keep all stats for all time. I know this can get rather tough, but it’s just text.</li> </ol> </blockquote> <p>The 23-year-old programmer inside of me is salivating at the idea of building this. The burned out 27-year-old programmer inside of me is busy writing an email about how all of these demands, taken together, probably violate the <a href="http://en.wikipedia.org/wiki/CAP_theorem">CAP theorem</a> somehow and also, hey, did you know that accuracy and precision are different?</p> <p>But the 33-year-old programmer (who has long since beaten those demons into a bloody submission) sees the difficulty as irrelevant at best. Real-time analytics are <em>undesirable</em>. While there are many things wrong with our infrastructure, I would argue that the waiting is not one of those things.</p> <p>Engineers might find this assertion more puzzling than most. I am sympathetic to this mindset, and I can understand why engineers are predisposed to see instantaneous A/B statistics as self-evidently positive. We monitor everything about our site in real time. Real-time metrics and graphing are the key to deploying 40 times daily with relative impunity. <a href="http://codeascraft.etsy.com/2011/02/15/measure-anything-measure-everything/">Measure anything, measure everything</a>!</p> <figure> <img src="/assets/images/deploy-dash.png" alt="Part of the deploy dashboard at Etsy. We love up-to-the-minute graphs." /> <figcaption>Part of the deploy dashboard at Etsy. We love up-to-the-minute graphs.</figcaption> </figure> <p>This line of thinking is a trap. It’s important to divorce the concepts of operational metrics and product analytics. Confusing <em>how we do things</em> with <em>how we decide which things to do</em> is a fatal mistake.</p> <p>So what is it that makes product analysis different? Well, there are many ways to screw yourself with real-time analytics. I will endeavor to list a few.</p> <p>The first and most fundamental way is to disregard statistical significance testing entirely. This is a rookie mistake, but it’s one that’s made all of the time. Let’s say you’re testing a text change for a link on your website. Being an impatient person, you decide to do this over the course of an hour. You observe that 20 people in bucket A clicked, but 30 in bucket B clicked. Satisfied, and eager to move on, you choose bucket B. There are probably thousands of people doing this right now, <em>and they’re getting away with it.</em></p> <p>This is a mistake because there’s no measurement of how likely it is that the observation (20 clicks vs. 30 clicks) was due to chance. Suppose that we weren’t measuring text on hyperlinks, but instead we were measuring two quarters to see if there was any difference between the two when flipped. As we flip, we could see a large gap between the number of heads received with either quarter. But since we’re talking about quarters, it’s more natural to suspect that that difference might be due to chance. Significance testing lets us ascertain how likely it is that this is the case.</p> <p>A subtler error is to do significance testing, but to halt the experiment as soon as significance is measured. This is <a href="http://www.evanmiller.org/how-not-to-run-an-ab-test.html">always a bad idea</a>, and the problem is exacerbated by trying to make decisions far too quickly. Funny business with timeframes can coerce most A/B tests into statistical significance.</p> <figure> <img src="/assets/images/real-time-screwed.png" alt="A simulation of flipping two fair coins. In the green regions, the difference in the number of heads is measured to be significant. If we stopped flipping in those regions, we would (incorrectly) decide the coins were different." /> <figcaption>A simulation of flipping two fair coins. In the green regions, the difference in the number of heads is measured to be significant. If we stopped flipping in those regions, we would (incorrectly) decide the coins were different.</figcaption> </figure> <p>Depending on the change that’s being made, making <em>any</em> decision based on a single day of data could be ill-conceived. Even if you think you have plenty of data, it’s not farfetched to imagine that user behavior has its own rhythms. A conspicuous (if basic) example of this is that Etsy sees 30% more orders on Tuesdays than it does on Sundays.</p> <figure> <img class="extra-vspace" src="/assets/images/dod-sales.png" alt="Gratuitous infographic courtesy Brendan Sudol" /> <figcaption>Gratuitous infographic courtesy <a href="http://www.thenitpickster.com/">Brendan Sudol</a>.</figcaption> </figure> <p>While the sale count itself might not skew a random test, user demographics could be different day over day. Or very likely, you could see a major difference in user behavior immediately upon releasing a change, only to watch it evaporate as users learn to use new functionality. Given all of these concerns, the conservative and reasonable stance is to only consider tests that last a few days or more.</p> <p>One could certainly have a real-time analytics system without making any of these mistakes. (To be clear, I find this unlikely. Idle hands stoked by a stream of numbers are the devil’s playthings.) But unless the intention is to make decisions with this data, one might wonder what the purpose of such a system could possibly be. Wasting the effort to erect complexity for which there is no use case is perhaps the worst of all of these possible pitfalls.</p> <p>For all of these reasons I’ve come to view delayed analytics as positive. The turnaround time also imposes a welcome pressure on experimental design. People are more likely to think carefully about how their controls work and how they set up their measurements when there’s no promise of immediate feedback.</p> <p>Real-time web analytics is a seductive concept. It appeals to our desire for instant gratification. But the truth is that there are very few product decisions that can be made in real time, if there are any at all. Analysis is difficult enough already, without attempting to do it at speed.</p> Dan McKinley https://mcfunley.com/