How Long Should You Run Experiments?
May 13th, 2013

The question of how long an A/B test needs to run comes up all the time. And the answer is that it really depends. It depends on how much traffic you have, on how you divide it up, on the base rates of the metrics you're trying to change, and on how much you manage to change them. It also depends on what you deem to be acceptable rates for Type I and Type II errors.

In the face of this complexity, community concerns ("we don't want too many people to see this until we're sure about it") and scheduling concerns ("we'd like to release this week") can dominate. But this can be setting yourself up for failure, by embarking on experiments that have little chance of detecting positive or negative changes. Sometimes adjustments can be made to avoid this. And sometimes adjustments aren't possible.

"You ran an A/B test at one percent for a week" - the seldom-heard, missing verse of "You Played Yourself"
"You ran an A/B test at one percent for a week" - the seldom-heard, missing verse of You Played Yourself.

To help with this, I built a tool that will let you play around with all of the inputs. You can find it here:

Here's an example of what you might see using this tool:

You can probably go ahead and not test this one. Or hey maybe this isn't worth the time.

The source code is available on github here. The sample size estimate in use is the one described by Casagrande, Pike and Smith.

The following people were all great resources to me in building this: Steve Mardenfeld, James Lee, Kim Bost, William Chen, Roberto Medri, and Frank Harris. Peter Seibel wrote an internal tool a while back that got me thinking about this.

RSS | Atom | Copyright © 2004-2017 Dan McKinley. At no point has the writing here constituted the opinions of my employer(s).