I have been very fortunate in a number of respects. I have access to the twisted and talented mind of Eric Beug. And not only do I have a broad mandate to behave like a lunatic, but I also have dozens of like-minded coworkers. I got paid to make this. That makes me a professional actor.
The question of how long an A/B test needs to run comes up all the time. And the answer is that it really depends. It depends on how much traffic you have, on how you divide it up, on the base rates of the metrics you’re trying to change, and on how much you manage to change them. It also depends on what you deem to be acceptable rates for Type I and Type II errors.
In the face of this complexity, community concerns (“we don’t want too many people to see this until we’re sure about it”) and scheduling concerns (“we’d like to release this week”) can dominate. But this can be setting yourself up for failure, by embarking on experiments that have little chance of detecting positive or negative changes. Sometimes adjustments can be made to avoid this. And sometimes adjustments aren’t possible.
"You ran an A/B test at one percent for a week" - the seldom-heard, missing verse of You Played Yourself.
To help with this, I built a tool that will let you play around with all of the inputs. You can find it here: