Math, Programming, and Minority Reports

The Unreasonable Effectiveness of Mathematics in Planning
February 3rd, 2016

I was speaking on a panel the other day that was handed the topic, “the challenges of balancing data-light product bets vs purely data driven incremental improvements.” Camille Fournier was also a panelist and wrote up her thoughts here. Camille’s take (which I think is right) is that even if you don’t have data to work from, you can still approach projects analytically.

For me, the process of behaving analytically incorporates mathematical reasoning but not necessarily data. And I think this kind of spitballing is a useful activity, even if the numbers are made up. The reason for this is that human brains were forged on the African savanna where nothing is very fast, very large, or very small, cosmically speaking, and we are laughably equipped for coping with orders of magnitude.

That is also why you think this looks awesome, but don't let that spoil it for you.

The kind of thinking I’m describing works like this: “ok that’s a thing measured in thousands multiplied by a thing measured in tens of thousands, and then filtered through a rate of a few percent, are we even close?”. When permitted to skip this check on deficient intuition, most humans will sense their way to the wrong answers.

But on the panel and in subsequent discussions, it’s been easy to run with the dichotomy that you’ve either got data to work from, or you’ve got nothing at all. The temptation is to jump into philosophical takes given examples of products or entire markets that could not have been calculated with forsight before they existed. While that’s valid, I think it doesn’t describe most of the situations that you encounter in the wild.

Data Exists, and We Don’t Want to Look

The daily grind at a company consists of building in proximity to a thing that’s satisfying some definition of “working.” Yes, there’s always the innovator’s dilemma to worry about and the prospect of weird new platforms that will enable use cases you don’t understand yet. But the degree to which we’re striking out into the undiscovered country is overstated.

Companies release products that you’d figure shouldn’t have survived opportunity analysis all the time. They just don’t pitch them that way:

This feature notifies pairs of individuals that have arranged an unlikely relationship on the internet beforehand. The notifications are delivered two or three times a year, and only if the parties are in close geographic proximity. And they both have an optional iOS app installed. And in this scenario one of the people is known to be in a cohort that tends to not have that iOS app installed. And then at the end of this funnel we’re hoping that some small percentage of these folks will wind up showing up online and buying a thing. Later.

I have a real launch in mind with that, but I’ve rendered it unrecognizable and absurd by describing it accurately. This isn’t a situation where the volume couldn’t be estimated. If it were, I’d have a harder time lampooning it. This is the neglected scenario: we have all the data we need, but instead of deploying it we shipped something doomed.

When you hear people speak in defense of such things, they act out the same misdirection and head straight for the words we use when we’re discussing the iPod. You can’t, like, quantify vision, man. What they’re really espousing is the idea that product success obeys an uncertainty principle. If we look at things too closely, the magic disappears. And of course the good vibes would sublimate in this case, because the magic is nonsense.

The Hazards of Narrative Arc

Of course, this is not what anyone is actually thinking. Nobody sets out to ignore data on purpose, hoping to improve their chances of failing. You just watched me retcon an ethos onto feral behavior. And in doing so, I am part of the problem.

Everyone’s the hero of the novel they’re writing in their heads. That is the human condition. And having saved a company by inventing a new market is a great narrative arc, which is why we reach for it when we’re actually engaged in something mundane. We just systematically find stories too compelling.

It is rarely the case that vision can’t be at least sketched using arithmetic. Mathematics is the language we use to describe reality, and vision is generally assumed to have effects in reality. That’s what makes numeric methods more powerful than they should reasonably be. We’re constantly engaged in the art of self-deception, and they force you to snap out of it.

Do You Work at Amazon?
January 26th, 2016

Please note that Roberto Medri is a coauthor on this post.

Albert Wenger has been one of the VC’s I most admire for a long time. He was very present in the early days at Etsy, and sat in giving counsel on some, uh, significantly astray engineering team meetings. Albert is a smart, data-driven guy whose values roughly align with my own.

That said, I have an axe to grind with his latest post, Don’t Mind the Share Price. In it, Albert deploys the story of Amazon as a warning against focusing too much on how the market values a company. This is the story of Amazon:

Amazon's historical stock price

Amazon was riding high in the late 90s, then felt the DotCom burst roughly along with the rest of the tech sector. Albert points out that history has shamed anyone that might’ve judged Amazon on its share price fifteen years ago, since it’s returned north of 2000% in the years since.

So whether you are running a tech company, working for one, or investing in one I highly recommend not reading too much into changes in share price. Focus instead on whether your company is making real progress.

Albert is careful to stress that you should focus on fundamentals over fluctuations in the price, which is generally good advice. But I think the subtext is clear: don’t be discouraged by even large declines in price, because you might be working at the next Amazon.

This is a premise that we can investigate quantitatively.

The Odds of Being an Amazon

Suppose that we’re working at a public company that’s experienced a decline in its share price of at least 50%, relative to a recent high price. We’d like to approximate the odds that this company is going to recover [1].

It turns out that since 2002, there have been 2,132 companies traded on the NASDAQ that fit this description. One of these is indeed Amazon. But how many others are like it?

We can take this set of companies and categorize them. Let’s identify companies that wound up being completely wiped out—losing 90% of their remaining value or more—and then all other companies that declined in value. For companies that increased in value, we’ll differentiate those that beat the market (defined as the S&P 500 Index) from those that didn’t. The idea being that you would have been better off just buying an index fund with your cash surplus from working for Google in a parallel universe. And finally, we’ll identify the miraculous: those companies that return 1000% or more, of which Amazon is one example.

If we do that, it looks like this:

Category	Count	Percent	Cumulative Percent
Wiped Out	239	11.21%	11.21%
Declined	794	37.24%	48.45%
Beaten by Market	344	16.14%	64.59%
Beat Market	661	31.00%	95.59%
Miracle	94	4.41%	100.00%

Here we can see that about 65% percent of public companies that find themselves in this situation don’t recover. But 35% of companies do. These are tough odds, but definitely not impossible odds, right?

Recovery is not Good Enough

Albert asks us to consider investors, officers, and employees of the company as having roughly identical situations. This is a mistake. Things are significantly worse in the case of employees [2] at a public company that have been issued options. In these cases, the company may very well recover, but we have to contemplate several other horrifying possibilities.

Employees may have already exercised options at a strike price higher than the current market price. If so, they’re screwed if the company never recovers above that price. Even if the company beats the market from here out.
The strike price may be below the current market price, meaning that the options are worth something. But employees may owe taxes (or AMT), forcing them to sell before the recovery.
Options may be underwater and worthless. At least in this scenario, there is clarity.

From these situations we can see that as an employee [3], it makes sense to consider the odds that the company will not just recover, but will ultimately get back to where it was. That looks like this:

Category	Count	Percent	Cumulative Percent
Wiped Out	239	11.21%	11.21%
Declined	794	37.24%	48.45%
Beaten by Market	344	16.14%	64.59%
Recovered Below High Price	210	9.84%	75.04%
Beat Market	441	20.66%	95.69%
Miracle	92	4.31%	100.00%

This makes it worse: 75% of companies won’t recover using this definition. And only about 4% will make miraculous comebacks of Amazon’s order of magnitude.

Are You Making Progress?

Remember that Albert provides us with an important caveat: we should “[f]ocus … on whether the company is making real progress.” But this can be tricky to surmise as an employee, for several reasons:

You are in unavoidably close proximity to a coordinated propaganda campaign. It’s called the company’s internal communications and morale efforts. You may find yourself thinking unreasonably positively about these things.
You are putting in hours at this company, and human nature compels us to confuse effort with progress.
Remember that we’re talking about a public company. So unless you’re an officer, you’ll have a difficult time of getting detailed information about how much progress the company is really making. And of course timing trades on such information would be illegal.

We should agree that the outlook here is going to be hazy at best, and self-deception is a hazard.

The Base Rate Fallacy’s Perverse Tyranny Over the American [4] Mind

If there is any line of reasoning that really drives me crazy, it’s the following:

A series of cosmically unlikely events has unfolded.
This is submitted as evidence that it can happen to anyone.

Examples of this are everywhere. Someone is going to win powerball, therefore it makes sense to buy tickets. Barack Obama was elected president, therefore systematic racism is toothless. Mark Zuckerberg struck it rich, so you’ve just gotta have faith.

By the way this guy also thinks that picking your own numbers gives you a higher chance of winning.

In looking to Amazon (or Google, Facebook, Netflix, or dear god Apple) as consolation in the event that a company has experienced a decline in share price, we make the following mistake. The probability that successful companies have stumbled in their past is not the probability that a company will succeed, having stumbled.

This isn’t a call for nihilism if you find yourself in such a situation. Far from it—it’s a call to realize that the odds are now against you, and to behave proactively.

The code and data for this article is available here, on Github. It’s a bit sloppy and hastily written, sorry. We started from a dataset of companies traded on the NASDAQ that experienced a decline of 50% or more off of a previous high. our dataset started around the year 2000.

You may notice that I've switched questions, from "are you working at Amazon" to "is the company Amazon." Calculating the odds that you are working at Amazon would of course require a richer dataset that includes company headcounts, and I am a lazy man.
Investors can more easily scale their commitment to the company by having a diverse portfolio. Employees and officers, however, give 100% of their labor to the company. And in the event that things go well, a large percentage of their net worth derives from the value of the company. Officers have a high floor on their returns, via guaranteed bonuses, parachute provisions, accelerated vesting schedules in the event of termination, and so on. Employees on the other hand are screwed.
This refinement doesn't apply to all employees. Early employees probably have strike prices that are very low, and can make money despite a large drop in the share price. But at a newly-minted public company, most employees are probably new, and most employees are therefore affected.
I know that Albert Wenger is German.

Are My Push Notifications Driving Users Away?
November 24th, 2015

In response to Kellan’s musing about push notifications on twitter, Adam McCue asked an interesting question:

@kellan @mcfunley what's the best way to do this?
— Adam McCue (@mccue) November 25, 2015

I quickly realized that fitting an answer into tweets was hopeless, so here’s a stab at it in longform.

How would we do this?

Let’s come up with a really simple way to figure this out for the case of a single irritating notification. This is limited, but the procedure described ought to be possible for anyone with a web-enabled mobile app. We need:

A way to divide the user population into two groups: a treatment group that will see the ad notification, and a control group that won’t.
A way to decide if users have disappeared or not.

To make the stats as simple as possible, we need (1) to be random and we need (2) to be a binomial measure (i.e. “yes or no,” “true or false,” “heads or tails,” etc).

To do valid (simple) stats, we also want our trials to be independent of each other. If we send the same users the notifications over and over, we can’t consider each of those to be independent trials. It’s easy to intuit why that might be: I’m more likely to uninstall your app after the fifth time you’ve bugged me [1]. So we need to consider disjoint sets of users on every day of the experiment.

Does this hurt us or help us? Let's try science.

How to randomly select users to receive the treatment under these conditions is up to you, but one simple way that should be broadly applicable is just hashing the user ID. Say we need 100 groups of users: both a treatment and control group for 50 days. We can hash the space of all user ID’s down to 100 buckets [2].

So how do we decide if users have disappeared? Well, most mobile apps make http requests to a server somewhere. Let’s say that we’ll consider a user to be “bounced” if they don’t make a request to us again within some interval.

Some people will probably look at the notification we sent (resulting in a request or two), but be annoyed and subsequently uninstall. We wouldn’t want to count such a user as happy. So let’s say we’ll look for usage between one day after the notification and six days after the notification. Users that send us a request during that interval will be considered “retained.”

Some examples of our binomial model. We'll call a user retained if they request data from us on any of days two through seven counting from the time of the notification. User 4 in this example is not retained because (s)he only requests data on the day the notification was sent.

To run the experiment properly you need to know how long to run it. That depends a lot on your personal details: how many people use your app, how often they use it, how valuable the ad notification is, and how severe uninstalls are for you. For the sake of argument, let’s say:

We can find disjoint sets of 10,000 users making requests to us on any given day, daily, for a long time.
(As discussed) we’ll put 50% of them in the treatment group.
60% of people active on a given day currently will be active between one and six days after that.
We want to be 80% sure that if we move that figure by plus or minus 1%, we’ll know about it.
We want to be 95% sure that if we measure a deviation in plus or minus 1% that it’s for real.

If you plug all of that into experiment calculator [3] it will tell you that you need 21 days of data to satisfy those conditions. But since we use a trailing time interval in our measurement, we need to wait 28 days.

An example result

Ok, so let’s say we’ve run that experiment and we have some results. And suppose that they look like this:

Group	Users	Retained users	Bounced users
Treatment	210,000	110,144	99,856
Control	210,000	126,033	83,967

Using these figures we can see that we’ve apparently decreased retention by 12.6%, and a test of proportions confirms that this difference is statistically significant. Oops!

I’ve run the experiment, now what?

You most likely have created the ad notification because you had some positive goal in mind. Maybe the intent was to get people to buy something. If that’s the case, then you should do an additional computation to see if what you gained in positive engagement outweighs what you’ve lost in users.

I don’t think I have enough data.

You might not have 420,000 users to play with, but that doesn’t mean that the experiment is necessarily pointless. In our example we were trying to detect changes of plus or minus one percent. You can detect more dramatic changes in behavior with smaller sets of users. Good luck!

I’m sending reactivation notifications to inactive users. Can I still measure uninstalls?

In our thought experiment, we took it as a given that users were likely to use your app. Then we considered the effect of push notifications on that behavior. But one reason you might be contemplating sending the notifications is that they’re not using it, and you are trying to reactivate them.

If that’s the case, you might want to just measure reactivations instead. After all, the difference between a user who has your app installed but never opens it and a user that has uninstalled your app is mostly philosophical. But you may also be able to design an experiment to detect uninstalls. And that might be sensible if very, very infrequent use of your app can still be valuable.

A procedure that might work for you here is to send two notifications. You could then use delivery failures of secondary notifications as a proxy metric for uninstalls.

I want to learn more about this stuff.

As it happens, I recorded a video with O’Reilly that covers things like this in more detail. You might also like Evan Miller’s blog and Ron Kohavi’s publications.

"How many notifications are too many?" is a separate question, not considered here.
If you do many experiments, you want to avoid using the _same_ sets of people as control and treatment. So include something based on the name of the experiment in the hash. So if user 12345 is in the treatment for 50/50 experiment X, she should be only 50% likely (not 100% likely) to be in the treatment for some other 50/50 experiment Y.
The labeling on the tool is for experiments on a website. The math is the same though.

The Unreasonable Effectiveness of Mathematics in Planning February 3rd, 2016