Why MongoDB Never Worked Out at Etsy
December 26th, 2012
In 2010ish, we tried to roll out a feature (Treasury) using MongoDB. It was an interesting experience. I learned quite a bit in the process. I wrote about what I was thinking at the time here. But for the most part it was an abject failure and Ryan Young wound up porting the entire thing to the MySQL shards which had come to maturity in the meantime.
Before you get too excited, the reason for the failure is probably not any of the ones you’re imagining. Mainly it’s this: adding another kind of production database was a huge waste of time.
If you want to make Mongo your only database, it might work out well for you. I can’t personally say it will definitely work out. I know that there’s plenty of talk on the internet about Mongo’s running-with-scissors-as-default and lack of single-server durability and rumors about data loss or what have you, but, none of that ever affected us. Those concerns may or may not have merit, but I personally have no experience with them.
But what I can say is that if you are considering Mongo plus another database like MySQL, then in all likelihood you shouldn’t do it. The benefits of being schemaless are negated by the pain you will feel sorting out:
- Slow query optimization.
- init scripts.
- Sharding strategy.
- Rebalancing strategy.
- Probably like 50 other things Allspaw knows about that we developers don’t have to care about.
For two databases. In practice, you will do this for one of your databases but not the other. The other one will be a ghetto.
Substitute “figure out” with “deploy” if you want, since I’m sure people now aren’t starting from scratch on these points as we were in 2010. We were the first people in the world to attempt several of these bulletpoints, and that certainly didn’t help. But regardless, deployment takes real effort. The mere fact that Ganglia integration for Mongo might already exist now doesn’t mean that you will be handed a tested and working Mongo+Ganglia setup on a silver platter. Everything is significantly more settled in the MySQL world, but it didn’t take us zero time or energy to get our MySQL sharding to where it is today.
Mongo tries to make certain things easier, and sometimes it succeeds, but in my experience these abstractions mostly leak. There is no panacea for your scaling problem. You still have to think about how to store your data so that you can get it out of the database. You still have to think about how to denormalize and how to index.
“Auto-sharding” or not, which is something I don’t have direct experience with, you have to choose your sharding key correctly or you are screwed. Does your shard setup cluster the newest users onto a single shard? Congratulations, you just figured out how to send the majority of your queries to a single box.
Keep in mind that almost none of this is specific to MongoDB. I wouldn’t discourage anyone from trying Mongo out if they’re starting a new site, or if they’re using it for some offline purpose where these kinds of concerns can be glossed over. But if you’re trying to mix Mongo (or almost anything else) into an established site where it won’t be your only database, and doesn’t accomplish something really novel, you’re probably wasting your time.
Does it ever make sense to add another kind of database? Sure, if the work you would save by using it is not outweighed by all of the work I just described. For most purposes, it’s pretty hard to make the case that MySQL and Mongo are really sufficiently different for this to be the case.