My Story with TokuDB [ Bohu's Blog ]

I spend a lot of time on Zhihu pretending to be an expert, giving advice to beginners about learning database technology. Today, let me tell you how I actually got into the database world — my story with TokuDB.

What is TokuDB anyway?

TokuDB is an “antique.” It was wildly popular back in the HDD era. It had great compression and solid read/write performance, which made it useful for solving capacity and performance problems at scale.

This story really starts with LevelDB.

In 2011, LevelDB went open source. I was immediately drawn to this clean key-value store. I quickly dove into the source code and understood its internal mechanics. That project officially kicked off my database journey.

While building prototypes, I kept up with the latest database developments. I followed almost every database-related tag on Twitter. Every morning, the first thing I’d do was read through those tweets. Database technology felt so fascinating and prestigious. I was constantly thinking about how to get good at it quickly!

Things got interesting in 2012. I started implementing Skip Lists, then figuring out how to optimize them. Once I got bored with Skip Lists, I moved on to B-Trees. Eventually, I could write a B-Tree while waiting for a table at a restaurant. I thought: “Is that all there is to databases?” So I decided to implement a complete key-value store. Project codename: nessDB.

While working on nessDB, I stumbled upon the “Cache-Oblivious B-Trees” series from MIT CSAIL. They made a constant-factor optimization to B-Trees that dramatically reduced overall I/O. Based on this idea, they founded TokuTek and went all-in on developing TokuDB. The team was mostly the professors’ star students: 3 graduate students on development, plus Rik, a veteran from Bell Labs, as architect (this guy is incredible — a true embodiment of the hardcore Bell Labs spirit. He’s retired now. I tried to recruit him to our startup a while back, but family commitments prevented him from working full-time).

TokuDB was closed source at the time. Most of the information came from the TokuTek website and a few papers from the professors. In late 2012, I finally got in touch with Leif, the lead developer on TokuDB. We’d chat every week via GTalk. With Leif’s expert guidance, I implemented a simplified version of TokuDB (Fractal-Tree Index) in 2013. I felt like I’d reached the peak of my career. My whole walk changed. I’d become someone who mastered core database technology!

On April 22, 2013, TokuDB went open source. I started studying the source code and found it mostly matched my expectations. But to integrate TokuDB into MySQL as a storage engine, they’d done a ton of work. For example, tokudb-engine was the layer that interfaced with MySQL’s plugin system, while the real core was ft-index — a key-value store built on Fractal-Tree.

Soon after, my work required me to dive deep into TokuDB development. First, I implemented hot backup for TokuDB so Xtrabackup could also hot-backup TokuDB data. This required some kernel-level changes to TokuDB. I shared my approach with TokuTek, and their VP Tim sent me an email:
Hey, send me your name/address, we owe you something as the "first TokuDB contributor"!

They sent me a TokuMX t-shirt from the US (TokuMX was MongoDB based on TokuDB — back when MongoDB was still struggling with MMAP, TokuMX had quite a few users), along with a small blue TokuDB sticker that simply said “First TokuDB Contributor.”

I was thrilled. I still remember that sunny afternoon at the bustling WFC, receiving a beat-up woven bag carried by FedEx. This was care from across the ocean!

For a while after that, TokuTek kept working on open-source TokuDB, but there was barely a community. It was hard for outsiders to contribute. Plus, they were extremely capable — they could refactor a major module in just a few weeks. I maintained our company’s internal TokuDB fork in isolation, fixing bugs and adding features.

In April 2015, I enthusiastically invited Leif to China. I took him to some famous tourist spots. The day before he left, Leif found a bar in Houhai. He said boldly that since he’d be back in the US and couldn’t use the RMB anyway, we should drink until we dropped.

Leif ordered drinks while describing their flavors and popularity in the US. With each new drink, he’d taste it first, then rotate the glass 180 degrees for me to try. After a while, Leif revealed some news: TokuDB had been sold to Percona, and none of the original team would be joining them. The reason was… well, lots of reasons. Google had made an offer to the team, but Leif swore he’d never touch the database industry again. He said it was too brutal. Clearly, the TokuDB experience had left some scars.

After Percona acquired TokuDB, it quickly became a mess and slowly fell into disrepair. Their CTO invited me to join several times, but I politely declined.

In 2016, Rik and I founded the XeLabs organization to maintain the TokuDB fork. This version was used by several companies and stably supported tens of thousands of TokuDB instances. Eventually, due to limited bandwidth, the project went dormant. Maintaining a MySQL fork isn’t easy — even a small change requires tons of testing. Fortunately, TokuDB’s engineering quality was excellent, so bugs were rare. Even now, it’s pretty stable. But these days, it’s all about cloud databases. Few people deploy MySQL themselves anymore.

That’s my story with TokuDB.

I also have stories with distributed databases, and with data warehouses. So many stories, it’s almost overwhelming.