Hacker News: leif

New comment by leif in "Siberite: A Simple LevelDB-Backed Message Queue in Go"

leif — Fri, 23 Oct 2015 06:36:29 +0000

> In that case all you've got is an in-memory queue that evaporates on a system crash.

https://www.cs.berkeley.edu/~brewer/cs262/Aries.pdf

> Remember that merge operations are O(N). Then remember that there are N of them to do. O(N^2) is a horrible algorithmic complexity.

No. Mountains of actual math refute this. LSM-tree merges are O(N log N). This is an Actual Fact.

New comment by leif in "MariaDB 10.1 can do 1M queries per second"

leif — Tue, 20 Oct 2015 05:17:51 +0000

The TokuDB implementation of Fractal Trees uses a single-process, multi-threaded model, which is incompatible with PostgreSQL's multi-process model. In theory, one could patch TokuDB to be suitable, but this is a ton of work.

New comment by leif in "MariaDB 10.1 can do 1M queries per second"

leif — Tue, 20 Oct 2015 05:15:54 +0000

That's not really online, it still rebuilds the entire table, it just does it in the background and shows the results once it's done. This is pretty similar to how TokuDB hot indexing works, but TokuDB's hot column add/rename/delete is truly online, the results are visible immediately and the actual disk work is done in a lazy fashion.

New comment by leif in "MariaDB 10.1 can do 1M queries per second"

leif — Tue, 20 Oct 2015 05:13:50 +0000

Cracking is kind of similar in that it delays the "sorting work" done in the indexing structure. However, cracking is fairly heuristic and therefore hard to analyze without an intimate understanding of the workload. Fractal Trees do pretty much the same thing under all workloads (modulo optimizations for things like sequential and Zipfian workloads), so they're easier to analyze and prove bounds for.

An interesting new development is http://getkudu.io/ which applies some ideas common with Fractal Trees and other write-optimized data structures (like LSM-trees) to column stores.

At Tokutek, we had some designs for how to implement column store-like data models on Fractal Trees (we called them "matrix stores" because they had some of the benefits of both row stores and column stores), but we didn't get around to implementing them.

New comment by leif in "MariaDB 10.1 can do 1M queries per second"

leif — Tue, 20 Oct 2015 05:09:48 +0000

This is the critical insight. TokuDB uses a write-ahead log which is synced according to the configuration, and can be made as immediate as full fsync on commit. This provides the strongest durability available on a single machine.

Where TokuDB gets its speed boost is by delaying the random reads associated with updating the indexing structure (the Fractal Tree). The buffers are written to disk on checkpoint, but because they're buffers, the potentially random writes are localized to a smaller number of nodes high in the tree, which minimizes the number of disk seeks required. Since sequential I/O is cheaper than random, the sequential writes to the write-ahead log are very fast, so even in very strict durability configurations, TokuDB can easily outperform databases which use random writes to update the indexing structures, such as the B-trees used by InnoDB and most other RDBMSes.

More details here: https://www.percona.com/blog/2011/09/22/write-optimization-m...

New comment by leif in "MariaDB 10.1 can do 1M queries per second"

leif — Tue, 20 Oct 2015 05:05:38 +0000

TokuDB is built for a single-process model with threads for connections. This is incompatible with PostgreSQL's multi-process model, and patching TokuDB to support such a model would be a large effort. Not impossible, but a lot of work.

New comment by leif in "Drink Cheap Wine"

leif — Sat, 30 May 2015 05:04:43 +0000

A bottle is usually about 4 glasses (which is consistent with my experience, not sure how you made your calculation), so when shared, that's one to drink with the meal and one to share over conversation after. I'm far from a teetotaler, but it doesn't seem terribly excessive when phrased that way.

It may depend on your upbringing. In the modern US, particularly the more puritan regions, any alcohol at all is usually considered a luxury reserved for celebration, but in some parts of the world, wine or beer is just the drink you drink with a meal, and there's less of a stigma about it. If it's not your thing, it's not your thing, but applying the label "excess" is a pretty specifically cultural decision.

New comment by leif in "A female computer science major at Stanford: “Floored” by the sexism"

leif — Tue, 17 Feb 2015 21:50:34 +0000

you seem fun at parties

New comment by leif in "A female computer science major at Stanford: “Floored” by the sexism"

leif — Tue, 17 Feb 2015 21:37:43 +0000

>You even seem certain that a stranger on the Internet is part of the problem

The stranger on the internet said they felt like part of the problem, so I don't really know what to tell you.

If you read between the lines a bit, I never said I had any of the answers. All I said was listen to the people that are affected, try to learn about their problems, and be empathetic, and maybe you'll find the answers.

I'm not trying to be revolutionary here, I just think one of the nice features of these types of issues is that the people being affected are actual people that you can listen to and have conversations with, and I think that's a good place to start.

New comment by leif in "A female computer science major at Stanford: “Floored” by the sexism"

leif — Tue, 17 Feb 2015 21:14:48 +0000

You could try talking to them about this. Try "hey, I understand being in this industry can be difficult as a woman, I am not very familiar with the problem but I don't want to be part of it, and I'm concerned about how much eye contact I'm giving you during meetings. I don't want to make you uncomfortable with my eye contact or this question, but is it a problem for you?"

Most likely they'll tell you they have a lot of worse things to deal with than how much eye contact you're giving them, but it'll be a learning experience and at least you'll open up a dialogue.

New comment by leif in "A female computer science major at Stanford: “Floored” by the sexism"

leif — Tue, 17 Feb 2015 21:09:39 +0000

You've successfully completed steps 1 and 2:

1. Acknowledge there is a real problem.

2. Acknowledge you are part of the problem.

Now, on to steps 3-n!

3. Don't be discouraged, accept that there are things you can do to help! This is good, you're about to become a more valuable member of society.

4. Accept that you need to put in some effort. Not much, don't worry. Most of it is shutting up.

5. Read what women write about this problem. They're all experts because they've been studying it literally their entire lives.

6. Learn (from step 5 and some self-reflection) how to recognize in real time when you're being a jerk. Then stop.

Now it gets a little harder. But remember, not nearly as hard as being a woman in tech, so buck up, kid!

7. Learn (from step 5) how to recognize in real time when other people are being jerks, and good techniques for how to advocate on behalf of women. This takes practice and courage, keep at it.

8. You'll be tempted to brag about how helpful you are to women (hi @wadhwa). Resist this temptation. Whenever you feel yourself about to tell someone how great and helpful you are, instead, show them something an actual woman has said about their problems, and try to point them in the right direction.

Note that you can replace "woman/women" in the above with any minority/disenfranchised/disadvantaged group you want to help, and the same formula pretty much works verbatim.

New comment by leif in "The 8-Byte Two-Step"

leif — Sun, 21 Sep 2014 03:00:22 +0000

I agree, the mask/not/and is standard within systems contexts. It need not be cognitive overhead, not to be elitist but if you don't have a copy of hacker's delight or the ability to understand constructs like this, you have no business writing systems-level code.

New comment by leif in "Ask HN: Who is hiring? (August 2014)"

leif — Fri, 01 Aug 2014 17:34:48 +0000

Tokutek, Boston, MA and New York, NY http://www.tokutek.com/careers

We build and sell high-performance databases TokuDB and TokuMX.

We are hiring:

- Inside Sales Exec

- QA Engineer

- Tech Support Engineer

- Product Manager

- Technical Writers (contract)

We are currently growing our customer base almost too fast with TokuMX, so we are expanding on sales, support, and documentation. For the engineering positions, MongoDB experience is preferred but any database experience is good.

For engineering, email myself (leif@tokutek.com) or Tim Callaghan (tim@tokutek.com) with a resume and say hi. For sales and marketing, email sales@ or marketing@. Referral bonuses available!

New comment by leif in "Introducing Ark: A Consensus Algorithm For TokuMX and MongoDB"

leif — Sat, 19 Jul 2014 01:51:10 +0000

It won't be patented. We hope others find it helpful and consider implementing it in similar systems.

New comment by leif in "Introducing Ark: A Consensus Algorithm For TokuMX and MongoDB"

leif — Sat, 19 Jul 2014 01:50:17 +0000

We are working on several ways to prove to ourselves and the community that what we have is correct. The most important is obviously testing. We have tests that demonstrate the problems we found, and Ark passes those tests. Publishing this explanation and soliciting feedback is another.

We have run Jepsen and have not been able to get it to show data loss in TokuMX. The problems it found in MongoDB were already fixed in other ways in earlier versions of TokuMX, but we're trying to get Jepsen to demonstrate the other problems we've found.

Model checking may be another way we can prove correctness, but since Ark is so similar to Raft, I think the Raft model in TLA+[1] is probably sufficient. Anyway, we'd also need a proof that the model is equivalent to the implementation, and I don't know of a way to do that, so I think functional tests are more important.

In any case, we'll look in to using a model checker, and any help would be greatly appreciated. If you're interested, feel free to email me.

[1]: https://ramcloud.stanford.edu/~ongaro/raft.tla

New comment by leif in "Introducing Ark: A Consensus Algorithm For TokuMX and MongoDB"

leif — Fri, 18 Jul 2014 19:56:10 +0000

No, none of us know how to use those. :(

If you're interested in building one and you have experience with them, get in touch and we can work through it together. I think the biggest challenge would be modeling the semantics of write concern, but I'm not that familiar with proof assistants, maybe that isn't too hard.

New comment by leif in "Introducing Ark: A Consensus Algorithm For TokuMX and MongoDB"

leif — Fri, 18 Jul 2014 19:53:16 +0000

Multi-document and multi-collection transactions are already a part of TokuMX[1]. Since commit of the oplog insert is atomic together with the actual operation's changes to documents, atomicity is also guaranteed in replication. Atomicity and MVCC in a sharded system is something we're working on, but it's unrelated to Raft/Ark.

Ark is just about making replication as a whole trustworthy. The jepsen post on MongoDB[2] shows MongoDB losing data even with majority write concern, which if used properly, is supposed to make MongoDB a CP system. But because of the design flaws in the election algorithm, you can't rely on it perfectly. The changes we made in Ark fix the election algorithm to make majority write concern actually able to guarantee data safety, so you can treat it as a fully CP system.

[1]: http://docs.tokutek.com/tokumx/tokumx-transactions.html

[2]: http://aphyr.com/posts/284-call-me-maybe-mongodb

New comment by leif in "Call me maybe: MongoDB (2013)"

leif — Sat, 05 Jul 2014 05:27:42 +0000

My other reply didn't address some of your specific questions about 2.8:

> We're seeing document-level locking

We've had it from the beginning. Their implementation so far doesn't handle index updates or replication. I assume they'll handle these issues before a GA release, but the interesting question is which workloads will still demonstrate good concurrency after they solve these problems.

> possible B-Tree improvements (I presume Toku's R-Tree/Fractals [can't remember which they use] will still be superior)

I haven't seen any actual improvements they've got planned. Besides, B-trees will never compete with our fractal trees on insertions or compression, or for that matter, with LSM trees either.

> possible transactions (although what's on JIRA hasn't convinced me so far)

They aren't going to do transactions in 2.8. They may provide something like some transactional semantics we provide in TokuMX after 2.8 (I've heard mentions of single-shard atomicity), but by this point we have even bigger and better things than just single-shard transactions planned.

> and a few other improvements and Performance Boosting Things.

Not sure what you mean. The coolest things I've heard are not storage related, e.g. filtered replication. They're definitely exciting, but unrelated enough that we should be able to just merge them wholesale.

> So to what scale with Toku remain relevant if they don't keep up to date with Mongo

We'll keep up, don't worry. Here's hoping we maintain---and gain---relevance. ;-)

New comment by leif in "Call me maybe: MongoDB (2013)"

leif — Sat, 05 Jul 2014 05:10:35 +0000

I can't add much to what Chris and Zardosht already said, but let me reiterate a few things regarding our fork:

1. You're a bit out of date. We merged changes to catch up to 2.4 in about a month (once we decided 2.4.x was stable). The current plan is the same for 2.6. We're currently working on it. If you need the latest and greatest Mongo features, stick with basic MongoDB. If you're willing to suffer a bit of lag (on the order of months) to receive our benefits, we're here if we can help.

2. Geo is a known issue. At the moment it doesn't seem like it's that widely used, so it's not a very high priority. However, we know some people want it and we will eventually get to it. Hopefully with a better implementation.

3. MongoDB's full-text search capabilities are, as far as I can tell, far behind what's provided by the state of the art text search systems, and serious users currently use MongoDB/TokuMX in concert with more focused solutions like Solr/Lucene/Elastic Search. I haven't spoken to anyone invested in text search that actually used MongoDB's text indexes, even if they use MongoDB elsewhere in their application. If you do, I'd love to buy you lunch and talk about it, please email me (my username here at tokutek.com).

4. Here's the big takeaway I got from last week's conference: MongoDB has been convinced that many of the problems we solve with TokuMX (performance, compression, concurrency, transactions) are important to their biggest users. Their most hyped announcements and plans for 2.8---document-level locking and the storage engine API---are aimed straight at us. We see this as a resounding validation of our technology, and a wonderful challenge to continue improving TokuMX. While it's tantalizing to implement a fractal tree storage engine according to their API (and there's no doubt that we will implement one), our innovations in TokuMX proper run deeper, into extra collection types, replication and sharding internals, and we have further plans for TokuMX that are beyond the scope of a storage engine API. The availability of the API is an opportunity for us to create a product with some of our improvements (mainly insertion performance and compression) with better compatibility (esp. w.r.t. replication and geo/full-text) and a simpler upgrade path. However, TokuMX as it exists as its own product (with better replication, sharding, and advanced features like clustering indexes and partitioned collections) is not going away, and will continue to see aggressive innovation as it will always lead a product built from MongoDB's storage engine API in terms of advanced features like clustering indexes and shard-aware transactions.

New comment by leif in "Call me maybe: MongoDB (2013)"

leif — Sat, 05 Jul 2014 00:02:33 +0000

Yes, there are still problems with the election protocol, e.g. [1]. The right kind of network partitions can cause multiple primaries to stay up indefinitely, accepting writes on both sides of the partition, which will eventually be rolled back. There is another problem with the election protocol that allows writes acknowledged by a majority of machines to be rolled back after an election.

Both of these problems can be fixed by using something like Raft[2] or Paxos for elections, rather than the ad hoc mechanisms used today.

In TokuMX[3], we're currently working on replacing the election algorithm with something similar to Raft, that will eliminate these sources of data loss. We've heard that MongoDB is also working on fixing replication, but we don't know what their exact plans are (they have a bigger challenge since they need to stay compatible with their existing replication algorithms, which use timestamps as transaction identifiers) or whether these fixes will end up in 2.8 or in a later version.

[1]: https://jira.mongodb.org/browse/SERVER-9848

[2]: https://ramcloud.stanford.edu/wiki/download/attachments/1137...

[3]: http://docs.tokutek.com/tokumx