Hacker News: fulmicoton

New comment by fulmicoton in "Show HN: Improving search ranking with chess Elo scores"

fulmicoton — Thu, 17 Jul 2025 02:04:51 +0000

One trouble I could see with your approach is that you treat the information "Doc at pos i" beats "Doc at pos j" independently from i and j. Intuitively, it is not as critical when a bad doc is at rank 9 instead of rank 10; compared to bad doc landing at rank 1 instead of rank 10.

LambdaMART's approach seems better in that respect.

https://medium.com/@nikhilbd/pointwise-vs-pairwise-vs-listwi...

New comment by fulmicoton in "AWS S3 SDK breaks its compatible services"

fulmicoton — Thu, 20 Feb 2025 22:46:44 +0000

This bug hit us, and yes, I hadn't thought of just switching to opendal. That's indeed a great reminder.

New comment by fulmicoton in "Datadog acquires Quickwit"

fulmicoton — Sun, 12 Jan 2025 00:40:04 +0000

No. Quickwit was founded well before Warpstream and it did not inspire us.

The Husky blog post was released after we released a few versions of quickwit if I recall correctly. It was not an inspiration either.

As far as I know, the similarities are fortuitous.

New comment by fulmicoton in "Datadog acquires Quickwit"

fulmicoton — Sun, 12 Jan 2025 00:34:04 +0000

Our seed round was 100% made of SAFE, so VCs did not have the power to force us to do anything.

The sentence in the blog post is a tad misleading. I suspect François is not really talking about VCs that had already invested in quickwit, but about the usual flow of other VCs who contacted us, to know about the company and be part of our eventual series A.

It just generally felt like we were "at a crossing".

No one twisted our arm.

New comment by fulmicoton in "Show HN: SeekStorm – open-source sub-millisecond search in Rust"

fulmicoton — Tue, 03 Dec 2024 08:13:49 +0000

Developer of tantivy chiming in! (I hope that's ok) Database performance is a space where there are a lot of lies and bullshit, so you are 100% right to be suspicious.

I don't know SeekStorm's team and I did not dig much into the details, but my impression so far is that their benchmark's results are fair. At least I see no reason not to trust them.

New comment by fulmicoton in "Nixiesearch: Running Lucene over S3, and why we're building a new search engine"

fulmicoton — Fri, 11 Oct 2024 00:07:46 +0000

Yes. We should shut down this demo. We reduced the hardware to cut down our costs. Right now it runs a ludicrously small amount of hardware.

New comment by fulmicoton in "Turbopuffer: Fast search on object storage"

fulmicoton — Fri, 12 Jul 2024 08:33:56 +0000

Quickwit is targetting logs:

    - it does not do vector search. It can rank docs using BM25, but usually people just want to sort by timestamp.
    - its does not use an SSD cache. Quickwit reads directly into the object storage.
    - it is append-only (you can't modify documents)
    - it scales really well and typically shines on the 1TB .. 100PB range
    - it has a Elastic search compatible API.

New comment by fulmicoton in "Binance built a 100PB log service with Quickwit"

fulmicoton — Fri, 12 Jul 2024 06:55:40 +0000

This is NOT about transaction log. This is application logs. The thing you generate via Log4j for instance.

Also 100PB is measured as the input format (JSON). Internally Quickwit will have more efficient representations.

New comment by fulmicoton in "Binance built a 100PB log service with Quickwit"

fulmicoton — Fri, 12 Jul 2024 06:52:58 +0000

Security and customer support are the two main reasons why people want a super long retention.

Medium retention (1 or 2 months) is still very appreciable if some issue in your bugtracker stay stale for this amount of time.

New comment by fulmicoton in "Binance built a 100PB log service with Quickwit"

fulmicoton — Fri, 12 Jul 2024 06:50:39 +0000

It is pretty much the same as Lucene. The compression ratio is very specific logs and depends on the logs themselves. (Often it is not that good)

New comment by fulmicoton in "Binance built a 100PB log service with Quickwit"

fulmicoton — Fri, 12 Jul 2024 06:49:22 +0000

Quickwit (like Elasticsearch/Opensearch) stores you data compressed with ZSTD in a row store, builds a full text search index, and stores some of your fields in a columnar. The "compressed size" includes all of this.

The high compression rate is VERY specific to logs.

- What happens when you alter an index configuration? Or add or remove an index?

Changing an index mapping was not available in 0.8. It is available in main and will be added in 0.9. The change only impacts new data.

- Or add or remove an index?

This is handled since the beginning.

- What about cold storage?

What makes Quickwit special is that we are reading everything is on S3. We adapted our inverted index to make it possible to read straight from S3. You might think this is crazy slow, but we typically search into TBs of data in less than a second. We have some in RAM cache too, but they are entirely optional.

> 2. Sampled data, generally for debugging. I would generally try to keep this at 10TB or less;

Sometimes, sampling is not possible. For instance, some of Quickwit users (including Binance) use their logs for user support too. A user might come asking details about something fishy that happened 2 months ago.

New comment by fulmicoton in "Binance built a 100PB log service with Quickwit"

fulmicoton — Fri, 12 Jul 2024 00:01:08 +0000

Again, this is application logs. The stuff you would log in your program with log4j for instance.

With a microservices architecture in particular that can pile up rapidly.

New comment by fulmicoton in "Binance built a 100PB log service with Quickwit"

fulmicoton — Thu, 11 Jul 2024 14:33:12 +0000

Thank you for the kind word @ZeroCool2u ! :)

New comment by fulmicoton in "Binance built a 100PB log service with Quickwit"

fulmicoton — Thu, 11 Jul 2024 14:29:18 +0000

Building an inverted index is actually very cpu intensive. I think we are the fastest on that (if someone knows something faster than tantivy at indexing I am interested).

I'd be really surprised if you can make a 10x improvement here.

New comment by fulmicoton in "Binance built a 100PB log service with Quickwit"

fulmicoton — Thu, 11 Jul 2024 14:25:46 +0000

If you can limit your research to GBs of logs, I kind of agree with you. It's ok if a log search request takes 100ms instead of 2s, and the "grep" approach is more flexible.

Usually our users search into > 1TB.

Let's imagine you have to search into 10TB (even after time/tag pruning). Distributing over 10k cores over 2 second is not practical and does not always economically make sense.

New comment by fulmicoton in "Binance built a 100PB log service with Quickwit"

fulmicoton — Thu, 11 Jul 2024 14:15:01 +0000

The data is just Binance's application logs for observability. Typically what a smaller business would simply send to Datadog.

This log search infra is handled by two engineers who do that for the entire company.

They have some standardized log format that all teams are required to observe, but they have little control on how much data is logged by each service.

(I'm quickwit CTO by the way)

New comment by fulmicoton in "Binance built a 100PB log service with Quickwit"

fulmicoton — Thu, 11 Jul 2024 13:51:35 +0000

Quickwit is designed to do full-text search efficiently with an index stored on an object storage.

There are no equivalent technology, apart maybe:

- Chaossearch but it is hard to tell because they are not opensource and do not share their internals. (if someone from chaossearch wants to comment?)

- Elasticsearch makes it possible to search into an index archived on S3. This is still a super useful feature as a way to search punctually into your archived data, but it would be too slow and too expensive (it generates a lot of GET requests) to use as your everyday "main" log search index.

New comment by fulmicoton in "Binance built a 100PB log service with Quickwit"

fulmicoton — Thu, 11 Jul 2024 13:23:24 +0000

This is their application logs. They need to search into it in a comfortable manner. They went for a search engine with Elasticsearch at first, and Quickwit after that because even after restriction the search on a tag and a time window "grepping" was not a viable option.

New comment by fulmicoton in "Tantivy – full-text search engine library inspired by Apache Lucene"

fulmicoton — Tue, 28 May 2024 04:24:04 +0000

Thank you @tyler!!!

New comment by fulmicoton in "Tantivy – full-text search engine library inspired by Apache Lucene"

fulmicoton — Mon, 27 May 2024 18:58:50 +0000

Thank you so much for sharing!!!