<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: fulmicoton</title><link>https://news.ycombinator.com/user?id=fulmicoton</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Thu, 23 Apr 2026 15:25:12 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=fulmicoton" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by fulmicoton in "Show HN: Improving search ranking with chess Elo scores"]]></title><description><![CDATA[
<p>One trouble I could see with your approach is that you treat the information "Doc at pos i" beats "Doc at pos j" independently from i and j. Intuitively, it is not as critical when a bad doc is at rank 9 instead of rank 10; compared to bad doc landing at rank 1 instead of rank 10.<p>LambdaMART's approach seems better in that respect.<p><a href="https://medium.com/@nikhilbd/pointwise-vs-pairwise-vs-listwise-learning-to-rank-80a8fe8fadfd" rel="nofollow">https://medium.com/@nikhilbd/pointwise-vs-pairwise-vs-listwi...</a></p>
]]></description><pubDate>Thu, 17 Jul 2025 02:04:51 +0000</pubDate><link>https://news.ycombinator.com/item?id=44588983</link><dc:creator>fulmicoton</dc:creator><comments>https://news.ycombinator.com/item?id=44588983</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44588983</guid></item><item><title><![CDATA[New comment by fulmicoton in "AWS S3 SDK breaks its compatible services"]]></title><description><![CDATA[
<p>This bug hit us, and yes, I hadn't thought of just switching to opendal. That's indeed a great reminder.</p>
]]></description><pubDate>Thu, 20 Feb 2025 22:46:44 +0000</pubDate><link>https://news.ycombinator.com/item?id=43121419</link><dc:creator>fulmicoton</dc:creator><comments>https://news.ycombinator.com/item?id=43121419</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43121419</guid></item><item><title><![CDATA[New comment by fulmicoton in "Datadog acquires Quickwit"]]></title><description><![CDATA[
<p>No. Quickwit was founded well before Warpstream and it did not inspire us.<p>The Husky blog post was released after we released a few versions of quickwit if I recall correctly. It was not an inspiration either.<p>As far as I know, the similarities are fortuitous.</p>
]]></description><pubDate>Sun, 12 Jan 2025 00:40:04 +0000</pubDate><link>https://news.ycombinator.com/item?id=42670169</link><dc:creator>fulmicoton</dc:creator><comments>https://news.ycombinator.com/item?id=42670169</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42670169</guid></item><item><title><![CDATA[New comment by fulmicoton in "Datadog acquires Quickwit"]]></title><description><![CDATA[
<p>Our seed round was 100% made of SAFE, so VCs did not have the power to force us to do anything.<p>The sentence in the blog post is a tad misleading. I suspect François is not really talking about VCs that had already invested in quickwit, but about the usual flow of other VCs who contacted us, to know about the company and be part of our eventual series A.<p>It just generally felt like we were "at a crossing".<p>No one twisted our arm.</p>
]]></description><pubDate>Sun, 12 Jan 2025 00:34:04 +0000</pubDate><link>https://news.ycombinator.com/item?id=42670135</link><dc:creator>fulmicoton</dc:creator><comments>https://news.ycombinator.com/item?id=42670135</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42670135</guid></item><item><title><![CDATA[New comment by fulmicoton in "Show HN: SeekStorm – open-source sub-millisecond search in Rust"]]></title><description><![CDATA[
<p>Developer of tantivy chiming in! (I hope that's ok) Database performance is a space where there are a lot of lies and bullshit, so you are 100% right to be suspicious.<p>I don't know SeekStorm's team and I did not dig much into the details, but my impression so far is that their benchmark's results are fair. At least I see no reason not to trust them.</p>
]]></description><pubDate>Tue, 03 Dec 2024 08:13:49 +0000</pubDate><link>https://news.ycombinator.com/item?id=42304001</link><dc:creator>fulmicoton</dc:creator><comments>https://news.ycombinator.com/item?id=42304001</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42304001</guid></item><item><title><![CDATA[New comment by fulmicoton in "Nixiesearch: Running Lucene over S3, and why we're building a new search engine"]]></title><description><![CDATA[
<p>Yes. We should shut down this demo. We reduced the hardware to cut down our costs. Right now it runs a ludicrously small amount of hardware.</p>
]]></description><pubDate>Fri, 11 Oct 2024 00:07:46 +0000</pubDate><link>https://news.ycombinator.com/item?id=41804840</link><dc:creator>fulmicoton</dc:creator><comments>https://news.ycombinator.com/item?id=41804840</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41804840</guid></item><item><title><![CDATA[New comment by fulmicoton in "Turbopuffer: Fast search on object storage"]]></title><description><![CDATA[
<p>Quickwit is targetting logs:<p><pre><code>    - it does not do vector search. It can rank docs using BM25, but usually people just want to sort by timestamp.
    - its does not use an SSD cache. Quickwit reads directly into the object storage.
    - it is append-only (you can't modify documents)
    - it scales really well and typically shines on the 1TB .. 100PB range
    - it has a Elastic search compatible API.</code></pre></p>
]]></description><pubDate>Fri, 12 Jul 2024 08:33:56 +0000</pubDate><link>https://news.ycombinator.com/item?id=40943710</link><dc:creator>fulmicoton</dc:creator><comments>https://news.ycombinator.com/item?id=40943710</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40943710</guid></item><item><title><![CDATA[New comment by fulmicoton in "Binance built a 100PB log service with Quickwit"]]></title><description><![CDATA[
<p>This is NOT about transaction log. This is application logs. The thing you generate via Log4j for instance.<p>Also 100PB is measured as the input format (JSON). Internally Quickwit will have more efficient representations.</p>
]]></description><pubDate>Fri, 12 Jul 2024 06:55:40 +0000</pubDate><link>https://news.ycombinator.com/item?id=40943261</link><dc:creator>fulmicoton</dc:creator><comments>https://news.ycombinator.com/item?id=40943261</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40943261</guid></item><item><title><![CDATA[New comment by fulmicoton in "Binance built a 100PB log service with Quickwit"]]></title><description><![CDATA[
<p>Security and customer support are the two main reasons why people want a super long retention.<p>Medium retention (1 or 2 months) is still very appreciable if some issue in your bugtracker stay stale for this amount of time.</p>
]]></description><pubDate>Fri, 12 Jul 2024 06:52:58 +0000</pubDate><link>https://news.ycombinator.com/item?id=40943251</link><dc:creator>fulmicoton</dc:creator><comments>https://news.ycombinator.com/item?id=40943251</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40943251</guid></item><item><title><![CDATA[New comment by fulmicoton in "Binance built a 100PB log service with Quickwit"]]></title><description><![CDATA[
<p>It is pretty much the same as Lucene. The compression ratio is very specific logs and depends on the logs themselves. (Often it is not that good)</p>
]]></description><pubDate>Fri, 12 Jul 2024 06:50:39 +0000</pubDate><link>https://news.ycombinator.com/item?id=40943237</link><dc:creator>fulmicoton</dc:creator><comments>https://news.ycombinator.com/item?id=40943237</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40943237</guid></item><item><title><![CDATA[New comment by fulmicoton in "Binance built a 100PB log service with Quickwit"]]></title><description><![CDATA[
<p>Quickwit (like Elasticsearch/Opensearch) stores you data compressed with ZSTD in a row store, builds a full text search index, and stores some of your fields in a columnar. The "compressed size" includes all of this.<p>The high compression rate is VERY specific to logs.<p>- What happens when you alter an index configuration? Or add or remove an index?<p>Changing an index mapping was not available in 0.8. It is available in main and will be added in  0.9. The change only impacts new data.<p>- Or add or remove an index?<p>This is handled since the beginning.<p>- What about cold storage?<p>What makes Quickwit special is that we are reading everything is on S3. We adapted our inverted index to make it possible to read straight from S3.
You might think this is crazy slow, but we typically search into TBs of data in less than a second. We have some in RAM cache too, but they are entirely optional.<p>> 2. Sampled data, generally for debugging. I would generally try to keep this at 10TB or less;<p>Sometimes, sampling is not possible. For instance, some of Quickwit users (including Binance) use their logs for user support too. A user might come asking details about something fishy that happened 2 months ago.</p>
]]></description><pubDate>Fri, 12 Jul 2024 06:49:22 +0000</pubDate><link>https://news.ycombinator.com/item?id=40943226</link><dc:creator>fulmicoton</dc:creator><comments>https://news.ycombinator.com/item?id=40943226</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40943226</guid></item><item><title><![CDATA[New comment by fulmicoton in "Binance built a 100PB log service with Quickwit"]]></title><description><![CDATA[
<p>Again, this is application logs. The stuff you would log in your program with log4j for instance.<p>With a microservices architecture in particular that can pile up rapidly.</p>
]]></description><pubDate>Fri, 12 Jul 2024 00:01:08 +0000</pubDate><link>https://news.ycombinator.com/item?id=40941684</link><dc:creator>fulmicoton</dc:creator><comments>https://news.ycombinator.com/item?id=40941684</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40941684</guid></item><item><title><![CDATA[New comment by fulmicoton in "Binance built a 100PB log service with Quickwit"]]></title><description><![CDATA[
<p>Thank you for the kind word @ZeroCool2u ! :)</p>
]]></description><pubDate>Thu, 11 Jul 2024 14:33:12 +0000</pubDate><link>https://news.ycombinator.com/item?id=40937254</link><dc:creator>fulmicoton</dc:creator><comments>https://news.ycombinator.com/item?id=40937254</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40937254</guid></item><item><title><![CDATA[New comment by fulmicoton in "Binance built a 100PB log service with Quickwit"]]></title><description><![CDATA[
<p>Building an inverted index is actually very cpu intensive. I think we are the fastest on that (if someone knows something faster than tantivy at indexing I am interested).<p>I'd be really surprised if you can make a 10x improvement here.</p>
]]></description><pubDate>Thu, 11 Jul 2024 14:29:18 +0000</pubDate><link>https://news.ycombinator.com/item?id=40937210</link><dc:creator>fulmicoton</dc:creator><comments>https://news.ycombinator.com/item?id=40937210</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40937210</guid></item><item><title><![CDATA[New comment by fulmicoton in "Binance built a 100PB log service with Quickwit"]]></title><description><![CDATA[
<p>If you can limit your research to GBs of logs, I kind of agree with you.
It's ok if a log search request takes 100ms instead of 2s,
and the "grep" approach is more flexible.<p>Usually our users search into > 1TB.<p>Let's imagine you have to search into 10TB (even after time/tag pruning).
Distributing over 10k cores over 2 second is not practical and does not always economically make sense.</p>
]]></description><pubDate>Thu, 11 Jul 2024 14:25:46 +0000</pubDate><link>https://news.ycombinator.com/item?id=40937171</link><dc:creator>fulmicoton</dc:creator><comments>https://news.ycombinator.com/item?id=40937171</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40937171</guid></item><item><title><![CDATA[New comment by fulmicoton in "Binance built a 100PB log service with Quickwit"]]></title><description><![CDATA[
<p>The data is just Binance's application logs for observability. 
Typically what a smaller business would simply send to Datadog.<p>This log search infra is handled by two engineers who do that for the entire company.<p>They have some standardized log format that all teams are required to observe, but they have little control on how much data is logged by each service.<p>(I'm quickwit CTO by the way)</p>
]]></description><pubDate>Thu, 11 Jul 2024 14:15:01 +0000</pubDate><link>https://news.ycombinator.com/item?id=40937064</link><dc:creator>fulmicoton</dc:creator><comments>https://news.ycombinator.com/item?id=40937064</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40937064</guid></item><item><title><![CDATA[New comment by fulmicoton in "Binance built a 100PB log service with Quickwit"]]></title><description><![CDATA[
<p>Quickwit is designed to do full-text search efficiently with an index stored on an object storage.<p>There are no equivalent technology, apart maybe:<p>- Chaossearch but it is hard to tell because they are not opensource and do not share their internals. (if someone from chaossearch wants to comment?)<p>- Elasticsearch makes it possible to search into an index archived on S3. This is still a super useful feature as a way to search punctually into your archived data, but it would be too slow and too expensive (it generates a lot of GET requests) to use as your everyday "main" log search index.</p>
]]></description><pubDate>Thu, 11 Jul 2024 13:51:35 +0000</pubDate><link>https://news.ycombinator.com/item?id=40936834</link><dc:creator>fulmicoton</dc:creator><comments>https://news.ycombinator.com/item?id=40936834</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40936834</guid></item><item><title><![CDATA[New comment by fulmicoton in "Binance built a 100PB log service with Quickwit"]]></title><description><![CDATA[
<p>This is their application logs. They need to search into it in a comfortable manner. They went for a search engine with Elasticsearch at first, and Quickwit after that because even after restriction the search on a tag and a time window "grepping" was not a viable option.</p>
]]></description><pubDate>Thu, 11 Jul 2024 13:23:24 +0000</pubDate><link>https://news.ycombinator.com/item?id=40936515</link><dc:creator>fulmicoton</dc:creator><comments>https://news.ycombinator.com/item?id=40936515</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40936515</guid></item><item><title><![CDATA[New comment by fulmicoton in "Tantivy – full-text search engine library inspired by Apache Lucene"]]></title><description><![CDATA[
<p>Thank you @tyler!!!</p>
]]></description><pubDate>Tue, 28 May 2024 04:24:04 +0000</pubDate><link>https://news.ycombinator.com/item?id=40497297</link><dc:creator>fulmicoton</dc:creator><comments>https://news.ycombinator.com/item?id=40497297</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40497297</guid></item><item><title><![CDATA[New comment by fulmicoton in "Tantivy – full-text search engine library inspired by Apache Lucene"]]></title><description><![CDATA[
<p>Thank you so much for sharing!!!</p>
]]></description><pubDate>Mon, 27 May 2024 18:58:50 +0000</pubDate><link>https://news.ycombinator.com/item?id=40493563</link><dc:creator>fulmicoton</dc:creator><comments>https://news.ycombinator.com/item?id=40493563</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40493563</guid></item></channel></rss>