<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: tadkar</title><link>https://news.ycombinator.com/user?id=tadkar</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Fri, 26 Jun 2026 00:15:51 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=tadkar" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by tadkar in "Show HN: Nimic – Pure Python as a systems language with AOT compilation"]]></title><description><![CDATA[
<p>ahead of time</p>
]]></description><pubDate>Thu, 25 Jun 2026 11:10:20 +0000</pubDate><link>https://news.ycombinator.com/item?id=48671754</link><dc:creator>tadkar</dc:creator><comments>https://news.ycombinator.com/item?id=48671754</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48671754</guid></item><item><title><![CDATA[New comment by tadkar in "Supercharge vector search with ColBERT rerank in PostgreSQL"]]></title><description><![CDATA[
<p>I think the important thing is that the first approach to converting complete <i>sentences</i> to an embedding was done by averaging all the embeddings of the tokens in the sentence. What ColBERT does is store the embeddings of all the tokens before then using dot products to identify the most relevant tokens to the query. Another comment in this thread says the same thing in a different way. Feels funny to post a stack exchange reference, but this is a great answer!<p>[1] <a href="https://stackoverflow.com/questions/57960995/how-are-the-tokenembeddings-in-bert-created" rel="nofollow">https://stackoverflow.com/questions/57960995/how-are-the-tok...</a></p>
]]></description><pubDate>Fri, 24 Jan 2025 15:49:07 +0000</pubDate><link>https://news.ycombinator.com/item?id=42814259</link><dc:creator>tadkar</dc:creator><comments>https://news.ycombinator.com/item?id=42814259</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42814259</guid></item><item><title><![CDATA[New comment by tadkar in "Supercharge vector search with ColBERT rerank in PostgreSQL"]]></title><description><![CDATA[
<p>Here’s my understanding. It is intimidating to write a response to you, because you have an exceptionally clear writing style. I hope the more knowledgable HN crowd will correct any errors in fact or presentation style below.<p>Old school word embedding models (like Word2Vec) come up with embeddings by using masked word predictions. You can embed a whole sentence by taking the average of all the word embeddings in the sentence.<p>There are many scenarios where this average fails to distinguish between multiple meanings of a word. For example “fine weather” and “fine hair” both contain “fine” but mean different things.<p>Transformers are great at producing better embeddings by considering context, and using the words in the rest of the sentence to produce a better representation of each word. BERT is a great model to do this.<p>The problem is that if you want to use BERT by itself to compute relevance you need to perform a lot of compute per query because you have to concatenate the query and the document vector to produce a long sequence that can then be “embedded” by BERT. Figure 2c in the ColBERT paper [1]<p>What ColBERT does is to use the fact that BERT can use context from the entire sentence and its attention heads to produce a more nuanced representation of any token in its input. It does this once for all documents in its index. So for example (assuming “fine” was a token) it would embed the “fine” in the sentence “we’re having fine weather today” to a different vector than the fine in “Sarah has fine blond hair”. In ColBERT the size of the output embeddings are usually much smaller than the typical 1024 you might expect from a Word2Vec.<p>Now, if you have a query, you can do the same and produce token level embeddings for all the tokens in the query.<p>Once you have these two contextualised embeddings, you can check for the presence of the particular meaning of a word in the document using the dot product. For example the query “which children have fine hair” matches the document “Sarah had fine blond hair” because the token “fine” is used in the exact same context in both the query and the document and should be picked up by the MaxSim operation.<p>[1] <a href="https://arxiv.org/pdf/2004.12832" rel="nofollow">https://arxiv.org/pdf/2004.12832</a></p>
]]></description><pubDate>Fri, 24 Jan 2025 06:44:01 +0000</pubDate><link>https://news.ycombinator.com/item?id=42811052</link><dc:creator>tadkar</dc:creator><comments>https://news.ycombinator.com/item?id=42811052</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42811052</guid></item><item><title><![CDATA[New comment by tadkar in "Advantages of incompetent management"]]></title><description><![CDATA[
<p>I have a theory that organizations that grow fast and scale well all have this “cellular model” at their core.<p>Investment bank trading desks in the pre-2008 era, partnership at the big strategy consulting firms and even “multi-strategy hedge funds” now are actually all collections of very incentive aligned businesses. They share the Creo quality of making lots of millionaires and people looking back on their time there as one of great freedom and achievement.<p>In all these places, employees are paid according to the revenue they generate, with seemingly no ceiling to what you can take home. 
It is true that the size of any one cell doesn’t scale beyond a small number of people. But all the organisations I mentioned above scale by having units tackling small pieces of vast markets.<p>The main lesson I took away from reading “Barbarians at the Gate” is that big companies hugely suffer from the principal agent problem, where management is mostly out to enrich themselves at the expense of shareholders and employees (sometimes). This looting is however only possible at a company that was established by a founder with a deep vision and passion for the product and has set up systems and culture that generates sufficient cash for the professional management to leech off.<p>What I have not read yet is a systematic study of these “cellular organizations” and what the common features are that make them successful. My guess is that the key is that each “unit” or “cell” has measurable economics that makes it possible to share the economic value over a sustained period of time. A bit like why sales people get paid a lot.</p>
]]></description><pubDate>Sun, 07 Jul 2024 07:02:20 +0000</pubDate><link>https://news.ycombinator.com/item?id=40895792</link><dc:creator>tadkar</dc:creator><comments>https://news.ycombinator.com/item?id=40895792</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40895792</guid></item><item><title><![CDATA[Architecting ML Pipelines on Snowflake]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.modelbit.com/blog/a-technical-guide-to-architecting-large-ml-pipelines-on-snowflake-warehouses">https://www.modelbit.com/blog/a-technical-guide-to-architecting-large-ml-pipelines-on-snowflake-warehouses</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=40188640">https://news.ycombinator.com/item?id=40188640</a></p>
<p>Points: 1</p>
<p># Comments: 0</p>
]]></description><pubDate>Sun, 28 Apr 2024 13:58:43 +0000</pubDate><link>https://www.modelbit.com/blog/a-technical-guide-to-architecting-large-ml-pipelines-on-snowflake-warehouses</link><dc:creator>tadkar</dc:creator><comments>https://news.ycombinator.com/item?id=40188640</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40188640</guid></item><item><title><![CDATA[New comment by tadkar in "Ask HN: How to Learn Performance Engineering?"]]></title><description><![CDATA[
<p>This is a great blog to give you things to get started. <a href="https://easyperf.net/" rel="nofollow noreferrer">https://easyperf.net/</a><p>As with all things, practice is an essential part of improving!<p>Then, there's learning from some real achievements. Fast inverse square root, or the 55GB/s Fizzbuzz example: <a href="https://codegolf.stackexchange.com/questions/215216/high-throughput-fizz-buzz/236630#236630" rel="nofollow noreferrer">https://codegolf.stackexchange.com/questions/215216/high-thr...</a></p>
]]></description><pubDate>Thu, 30 Nov 2023 16:32:50 +0000</pubDate><link>https://news.ycombinator.com/item?id=38475512</link><dc:creator>tadkar</dc:creator><comments>https://news.ycombinator.com/item?id=38475512</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=38475512</guid></item><item><title><![CDATA[New comment by tadkar in "Implementing Interactive Languages"]]></title><description><![CDATA[
<p>I came here to say exactly the same thing. There are also a couple of other options: the MIR project from RedHat [1], libjit [2], lightning [3] and Dynasm [4]
1. <a href="https://github.com/vnmakarov/mir">https://github.com/vnmakarov/mir</a>
2. <a href="https://www.gnu.org/software/libjit/" rel="nofollow noreferrer">https://www.gnu.org/software/libjit/</a>
3. <a href="https://www.gnu.org/software/lightning/manual/lightning.html" rel="nofollow noreferrer">https://www.gnu.org/software/lightning/manual/lightning.html</a>
4. <a href="https://corsix.github.io/dynasm-doc/tutorial.html" rel="nofollow noreferrer">https://corsix.github.io/dynasm-doc/tutorial.html</a><p>But in general it seems to be very hard to beat the bang for buck from generating C and compiling that -  even with something simple like tcc</p>
]]></description><pubDate>Sat, 26 Aug 2023 07:46:36 +0000</pubDate><link>https://news.ycombinator.com/item?id=37270671</link><dc:creator>tadkar</dc:creator><comments>https://news.ycombinator.com/item?id=37270671</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=37270671</guid></item><item><title><![CDATA[New comment by tadkar in "Invertible Bloom Lookup Tables with Less Randomness and Memory"]]></title><description><![CDATA[
<p>I suspect that for most Bloom filters, the most commonly used hash functions are “good enough”. There’s also some literature to suggest that using just 2 hash functions and recombining the results is plenty. See kirsch-mitzenmacher [1] and [2]<p>[1] <a href="https://www.eecs.harvard.edu/%7Emichaelm/postscripts/tr-02-05.pdf" rel="nofollow noreferrer">https://www.eecs.harvard.edu/%7Emichaelm/postscripts/tr-02-0...</a>
[2] <a href="https://stackoverflow.com/questions/70963247/bloom-filters-with-the-kirsch-mitzenmacher-optimization" rel="nofollow noreferrer">https://stackoverflow.com/questions/70963247/bloom-filters-w...</a></p>
]]></description><pubDate>Thu, 13 Jul 2023 06:19:20 +0000</pubDate><link>https://news.ycombinator.com/item?id=36705605</link><dc:creator>tadkar</dc:creator><comments>https://news.ycombinator.com/item?id=36705605</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=36705605</guid></item><item><title><![CDATA[Quack Pipe – DuckDB as a ClickHouse UDF]]></title><description><![CDATA[
<p>Article URL: <a href="https://blog.qryn.dev/clickhouse-duckdb">https://blog.qryn.dev/clickhouse-duckdb</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=36569674">https://news.ycombinator.com/item?id=36569674</a></p>
<p>Points: 4</p>
<p># Comments: 0</p>
]]></description><pubDate>Mon, 03 Jul 2023 06:56:29 +0000</pubDate><link>https://blog.qryn.dev/clickhouse-duckdb</link><dc:creator>tadkar</dc:creator><comments>https://news.ycombinator.com/item?id=36569674</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=36569674</guid></item><item><title><![CDATA[Comparing SQL based streaming analytics approaches]]></title><description><![CDATA[
<p>Article URL: <a href="https://georgheiler.com/2022/04/01/comparing-sql-based-streaming-approaches/">https://georgheiler.com/2022/04/01/comparing-sql-based-streaming-approaches/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=33644091">https://news.ycombinator.com/item?id=33644091</a></p>
<p>Points: 2</p>
<p># Comments: 0</p>
]]></description><pubDate>Thu, 17 Nov 2022 19:14:50 +0000</pubDate><link>https://georgheiler.com/2022/04/01/comparing-sql-based-streaming-approaches/</link><dc:creator>tadkar</dc:creator><comments>https://news.ycombinator.com/item?id=33644091</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=33644091</guid></item><item><title><![CDATA[New comment by tadkar in "Ask HN: More “experimental” UIs for editing/writing code?"]]></title><description><![CDATA[
<p>I think JetBrains have something where you can do something like what you’re looking for. 
<a href="https://www.jetbrains.com/mps/" rel="nofollow">https://www.jetbrains.com/mps/</a></p>
]]></description><pubDate>Sun, 07 Aug 2022 11:26:50 +0000</pubDate><link>https://news.ycombinator.com/item?id=32375646</link><dc:creator>tadkar</dc:creator><comments>https://news.ycombinator.com/item?id=32375646</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=32375646</guid></item><item><title><![CDATA[New comment by tadkar in "Excel: Getting rid of everything except numbers"]]></title><description><![CDATA[
<p>I think this is the paper
<a href="https://arxiv.org/abs/1204.6079" rel="nofollow">https://arxiv.org/abs/1204.6079</a></p>
]]></description><pubDate>Sun, 17 Jul 2022 21:58:41 +0000</pubDate><link>https://news.ycombinator.com/item?id=32131805</link><dc:creator>tadkar</dc:creator><comments>https://news.ycombinator.com/item?id=32131805</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=32131805</guid></item><item><title><![CDATA[Bootstrapping Data Labels]]></title><description><![CDATA[
<p>Article URL: <a href="https://eugeneyan.com/writing/bootstrapping-data-labels/">https://eugeneyan.com/writing/bootstrapping-data-labels/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=28360484">https://news.ycombinator.com/item?id=28360484</a></p>
<p>Points: 12</p>
<p># Comments: 0</p>
]]></description><pubDate>Mon, 30 Aug 2021 20:25:01 +0000</pubDate><link>https://eugeneyan.com/writing/bootstrapping-data-labels/</link><dc:creator>tadkar</dc:creator><comments>https://news.ycombinator.com/item?id=28360484</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=28360484</guid></item><item><title><![CDATA[New comment by tadkar in "Drag and Drop and No Coding Testing Tool That Saves Time"]]></title><description><![CDATA[
<p>How does preflight deal with selectors used to identify elements changing? This is the critical piece to solve before systems like this are useful. As identified in the paper discussed here:
<a href="https://blog.acolyer.org/2016/05/30/why-do-recordreplay-tests-of-web-applications-break/" rel="nofollow">https://blog.acolyer.org/2016/05/30/why-do-recordreplay-test...</a></p>
]]></description><pubDate>Mon, 23 Aug 2021 17:25:46 +0000</pubDate><link>https://news.ycombinator.com/item?id=28279167</link><dc:creator>tadkar</dc:creator><comments>https://news.ycombinator.com/item?id=28279167</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=28279167</guid></item><item><title><![CDATA[On Medici and Thiel]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.strangeloopcanon.com/p/on-medici-and-thiel">https://www.strangeloopcanon.com/p/on-medici-and-thiel</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=27975930">https://news.ycombinator.com/item?id=27975930</a></p>
<p>Points: 26</p>
<p># Comments: 5</p>
]]></description><pubDate>Tue, 27 Jul 2021 18:40:34 +0000</pubDate><link>https://www.strangeloopcanon.com/p/on-medici-and-thiel</link><dc:creator>tadkar</dc:creator><comments>https://news.ycombinator.com/item?id=27975930</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=27975930</guid></item><item><title><![CDATA[New comment by tadkar in "Ask HN: Did anybody apply for UK Global Talent visa? How did it go?"]]></title><description><![CDATA[
<p>From an employer’s perspective, I’d be super interested in hearing about the experience of applying for this visa too!
Does the UK government make it easy to apply? What are the interviews like? Anything your employer did that made the whole process easier for you?</p>
]]></description><pubDate>Sun, 18 Jul 2021 16:27:12 +0000</pubDate><link>https://news.ycombinator.com/item?id=27874197</link><dc:creator>tadkar</dc:creator><comments>https://news.ycombinator.com/item?id=27874197</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=27874197</guid></item><item><title><![CDATA[New comment by tadkar in "Ribbon filter: Practically smaller than Bloom and Xor"]]></title><description><![CDATA[
<p>The paper has a great figure where they illustrate areas of the overhead vs false positive trade-off space where each filter type performs best. Cuckoo filters make an appearance there</p>
]]></description><pubDate>Sun, 11 Jul 2021 10:46:27 +0000</pubDate><link>https://news.ycombinator.com/item?id=27799830</link><dc:creator>tadkar</dc:creator><comments>https://news.ycombinator.com/item?id=27799830</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=27799830</guid></item><item><title><![CDATA[New comment by tadkar in "Ribbon filter: Practically smaller than Bloom and Xor"]]></title><description><![CDATA[
<p>A hyperLogLog is for counting distinct elements. This and Bloom filters are more about checking whether an element has been seen before; a very different use case.</p>
]]></description><pubDate>Sun, 11 Jul 2021 08:58:46 +0000</pubDate><link>https://news.ycombinator.com/item?id=27799402</link><dc:creator>tadkar</dc:creator><comments>https://news.ycombinator.com/item?id=27799402</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=27799402</guid></item><item><title><![CDATA[No more DSLs: Implement and deploy a distributed system with a single program]]></title><description><![CDATA[
<p>Article URL: <a href="http://catern.com/caternetes.html">http://catern.com/caternetes.html</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=27787314">https://news.ycombinator.com/item?id=27787314</a></p>
<p>Points: 64</p>
<p># Comments: 15</p>
]]></description><pubDate>Fri, 09 Jul 2021 19:05:43 +0000</pubDate><link>http://catern.com/caternetes.html</link><dc:creator>tadkar</dc:creator><comments>https://news.ycombinator.com/item?id=27787314</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=27787314</guid></item><item><title><![CDATA[Online spreadsheet with first class functions and record types]]></title><description><![CDATA[
<p>Article URL: <a href="https://inflex.io/">https://inflex.io/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=26803819">https://news.ycombinator.com/item?id=26803819</a></p>
<p>Points: 6</p>
<p># Comments: 1</p>
]]></description><pubDate>Wed, 14 Apr 2021 06:54:28 +0000</pubDate><link>https://inflex.io/</link><dc:creator>tadkar</dc:creator><comments>https://news.ycombinator.com/item?id=26803819</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=26803819</guid></item></channel></rss>