<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: qeternity</title><link>https://news.ycombinator.com/user?id=qeternity</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Sat, 13 Jun 2026 03:57:22 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=qeternity" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by qeternity in "KVarN: Native vLLM backend for KV-cache quantization by Huawei"]]></title><description><![CDATA[
<p>I think it very much is worth it!<p>But the point was that quality didn't magically increase.</p>
]]></description><pubDate>Sat, 06 Jun 2026 11:07:00 +0000</pubDate><link>https://news.ycombinator.com/item?id=48423717</link><dc:creator>qeternity</dc:creator><comments>https://news.ycombinator.com/item?id=48423717</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48423717</guid></item><item><title><![CDATA[New comment by qeternity in "KVarN: Native vLLM backend for KV-cache quantization by Huawei"]]></title><description><![CDATA[
<p>It's not better quality: 59.3% vs 59.4% fp16 on AIME 25</p>
]]></description><pubDate>Thu, 04 Jun 2026 17:04:42 +0000</pubDate><link>https://news.ycombinator.com/item?id=48401479</link><dc:creator>qeternity</dc:creator><comments>https://news.ycombinator.com/item?id=48401479</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48401479</guid></item><item><title><![CDATA[New comment by qeternity in "Anthropic raises $65B in Series H funding at $965B post-money valuation"]]></title><description><![CDATA[
<p>> Venture capitalists & private investors are sucking all of the possible growth and future upside from these companies and then dumping them on retail investors when there's nothing left.<p>A lot of the money that is deployed by VCs comes from pension funds and asset managers that ultimately manage money for the average Joe.</p>
]]></description><pubDate>Thu, 28 May 2026 19:26:20 +0000</pubDate><link>https://news.ycombinator.com/item?id=48314167</link><dc:creator>qeternity</dc:creator><comments>https://news.ycombinator.com/item?id=48314167</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48314167</guid></item><item><title><![CDATA[New comment by qeternity in "Introspective Diffusion Language Models"]]></title><description><![CDATA[
<p>I haven't read TFA yet but a common technique is speculative decoding where a fast draft model will generate X tokens, which are then verified by the larger target model. The target model may accept some Y <= X tokens but the speedup comes from the fact that this can be done in parallel as a prefill operation due to the nature of transformers.<p>So let's say a draft model generates 5 tokens, all 5 of these can be verified in parallel with a single forward pass of the target model. The target model may only accept the first 4 tokens (or whatever) but as long as the 5 forward passes of the draft model + 1 prefill of the target model is faster than 4 forward passes of the target, you will have a speedup while maintaining the exact output distribution as the target.</p>
]]></description><pubDate>Tue, 14 Apr 2026 11:44:24 +0000</pubDate><link>https://news.ycombinator.com/item?id=47764342</link><dc:creator>qeternity</dc:creator><comments>https://news.ycombinator.com/item?id=47764342</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47764342</guid></item><item><title><![CDATA[New comment by qeternity in "Anthropic downgraded cache TTL on March 6th"]]></title><description><![CDATA[
<p>> They paid a billion dollars for a vibe coded mess just for the opportunity to associate themselves with the hype.<p>Lol no they didn't. It wasn't even an acquihire. They just hired Peter.<p>Maybe they are paying him incredibly well, but not a billion dollars well.</p>
]]></description><pubDate>Sun, 12 Apr 2026 18:10:13 +0000</pubDate><link>https://news.ycombinator.com/item?id=47742586</link><dc:creator>qeternity</dc:creator><comments>https://news.ycombinator.com/item?id=47742586</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47742586</guid></item><item><title><![CDATA[New comment by qeternity in "Meta removes ads for social media addiction litigation"]]></title><description><![CDATA[
<p>> It's not any company, its Meta and the channels they administrate come with a set of responsibilities and principles<p>Sorry, which laws stipulate these special responsibilities and principles?</p>
]]></description><pubDate>Sun, 12 Apr 2026 17:10:30 +0000</pubDate><link>https://news.ycombinator.com/item?id=47742050</link><dc:creator>qeternity</dc:creator><comments>https://news.ycombinator.com/item?id=47742050</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47742050</guid></item><item><title><![CDATA[New comment by qeternity in "Claude mixes up who said what"]]></title><description><![CDATA[
<p>> or if the model might actually have emitted the formatting tokens that indicate a user message.<p>These tokens are almost universally used as stop tokens which causes generation to stop and return control to the user.<p>If you didn't do this, the model would happily continue generating user + assistant pairs w/o any human input.</p>
]]></description><pubDate>Thu, 09 Apr 2026 12:54:51 +0000</pubDate><link>https://news.ycombinator.com/item?id=47703099</link><dc:creator>qeternity</dc:creator><comments>https://news.ycombinator.com/item?id=47703099</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47703099</guid></item><item><title><![CDATA[New comment by qeternity in "Claude mixes up who said what"]]></title><description><![CDATA[
<p>This does not solve the problem at all, it's just another bandaid that hopefully reduces the likelihood.</p>
]]></description><pubDate>Thu, 09 Apr 2026 12:48:09 +0000</pubDate><link>https://news.ycombinator.com/item?id=47703034</link><dc:creator>qeternity</dc:creator><comments>https://news.ycombinator.com/item?id=47703034</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47703034</guid></item><item><title><![CDATA[New comment by qeternity in "Mamba-3"]]></title><description><![CDATA[
<p>Yes, it is written for a specific audience.<p>That is not a reason for snark.<p>As other commenters have noted, it’s well written.</p>
]]></description><pubDate>Sat, 21 Mar 2026 08:19:38 +0000</pubDate><link>https://news.ycombinator.com/item?id=47465105</link><dc:creator>qeternity</dc:creator><comments>https://news.ycombinator.com/item?id=47465105</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47465105</guid></item><item><title><![CDATA[New comment by qeternity in "Better JIT for Postgres"]]></title><description><![CDATA[
<p>Yes, lots of things can create indeterminism. But nothing is inherent.</p>
]]></description><pubDate>Wed, 04 Mar 2026 18:15:52 +0000</pubDate><link>https://news.ycombinator.com/item?id=47251535</link><dc:creator>qeternity</dc:creator><comments>https://news.ycombinator.com/item?id=47251535</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47251535</guid></item><item><title><![CDATA[New comment by qeternity in "Better JIT for Postgres"]]></title><description><![CDATA[
<p>Yes, lots of things can create indeterminism. But nothing is inherent.</p>
]]></description><pubDate>Wed, 04 Mar 2026 18:15:46 +0000</pubDate><link>https://news.ycombinator.com/item?id=47251533</link><dc:creator>qeternity</dc:creator><comments>https://news.ycombinator.com/item?id=47251533</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47251533</guid></item><item><title><![CDATA[New comment by qeternity in "Better JIT for Postgres"]]></title><description><![CDATA[
<p>> LLMs are inherently non-deterministic.<p>This isn't true, and certainly not inherently so.<p>Changes to input leading to changes in output does not violate determinism.</p>
]]></description><pubDate>Wed, 04 Mar 2026 09:49:19 +0000</pubDate><link>https://news.ycombinator.com/item?id=47245256</link><dc:creator>qeternity</dc:creator><comments>https://news.ycombinator.com/item?id=47245256</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47245256</guid></item><item><title><![CDATA[New comment by qeternity in "MCP server that reduces Claude Code context consumption by 98%"]]></title><description><![CDATA[
<p>> With prompt caching, verbose context that gets reused is basically free.<p>But it's not. It might be discounted cost-wise, however it will still degrade attention and make generation slower/more computationally expensive even if you have a long prefix you can reuse during prefill.</p>
]]></description><pubDate>Sun, 01 Mar 2026 12:44:55 +0000</pubDate><link>https://news.ycombinator.com/item?id=47206210</link><dc:creator>qeternity</dc:creator><comments>https://news.ycombinator.com/item?id=47206210</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47206210</guid></item><item><title><![CDATA[New comment by qeternity in "Google restricting Google AI Pro/Ultra subscribers for using OpenClaw"]]></title><description><![CDATA[
<p>> Tradition warrants a negotiation phase when one party wishes to change the terms of an agreement, or becomes cognizant that the counterparty may wish to do the same.<p>They didn't change the agreement. One party violated it, and the other party withdrew as a result.<p>This is so vanilla. But people will moan because they want subsidized tokens.</p>
]]></description><pubDate>Mon, 23 Feb 2026 00:49:59 +0000</pubDate><link>https://news.ycombinator.com/item?id=47116614</link><dc:creator>qeternity</dc:creator><comments>https://news.ycombinator.com/item?id=47116614</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47116614</guid></item><item><title><![CDATA[New comment by qeternity in "Step 3.5 Flash – Open-source foundation model, supports deep reasoning at speed"]]></title><description><![CDATA[
<p>Number of parameters is at least a proxy for model capability.<p>You can achieve incredible tok/dollar or tok/sec with Qwen3 0.6b.<p>It just won't be very good for most use cases.</p>
]]></description><pubDate>Thu, 19 Feb 2026 13:54:52 +0000</pubDate><link>https://news.ycombinator.com/item?id=47073728</link><dc:creator>qeternity</dc:creator><comments>https://news.ycombinator.com/item?id=47073728</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47073728</guid></item><item><title><![CDATA[New comment by qeternity in "Two different tricks for fast LLM inference"]]></title><description><![CDATA[
<p>I would guess you haven't done this in practice. Yes, of course inference is memory bound at low batch sizes. This is why we run larger batch sizes!<p>Also there does not exist any batch size > 1 where per-request throughput is equal to bs=1. Doing any batching at all will slow all intra-batch requests down.</p>
]]></description><pubDate>Mon, 16 Feb 2026 01:17:57 +0000</pubDate><link>https://news.ycombinator.com/item?id=47029736</link><dc:creator>qeternity</dc:creator><comments>https://news.ycombinator.com/item?id=47029736</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47029736</guid></item><item><title><![CDATA[New comment by qeternity in "Two different tricks for fast LLM inference"]]></title><description><![CDATA[
<p>Yes this article is full of misunderstanding. The main explanation of bottleneck is wrong: it’s the model weights which dominate memory bandwidth (and hence why batching multiple requests in a single pass increases total throughput). If copying user tokens was the bottle neck, batching would not achieve any speed up.<p>When an author is confused about something so elementary, I can’t trust anything else they write.</p>
]]></description><pubDate>Sun, 15 Feb 2026 10:35:24 +0000</pubDate><link>https://news.ycombinator.com/item?id=47022682</link><dc:creator>qeternity</dc:creator><comments>https://news.ycombinator.com/item?id=47022682</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47022682</guid></item><item><title><![CDATA[New comment by qeternity in "Ireland rolls out basic income scheme for artists"]]></title><description><![CDATA[
<p>Sure, and again, she should do something else then.<p>She isn't entitled to have a large family and work whatever job she finds fulfilling.</p>
]]></description><pubDate>Fri, 13 Feb 2026 17:24:43 +0000</pubDate><link>https://news.ycombinator.com/item?id=47005227</link><dc:creator>qeternity</dc:creator><comments>https://news.ycombinator.com/item?id=47005227</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47005227</guid></item><item><title><![CDATA[New comment by qeternity in "Ireland rolls out basic income scheme for artists"]]></title><description><![CDATA[
<p>If you have any understanding of history, it's naive not to.</p>
]]></description><pubDate>Fri, 13 Feb 2026 17:23:53 +0000</pubDate><link>https://news.ycombinator.com/item?id=47005212</link><dc:creator>qeternity</dc:creator><comments>https://news.ycombinator.com/item?id=47005212</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47005212</guid></item><item><title><![CDATA[New comment by qeternity in "Ireland rolls out basic income scheme for artists"]]></title><description><![CDATA[
<p>Yes, this was empirically true at the time. Things change. And that does not invalidate my comment in the least.</p>
]]></description><pubDate>Fri, 13 Feb 2026 17:22:35 +0000</pubDate><link>https://news.ycombinator.com/item?id=47005198</link><dc:creator>qeternity</dc:creator><comments>https://news.ycombinator.com/item?id=47005198</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47005198</guid></item></channel></rss>