<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: halflings</title><link>https://news.ycombinator.com/user?id=halflings</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Mon, 15 Jun 2026 21:35:40 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=halflings" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[Sampling from LLMs: Art and Science]]></title><description><![CDATA[
<p>Article URL: <a href="https://kachkach.com/blog/llm-sampling">https://kachkach.com/blog/llm-sampling</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47672828">https://news.ycombinator.com/item?id=47672828</a></p>
<p>Points: 1</p>
<p># Comments: 0</p>
]]></description><pubDate>Tue, 07 Apr 2026 09:59:24 +0000</pubDate><link>https://kachkach.com/blog/llm-sampling</link><dc:creator>halflings</dc:creator><comments>https://news.ycombinator.com/item?id=47672828</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47672828</guid></item><item><title><![CDATA[New comment by halflings in "Men are ditching TV for YouTube as AI usage and social media fatigue grow"]]></title><description><![CDATA[
<p>> Youtube charges $10 per month and doesn't produce a single video<p>It is different from Netflix (that pays upfront for production costs), but there's of course a revenue share + the bulk of the revenue for creators is actually from sponsorships (which YT doesn't take a share of).</p>
]]></description><pubDate>Thu, 02 Apr 2026 11:18:54 +0000</pubDate><link>https://news.ycombinator.com/item?id=47612871</link><dc:creator>halflings</dc:creator><comments>https://news.ycombinator.com/item?id=47612871</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47612871</guid></item><item><title><![CDATA[New comment by halflings in "Generating one token at a time is a blessing in disguise"]]></title><description><![CDATA[
<p>LLMs generate their output one token at a time. The first thought when you learn this is that this is a huge performance bottleneck, as we are used to highly parallelized systems.<p>However, a large part of what makes LLMs feel so magical comes from this bottleneck.</p>
]]></description><pubDate>Sun, 29 Mar 2026 20:49:55 +0000</pubDate><link>https://news.ycombinator.com/item?id=47567161</link><dc:creator>halflings</dc:creator><comments>https://news.ycombinator.com/item?id=47567161</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47567161</guid></item><item><title><![CDATA[Generating one token at a time is a blessing in disguise]]></title><description><![CDATA[
<p>Article URL: <a href="https://kachkach.com/blog/generating-one-token-at-a-time-is-a-blessing-in-disguise">https://kachkach.com/blog/generating-one-token-at-a-time-is-a-blessing-in-disguise</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47567160">https://news.ycombinator.com/item?id=47567160</a></p>
<p>Points: 3</p>
<p># Comments: 1</p>
]]></description><pubDate>Sun, 29 Mar 2026 20:49:55 +0000</pubDate><link>https://kachkach.com/blog/generating-one-token-at-a-time-is-a-blessing-in-disguise</link><dc:creator>halflings</dc:creator><comments>https://news.ycombinator.com/item?id=47567160</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47567160</guid></item><item><title><![CDATA[New comment by halflings in "The Codex App"]]></title><description><![CDATA[
<p>The main thing I noticed in the video is that they have <i>heavily</i> sped up all the code generation sections... seems to be on 5x speed or more. (because people got used to how fast <i>and</i> good Sonnet, and especially Gemini 3.0 Flash, are)</p>
]]></description><pubDate>Mon, 02 Feb 2026 20:50:38 +0000</pubDate><link>https://news.ycombinator.com/item?id=46861333</link><dc:creator>halflings</dc:creator><comments>https://news.ycombinator.com/item?id=46861333</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46861333</guid></item><item><title><![CDATA[New comment by halflings in "The Codex App"]]></title><description><![CDATA[
<p>Deploying from Antigravity is as easy as say connecting the Firebase MCP [1] and asking it "deploy my app to firebase".<p>[1] <a href="https://firebase.google.com/docs/ai-assistance/mcp-server" rel="nofollow">https://firebase.google.com/docs/ai-assistance/mcp-server</a></p>
]]></description><pubDate>Mon, 02 Feb 2026 20:48:04 +0000</pubDate><link>https://news.ycombinator.com/item?id=46861303</link><dc:creator>halflings</dc:creator><comments>https://news.ycombinator.com/item?id=46861303</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46861303</guid></item><item><title><![CDATA[New comment by halflings in "Python 3.15’s interpreter for Windows x86-64 should hopefully be 15% faster"]]></title><description><![CDATA[
<p>+1, reading through the post, the PR updating the documentation... thanks for being transparent, but also don't be so hard on yourself!<p>That was a very niche error, that you promptly corrected, no need to be so apologetic about it! And thanks for all the hard work making Python faster!</p>
]]></description><pubDate>Thu, 25 Dec 2025 18:39:02 +0000</pubDate><link>https://news.ycombinator.com/item?id=46386205</link><dc:creator>halflings</dc:creator><comments>https://news.ycombinator.com/item?id=46386205</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46386205</guid></item><item><title><![CDATA[New comment by halflings in "Getting a Gemini API key is an exercise in frustration"]]></title><description><![CDATA[
<p>"The models perform differently when called via the API vs in the Gemini UI."<p>This shouldn't be surprised, e.g. the model != the product.
The same way GPT4o behaves differently than the ChatGPT product when using GPT4o.</p>
]]></description><pubDate>Thu, 11 Dec 2025 10:14:23 +0000</pubDate><link>https://news.ycombinator.com/item?id=46229595</link><dc:creator>halflings</dc:creator><comments>https://news.ycombinator.com/item?id=46229595</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=46229595</guid></item><item><title><![CDATA[New comment by halflings in "US vs. Google amicus curiae brief of Y Combinator in support of plaintiffs [pdf]"]]></title><description><![CDATA[
<p>I would also add that search has <i>already</i> moved elsewhere.<p>Less and less people are using search engines to shop, ex:Amazon makes >$57B a year from search ads, but also look at Temu and Shein which are mostly glorified product search platforms.<p>No one is searching for "funny videos" when you can just open Instagram and Tiktok.<p>The only real unique thing that search engines can do is queries that are not directly commercial (e.g. education, information seeking, etc.) and competition is insanely intense (w/ ChatGPT, Perplexity, etc) there.</p>
]]></description><pubDate>Sun, 11 May 2025 12:01:57 +0000</pubDate><link>https://news.ycombinator.com/item?id=43953204</link><dc:creator>halflings</dc:creator><comments>https://news.ycombinator.com/item?id=43953204</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43953204</guid></item><item><title><![CDATA[New comment by halflings in "Gemma 3 QAT Models: Bringing AI to Consumer GPUs"]]></title><description><![CDATA[
<p>That's what the chart says yes. 14.1GB VRAM usage for the 27B model.</p>
]]></description><pubDate>Sun, 20 Apr 2025 13:28:23 +0000</pubDate><link>https://news.ycombinator.com/item?id=43743634</link><dc:creator>halflings</dc:creator><comments>https://news.ycombinator.com/item?id=43743634</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43743634</guid></item><item><title><![CDATA[New comment by halflings in "Genie 2: A large-scale foundation world model"]]></title><description><![CDATA[
<p>> I cannot be the first person to think about such possibilities<p>Differentiable Rendering [1] is the closest thing to what you are describing.
And yes, people have been working on this for the same reason you outline, it is more data/compute efficient and hence should generalize better.<p>[1] <a href="https://blog.qarnot.com/article/an-overview-of-differentiable-rendering" rel="nofollow">https://blog.qarnot.com/article/an-overview-of-differentiabl...</a><p>But also:
> While cool, this also seems utterly wasteful. Video games offer known "analytical" solutions for the interactions that the model provides as a "statistical approximation", so to say.<p>A bit of the same debate as people calling LLMs a "blurry JPEG of the web" and hence useless.<p>Yes this is a statistical approximation to an analytical problem... but that's a very reductive framing to what is going on.
To find the symbolic/analytical solution here would require to constrain the problem greatly: not <i>all</i> things on the screen have a differentiable representation, for example complex simulations might involve some kind of custom internal loop/simulation.<p>You waste compute to get a solution that can just be trained on billions of unlabeled (synthetic) examples, and then generalize to previously unseen prompts/environments.</p>
]]></description><pubDate>Thu, 05 Dec 2024 10:23:55 +0000</pubDate><link>https://news.ycombinator.com/item?id=42326746</link><dc:creator>halflings</dc:creator><comments>https://news.ycombinator.com/item?id=42326746</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=42326746</guid></item><item><title><![CDATA[New comment by halflings in "Open source AI is the path forward"]]></title><description><![CDATA[
<p>Training code is only useful to people in academia, and the closest thing to "code you can modify" are open weights.<p>People are framing this as if it was an open-source hierarchy, with "actual" open-source requiring all training code to be shared. This is not obvious to me, as I'm not asking people that share open-source libraries to also share the tools they used to develop them. I'm also not asking them to share all the design documents/architecture discussion behind this software. It's sufficient that I can take the end result and reshape it in any way I desire.<p>This is coming from an LLM practitioner that finetunes models for a living; and this constant debate about open-source vs open-weights seems like a huge distraction vs the impact open-sourcing something like Llama has... this is truly a Linux-like moment. (at a much smaller scale of course, for now at least)</p>
]]></description><pubDate>Tue, 23 Jul 2024 20:50:40 +0000</pubDate><link>https://news.ycombinator.com/item?id=41050673</link><dc:creator>halflings</dc:creator><comments>https://news.ycombinator.com/item?id=41050673</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=41050673</guid></item><item><title><![CDATA[New comment by halflings in "Suicide is on the rise for young Americans, with no clear answers"]]></title><description><![CDATA[
<p>> The world is teetering on the edge of world war<p>The world probably has never been as peaceful as in the last 50 years or so.<p>Same goes for access to drinkable water, food, decent shelter, gender equality, freedoms, technology, etc.<p>But I suppose your comment is a good illustration of the problem at hands (that so many people deeply believe that things are fucked)</p>
]]></description><pubDate>Fri, 12 Apr 2024 13:25:05 +0000</pubDate><link>https://news.ycombinator.com/item?id=40012483</link><dc:creator>halflings</dc:creator><comments>https://news.ycombinator.com/item?id=40012483</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=40012483</guid></item><item><title><![CDATA[New comment by halflings in "Groq CEO: 'We No Longer Sell Hardware'"]]></title><description><![CDATA[
<p>Thanks for putting this together! Will give it a watch now</p>
]]></description><pubDate>Tue, 09 Apr 2024 21:13:36 +0000</pubDate><link>https://news.ycombinator.com/item?id=39984313</link><dc:creator>halflings</dc:creator><comments>https://news.ycombinator.com/item?id=39984313</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=39984313</guid></item><item><title><![CDATA[New comment by halflings in "Groq CEO: 'We No Longer Sell Hardware'"]]></title><description><![CDATA[
<p>The # of chips is not the most important metric.<p>Most important, even ignoring latency, is throughput (tokens) per $$$. And according to their own benchmark [1] (famous last words :)), they're quite cost efficient.<p>[1] <a href="https://www.semianalysis.com/p/groq-inference-tokenomics-speed-but" rel="nofollow">https://www.semianalysis.com/p/groq-inference-tokenomics-spe...</a></p>
]]></description><pubDate>Tue, 09 Apr 2024 21:09:51 +0000</pubDate><link>https://news.ycombinator.com/item?id=39984278</link><dc:creator>halflings</dc:creator><comments>https://news.ycombinator.com/item?id=39984278</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=39984278</guid></item><item><title><![CDATA[New comment by halflings in "Groq CEO: 'We No Longer Sell Hardware'"]]></title><description><![CDATA[
<p>No HBM because they use tons of fast SRAM instead. Isn't that the main driver for performance here?<p>(the way I understood it => it's still cost effective at scale due to throughput increase this brings)</p>
]]></description><pubDate>Sun, 07 Apr 2024 23:51:16 +0000</pubDate><link>https://news.ycombinator.com/item?id=39964961</link><dc:creator>halflings</dc:creator><comments>https://news.ycombinator.com/item?id=39964961</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=39964961</guid></item><item><title><![CDATA[New comment by halflings in "Google's First Tensor Processing Unit: Architecture"]]></title><description><![CDATA[
<p>Agree re:hallucinations/safety issues, that was likely one of the main blockers.<p>And here's the sad part: they had this back in 2019... see this paper released in Jan 2020:
<a href="https://blog.research.google/2020/01/towards-conversational-agent-that-can.html" rel="nofollow">https://blog.research.google/2020/01/towards-conversational-...</a></p>
]]></description><pubDate>Tue, 26 Mar 2024 03:50:30 +0000</pubDate><link>https://news.ycombinator.com/item?id=39824113</link><dc:creator>halflings</dc:creator><comments>https://news.ycombinator.com/item?id=39824113</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=39824113</guid></item><item><title><![CDATA[New comment by halflings in "Google's First Tensor Processing Unit: Architecture"]]></title><description><![CDATA[
<p>This (innovator's dilemma / too afraid of disrupting your own ads business model) is the most common explanation folks are giving for this, but seems to be some sort of post-rationalization of why such a large company full of competent researchers/engineers would drop the ball this hard.<p>My read (having seen some of this on the inside), is that it was a mix of being too worried about safety issues (OMG, the chatbot occasionally says something offensive!) and being too complacent (too comfortable with incremental changes in Search, no appetite for launching an entirely new type of product / doing something really out there). There are many ways to monetize a chatbot, OpenAI for example is raking billions in subscription fees.</p>
]]></description><pubDate>Tue, 26 Mar 2024 03:48:15 +0000</pubDate><link>https://news.ycombinator.com/item?id=39824102</link><dc:creator>halflings</dc:creator><comments>https://news.ycombinator.com/item?id=39824102</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=39824102</guid></item><item><title><![CDATA[New comment by halflings in "Pyenv – lets you easily switch between multiple versions of Python"]]></title><description><![CDATA[
<p>uv has been really awesome as a replacement for pip:
<a href="https://github.com/astral-sh/uv">https://github.com/astral-sh/uv</a><p>So fast it finally made virtual environments usable for me.
But it's not (yet) a full replacement for conda, e.g. it won't install things outside of Python packages</p>
]]></description><pubDate>Mon, 25 Mar 2024 09:48:42 +0000</pubDate><link>https://news.ycombinator.com/item?id=39814287</link><dc:creator>halflings</dc:creator><comments>https://news.ycombinator.com/item?id=39814287</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=39814287</guid></item><item><title><![CDATA[New comment by halflings in "Show HN: Matrix Multiplication with Half the Multiplications"]]></title><description><![CDATA[
<p>This looks pretty cool! What's the catch? e.g. why isn't this already implemented in accelerators, is it really just a forgotten algorithm, or this has some implications on the cost of building the accelerator or else?</p>
]]></description><pubDate>Fri, 15 Mar 2024 11:38:49 +0000</pubDate><link>https://news.ycombinator.com/item?id=39714396</link><dc:creator>halflings</dc:creator><comments>https://news.ycombinator.com/item?id=39714396</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=39714396</guid></item></channel></rss>