<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: GreyOcten</title><link>https://news.ycombinator.com/user?id=GreyOcten</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Tue, 30 Jun 2026 22:24:14 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=GreyOcten" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by GreyOcten in "Show HN: A GPU/VRAM filter for finding LLMs that will run on your hardware"]]></title><description><![CDATA[
<p>handy, but the gap most of these filters have is that "fits in VRAM" doesn't mean usable. 
context length blows up the KV cache fast, a 7B that fits at 2k tokens will OOM at 32k. 
factoring context len + quant into the estimate is where it'd actually save people from getting burned.</p>
]]></description><pubDate>Fri, 26 Jun 2026 20:30:42 +0000</pubDate><link>https://news.ycombinator.com/item?id=48691596</link><dc:creator>GreyOcten</dc:creator><comments>https://news.ycombinator.com/item?id=48691596</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48691596</guid></item></channel></rss>