<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: cold_harbor</title><link>https://news.ycombinator.com/user?id=cold_harbor</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Fri, 22 May 2026 21:39:47 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=cold_harbor" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by cold_harbor in "DeepSeek makes the V4 Pro price discount permanent"]]></title><description><![CDATA[
<p>their MLA architecture cuts KV cache by ~5-13x vs standard attention. that's why inference is actually cheaper to run, not just a price war to gain market share.</p>
]]></description><pubDate>Fri, 22 May 2026 17:35:07 +0000</pubDate><link>https://news.ycombinator.com/item?id=48238935</link><dc:creator>cold_harbor</dc:creator><comments>https://news.ycombinator.com/item?id=48238935</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48238935</guid></item><item><title><![CDATA[New comment by cold_harbor in "CODA: Rewriting Transformer Blocks as GEMM-Epilogue Programs"]]></title><description><![CDATA[
<p>synthesis-only is the hard part. with execution feedback — run, profile, patch — the gap closes fast. it's basically an RL problem in disguise</p>
]]></description><pubDate>Fri, 22 May 2026 14:32:18 +0000</pubDate><link>https://news.ycombinator.com/item?id=48236432</link><dc:creator>cold_harbor</dc:creator><comments>https://news.ycombinator.com/item?id=48236432</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48236432</guid></item><item><title><![CDATA[New comment by cold_harbor in "Was my $48K GPU server worth it?"]]></title><description><![CDATA[
<p>missing from most of these cost discussions: privacy. for some workloads the entire value of local is zero data leaving the network, and cloud cost is irrelevant</p>
]]></description><pubDate>Fri, 22 May 2026 14:32:01 +0000</pubDate><link>https://news.ycombinator.com/item?id=48236426</link><dc:creator>cold_harbor</dc:creator><comments>https://news.ycombinator.com/item?id=48236426</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48236426</guid></item><item><title><![CDATA[New comment by cold_harbor in "Learnings from 100K lines of Rust with AI (2025)"]]></title><description><![CDATA[
<p>with Rust the failure mode isnt wrong code, it's unidiomatic code. .clone() everywhere will compile fine but you'll feel it later</p>
]]></description><pubDate>Thu, 21 May 2026 17:39:47 +0000</pubDate><link>https://news.ycombinator.com/item?id=48226388</link><dc:creator>cold_harbor</dc:creator><comments>https://news.ycombinator.com/item?id=48226388</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48226388</guid></item><item><title><![CDATA[New comment by cold_harbor in "Indexing a year of video locally on a 2021 MacBook with Gemma4-31B (50GB swap)"]]></title><description><![CDATA[
<p>the reason 50GB swap is even viable here is Apple Silicon's memory bandwidth. on x86 that much swap would make inference unusably slow</p>
]]></description><pubDate>Thu, 21 May 2026 17:37:56 +0000</pubDate><link>https://news.ycombinator.com/item?id=48226361</link><dc:creator>cold_harbor</dc:creator><comments>https://news.ycombinator.com/item?id=48226361</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48226361</guid></item><item><title><![CDATA[New comment by cold_harbor in "PyTorch Landscape"]]></title><description><![CDATA[
<p>JAX is brilliant for research but the debugging story is still rough compared to PyTorch. eager mode + native Python exceptions win for most people.</p>
]]></description><pubDate>Tue, 19 May 2026 11:39:27 +0000</pubDate><link>https://news.ycombinator.com/item?id=48192036</link><dc:creator>cold_harbor</dc:creator><comments>https://news.ycombinator.com/item?id=48192036</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48192036</guid></item><item><title><![CDATA[New comment by cold_harbor in "The last six months in LLMs in five minutes"]]></title><description><![CDATA[
<p>for non-coders: local AI. a couple years ago you needed a dedicated GPU rig. now a 30B model fits on a laptop and runs offline.</p>
]]></description><pubDate>Tue, 19 May 2026 11:38:40 +0000</pubDate><link>https://news.ycombinator.com/item?id=48192025</link><dc:creator>cold_harbor</dc:creator><comments>https://news.ycombinator.com/item?id=48192025</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48192025</guid></item><item><title><![CDATA[New comment by cold_harbor in "Where Are the Vibecoded Photoshops?"]]></title><description><![CDATA[
<p>the bottleneck is precise control. diffusion models are great at generation but bad at 'change only this region, preserve everything else exactly' — that constraint keeps Photoshop alive.</p>
]]></description><pubDate>Mon, 18 May 2026 11:23:56 +0000</pubDate><link>https://news.ycombinator.com/item?id=48178103</link><dc:creator>cold_harbor</dc:creator><comments>https://news.ycombinator.com/item?id=48178103</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48178103</guid></item><item><title><![CDATA[New comment by cold_harbor in "CUDA Books"]]></title><description><![CDATA[
<p>for LLM work, reading the Flash Attention and vLLM kernel source taught me more than any book. real code makes memory hierarchy concrete — books stay too abstract.</p>
]]></description><pubDate>Mon, 18 May 2026 11:23:25 +0000</pubDate><link>https://news.ycombinator.com/item?id=48178095</link><dc:creator>cold_harbor</dc:creator><comments>https://news.ycombinator.com/item?id=48178095</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48178095</guid></item></channel></rss>