<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: ehtbanton</title><link>https://news.ycombinator.com/user?id=ehtbanton</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Mon, 13 Apr 2026 08:16:23 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=ehtbanton" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by ehtbanton in "Claude Opus 4.6 accuracy on BridgeBench hallucination test drops from 83% to 68%"]]></title><description><![CDATA[
<p>Benchmarks like this one are designed to thoroughly test the model across several iterations. 15% is a MASSIVE discrepancy.<p>Come on Anthropic, admit what you're doing already and let us access your best models unhindered, even if it costs us more. At the moment we just all feel short-changed.</p>
]]></description><pubDate>Mon, 13 Apr 2026 03:34:11 +0000</pubDate><link>https://news.ycombinator.com/item?id=47747270</link><dc:creator>ehtbanton</dc:creator><comments>https://news.ycombinator.com/item?id=47747270</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47747270</guid></item><item><title><![CDATA[New comment by ehtbanton in "I ran Gemma 4 as a local model in Codex CLI"]]></title><description><![CDATA[
<p>This is genuinely very helpful. I'm planning a MacBook pro purchase with local inference in mind and now see I'll have to aim for a slightly higher memory option because the Gemma A4 26B MoE is not all that!</p>
]]></description><pubDate>Mon, 13 Apr 2026 01:46:30 +0000</pubDate><link>https://news.ycombinator.com/item?id=47746596</link><dc:creator>ehtbanton</dc:creator><comments>https://news.ycombinator.com/item?id=47746596</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47746596</guid></item><item><title><![CDATA[New comment by ehtbanton in "Doom, Played over Curl"]]></title><description><![CDATA[
<p>This is very impressive, have tried it out.<p>If only everyone was as good at making performant terminal applications (cough cough Anthropic)</p>
]]></description><pubDate>Mon, 13 Apr 2026 01:22:37 +0000</pubDate><link>https://news.ycombinator.com/item?id=47746434</link><dc:creator>ehtbanton</dc:creator><comments>https://news.ycombinator.com/item?id=47746434</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47746434</guid></item><item><title><![CDATA[New comment by ehtbanton in "Surelock: Deadlock-Free Mutexes for Rust"]]></title><description><![CDATA[
<p>I've had this thought myself too. Going off on a slight tangent: I think there's also loads of useful stuff in domains like either of these which maps amazingly well to AI agent system design, but there's such a huge discrepancy between the knowledge bases of the fields that no benefit ever really surfaces.<p>(Speaking from the perspective of someone who simultaneously loves high-performance compute and agentic AI haha)</p>
]]></description><pubDate>Sun, 12 Apr 2026 23:59:47 +0000</pubDate><link>https://news.ycombinator.com/item?id=47745887</link><dc:creator>ehtbanton</dc:creator><comments>https://news.ycombinator.com/item?id=47745887</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47745887</guid></item><item><title><![CDATA[New comment by ehtbanton in "Exploiting the most prominent AI agent benchmarks"]]></title><description><![CDATA[
<p>I will always maintain that the best benchmark is just trying it out for yourself.
The most practical parallel for me is all the people posting about how some open-source model has "achieved X on Y benchmark - beating out Opus 4.6!"
It's all show and everyone cheats.</p>
]]></description><pubDate>Sun, 12 Apr 2026 23:52:08 +0000</pubDate><link>https://news.ycombinator.com/item?id=47745813</link><dc:creator>ehtbanton</dc:creator><comments>https://news.ycombinator.com/item?id=47745813</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47745813</guid></item><item><title><![CDATA[New comment by ehtbanton in "Small models also found the vulnerabilities that Mythos found"]]></title><description><![CDATA[
<p>Wake me up when Anthropic does something right again...</p>
]]></description><pubDate>Sun, 12 Apr 2026 23:46:44 +0000</pubDate><link>https://news.ycombinator.com/item?id=47745760</link><dc:creator>ehtbanton</dc:creator><comments>https://news.ycombinator.com/item?id=47745760</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47745760</guid></item></channel></rss>