<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: fmaccomber</title><link>https://news.ycombinator.com/user?id=fmaccomber</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Tue, 09 Jun 2026 00:12:57 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=fmaccomber" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by fmaccomber in "Benchmark accuracy retention is the wrong metric"]]></title><description><![CDATA[
<p>Whether model routing works is an empirical problem. Existing empirical efforts rely on benchmark accuracy retention, i.e. how does a model routing system score compared to a sophisticated model like Opus 4.7 on a complex task benchmark like Terminal-Bench 2.0.<p>However, that metric is completely divorced from what we care about. The better metric is utility retention, which takes into account task importance.</p>
]]></description><pubDate>Mon, 08 Jun 2026 15:57:05 +0000</pubDate><link>https://news.ycombinator.com/item?id=48447033</link><dc:creator>fmaccomber</dc:creator><comments>https://news.ycombinator.com/item?id=48447033</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48447033</guid></item><item><title><![CDATA[Benchmark accuracy retention is the wrong metric]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.joshuahedtke.com/writing/benchmark-retention-is-not-utility-retention">https://www.joshuahedtke.com/writing/benchmark-retention-is-not-utility-retention</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=48447032">https://news.ycombinator.com/item?id=48447032</a></p>
<p>Points: 1</p>
<p># Comments: 1</p>
]]></description><pubDate>Mon, 08 Jun 2026 15:57:05 +0000</pubDate><link>https://www.joshuahedtke.com/writing/benchmark-retention-is-not-utility-retention</link><dc:creator>fmaccomber</dc:creator><comments>https://news.ycombinator.com/item?id=48447032</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48447032</guid></item></channel></rss>