<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: submarius</title><link>https://news.ycombinator.com/user?id=submarius</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Mon, 18 May 2026 09:11:11 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=submarius" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by submarius in "Show HN: A new benchmark for testing LLMs for deterministic outputs"]]></title><description><![CDATA[
<p>Cool work — quick question: how should readers think about the fact that Interfaze-Beta is on the leaderboard you built? Not saying anything's wrong with the methodology, just curious how you'd recommend a third party verify the ranking is neutral to the choices you made (datasets, difficulty weights, reasoning-off default, etc.).</p>
]]></description><pubDate>Mon, 04 May 2026 16:00:01 +0000</pubDate><link>https://news.ycombinator.com/item?id=48010387</link><dc:creator>submarius</dc:creator><comments>https://news.ycombinator.com/item?id=48010387</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48010387</guid></item></channel></rss>