<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: mrifaki</title><link>https://news.ycombinator.com/user?id=mrifaki</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Tue, 14 Apr 2026 11:58:43 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=mrifaki" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by mrifaki in "Exploiting the most prominent AI agent benchmarks"]]></title><description><![CDATA[
<p>this is atctually he reward hacking problem from RL showing up in evaluation infra which is not surprising but worth naming clearly, an interesting question raised here is whether agents start doing this on their own and from an RL perspective the answer is they will inevitably once benchmark performance feeds back into training signal in any form, RL finds the path of least resistance to maximize reward and if hacking the test harness is easier than solving the problem that is where gradient descent takes us, the fix is the same one the RL community has been working on for years which is to make the verifier harder to game than the task is to solve, this paper shows that right now for most of these benchmarks the opposite is true</p>
]]></description><pubDate>Mon, 13 Apr 2026 03:37:09 +0000</pubDate><link>https://news.ycombinator.com/item?id=47747285</link><dc:creator>mrifaki</dc:creator><comments>https://news.ycombinator.com/item?id=47747285</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47747285</guid></item><item><title><![CDATA[New comment by mrifaki in "Small models also found the vulnerabilities that Mythos found"]]></title><description><![CDATA[
<p>finding vulns in a large codebase is a search problem with a huge
negative space and what aisle measured is classification accuracy on
ground-truth positives, those are different tasks so a model that
correctly labels a pre-isolated vulnerable function tells me almost
nothing about that model's ability to surface the same function out of
a million lines of unrelated code under a realistic triage budget<p>the experiment i'd want to see is running each of the small models as an
unsupervised scanner across full freebsd then return the top-k suspicious
functions per model and compute precision at recall levels that
correspond to real analyst triage budgets, if mythos s findings show up
in the small models top 100, i'd call that meaningful but if they only
surface under 10k false positives then the cost advantage
collapses because analyst triage time is more expensive than frontier
model compute to begin with<p>second thing i keep coming back to is the $20k mythos number is a search
budget not a model cost, small models at one hundredth the per-token
price don't give us one hundredth the total budget when the search
process is the same shape, i still run thousands of iterations and the
issue for autonomous vuln research is how fast the reward signal
converges and the aisle post doesn't touch any of this</p>
]]></description><pubDate>Sat, 11 Apr 2026 17:52:13 +0000</pubDate><link>https://news.ycombinator.com/item?id=47732568</link><dc:creator>mrifaki</dc:creator><comments>https://news.ycombinator.com/item?id=47732568</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47732568</guid></item></channel></rss>