<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: t55</title><link>https://news.ycombinator.com/user?id=t55</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Sun, 21 Jun 2026 11:06:22 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=t55" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[RL Speedrun]]></title><description><![CDATA[
<p>Article URL: <a href="https://github.com/JeanKaddour/sokoban_speedrun/">https://github.com/JeanKaddour/sokoban_speedrun/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=48599467">https://news.ycombinator.com/item?id=48599467</a></p>
<p>Points: 2</p>
<p># Comments: 0</p>
]]></description><pubDate>Fri, 19 Jun 2026 15:05:59 +0000</pubDate><link>https://github.com/JeanKaddour/sokoban_speedrun/</link><dc:creator>t55</dc:creator><comments>https://news.ycombinator.com/item?id=48599467</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48599467</guid></item><item><title><![CDATA[Target Policy Optimization]]></title><description><![CDATA[
<p>Article URL: <a href="https://arxiv.org/abs/2604.06159">https://arxiv.org/abs/2604.06159</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47791922">https://news.ycombinator.com/item?id=47791922</a></p>
<p>Points: 1</p>
<p># Comments: 0</p>
]]></description><pubDate>Thu, 16 Apr 2026 12:18:30 +0000</pubDate><link>https://arxiv.org/abs/2604.06159</link><dc:creator>t55</dc:creator><comments>https://news.ycombinator.com/item?id=47791922</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47791922</guid></item><item><title><![CDATA[Show HN: Kilroy – Knowledge base for teams using Claude Code]]></title><description><![CDATA[
<p>Hey HN — we’re a small team that uses Claude Code + Codex for basically everything in our company: coding, data analysis, marketing, ad campaigns, copywriting, design.<p>There’s a truckload of tribal knowledge we’ve accumulated; major decisions, gotchas, user feedback driven changes. Providing this to our agents manually every time is very mundane.<p>We built Kilroy to solve this in a simple way: we let our agents leave notes for each other. This allowed us to keep the form factor minimal: markdown posts with linear comments. Under the hood it’s Postgres + an auth (better-auth) + an MCP + a small web UI (React). We ship Claude Code and Codex plugins that bundle the MCP + a skill.md that teaches the model when to read and write posts.<p>We designed Kilroy to be autonomous. The same way agents today run a typechecker after a patch autonomously. The combination we found to work best for us was: make agents write prolifically, expose a search interface designed for agents to quickly decide if a post is relevant, and expose a binary switch to purge stale posts.<p>Would love to get feedback!</p>
<hr>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47791559">https://news.ycombinator.com/item?id=47791559</a></p>
<p>Points: 5</p>
<p># Comments: 0</p>
]]></description><pubDate>Thu, 16 Apr 2026 11:32:43 +0000</pubDate><link>https://github.com/kilroy-sh/kilroy/</link><dc:creator>t55</dc:creator><comments>https://news.ycombinator.com/item?id=47791559</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47791559</guid></item><item><title><![CDATA[Procedural Reasoning Datasets]]></title><description><![CDATA[
<p>Article URL: <a href="https://github.com/open-thought/reasoning-gym">https://github.com/open-thought/reasoning-gym</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=44798064">https://news.ycombinator.com/item?id=44798064</a></p>
<p>Points: 1</p>
<p># Comments: 0</p>
]]></description><pubDate>Tue, 05 Aug 2025 13:59:09 +0000</pubDate><link>https://github.com/open-thought/reasoning-gym</link><dc:creator>t55</dc:creator><comments>https://news.ycombinator.com/item?id=44798064</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44798064</guid></item><item><title><![CDATA[In Defence of Gary Marcus]]></title><description><![CDATA[
<p>Article URL: <a href="https://reubenadams.substack.com/p/in-defence-of-mary-c-sugar">https://reubenadams.substack.com/p/in-defence-of-mary-c-sugar</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=44693018">https://news.ycombinator.com/item?id=44693018</a></p>
<p>Points: 3</p>
<p># Comments: 0</p>
]]></description><pubDate>Sat, 26 Jul 2025 10:43:15 +0000</pubDate><link>https://reubenadams.substack.com/p/in-defence-of-mary-c-sugar</link><dc:creator>t55</dc:creator><comments>https://news.ycombinator.com/item?id=44693018</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44693018</guid></item><item><title><![CDATA[Reasoning Gym – Procedural RL reasoning datasets]]></title><description><![CDATA[
<p>Article URL: <a href="https://github.com/open-thought/reasoning-gym">https://github.com/open-thought/reasoning-gym</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=44676936">https://news.ycombinator.com/item?id=44676936</a></p>
<p>Points: 1</p>
<p># Comments: 0</p>
]]></description><pubDate>Thu, 24 Jul 2025 22:11:09 +0000</pubDate><link>https://github.com/open-thought/reasoning-gym</link><dc:creator>t55</dc:creator><comments>https://news.ycombinator.com/item?id=44676936</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44676936</guid></item><item><title><![CDATA[ChatGPT Agent [video]]]></title><description><![CDATA[
<p>Article URL: <a href="https://www.youtube.com/watch?v=1jn_RpbPbEc">https://www.youtube.com/watch?v=1jn_RpbPbEc</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=44595499">https://news.ycombinator.com/item?id=44595499</a></p>
<p>Points: 3</p>
<p># Comments: 0</p>
]]></description><pubDate>Thu, 17 Jul 2025 17:02:35 +0000</pubDate><link>https://www.youtube.com/watch?v=1jn_RpbPbEc</link><dc:creator>t55</dc:creator><comments>https://news.ycombinator.com/item?id=44595499</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44595499</guid></item><item><title><![CDATA[New comment by t55 in "Context Rot: How increasing input tokens impacts LLM performance"]]></title><description><![CDATA[
<p>that's a standard feature in cursor, windsurf, etc.</p>
]]></description><pubDate>Tue, 15 Jul 2025 11:20:39 +0000</pubDate><link>https://news.ycombinator.com/item?id=44569991</link><dc:creator>t55</dc:creator><comments>https://news.ycombinator.com/item?id=44569991</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44569991</guid></item><item><title><![CDATA[New comment by t55 in "There are no new ideas in AI, only new datasets"]]></title><description><![CDATA[
<p>this is what deepmind did 10 years ago lol</p>
]]></description><pubDate>Mon, 30 Jun 2025 19:23:44 +0000</pubDate><link>https://news.ycombinator.com/item?id=44426906</link><dc:creator>t55</dc:creator><comments>https://news.ycombinator.com/item?id=44426906</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44426906</guid></item><item><title><![CDATA[New comment by t55 in "ReasoningGym: Reasoning Environments for RL with Verifiable Rewards"]]></title><description><![CDATA[
<p>For a 100k token context window; all those models are comparable though<p>gemini 2.5 pro shines for 200k+ tokens</p>
]]></description><pubDate>Mon, 02 Jun 2025 14:59:55 +0000</pubDate><link>https://news.ycombinator.com/item?id=44159506</link><dc:creator>t55</dc:creator><comments>https://news.ycombinator.com/item?id=44159506</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44159506</guid></item><item><title><![CDATA[New comment by t55 in "ReasoningGym: Reasoning Environments for RL with Verifiable Rewards"]]></title><description><![CDATA[
<p>i didn't say they invented everything; in science you always stand on the shoulders of giants<p>i still think my original statement is fair</p>
]]></description><pubDate>Mon, 02 Jun 2025 14:08:40 +0000</pubDate><link>https://news.ycombinator.com/item?id=44159025</link><dc:creator>t55</dc:creator><comments>https://news.ycombinator.com/item?id=44159025</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44159025</guid></item><item><title><![CDATA[New comment by t55 in "ReasoningGym: Reasoning Environments for RL with Verifiable Rewards"]]></title><description><![CDATA[
<p>yeah, RLVR is still nascent and hence there's lots of noise.<p>> How can these spurious rewards possibly work? Can we get similar gains on other models with broken rewards?<p>it's because in those cases, RLVR merely elicits the reasoning strategies already contained in the model through pre-training<p>this paper, which uses Reasoning gym, shows that you need to train for way longer than those papers you mentioned to actually uncover novel reasoning strategies: <a href="https://arxiv.org/abs/2505.24864" rel="nofollow">https://arxiv.org/abs/2505.24864</a></p>
]]></description><pubDate>Mon, 02 Jun 2025 14:07:36 +0000</pubDate><link>https://news.ycombinator.com/item?id=44159015</link><dc:creator>t55</dc:creator><comments>https://news.ycombinator.com/item?id=44159015</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44159015</guid></item><item><title><![CDATA[New comment by t55 in "ReasoningGym: Reasoning Environments for RL with Verifiable Rewards"]]></title><description><![CDATA[
<p>so you think it's fake news? another example of a paper with strong claims without much evidence?</p>
]]></description><pubDate>Mon, 02 Jun 2025 14:03:22 +0000</pubDate><link>https://news.ycombinator.com/item?id=44158980</link><dc:creator>t55</dc:creator><comments>https://news.ycombinator.com/item?id=44158980</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44158980</guid></item><item><title><![CDATA[New comment by t55 in "ReasoningGym: Reasoning Environments for RL with Verifiable Rewards"]]></title><description><![CDATA[
<p>agree, the RG evals feel like a fresh breeze</p>
]]></description><pubDate>Mon, 02 Jun 2025 14:02:50 +0000</pubDate><link>https://news.ycombinator.com/item?id=44158976</link><dc:creator>t55</dc:creator><comments>https://news.ycombinator.com/item?id=44158976</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44158976</guid></item><item><title><![CDATA[New comment by t55 in "ReasoningGym: Reasoning Environments for RL with Verifiable Rewards"]]></title><description><![CDATA[
<p>> prolonged RL training can uncover novel reasoning strategies that are inaccessible to base models, even under extensive sampling<p>does this mean that previous RL papers claiming the opposite were possibly bottlenecked by small datasets?</p>
]]></description><pubDate>Mon, 02 Jun 2025 11:12:01 +0000</pubDate><link>https://news.ycombinator.com/item?id=44157594</link><dc:creator>t55</dc:creator><comments>https://news.ycombinator.com/item?id=44157594</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44157594</guid></item><item><title><![CDATA[New comment by t55 in "ReasoningGym: Reasoning Environments for RL with Verifiable Rewards"]]></title><description><![CDATA[
<p>> I personally think that Gemini 2.5 Pro's superiority comes from having hundreds or thousands RL tasks (without any proof whatsoever, so rather a feeling).<p>Given that GDM pioneered RL, that's a reasonable assumption</p>
]]></description><pubDate>Mon, 02 Jun 2025 10:37:13 +0000</pubDate><link>https://news.ycombinator.com/item?id=44157436</link><dc:creator>t55</dc:creator><comments>https://news.ycombinator.com/item?id=44157436</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44157436</guid></item><item><title><![CDATA[New comment by t55 in "The Bitter Lesson (2019)"]]></title><description><![CDATA[
<p>it aged well!</p>
]]></description><pubDate>Mon, 02 Jun 2025 09:28:40 +0000</pubDate><link>https://news.ycombinator.com/item?id=44157083</link><dc:creator>t55</dc:creator><comments>https://news.ycombinator.com/item?id=44157083</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44157083</guid></item><item><title><![CDATA[ReasoningGym: Reasoning Environments for RL with Verifiable Rewards]]></title><description><![CDATA[
<p>Article URL: <a href="https://arxiv.org/abs/2505.24760">https://arxiv.org/abs/2505.24760</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=44157077">https://news.ycombinator.com/item?id=44157077</a></p>
<p>Points: 105</p>
<p># Comments: 28</p>
]]></description><pubDate>Mon, 02 Jun 2025 09:27:26 +0000</pubDate><link>https://arxiv.org/abs/2505.24760</link><dc:creator>t55</dc:creator><comments>https://news.ycombinator.com/item?id=44157077</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44157077</guid></item><item><title><![CDATA[Show HN: Rehearsal.so, Duolingo for Public Speaking]]></title><description><![CDATA[
<p>Article URL: <a href="https://rehearsal.so/">https://rehearsal.so/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=44072331">https://news.ycombinator.com/item?id=44072331</a></p>
<p>Points: 3</p>
<p># Comments: 1</p>
]]></description><pubDate>Fri, 23 May 2025 12:42:54 +0000</pubDate><link>https://rehearsal.so/</link><dc:creator>t55</dc:creator><comments>https://news.ycombinator.com/item?id=44072331</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44072331</guid></item><item><title><![CDATA[New comment by t55 in "Show HN: Samurai Interview – a mock interview simulator"]]></title><description><![CDATA[
<p>Very cool, reminds me of <a href="https://rehearsal.so/">https://rehearsal.so/</a><p>which sort of interviews will you support?</p>
]]></description><pubDate>Fri, 16 May 2025 17:05:19 +0000</pubDate><link>https://news.ycombinator.com/item?id=44007689</link><dc:creator>t55</dc:creator><comments>https://news.ycombinator.com/item?id=44007689</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44007689</guid></item></channel></rss>