Hacker News: t55

RL Speedrun

t55 — Fri, 19 Jun 2026 15:05:59 +0000

Article URL: https://github.com/JeanKaddour/sokoban_speedrun/

Comments URL: https://news.ycombinator.com/item?id=48599467

Points: 2

# Comments: 0

Target Policy Optimization

t55 — Thu, 16 Apr 2026 12:18:30 +0000

Article URL: https://arxiv.org/abs/2604.06159

Comments URL: https://news.ycombinator.com/item?id=47791922

Points: 1

# Comments: 0

Show HN: Kilroy – Knowledge base for teams using Claude Code

t55 — Thu, 16 Apr 2026 11:32:43 +0000

Hey HN — we’re a small team that uses Claude Code + Codex for basically everything in our company: coding, data analysis, marketing, ad campaigns, copywriting, design.

There’s a truckload of tribal knowledge we’ve accumulated; major decisions, gotchas, user feedback driven changes. Providing this to our agents manually every time is very mundane.

We built Kilroy to solve this in a simple way: we let our agents leave notes for each other. This allowed us to keep the form factor minimal: markdown posts with linear comments. Under the hood it’s Postgres + an auth (better-auth) + an MCP + a small web UI (React). We ship Claude Code and Codex plugins that bundle the MCP + a skill.md that teaches the model when to read and write posts.

We designed Kilroy to be autonomous. The same way agents today run a typechecker after a patch autonomously. The combination we found to work best for us was: make agents write prolifically, expose a search interface designed for agents to quickly decide if a post is relevant, and expose a binary switch to purge stale posts.

Would love to get feedback!

Comments URL: https://news.ycombinator.com/item?id=47791559

Points: 5

# Comments: 0

Procedural Reasoning Datasets

t55 — Tue, 05 Aug 2025 13:59:09 +0000

Article URL: https://github.com/open-thought/reasoning-gym

Comments URL: https://news.ycombinator.com/item?id=44798064

Points: 1

# Comments: 0

In Defence of Gary Marcus

t55 — Sat, 26 Jul 2025 10:43:15 +0000

Article URL: https://reubenadams.substack.com/p/in-defence-of-mary-c-sugar

Comments URL: https://news.ycombinator.com/item?id=44693018

Points: 3

# Comments: 0

Reasoning Gym – Procedural RL reasoning datasets

t55 — Thu, 24 Jul 2025 22:11:09 +0000

Article URL: https://github.com/open-thought/reasoning-gym

Comments URL: https://news.ycombinator.com/item?id=44676936

Points: 1

# Comments: 0

ChatGPT Agent [video]

t55 — Thu, 17 Jul 2025 17:02:35 +0000

Article URL: https://www.youtube.com/watch?v=1jn_RpbPbEc

Comments URL: https://news.ycombinator.com/item?id=44595499

Points: 3

# Comments: 0

New comment by t55 in "Context Rot: How increasing input tokens impacts LLM performance"

t55 — Tue, 15 Jul 2025 11:20:39 +0000

that's a standard feature in cursor, windsurf, etc.

New comment by t55 in "There are no new ideas in AI, only new datasets"

t55 — Mon, 30 Jun 2025 19:23:44 +0000

this is what deepmind did 10 years ago lol

New comment by t55 in "ReasoningGym: Reasoning Environments for RL with Verifiable Rewards"

t55 — Mon, 02 Jun 2025 14:59:55 +0000

For a 100k token context window; all those models are comparable though

gemini 2.5 pro shines for 200k+ tokens

New comment by t55 in "ReasoningGym: Reasoning Environments for RL with Verifiable Rewards"

t55 — Mon, 02 Jun 2025 14:08:40 +0000

i didn't say they invented everything; in science you always stand on the shoulders of giants

i still think my original statement is fair

New comment by t55 in "ReasoningGym: Reasoning Environments for RL with Verifiable Rewards"

t55 — Mon, 02 Jun 2025 14:07:36 +0000

yeah, RLVR is still nascent and hence there's lots of noise.

> How can these spurious rewards possibly work? Can we get similar gains on other models with broken rewards?

it's because in those cases, RLVR merely elicits the reasoning strategies already contained in the model through pre-training

this paper, which uses Reasoning gym, shows that you need to train for way longer than those papers you mentioned to actually uncover novel reasoning strategies: https://arxiv.org/abs/2505.24864

New comment by t55 in "ReasoningGym: Reasoning Environments for RL with Verifiable Rewards"

t55 — Mon, 02 Jun 2025 14:03:22 +0000

so you think it's fake news? another example of a paper with strong claims without much evidence?

New comment by t55 in "ReasoningGym: Reasoning Environments for RL with Verifiable Rewards"

t55 — Mon, 02 Jun 2025 14:02:50 +0000

agree, the RG evals feel like a fresh breeze

New comment by t55 in "ReasoningGym: Reasoning Environments for RL with Verifiable Rewards"

t55 — Mon, 02 Jun 2025 11:12:01 +0000

> prolonged RL training can uncover novel reasoning strategies that are inaccessible to base models, even under extensive sampling

does this mean that previous RL papers claiming the opposite were possibly bottlenecked by small datasets?

New comment by t55 in "ReasoningGym: Reasoning Environments for RL with Verifiable Rewards"

t55 — Mon, 02 Jun 2025 10:37:13 +0000

> I personally think that Gemini 2.5 Pro's superiority comes from having hundreds or thousands RL tasks (without any proof whatsoever, so rather a feeling).

Given that GDM pioneered RL, that's a reasonable assumption