Hacker News: milkkarten

New comment by milkkarten in "Claude Fable 5"

milkkarten — Tue, 09 Jun 2026 18:39:06 +0000

no reasoning shown. no explanation on any training information. Using vision-only should be an easier version of the task (given training).

there are many standardized evals to do this correctly and Anthropic ignored them to provide a 18 second sped up video of a 50 hour run?

yeah I don't trust this until they provide a live run by a 3rd party with full reasoning traces in real-time. The reason we all liked the Gemini Plays Pokemon style runs were because they were live and couldn't be faked

New comment by milkkarten in "Agent Bazaar: Enabling Economic Alignment in Multi-Agent Marketplaces"

milkkarten — Tue, 19 May 2026 16:03:39 +0000

Author here. LLM agents are getting good enough to run individual businesses. What happens when everyone's business is run by agents? Turns out, without targeted training for economic alignment, markets collapse. We study concrete failure modes in B2C ("The Crash": firms undercut each other below unit cost in a flash-crash-style spiral) and C2C ("The Lemon Market": a single agent runs many seller identities to flood the market with fraudulent listings).

We were surprised that no model was able to successfully solve both tasks, and frontier models can be just as bad as open source models in these scenarios. The good news: this is easily addressable by adding varied marketplace decision-making to the finetuning set.

Agent Bazaar: Enabling Economic Alignment in Multi-Agent Marketplaces

milkkarten — Tue, 19 May 2026 15:55:39 +0000

Article URL: https://arxiv.org/abs/2605.17698

Comments URL: https://news.ycombinator.com/item?id=48195043

Points: 3

# Comments: 1

New comment by milkkarten in "Continual Harness: Online Adaptation for Self-Improving Foundation Agents"

milkkarten — Wed, 13 May 2026 19:11:10 +0000

Author here. TL;DR:

Long-horizon embodied agency is a harness problem, not a model-scale problem. Coding agents like Claude Code work because of scaffolding (prompt, skills, memory, sub-agents) around the model. Embodied agents haven't had an equivalent.

Gemini Plays Pokémon (GPP) became the first AI to complete Pokémon Blue, Yellow Legacy on hard mode, and Crystal without a lost battle via iterative harness refinement. Early on a human edited the harness. By Crystal the model was doing it itself by naming its own strategies, writing truth tables for puzzles, wrapping loopholes into reusable primitives.

Continual Harness automates this fully. Starting from a raw interface with no curated knowledge, every F steps a Refiner reads the recent trajectory and applies edits to the prompt, sub-agents, skills, and memory -- no resets. It closes most of the gap to a hand-engineered expert harness from scratch.

Our key findings: (1) Iterative harness refinement closes most of the gap to a hand-engineered version. (2) Long-horizon agency requires self-refinement, and self-refinement requires a useful model. (3) The future of agents is model-harness co-learning.

Demos: https://sethkarten.ai/continual-harness

Continual Harness: Online Adaptation for Self-Improving Foundation Agents

milkkarten — Wed, 13 May 2026 19:11:10 +0000

Article URL: https://arxiv.org/abs/2605.09998

Comments URL: https://news.ycombinator.com/item?id=48126112

Points: 8

# Comments: 1

We Ran the Largest AI Pokemon Tournament Ever. Now It's an Open Benchmark

milkkarten — Tue, 17 Mar 2026 17:04:56 +0000

Article URL: https://arxiv.org/abs/2603.15563

Comments URL: https://news.ycombinator.com/item?id=47415416

Points: 1

# Comments: 0

We Automated RL Environment Engineering for $10

milkkarten — Fri, 13 Mar 2026 02:19:10 +0000

Article URL: https://arxiv.org/abs/2603.12145

Comments URL: https://news.ycombinator.com/item?id=47359954

Points: 2

# Comments: 0

Artificial Pokemon Intelligence in the PokeAgent Challenge

milkkarten — Thu, 16 Oct 2025 05:59:54 +0000

Article URL: https://pokeagent.github.io/

Comments URL: https://news.ycombinator.com/item?id=45601907

Points: 3

# Comments: 0

New comment by milkkarten in "LLM Economist – Mechanism Design for Simulated Agent Societies"

milkkarten — Wed, 06 Aug 2025 03:41:37 +0000

We ran each method in under 24 hours on a singular H100. I understand your point and think we will include this in future iterations of our work since this is very interesting from the user perspective. Though, in the paper we focus more on algorithmic concerns.

New comment by milkkarten in "LLM Economist – Mechanism Design for Simulated Agent Societies"

milkkarten — Mon, 04 Aug 2025 23:28:33 +0000

Using smaller, cheaper agents is one of the goals of the work. There is a Pareto frontier though: by using smaller, faster, cheaper agents, the number of steps required to converge increases. We touch upon this briefly in the paper

New comment by milkkarten in "LLM Economist – Mechanism Design for Simulated Agent Societies"

milkkarten — Mon, 04 Aug 2025 02:18:41 +0000

These are the marginal tax rates not the effective tax rate (e.g. 80% of first $10k, 30% of $10k-20k). We do not model tax credits here. We try to keep the system as simple as possible so that we can effectively evaluate changes. As is, the Economic theory is intractable once we introduce bounded rationality from purely rational. We do think in future work we can potentially work out some smoothness in the overall tax rate but try to let the LLM planner try what it thinks is best in order to help test the in-context optimization capabilities.

Also, while there is a complicated tax code in the US, in our simulation there is no way for agents to avoid paying taxes :)

The Saez tax rates are perturbed from the LLM Economist's tax rates to find the theoretically optimal values according to the economic theory.

Thanks for the interest and I hope that this helps clarify some of the details.

New comment by milkkarten in "LLM Economist – Mechanism Design for Simulated Agent Societies"

milkkarten — Mon, 04 Aug 2025 00:02:07 +0000

We simulate large-scale agent societies where heterogeneous personas work, adapt, and vote—governed by an in-context planner optimizing social welfare.

The system models decentralized governance, dynamic tax policy, and institutional evolution—entirely via in-context reinforcement learning, no fine-tuning required.

Full paper (arXiv): https://arxiv.org/abs/2507.15815

LLM Economist – Mechanism Design for Simulated Agent Societies

milkkarten — Mon, 04 Aug 2025 00:02:07 +0000

Article URL: https://github.com/sethkarten/LLM-Economist

Comments URL: https://news.ycombinator.com/item?id=44780918

Points: 2

# Comments: 9