<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: milkkarten</title><link>https://news.ycombinator.com/user?id=milkkarten</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Mon, 15 Jun 2026 07:48:16 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=milkkarten" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by milkkarten in "Claude Fable 5"]]></title><description><![CDATA[
<p>no reasoning shown. no explanation on any training information. Using vision-only should be an easier version of the task (given training).<p>there are many standardized evals to do this correctly and Anthropic ignored them to provide a 18 second sped up video of a 50 hour run?<p>yeah I don't trust this until they provide a live run by a 3rd party with full reasoning traces in real-time. The reason we all liked the Gemini Plays Pokemon style runs were because they were live and couldn't be faked</p>
]]></description><pubDate>Tue, 09 Jun 2026 18:39:06 +0000</pubDate><link>https://news.ycombinator.com/item?id=48465568</link><dc:creator>milkkarten</dc:creator><comments>https://news.ycombinator.com/item?id=48465568</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48465568</guid></item><item><title><![CDATA[New comment by milkkarten in "Agent Bazaar: Enabling Economic Alignment in Multi-Agent Marketplaces"]]></title><description><![CDATA[
<p>Author here. LLM agents are getting good enough to run individual businesses. What happens when everyone's business is run by agents? Turns out, without targeted training for economic alignment, markets collapse. We study concrete failure modes in B2C ("The Crash": firms undercut each other below unit cost in a flash-crash-style spiral) and C2C ("The Lemon Market": a single agent runs many seller identities to flood the market with fraudulent listings).<p>We were surprised that no model was able to successfully solve both tasks, and frontier models can be just as bad as open source models in these scenarios. The good news: this is easily addressable by adding varied marketplace decision-making to the finetuning set.</p>
]]></description><pubDate>Tue, 19 May 2026 16:03:39 +0000</pubDate><link>https://news.ycombinator.com/item?id=48195157</link><dc:creator>milkkarten</dc:creator><comments>https://news.ycombinator.com/item?id=48195157</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48195157</guid></item><item><title><![CDATA[Agent Bazaar: Enabling Economic Alignment in Multi-Agent Marketplaces]]></title><description><![CDATA[
<p>Article URL: <a href="https://arxiv.org/abs/2605.17698">https://arxiv.org/abs/2605.17698</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=48195043">https://news.ycombinator.com/item?id=48195043</a></p>
<p>Points: 3</p>
<p># Comments: 1</p>
]]></description><pubDate>Tue, 19 May 2026 15:55:39 +0000</pubDate><link>https://arxiv.org/abs/2605.17698</link><dc:creator>milkkarten</dc:creator><comments>https://news.ycombinator.com/item?id=48195043</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48195043</guid></item><item><title><![CDATA[New comment by milkkarten in "Continual Harness: Online Adaptation for Self-Improving Foundation Agents"]]></title><description><![CDATA[
<p>Author here. TL;DR:<p>Long-horizon embodied agency is a harness problem, not a model-scale problem. Coding agents like Claude Code work because of scaffolding (prompt, skills, memory, sub-agents) around the model. Embodied agents haven't had an equivalent.<p>Gemini Plays Pokémon (GPP) became the first AI to complete Pokémon Blue, Yellow Legacy on hard mode, and Crystal without a lost battle via iterative harness refinement. Early on a human edited the harness. By Crystal the model was doing it itself by naming its own strategies, writing truth tables for puzzles, wrapping loopholes into reusable primitives.<p>Continual Harness automates this fully. Starting from a raw interface with no curated knowledge, every F steps a Refiner reads the recent trajectory and applies edits to the prompt, sub-agents, skills, and memory -- no resets. It closes most of the gap to a hand-engineered expert harness from scratch.<p>Our key findings: 
(1) Iterative harness refinement closes most of the gap to a hand-engineered version. 
(2) Long-horizon agency requires self-refinement, and self-refinement requires a useful model. 
(3) The future of agents is model-harness co-learning.<p>Demos: <a href="https://sethkarten.ai/continual-harness" rel="nofollow">https://sethkarten.ai/continual-harness</a></p>
]]></description><pubDate>Wed, 13 May 2026 19:11:10 +0000</pubDate><link>https://news.ycombinator.com/item?id=48126113</link><dc:creator>milkkarten</dc:creator><comments>https://news.ycombinator.com/item?id=48126113</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48126113</guid></item><item><title><![CDATA[Continual Harness: Online Adaptation for Self-Improving Foundation Agents]]></title><description><![CDATA[
<p>Article URL: <a href="https://arxiv.org/abs/2605.09998">https://arxiv.org/abs/2605.09998</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=48126112">https://news.ycombinator.com/item?id=48126112</a></p>
<p>Points: 8</p>
<p># Comments: 1</p>
]]></description><pubDate>Wed, 13 May 2026 19:11:10 +0000</pubDate><link>https://arxiv.org/abs/2605.09998</link><dc:creator>milkkarten</dc:creator><comments>https://news.ycombinator.com/item?id=48126112</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48126112</guid></item><item><title><![CDATA[We Ran the Largest AI Pokemon Tournament Ever. Now It's an Open Benchmark]]></title><description><![CDATA[
<p>Article URL: <a href="https://arxiv.org/abs/2603.15563">https://arxiv.org/abs/2603.15563</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47415416">https://news.ycombinator.com/item?id=47415416</a></p>
<p>Points: 1</p>
<p># Comments: 0</p>
]]></description><pubDate>Tue, 17 Mar 2026 17:04:56 +0000</pubDate><link>https://arxiv.org/abs/2603.15563</link><dc:creator>milkkarten</dc:creator><comments>https://news.ycombinator.com/item?id=47415416</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47415416</guid></item><item><title><![CDATA[We Automated RL Environment Engineering for $10]]></title><description><![CDATA[
<p>Article URL: <a href="https://arxiv.org/abs/2603.12145">https://arxiv.org/abs/2603.12145</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=47359954">https://news.ycombinator.com/item?id=47359954</a></p>
<p>Points: 2</p>
<p># Comments: 0</p>
]]></description><pubDate>Fri, 13 Mar 2026 02:19:10 +0000</pubDate><link>https://arxiv.org/abs/2603.12145</link><dc:creator>milkkarten</dc:creator><comments>https://news.ycombinator.com/item?id=47359954</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=47359954</guid></item><item><title><![CDATA[Artificial Pokemon Intelligence in the PokeAgent Challenge]]></title><description><![CDATA[
<p>Article URL: <a href="https://pokeagent.github.io/">https://pokeagent.github.io/</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=45601907">https://news.ycombinator.com/item?id=45601907</a></p>
<p>Points: 3</p>
<p># Comments: 0</p>
]]></description><pubDate>Thu, 16 Oct 2025 05:59:54 +0000</pubDate><link>https://pokeagent.github.io/</link><dc:creator>milkkarten</dc:creator><comments>https://news.ycombinator.com/item?id=45601907</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45601907</guid></item><item><title><![CDATA[New comment by milkkarten in "LLM Economist – Mechanism Design for Simulated Agent Societies"]]></title><description><![CDATA[
<p>We ran each method in under 24 hours on a singular H100. I understand your point and think we will include this in future iterations of our work since this is very interesting from the user perspective. Though, in the paper we focus more on algorithmic concerns.</p>
]]></description><pubDate>Wed, 06 Aug 2025 03:41:37 +0000</pubDate><link>https://news.ycombinator.com/item?id=44807362</link><dc:creator>milkkarten</dc:creator><comments>https://news.ycombinator.com/item?id=44807362</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44807362</guid></item><item><title><![CDATA[New comment by milkkarten in "LLM Economist – Mechanism Design for Simulated Agent Societies"]]></title><description><![CDATA[
<p>Using smaller, cheaper agents is one of the goals of the work. There is a Pareto frontier though: by using smaller, faster, cheaper agents, the number of steps required to converge increases. We touch upon this briefly in the paper</p>
]]></description><pubDate>Mon, 04 Aug 2025 23:28:33 +0000</pubDate><link>https://news.ycombinator.com/item?id=44792530</link><dc:creator>milkkarten</dc:creator><comments>https://news.ycombinator.com/item?id=44792530</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44792530</guid></item><item><title><![CDATA[New comment by milkkarten in "LLM Economist – Mechanism Design for Simulated Agent Societies"]]></title><description><![CDATA[
<p>These are the marginal tax rates not the effective tax rate (e.g. 80% of first $10k, 30% of $10k-20k). We do not model tax credits here. We try to keep the system as simple as possible so that we can effectively evaluate changes. As is, the Economic theory is intractable once we introduce bounded rationality from purely rational. We do think in future work we can potentially work out some smoothness in the overall tax rate but try to let the LLM planner try what it thinks is best in order to help test the in-context optimization capabilities.<p>Also, while there is a complicated tax code in the US, in our simulation there is no way for agents to avoid paying taxes :)<p>The Saez tax rates are perturbed from the LLM Economist's tax rates to find the theoretically optimal values according to the economic theory.<p>Thanks for the interest and I hope that this helps clarify some of the details.</p>
]]></description><pubDate>Mon, 04 Aug 2025 02:18:41 +0000</pubDate><link>https://news.ycombinator.com/item?id=44781553</link><dc:creator>milkkarten</dc:creator><comments>https://news.ycombinator.com/item?id=44781553</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44781553</guid></item><item><title><![CDATA[New comment by milkkarten in "LLM Economist – Mechanism Design for Simulated Agent Societies"]]></title><description><![CDATA[
<p>We simulate large-scale agent societies where heterogeneous personas work, adapt, and vote—governed by an in-context planner optimizing social welfare.<p>The system models decentralized governance, dynamic tax policy, and institutional evolution—entirely via in-context reinforcement learning, no fine-tuning required.<p>Full paper (arXiv):
<a href="https://arxiv.org/abs/2507.15815" rel="nofollow">https://arxiv.org/abs/2507.15815</a></p>
]]></description><pubDate>Mon, 04 Aug 2025 00:02:07 +0000</pubDate><link>https://news.ycombinator.com/item?id=44780919</link><dc:creator>milkkarten</dc:creator><comments>https://news.ycombinator.com/item?id=44780919</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44780919</guid></item><item><title><![CDATA[LLM Economist – Mechanism Design for Simulated Agent Societies]]></title><description><![CDATA[
<p>Article URL: <a href="https://github.com/sethkarten/LLM-Economist">https://github.com/sethkarten/LLM-Economist</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=44780918">https://news.ycombinator.com/item?id=44780918</a></p>
<p>Points: 2</p>
<p># Comments: 9</p>
]]></description><pubDate>Mon, 04 Aug 2025 00:02:07 +0000</pubDate><link>https://github.com/sethkarten/LLM-Economist</link><dc:creator>milkkarten</dc:creator><comments>https://news.ycombinator.com/item?id=44780918</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44780918</guid></item></channel></rss>