<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:atom="http://www.w3.org/2005/Atom"><channel><title>Hacker News: noddybear</title><link>https://news.ycombinator.com/user?id=noddybear</link><description>Hacker News RSS</description><docs>https://hnrss.org/</docs><generator>hnrss v2.1.1</generator><lastBuildDate>Wed, 13 May 2026 15:39:19 +0000</lastBuildDate><atom:link href="https://hnrss.org/user?id=noddybear" rel="self" type="application/rss+xml"></atom:link><item><title><![CDATA[New comment by noddybear in "Starship V3"]]></title><description><![CDATA[
<p>An EMP from a high altitude nuclear detonation would do the trick.</p>
]]></description><pubDate>Wed, 13 May 2026 08:58:46 +0000</pubDate><link>https://news.ycombinator.com/item?id=48119451</link><dc:creator>noddybear</dc:creator><comments>https://news.ycombinator.com/item?id=48119451</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=48119451</guid></item><item><title><![CDATA[New comment by noddybear in "Yann LeCun to depart Meta and launch AI startup focused on 'world models'"]]></title><description><![CDATA[
<p>Nah, its all pattern matching. This is how automated theorem provers like Isabelle are built, applying operations to lemmas/expressions to reach proofs.</p>
]]></description><pubDate>Wed, 12 Nov 2025 13:40:41 +0000</pubDate><link>https://news.ycombinator.com/item?id=45900035</link><dc:creator>noddybear</dc:creator><comments>https://news.ycombinator.com/item?id=45900035</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45900035</guid></item><item><title><![CDATA[New comment by noddybear in "Show HN: FLE v0.3 – Claude Code Plays Factorio"]]></title><description><![CDATA[
<p>I am really keen on plugging into Age of Empires 2 - although practically I think we need a couple of years of improvements before LLMs would be smart/fast enough to react to the game in realtime. Currently they can't react fast enough - although specially trained networks could be viable.</p>
]]></description><pubDate>Fri, 03 Oct 2025 23:34:53 +0000</pubDate><link>https://news.ycombinator.com/item?id=45469008</link><dc:creator>noddybear</dc:creator><comments>https://news.ycombinator.com/item?id=45469008</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45469008</guid></item><item><title><![CDATA[New comment by noddybear in "Show HN: FLE v0.3 – Claude Code Plays Factorio"]]></title><description><![CDATA[
<p>Thank you!</p>
]]></description><pubDate>Fri, 03 Oct 2025 23:27:31 +0000</pubDate><link>https://news.ycombinator.com/item?id=45468968</link><dc:creator>noddybear</dc:creator><comments>https://news.ycombinator.com/item?id=45468968</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45468968</guid></item><item><title><![CDATA[New comment by noddybear in "Show HN: FLE v0.3 – Claude Code Plays Factorio"]]></title><description><![CDATA[
<p>Biters are disabled, but cliffs are not</p>
]]></description><pubDate>Fri, 03 Oct 2025 22:30:56 +0000</pubDate><link>https://news.ycombinator.com/item?id=45468598</link><dc:creator>noddybear</dc:creator><comments>https://news.ycombinator.com/item?id=45468598</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45468598</guid></item><item><title><![CDATA[New comment by noddybear in "Show HN: FLE v0.3 – Claude Code Plays Factorio"]]></title><description><![CDATA[
<p>This is our earlier work. Since May we've made it really easy for the community to build their own agents to play the game: you can now hook up your terminal to get Claude Code to play the game.</p>
]]></description><pubDate>Fri, 03 Oct 2025 22:30:42 +0000</pubDate><link>https://news.ycombinator.com/item?id=45468596</link><dc:creator>noddybear</dc:creator><comments>https://news.ycombinator.com/item?id=45468596</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45468596</guid></item><item><title><![CDATA[Show HN: FLE v0.3 – Claude Code Plays Factorio]]></title><description><![CDATA[
<p>We're excited to release v0.3.0 of the Factorio Learning Environment (FLE), an open-source environment for evaluating AI agents on long-horizon planning, spatial reasoning, and automation tasks.<p>== What is FLE? ==<p>FLE uses the game Factorio to test whether AI can handle complex, open-ended engineering challenges. Agents write Python code to build automated factories, progressing from simple resource extraction (~30 units/min) to sophisticated production chains (millions of units/sec).<p>== What's new in 0.3.0 ==<p>- Headless scaling: No longer needs the game client, enabling massive parallelization!<p>- OpenAI Gym compatibility: Standard interface for RL research<p>- Claude Code integration: We're livestreaming Claude playing Factorio [on Twitch](<a href="http://twitch.tv/playsfactorio" rel="nofollow">http://twitch.tv/playsfactorio</a>)<p>- Better tooling and SDK: 1-line CLI commands to run evaluations (with W&B logging)<p>== Key findings ==<p>We evaluated frontier models (Claude Opus 4.1, GPT-5, Gemini 2.5 Pro, Grok 4) on 24 production automation tasks of increasing complexity.<p>Even the best models struggle:<p>- Most models still rely on semi-manual strategies rather than true automation<p>- Agents rarely define helper functions or abstractions, limiting their ability to scale<p>- Error recovery remains difficult – agents often get stuck in repetitive failure loops<p>The performance gap between models on FLE correlates more closely with real-world task benchmarks (like GDPVal) than with traditional coding/reasoning evals.<p>== Why this matters ==<p>Unlike benchmarks based on exams that saturate quickly, Factorio's exponential complexity scaling means there's effectively no performance ceiling. The skills needed - system debugging, constraint satisfaction, logistics optimization - transfer directly to real challenges.<p>== Try it yourself ==<p>>>> uv add factorio-learning-environment<p>>>> uv add "factorio-learning-environment[eval]"<p>>>> fle cluster start<p>>>> fle eval --config configs/gym_run_config.json<p>We're looking for researchers, engineers, and modders interested in pushing the boundaries of agent capabilities. Join our Discord if you want to contribute. We look forward to meeting you and seeing what you can build!<p>-- FLE Team</p>
<hr>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=45466865">https://news.ycombinator.com/item?id=45466865</a></p>
<p>Points: 75</p>
<p># Comments: 17</p>
]]></description><pubDate>Fri, 03 Oct 2025 19:32:37 +0000</pubDate><link>https://jackhopkins.github.io/factorio-learning-environment/versions/0.3.0.html</link><dc:creator>noddybear</dc:creator><comments>https://news.ycombinator.com/item?id=45466865</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45466865</guid></item><item><title><![CDATA[New comment by noddybear in "Why is choral music harder to appreciate?"]]></title><description><![CDATA[
<p>"Spem in Alium" is the most beautiful piece of music in existence for me: <a href="https://www.youtube.com/watch?v=iT-ZAAi4UQQ" rel="nofollow">https://www.youtube.com/watch?v=iT-ZAAi4UQQ</a><p>At 40 distinct melodies, it is certainly the 'grandest' piece in early English church music.</p>
]]></description><pubDate>Mon, 25 Aug 2025 05:13:11 +0000</pubDate><link>https://news.ycombinator.com/item?id=45010436</link><dc:creator>noddybear</dc:creator><comments>https://news.ycombinator.com/item?id=45010436</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=45010436</guid></item><item><title><![CDATA[New comment by noddybear in "The Economist's global rip off"]]></title><description><![CDATA[
<p>The onus is on you to present evidence to justify your claim. Without actual data beyond anecdotes, your claim can and should be dismissed.</p>
]]></description><pubDate>Mon, 19 May 2025 22:18:57 +0000</pubDate><link>https://news.ycombinator.com/item?id=44035553</link><dc:creator>noddybear</dc:creator><comments>https://news.ycombinator.com/item?id=44035553</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=44035553</guid></item><item><title><![CDATA[New comment by noddybear in "Multi-Agent Coordination in Factorio: FLE v0.2.0"]]></title><description><![CDATA[
<p>1. These are additions to our existing Factorio Learning Environment, which is an extensive agent environment for evaluating pre-trained LLM agents in an unbounded/open-ended setting in the game of Factorio. I don't agree that it is trivial, as there is significant infrastructure in place to support Factorio as an LLM eval.<p>2. Factorio is an unsolved game in multi-agent research.<p>3. This is a research environment. You can read our paper on Arxiv if you're interested! Nobody will make any money of this.</p>
]]></description><pubDate>Thu, 08 May 2025 15:36:34 +0000</pubDate><link>https://news.ycombinator.com/item?id=43927170</link><dc:creator>noddybear</dc:creator><comments>https://news.ycombinator.com/item?id=43927170</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43927170</guid></item><item><title><![CDATA[New comment by noddybear in "Multi-Agent Coordination in Factorio: FLE v0.2.0"]]></title><description><![CDATA[
<p>Hey everyone,<p>It's Mart, Neel and Jack from the Factorio Learning Environment team.<p>Since our initial release, we have been working hard to expand the environment to support multi-agent scenarios, reasoning models and MCP for human-in-the-loop evals.<p>We have also spent time experimenting with different ways to elicit more performance out of agents in the game, namely tools for vision and reflection.<p>Today, we are proud to release v0.2.0, which includes several exciting new features and improvements.<p>Thanks for checking this out.</p>
]]></description><pubDate>Thu, 08 May 2025 15:09:29 +0000</pubDate><link>https://news.ycombinator.com/item?id=43926830</link><dc:creator>noddybear</dc:creator><comments>https://news.ycombinator.com/item?id=43926830</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43926830</guid></item><item><title><![CDATA[Multi-Agent Coordination in Factorio: FLE v0.2.0]]></title><description><![CDATA[
<p>Article URL: <a href="https://jackhopkins.github.io/factorio-learning-environment/release.0.2.0">https://jackhopkins.github.io/factorio-learning-environment/release.0.2.0</a></p>
<p>Comments URL: <a href="https://news.ycombinator.com/item?id=43926829">https://news.ycombinator.com/item?id=43926829</a></p>
<p>Points: 13</p>
<p># Comments: 5</p>
]]></description><pubDate>Thu, 08 May 2025 15:09:29 +0000</pubDate><link>https://jackhopkins.github.io/factorio-learning-environment/release.0.2.0</link><dc:creator>noddybear</dc:creator><comments>https://news.ycombinator.com/item?id=43926829</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43926829</guid></item><item><title><![CDATA[New comment by noddybear in "smartfunc: Turn Docstrings into LLM-Functions"]]></title><description><![CDATA[
<p>Cool! Looks a lot like Tanuki: <a href="https://github.com/Tanuki/tanuki.py">https://github.com/Tanuki/tanuki.py</a></p>
]]></description><pubDate>Thu, 10 Apr 2025 16:53:24 +0000</pubDate><link>https://news.ycombinator.com/item?id=43645814</link><dc:creator>noddybear</dc:creator><comments>https://news.ycombinator.com/item?id=43645814</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43645814</guid></item><item><title><![CDATA[New comment by noddybear in "Show HN: Factorio Learning Environment – Agents Build Factories"]]></title><description><![CDATA[
<p>This specific approach relied on: a) availability of multiplayer servers, and b) a remotely accessible console.<p>I know Minecraft works in the same way - but I’m not sure about RPGs like NMS.</p>
]]></description><pubDate>Wed, 12 Mar 2025 13:40:20 +0000</pubDate><link>https://news.ycombinator.com/item?id=43343157</link><dc:creator>noddybear</dc:creator><comments>https://news.ycombinator.com/item?id=43343157</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43343157</guid></item><item><title><![CDATA[New comment by noddybear in "Show HN: Factorio Learning Environment – Agents Build Factories"]]></title><description><![CDATA[
<p>If I understand you correctly, this approach is sort of supported in FLE - the agents can create functions that encapsulate more complex logic. However, interaction is still synchronous/turn-based. I think to do what you propose, you will need to create event listeners that can trigger the agents program whenever appropriate.</p>
]]></description><pubDate>Wed, 12 Mar 2025 13:38:51 +0000</pubDate><link>https://news.ycombinator.com/item?id=43343146</link><dc:creator>noddybear</dc:creator><comments>https://news.ycombinator.com/item?id=43343146</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43343146</guid></item><item><title><![CDATA[New comment by noddybear in "Show HN: Factorio Learning Environment – Agents Build Factories"]]></title><description><![CDATA[
<p>The idea is for us to track all frontier models using the basic agent (goal, tooling info), and then offer another leaderboard for different agent architectures (with retrieval etc).</p>
]]></description><pubDate>Wed, 12 Mar 2025 13:34:18 +0000</pubDate><link>https://news.ycombinator.com/item?id=43343089</link><dc:creator>noddybear</dc:creator><comments>https://news.ycombinator.com/item?id=43343089</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43343089</guid></item><item><title><![CDATA[New comment by noddybear in "Show HN: Factorio Learning Environment – Agents Build Factories"]]></title><description><![CDATA[
<p>Oh super interesting! Create 10 scenarios containing working factories, and ‘drop out’ entities to break the factory in different ways. great idea.</p>
]]></description><pubDate>Wed, 12 Mar 2025 13:32:17 +0000</pubDate><link>https://news.ycombinator.com/item?id=43343072</link><dc:creator>noddybear</dc:creator><comments>https://news.ycombinator.com/item?id=43343072</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43343072</guid></item><item><title><![CDATA[New comment by noddybear in "Show HN: Factorio Learning Environment – Agents Build Factories"]]></title><description><![CDATA[
<p>There was a black mirror episode about this too, I seem to remember! Soldiers imagining they were fighting monsters - while actually committing war crimes.</p>
]]></description><pubDate>Wed, 12 Mar 2025 13:29:46 +0000</pubDate><link>https://news.ycombinator.com/item?id=43343042</link><dc:creator>noddybear</dc:creator><comments>https://news.ycombinator.com/item?id=43343042</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43343042</guid></item><item><title><![CDATA[New comment by noddybear in "Show HN: Factorio Learning Environment – Agents Build Factories"]]></title><description><![CDATA[
<p>This is true - there are simpler benchmarks that can saturate planning for these models. We were motivated to create a broader spectrum eval, to test multiple capabilities at once and remain viable into the future.</p>
]]></description><pubDate>Wed, 12 Mar 2025 13:28:50 +0000</pubDate><link>https://news.ycombinator.com/item?id=43343033</link><dc:creator>noddybear</dc:creator><comments>https://news.ycombinator.com/item?id=43343033</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43343033</guid></item><item><title><![CDATA[New comment by noddybear in "Show HN: Factorio Learning Environment – Agents Build Factories"]]></title><description><![CDATA[
<p>One of us works at Anthropic - but we had no insider access to any models or weights. All of our evals were on public models.</p>
]]></description><pubDate>Wed, 12 Mar 2025 13:26:58 +0000</pubDate><link>https://news.ycombinator.com/item?id=43343019</link><dc:creator>noddybear</dc:creator><comments>https://news.ycombinator.com/item?id=43343019</comments><guid isPermaLink="false">https://news.ycombinator.com/item?id=43343019</guid></item></channel></rss>