Hacker News: noddybear

New comment by noddybear in "Starship V3"

noddybear — Wed, 13 May 2026 08:58:46 +0000

An EMP from a high altitude nuclear detonation would do the trick.

New comment by noddybear in "Yann LeCun to depart Meta and launch AI startup focused on 'world models'"

noddybear — Wed, 12 Nov 2025 13:40:41 +0000

Nah, its all pattern matching. This is how automated theorem provers like Isabelle are built, applying operations to lemmas/expressions to reach proofs.

New comment by noddybear in "Show HN: FLE v0.3 – Claude Code Plays Factorio"

noddybear — Fri, 03 Oct 2025 23:34:53 +0000

I am really keen on plugging into Age of Empires 2 - although practically I think we need a couple of years of improvements before LLMs would be smart/fast enough to react to the game in realtime. Currently they can't react fast enough - although specially trained networks could be viable.

New comment by noddybear in "Show HN: FLE v0.3 – Claude Code Plays Factorio"

noddybear — Fri, 03 Oct 2025 23:27:31 +0000

Thank you!

New comment by noddybear in "Show HN: FLE v0.3 – Claude Code Plays Factorio"

noddybear — Fri, 03 Oct 2025 22:30:56 +0000

Biters are disabled, but cliffs are not

New comment by noddybear in "Show HN: FLE v0.3 – Claude Code Plays Factorio"

noddybear — Fri, 03 Oct 2025 22:30:42 +0000

This is our earlier work. Since May we've made it really easy for the community to build their own agents to play the game: you can now hook up your terminal to get Claude Code to play the game.

Show HN: FLE v0.3 – Claude Code Plays Factorio

noddybear — Fri, 03 Oct 2025 19:32:37 +0000

We're excited to release v0.3.0 of the Factorio Learning Environment (FLE), an open-source environment for evaluating AI agents on long-horizon planning, spatial reasoning, and automation tasks.

== What is FLE? ==

FLE uses the game Factorio to test whether AI can handle complex, open-ended engineering challenges. Agents write Python code to build automated factories, progressing from simple resource extraction (~30 units/min) to sophisticated production chains (millions of units/sec).

== What's new in 0.3.0 ==

- Headless scaling: No longer needs the game client, enabling massive parallelization!

- OpenAI Gym compatibility: Standard interface for RL research

- Claude Code integration: We're livestreaming Claude playing Factorio [on Twitch](http://twitch.tv/playsfactorio)

- Better tooling and SDK: 1-line CLI commands to run evaluations (with W&B logging)

== Key findings ==

We evaluated frontier models (Claude Opus 4.1, GPT-5, Gemini 2.5 Pro, Grok 4) on 24 production automation tasks of increasing complexity.

Even the best models struggle:

- Most models still rely on semi-manual strategies rather than true automation

- Agents rarely define helper functions or abstractions, limiting their ability to scale

- Error recovery remains difficult – agents often get stuck in repetitive failure loops

The performance gap between models on FLE correlates more closely with real-world task benchmarks (like GDPVal) than with traditional coding/reasoning evals.

== Why this matters ==

Unlike benchmarks based on exams that saturate quickly, Factorio's exponential complexity scaling means there's effectively no performance ceiling. The skills needed - system debugging, constraint satisfaction, logistics optimization - transfer directly to real challenges.

== Try it yourself ==

>>> uv add factorio-learning-environment

>>> uv add "factorio-learning-environment[eval]"

>>> fle cluster start

>>> fle eval --config configs/gym_run_config.json

We're looking for researchers, engineers, and modders interested in pushing the boundaries of agent capabilities. Join our Discord if you want to contribute. We look forward to meeting you and seeing what you can build!

-- FLE Team

Comments URL: https://news.ycombinator.com/item?id=45466865

Points: 75

# Comments: 17

New comment by noddybear in "Why is choral music harder to appreciate?"

noddybear — Mon, 25 Aug 2025 05:13:11 +0000

"Spem in Alium" is the most beautiful piece of music in existence for me: https://www.youtube.com/watch?v=iT-ZAAi4UQQ

At 40 distinct melodies, it is certainly the 'grandest' piece in early English church music.

New comment by noddybear in "The Economist's global rip off"

noddybear — Mon, 19 May 2025 22:18:57 +0000

The onus is on you to present evidence to justify your claim. Without actual data beyond anecdotes, your claim can and should be dismissed.

New comment by noddybear in "Multi-Agent Coordination in Factorio: FLE v0.2.0"

noddybear — Thu, 08 May 2025 15:36:34 +0000

1. These are additions to our existing Factorio Learning Environment, which is an extensive agent environment for evaluating pre-trained LLM agents in an unbounded/open-ended setting in the game of Factorio. I don't agree that it is trivial, as there is significant infrastructure in place to support Factorio as an LLM eval.

2. Factorio is an unsolved game in multi-agent research.

3. This is a research environment. You can read our paper on Arxiv if you're interested! Nobody will make any money of this.

New comment by noddybear in "Multi-Agent Coordination in Factorio: FLE v0.2.0"

noddybear — Thu, 08 May 2025 15:09:29 +0000

Hey everyone,

It's Mart, Neel and Jack from the Factorio Learning Environment team.

Since our initial release, we have been working hard to expand the environment to support multi-agent scenarios, reasoning models and MCP for human-in-the-loop evals.

We have also spent time experimenting with different ways to elicit more performance out of agents in the game, namely tools for vision and reflection.

Today, we are proud to release v0.2.0, which includes several exciting new features and improvements.

Thanks for checking this out.

Multi-Agent Coordination in Factorio: FLE v0.2.0

noddybear — Thu, 08 May 2025 15:09:29 +0000

Article URL: https://jackhopkins.github.io/factorio-learning-environment/release.0.2.0

Comments URL: https://news.ycombinator.com/item?id=43926829

Points: 13

# Comments: 5

New comment by noddybear in "smartfunc: Turn Docstrings into LLM-Functions"

noddybear — Thu, 10 Apr 2025 16:53:24 +0000

Cool! Looks a lot like Tanuki: https://github.com/Tanuki/tanuki.py

New comment by noddybear in "Show HN: Factorio Learning Environment – Agents Build Factories"

noddybear — Wed, 12 Mar 2025 13:40:20 +0000

This specific approach relied on: a) availability of multiplayer servers, and b) a remotely accessible console.

I know Minecraft works in the same way - but I’m not sure about RPGs like NMS.

New comment by noddybear in "Show HN: Factorio Learning Environment – Agents Build Factories"

noddybear — Wed, 12 Mar 2025 13:38:51 +0000

If I understand you correctly, this approach is sort of supported in FLE - the agents can create functions that encapsulate more complex logic. However, interaction is still synchronous/turn-based. I think to do what you propose, you will need to create event listeners that can trigger the agents program whenever appropriate.

New comment by noddybear in "Show HN: Factorio Learning Environment – Agents Build Factories"

noddybear — Wed, 12 Mar 2025 13:34:18 +0000

The idea is for us to track all frontier models using the basic agent (goal, tooling info), and then offer another leaderboard for different agent architectures (with retrieval etc).

New comment by noddybear in "Show HN: Factorio Learning Environment – Agents Build Factories"

noddybear — Wed, 12 Mar 2025 13:32:17 +0000

Oh super interesting! Create 10 scenarios containing working factories, and ‘drop out’ entities to break the factory in different ways. great idea.

New comment by noddybear in "Show HN: Factorio Learning Environment – Agents Build Factories"

noddybear — Wed, 12 Mar 2025 13:29:46 +0000

There was a black mirror episode about this too, I seem to remember! Soldiers imagining they were fighting monsters - while actually committing war crimes.

New comment by noddybear in "Show HN: Factorio Learning Environment – Agents Build Factories"

noddybear — Wed, 12 Mar 2025 13:28:50 +0000

This is true - there are simpler benchmarks that can saturate planning for these models. We were motivated to create a broader spectrum eval, to test multiple capabilities at once and remain viable into the future.

New comment by noddybear in "Show HN: Factorio Learning Environment – Agents Build Factories"

noddybear — Wed, 12 Mar 2025 13:26:58 +0000

One of us works at Anthropic - but we had no insider access to any models or weights. All of our evals were on public models.