(jackhopkins.github.io)

75 points noddybear | 1 comments | 03 Oct 25 19:32 UTC | HN request time: 0.223s | source

We're excited to release v0.3.0 of the Factorio Learning Environment (FLE), an open-source environment for evaluating AI agents on long-horizon planning, spatial reasoning, and automation tasks.

== What is FLE? ==

FLE uses the game Factorio to test whether AI can handle complex, open-ended engineering challenges. Agents write Python code to build automated factories, progressing from simple resource extraction (~30 units/min) to sophisticated production chains (millions of units/sec).

== What's new in 0.3.0 ==

- Headless scaling: No longer needs the game client, enabling massive parallelization!

- OpenAI Gym compatibility: Standard interface for RL research

- Claude Code integration: We're livestreaming Claude playing Factorio [on Twitch](http://twitch.tv/playsfactorio)

- Better tooling and SDK: 1-line CLI commands to run evaluations (with W&B logging)

== Key findings ==

We evaluated frontier models (Claude Opus 4.1, GPT-5, Gemini 2.5 Pro, Grok 4) on 24 production automation tasks of increasing complexity.

Even the best models struggle:

- Most models still rely on semi-manual strategies rather than true automation

- Agents rarely define helper functions or abstractions, limiting their ability to scale

- Error recovery remains difficult – agents often get stuck in repetitive failure loops

The performance gap between models on FLE correlates more closely with real-world task benchmarks (like GDPVal) than with traditional coding/reasoning evals.

== Why this matters ==

Unlike benchmarks based on exams that saturate quickly, Factorio's exponential complexity scaling means there's effectively no performance ceiling. The skills needed - system debugging, constraint satisfaction, logistics optimization - transfer directly to real challenges.

== Try it yourself ==

>>> uv add factorio-learning-environment

>>> uv add "factorio-learning-environment[eval]"

>>> fle cluster start

>>> fle eval --config configs/gym_run_config.json

We're looking for researchers, engineers, and modders interested in pushing the boundaries of agent capabilities. Join our Discord if you want to contribute. We look forward to meeting you and seeing what you can build!

-- FLE Team

Show context

dang ◴[03 Oct 25 22:05 UTC] No.45468372[source]▶

>>45466865 (OP) #

Related. Others?

Multi-Agent Coordination in Factorio: FLE v0.2.0 - https://news.ycombinator.com/item?id=43926829 - May 2025 (5 comments)

Show HN: Factorio Learning Environment – Agents Build Factories - https://news.ycombinator.com/item?id=43331582 - March 2025 (209 comments)

replies(1): >>45468596 #

noddybear ◴[03 Oct 25 22:30 UTC] No.45468596[source]▶

>>45468372 #

This is our earlier work. Since May we've made it really easy for the community to build their own agents to play the game: you can now hook up your terminal to get Claude Code to play the game.

replies(2): >>45468700 #>>45468923 #

typpilol ◴[03 Oct 25 23:20 UTC] No.45468923[source]▶

>>45468596 #

Is there going to be some kind of plugin support for other games?

Id love to see Claude playa age of empires.

Claude plays command and conquer.

I already know there a huge AI starcraft 2 scene, but I don't think those are LLM AI.

replies(2): >>45469008 #>>45469598 #

1. noddybear ◴[03 Oct 25 23:34 UTC] No.45469008[source]▶

>>45468923 #

I am really keen on plugging into Age of Empires 2 - although practically I think we need a couple of years of improvements before LLMs would be smart/fast enough to react to the game in realtime. Currently they can't react fast enough - although specially trained networks could be viable.

↑

Show HN: FLE v0.3 – Claude Code Plays Factorio