←back to thread

749 points noddybear | 5 comments | | HN request time: 0.674s | source

I'm Jack, and I'm excited to share a project that has channeled my Factorio addiction recently: the Factorio Learning Environment (FLE).

FLE is an open-source framework for developing and evaluating LLM agents in Factorio. It provides a controlled environment where AI models can attempt complex automation, resource management, and optimisation tasks in a grounded world with meaningful constraints.

A critical advantage of Factorio as a benchmark is its unbounded nature. Unlike many evals that are quickly saturated by newer models, Factorio's geometric complexity scaling means it won't be "solved" in the next 6 months (or possibly even years). This allows us to meaningfully compare models by the order-of-magnitude of resources they can produce - creating a benchmark with longevity.

The project began 18 months ago after years of playing Factorio, recognising its potential as an AI research testbed. A few months ago, our team (myself, Akbir, and Mart) came together to create a benchmark that tests agent capabilities in spatial reasoning and long-term planning.

Two technical innovations drove this project forward: First, we discovered that piping Lua into the Factorio console over TCP enables running (almost) arbitrary code without directly modding the game. Second, we developed a first-class Python API that wraps these Lua programs to provide a clean, type-hinted interface for AI agents to interact with Factorio through familiar programming paradigms.

Agents interact with FLE through a REPL pattern: 1. They observe the world (seeing the output of their last action) 2. Generate Python code to perform their next action 3. Receive detailed feedback (including exceptions and stdout)

We provide two main evaluation settings: - Lab-play: 24 structured tasks with fixed resources - Open-play: An unbounded task of building the largest possible factory on a procedurally generated map

We found that while LLMs show promising short-horizon skills, they struggle with spatial reasoning in constrained environments. They can discover basic automation strategies (like electric-powered drilling) but fail to achieve more complex automation (like electronic circuit manufacturing). Claude Sonnet 3.5 is currently the best model (by a significant margin).

The code is available at https://github.com/JackHopkins/factorio-learning-environment.

You'll need: - Factorio (version 1.1.110) - Docker - Python 3.10+

The README contains detailed installation instructions and examples of how to run evaluations with different LLM agents.

We would love to hear your thoughts and see what others can do with this framework!

Show context
p10jkle ◴[] No.43331679[source]
Wow, fascinating. I wonder if in a few years every in-game opponent will just be an LLM with access to a game-controlling API like the one you've created.

Did you find there are particular types of tasks that the models struggle with? Or does difficulty mostly just scale with the number of items they need to place?

replies(4): >>43331727 #>>43331794 #>>43332192 #>>43339521 #
1. noirscape ◴[] No.43332192[source]
Very unlikely that you'll see mass-use of LLMs as opponents. Enemy AI in most games doesn't need the level of complexity that machine learning demands. (Ignoring computational costs for a second.)

The main goal of an enemy AI isn't to be the hardest thing in the world, it's to provide an interesting challenge for the player to overcome. It's not necessarily difficult to make a hypercompetent AI in most games, but that also wouldn't make it very interesting to play against. Most games have finite states of logic, just large enough to the point where a human would have trouble finding every solution to it (although humans tend to be very good at pushing on the edges of these states to find ways around them).

Even in games where the amount of state is much higher than usual, you rarely want a super AI; nobody likes playing against an aimbot in an FPS for example.

Factorio is an outlier because unlike regular games, the true condition for a "victory" is almost entirely up to the player. You can make a rocket in non-DLC Factorio (the games victory condition) without building any factory at all beyond the most basic structures for stuff you can't handcraft. It'd be extremely slow, but it's an option. That's why the benchmark for this sort of thing is more efficiency than it is "can this work".

replies(3): >>43332471 #>>43334422 #>>43336036 #
2. PetitPrince ◴[] No.43332471[source]
As an opponent that would be indeed unfun, but as a sparring partner / coach in a competitive game (fighting game? Rts? Moba? Puzzle game?) that would be useful.
replies(1): >>43334407 #
3. ◴[] No.43334407[source]
4. ◴[] No.43334422[source]
5. fragmede ◴[] No.43336036[source]
Civilization (VII just released) is famous for having the harder difficulties be harder because the AI cheats. If the game was harder because the AI was smarter instead of it cheating, it would be worth it to players to upgrade!