Show HN: Factorio Learning Environment – Agents Build Factories

Very cool and also pretty expected results tbh. Some thoughts:

Factorio is a game that requires SIGNIFICANT amounts of thinking ahead, often requiring investments into things that won't pay off until much later and which might even significantly hamper initial development. Building a main bus vs spaghetti belts is one of the obvious examples here.

Humans with a little bit of experience playing factorio know that while building 1 item/s of some new resource is good, the game is about eventually building thousands of the new item. Until the LLM learns not to be short term minded it will probably build itself into a corner very quickly.

It is kind of amazing that these models manage to figure out a strategy at all, considering the game is not in their training set. That said, the current research goals are not very good IMO. Building the largest possible base has the predictable result of the AI building a humongous belt loop covering much of the map. A much better target would be the "standard" goal of SPM.

I think 99% of Factorio could be "solved" with GOFAI algorithms from the 80s and enough processing power. Set up a goal like 10k SPM and then work backwards towards how many of each resource you need, then recursively figure out fastest way to set up the production for each subresource using standard optimization algorithms from OR. No LLMs needed.

Whats very interesting is if we could use LLMs to generate GOFAI methods. Its often not at all obvious how to do so. Than being said its still hard to express goals in terms of natural language and resources to LLMs. I;ve been trying different things and none seems to work for me to say hey this is a step improvement. Its also hard to come up with a dataset for these use cases.

FLE agents technically can implement their own Python libraries to leverage GOFAI to do the heavy lifting. None has actually attempted this yet though. It would be interesting to see if this can be achieved just by modifying the manual given to the agents to bias in favour of this approach.

That does sound interesting. I might attempt it. Thanks for this benchmark, I totally could use it for my PhD (I started with GOFAI, but have hit a dead end. My advisor is suggesting pivoting into using LLMs to call my GoFAI framework.

Feel free to create an issue in the repo - am totally happy to help however I am able! I think that the only change you'll have to make is to expose your GoFAI framework in the 'Namespace' object which the agents have access to (for them to call it directly). Alternatively you could design a new tool which takes in game objects and generates a solution / typed object output.