'World Models,' an old idea in AI, mount a comeback

(www.quantamagazine.org)

Show context

AnotherGoodName ◴[02 Sep 25 17:54 UTC] No.45106653[source]▶

I’ve been working on board game ai lately.

Fwiw nothing beats ‘implement the game logic in full (huge amounts of work) and with pruning on some heuristics look 50 moves ahead’. This is how chess engines work and how all good turn based game ai works.

I’ve tried throwing masses of game state data at latest models in pytorch. Unusable. It Makes really dumb moves. In fact one big issue is that it often suggests invalid moves and the best way to avoid this is to implement the board game logic in full to validate it. At which point, why don’t i just do the above scan ahead X moves since i have to do the hard parts of manually building the world model anyway?

One area where current ai is helping is on the heuristics themselves for evaluating best moves when scanning ahead. You can input various game states and whether the player won the game or not in the end to train the values of the heuristics. You still need to implement the world model and look ahead to use those heuristics though! When you hear of neural networks being used for go or chess this is where they are used. You still need to build the world model and brute force scan ahead.

One path i do want to try more: In theory coding assistants should be able to read rulebooks and dynamically generate code to represent those rules. If you can do that part the rest should be easy. Ie. it could be possible to throw rulebooks at ai and it play the game. It would generate a world model from the rulebook via coding assistants and scan ahead more moves than humanly possible using that world model, evaluating to some heuristics that would need to be trained through trial and error.

Of course coding assistants aren’t at a point where you can throw rulebooks at them to generate an internal representation of game states. I should know. I just spent weeks building the game model even with a coding assistant.

replies(12): >>45106842 #>>45106945 #>>45106986 #>>45107761 #>>45107771 #>>45108876 #>>45109332 #>>45109904 #>>45110225 #>>45112651 #>>45113553 #>>45114494 #

smokel ◴[02 Sep 25 18:19 UTC] No.45106986[source]▶

>>45106653 #

You probably know this, but things heavily depend on the type of board game you are trying to solve.

In Go, for instance, it does not help much to look 50 moves ahead. The complexity is way too high for this to be feasible, and determining who's ahead is far from trivial. It's in these situations where modern AI (reinforcement learning, deep neural networks) helps tremendously.

Also note that nobody said that using AI is easy.

replies(1): >>45107087 #

AnotherGoodName ◴[02 Sep 25 18:27 UTC] No.45107087[source]▶

>>45106986 #

Alphago (and stockfish that another commenter mentioned) still has to search ahead using a world model. The AI training just helps with the heuristics for pruning and evaluation of that search.

The big fundamental blocker to a generic ‘can play any game’ ai is the manual implementation of the world model. If you read the alphago paper you’ll see ‘we started with nothing but an implementation of the game rules’. That’s the part we’re missing. It’s done by humans.

replies(2): >>45107183 #>>45107445 #

1. moyix ◴[02 Sep 25 18:54 UTC] No.45107445[source]▶

>>45107087 #

Note that MuZero did better than AlphaGo, without access to preprogrammed rules: https://en.wikipedia.org/wiki/MuZero

replies(2): >>45108021 #>>45112558 #

2. smokel ◴[02 Sep 25 19:39 UTC] No.45108021[source]▶

>>45107445 (TP) #

Minor nitpick: it did not use preprogrammed rules for scanning through the search tree, but it does use preprogrammed rules to enforce that no illegal moves are made during play.

replies(1): >>45109219 #

3. hulium ◴[02 Sep 25 21:21 UTC] No.45109219[source]▶

>>45108021 #

During play, yes, obviously you need an implementation of the game to play it. But in its planning tree, no:

> MuZero only masks legal actions at the root of the search tree where the environment can be queried, but does not perform any masking within the search tree. This is possible because the network rapidly learns not to predict actions that never occur in the trajectories it is trained on.

https://arxiv.org/pdf/1911.08265

replies(1): >>45110193 #

4. skywhopper ◴[02 Sep 25 22:56 UTC] No.45110193{3}[source]▶

>>45109219 #

That is exactly what the commenter was saying.

replies(2): >>45110406 #>>45112109 #

5. gnfargbl ◴[02 Sep 25 23:24 UTC] No.45110406{4}[source]▶

>>45110193 #

The more detailed clarification on what "preprogrammed rules" actually means in this case made the entire discussion significantly more clear to me. I think it was helpful.

6. Zacharias030 ◴[03 Sep 25 03:51 UTC] No.45112109{4}[source]▶

>>45110193 #

It is consistent with what the commenter was saying.

In any case, for Go - with a mild amount of expert knowledge - this limitation is most likely quite irrelevant unless in very rare endgame situations, or special superko setups, where a lack of moves or solutions push some probability to moves that look like wishful thinking.

I think this is not a significant limitation of the work (not that any parent claimed otherwise). MuZero is acting in an environment with prescribed actions, it’s just “planning with a learned model” and without access to the simulation environment.

—-

What I am less convinced by was the claim that MuZero reaches higher performance than previous AlphaZero variants. What is the comparison based on? Iso-flops, Iso-search depth, iso self play games, iso wallclock time? What would make sense here?

Each AlphaGo paper was trained on some sort of embarrassingly parallel compute cluster, but all included the punchlines for general audiences that “in just 30 hours” some performance level was reached.

7. CGamesPlay ◴[03 Sep 25 05:43 UTC] No.45112558[source]▶

>>45107445 (TP) #

This is true, and MuZero's paper notes that it did better with less computation than AlphaZero. But it still used about 10x more computation to get there than AlphaGo, which was "bootstrapped" with human expert moves. I think this is very important context to anyone who is trying to implement an AI for their own game.

↑