[1]: https://hlfshell.ai/posts/deepmind-grandmaster-chess-without...
[1]: https://hlfshell.ai/posts/deepmind-grandmaster-chess-without...
Currently there's a very interesting war between small neural networks on the CPU with high search depth alpha-beta pruning (stockfish NNUE) and big neural networks on a GPU with Monte Carlo search and lower depth (lc0).
So, while machines beating humans is "solved", chess is very far from solved (just ask the guys who have actually solved chess endgames with 8 or less pieces).
even in human chess people sometimes mistaken draw frequency to reflect both sides playing optimally, but there are many games where a winning advantage slips away into a draw
No computers now or in the foreseeable future will be capable of solving chess. It has an average branching factor over 30 and games can be over 100 moves.
NNUE already tries to distill a subtree eval into a neural net, but it’s optimized for CPU rather than GPU.
Grandmaster-Level Chess Without Search
It’s still very cool that they could learn a very good eval function that doesn’t require search. I would’ve liked the authors to throw out the games where the Stockfish fallback kicked in though. Even for a human, mate in 2 vs mate in 10 is the difference between a win and a draw/loss on time.
I also would’ve liked to see a head to head with limited search depth Stockfish. That would tell us approximately how much of the search tree their eval function distilled.
As for limited search tree I like the idea! I think it's tough to measure, since the time it takes to perform search across various depths vary wildly based on the complexity of the position. I feel like you would have to compile a dataset of specific positions identified to require significant depth of search to find a "good" move.
And limited depth games would not have been difficult to run. You can run a limited search Stockfish on a laptop using the UCI protocol: https://github.com/official-stockfish/Stockfish/wiki/UCI-%26...
Say I want to play chess with an opponent that is at about the same skill level as me, or perhaps I want to play with an opponent about 100 rating points above me for training.
Most engines let you dumb them down by cutting search depth, but that usually doesn't work well. Sure, you end up beating them about half the time if you cut the search down enough but it generally feels like they were still outplaying you for much of the game and you won because they made one or two blunders.
What I want is a computer opponent that plays at a level of my choosing but plays a game that feels like that of a typical human player of that level.
Are there such engines?
Basic computer opponents on the other hand can make moves all over the place. They look at the board state holistically. This can be very frustrating to play against as a human who has enough problems just thinking their way through some subset of the board, but is thrown off by the computer again and again.
It's not that bad in chess at least (compared to Go), but still something worth to keep in mind if you're trying to make an AI that is fun to play against as an amateur.
The best neural network chess engine's authors wrote about this deepminds publication.
"What would Stockfish Do?"
A more appropriate title; because Stockfish is a search-base system and DeepMind's approach wouldn't work without it.
Oh, btw, this is (yet another) a Neurosymbolic system of the "compiling system 2 to system 1" type.
This implies the model is around 2500 blitz vs humans. As blitz elo are often much higher than in classical time controls, 2500 elo on chess.com places it firmly in the 'good but not great' level.
I am very curious to know whether the model suffers from the same eval problems vs the well known "anti-bot" openings that stockfish is susceptible to at limited search depths.
Yeah, no. They are two different rating systems (not ELO incidentally) with different curves, there isn't a fixed difference you can apply. At the high end of the scale lichess ratings are below, not above, chess.com ratings. E.g. Magnus Carlsen is 3131 blitz on lichess [0], 3294 blitz on chess.com [1].
This website [2] tries to translate between the sites, and figures that a 2925 lichess blitz rating (the closet on the website to the one reported in the paper of 2895) translates to 3000 chess.com.
[0] Multiple accounts but this is the one I found with the most blitz games: https://lichess.org/@/DrNykterstein/perf/blitz
[1] https://www.chess.com/member/magnuscarlsen
[2] https://chessgoals.com/rating-comparison/#lichesschesscom
What you’re discussing sounds like intuition with checking, which is pretty close to how humans with a moderate degree of skill behave. I haven’t known enough Chess or Go masters to have any claim on how they think. But most of us don’t want an opponent at that level and if we did, we would certainly find a human, or just play against ourselves.
It uses a similar approach to Maia but with a different neural network, so it had a bit better move matching performance. And on top of that it has an expectation maximization algorithm so that the bot will try to exploit your mistakes.
On lichess puzzles gpt4o with the compiled prompt is around 70%, I think the 270M transformer is around 95%
It wouldn't be competitive against top tier players and AI, but I wouldn't be surprised if it could beat me. 'Instantly' knowing the next move would be a cool trick.
They have managed to create one for 7 pieces. Last update on trying to get to 8 piece database: https://www.chess.com/blog/Rocky64/eight-piece-tablebases-a-...
It uses paradigmatic PyTorch with easy to read code, and the architecture is similar to the current best performing chess neural nets.
> From May to August 2018 Bojun Guo generated 7-piece tables. The 7-piece tablebase contains 423,836,835,667,331 unique legal positions in about 18 Terabytes.
Chess configs = 4.8 x 10^44, Atoms > 10^70
https://tromp.github.io/chess/chess.html https://physics.stackexchange.com/questions/47941/dumbed-dow...
You might be able to pull off a low-resolution lookup table. Take some big but manageable number N (e.g 10^10) and calculate the maximally even distribution of those points over the total space of chessboard configurations. Then make a lookup table for those configs. In play, for configs not in the table, interpolate between nearest points in the table.
If you want a computer that plays like a human, you will probably need to imitate the way that a human thinks about the game. This means for example thinking about the interactions between pieces and the flow of the game rather than stateless evaluations.
It’s supposedly good up to about 1300, but aside from that the ability to prompt can make the style of play somewhat tunable for ex aggressive, defensive, etc.
The resolution isn't great, and adding search to that can be used to develop an implicit measure of how accurate the function is (ie, probability the move suggested in a position remains unchanged after searching the move tree for better alternatives).
I'm also curious about if it would be possible to mimic certain playing styles. Two beginners can have the same rating but one might lose because they have a weak opening, and the other one because they mess upo the end game, for example.
Random mistakes doesn't mimic human play very well.
As far as I can tell, they got rid of this feature. It was the only computer opponent that felt real. Like it made a human mistake when put under pressure, rather than just playing like a computer and randomly deciding to play stupid.
I'm curious how you combined Stockfish with your own model - but no worries if you're keeping the secret sauce a secret. All the best to you in building out this app!
Since the whole thing is executed in the browser (including the model) there aren't a ton of secrets for me to keep. Essentially it is expectation maximization: the bot tries to find the move with the highest value. What is "value"? Essentially, it is the dot product between the probability distribution coming out of the model and the centipawn evaluations from Stockfish.
In other words if the model thinks you will blunder with high probability, it will try to steer you towards making that mistake.
AlphaZero was the successor to AlphaGo. AZ was notable because unlike AG, it used zero human games to learn to play: it just played games against itself. Typically in “supervised” machine learning you take human data and train a model to imitate it. AZ used zero human data to learn.
Leela Chess Zero started out as an open source copy of AZ but it’s probably better than AZ now.
Java source code here: https://github.com/theronic/chessmate
That is what winning in chess is. Minimising blunders.
Is the portal to go. From there, you can dig deeper in many relevant themes.
That's why I don't like winning in multiplyer games. Usually when you win you either feel like the opponent just played comically bad on sufficient number of occasions or that they played well but in few instances you got undully lucky and it could have gone either way. Very rarely you get the desired feeling that opponent played well but you just played a little better overall so your win is deserved. It almost always seem like it's not that you are winning but the opponent is losing instead. And none of that is about AI. Making AI that lets you win symmetrical games satisfyingly and teaches you with your losses in a satisfying manner would be a billion dollar business. I don't think it can be done without some serious psychology research.
Even 10 thousand of such games may already have way more tactics than a player at the targeted level can detect and apply. If so, a learning algorithm that detects and remembers all of them already will be better than the target level.
My current chess engine already hangs its queen sometimes and walks into forks. I'm still experimenting with how to improve personalization.
This result seems to tell us less about the power of the training approach (in absolute terms) and more about how amenable the chess game tree is to those two approaches (in relative terms). What I would take away is that a reasonable approximation of that tree can be made in 270M words of data.
There's just too much wordplay going on with "heuristic"
You can simulate a better/worse player by increasing/decreasing the factor: 1 plays as well as the chosen engine can do, 0 is typing random (yet valid) moves on the keyboard.
The average rating of tournament chess players in the US is around USCF 1550. I'm not sure what their FIDE rating would be. FIDE ratings are usually 50-100 points lower than USCF ratings but that's based on comparing people that have both ratings which for the most part are strong masters and above.
A human with a USCF 1550 rating will typically be mostly making moves that are suboptimal in a variety of ways: piece coordination, king safety, planning, pawn structure, development, search, and more. Basically they are around 1550 at nearly everything. There will be variations of course. A particular player might be worse at king safety and better at tactics for instance, but it will be something like they handle king safety like a 1400 and tactics like a 1700.
With a GM level engine turned down to 1550 you tend to see aspects of GM level play still in its game. If you are a 1550 playing against it it doesn't feel like you playing the kind of opponent you will play if you enter a local chess tournament and get paired with another 1450-1650 player.
It feels like you are playing someone with a totally different approach to chess than you who just happens to lose or draw to you about the same amount as a 1450-1650 human.
Source?
Otherwise you wouldn't really be learning anything useful. You would end up with an opening vocabulary that good players would easily punish. If you play crappy gambits leading to positions you know well the better players will think highly of you.
Best way to learn is to play the hardest possible engines and just take back moves when it becomes evident you've screwed up.
> Generally considered to be the strongest GPU engine, it continues to provide open data which is essential for training our NNUE networks. They released version 0.31.1 of their engine a few weeks ago, check it out!
The main difference is that Stockfish is targeting to run on the CPU while Leela targets the GPU. That stockfish is able to be competitive with Leela is of course impressive.
https://lichess.org/@/StockfishNews/blog/stockfish-17-is-her...
> Board states s are encoded as FEN strings which we convert to fixed-length strings of 77 characters where the ASCII-code of each character is one token. A FEN string is a description of all pieces on the board, whose turn it is, the castling availability for both players, a potential en passant target, a half-move clock and a full-move counter. We essentially take any variable-length field in the FEN string, and convert it into a fixed-length sub-string by padding with ‘.’ if needed. We never flip the board; the FEN string always starts at rank 1, even when it is the black’s turn. We store the actions in UCI notation (e.g., ‘e2e4’ for the well-known white opening move). To tokenize them we determine all possible legal actions across games, which is 1968, sort them alphanumerically (case-sensitive), and take the action’s index as the token, meaning actions are always described by a single token (all details in Section A.1).
I am starting to notice a pattern in these papers - Writing hyper-specific tokenizers for the target problem.
How would this model perform if we made a small change to the rules of chess and continued using the same tokenizer? If we find we need to rewrite the tokenizer for every problem variant, then I argue this is just ordinary programming in a very expensive disguise.
How is this the top comment?
> I am starting to notice a pattern in these papers - Writing hyper-specific tokenizers for the target problem.
This is merely expressing what they consider as part of a game state, which is entirely needed for what they set out to do.
> I argue this is just ordinary programming
"Ordinary programming" (what does that mean?) for such a task implies extraordinary chess intuition, capable of conjuring rules and heuristics for the task of comparing two game states and saying which one is "better" (what does better mean?).
> How would this model perform if we made a small change to the rules of chess and continued using the same tokenizer?
If by "small change" you are implying i.e. removing the ability to castle, then sure, the tokenizer would need to be rewritten. At the same time, the entire training dataset would need to be changed, such that the games are valid under your new ruleset. How is this controversial or unexpected?
It feels like you are expecting that state of the art technology allows us to input an arbitrary ruleset and the mighty computer immediately plays an arbitrary game optimally. Unfortunately, this is not the case, but that does not take anything away from this paper.
So increasing the number of parameters to the model would allow it to encode more of the search tree and give better performance, which doesn't seem all that interesting.
Since there is no training data for that game, I don't know you get this kind of AI to do anything?
You can see in the release notes a few screenshot examples where a particular move changes likelihood as you get to higher-level play: https://github.com/lightvector/KataGo/releases/tag/v1.15.0
Now use a transformer to "compress" that information into its model. It sounds like that is approximately what is going on here. Certainly, the model is likely to generalize some aspects of the data (just like LLMs do). But for the most part, the model encodes the information from the Stockfish evaluation.
(This is just my guess of what we are seeing.)
The recent breakthrough successes in machine learning are mainly attributed to scale: namely large-scale attention-based architectures and datasets of unprecedented scale. This paper investigates the impact of training at scale for chess. Unlike traditional chess engines that rely on complex heuristics, explicit search, or a combination of both, we train a 270M parameter transformer model with supervised learning on a dataset of 10 million chess games. We annotate each board in the dataset with action-values provided by the powerful Stockfish 16 engine, leading to roughly 15 billion data points. Our largest model reaches a Lichess blitz Elo of 2895 against humans, and successfully solves a series of challenging chess puzzles, without any domain-specific tweaks or explicit search algorithms. We also show that our model outperforms AlphaZero's policy and value networks (without MCTS) and GPT-3.5-turbo-instruct. A systematic investigation of model and dataset size shows that strong chess performance only arises at sufficient scale. To validate our results, we perform an extensive series of ablations of design choices and hyperparameters.
I can also make a note of it privately and check back in with you in the future. I found it pretty remarkable that it played a human-like response to some niche openings - I actually ended up checking against Stockfish and it played different moves, which is pretty neat.
A 6-piece tablebase is 150GB. A 7 piece is 18TB. An 8 piece is thought to be 2PB, but we don't have one yet. How big do you think a 32-piece tablebase will be?
https://arstechnica.com/information-technology/2023/02/man-b...
- The Leela open source community had already used transformer architecture to train Lc0 long before the paper (and published it, too!) and got much better result than new DeepMind massive model
- The top engines with with search (Stockfish NNUE, Lc0) beat DeepMind’s model by margins under normal competition’s conditions
- Speaking about efficiency, Stockfish NNUE can run on a commodity PC with only slightly lower ELO. AlphaZero or DeepMind’s new model can not even run to begin with.
It is a bit roundabout, since it involves converting maia models to onnx before loading into pytorch and some outdated versions of libraries (maia/lc0 are a little old). We were using this for transfer learning for a competition, so we needed some flexibility that we didn't know how to do quickly/easily in TF.
Hope this helps.
------------------
Personal note: given your interest in chess ai and your starcraft username, I think we would have a lot of shared interests. Feel free to reach out (info is in my profile).