We trained over the course of 5-6 days, and the end generation was capable of beating an intermediate player but got completely destroyed by experts. It was a fun project!
Of course another interesting competition could be to develop _all_ of a chess game and engine and see how low in code people can go. But that would be an entirely different competition.
In the past I’ve tried distilling not just the eval function but the outcome of the search (something like depth 20) in a neural net. It kind of works, but not very well until the net is pretty big. Deepmind did something similar.
I'd be quite tempted to try linear genetic programming with a variant of a Brainfuck-style UTM for this kind of constraint. Assuming 1024 tokens = 1024 bytes = 1024 instructions, I think there is some degree of performance that is feasible.
This is really interesting to me because hand-coding BF to do something like chess is absolutely beyond me. I wouldn't even know where to begin. A LGP-coded BF program would almost certainly outperform anything I could design with my own hands.
(I put mistake in quotes here because Deepmind showed that with a server farm of TPUs it is possible to distill the search as well. So it’s not impossible per se.)
But that’s ok! You’ve created a super fast evaluation function. Instead of trying to distill the depth 18 output, it will be much more realistic to distill the depth zero output, the NNUE. If you rerun those FENs in Stockfish you can pretty quickly create an easier dataset.
Have implemented 2048, Minesweeper, Sudoku and chess. First three are single player games have made good progress.
I still don't understand UCI interface and have not thought through chess solving. Hope will take another try this weekend
i feel like if you can bruteforce every possible board configuration for the next 3 turns and then make the move that leads to more desirable outcomes, that ought to be enough to thrash most amateurs. Probably not good enough to beat somebody who actually understands chess tactics on a deeper level, but I expect most non-masters are just "winging it" and making things up as they go along, so a machine that can think farther ahead than you would win more often than not.
But like I said, this is all just me fantasizing about a project I haven't even started yet so my hypothesis could easily be wrong.
It's how real chess engines like Stockfish work, but they are highly optimized so they can search extreme depths.
I'd recommend familiarizing yourself with the minimax algorithm - since that's exactly what it does where "desirable outcome" is essentially the tunable evaluation aspect of the equation.
And that's pivotal.
If you have a program which always makes valid moves and gives up when it has lost you wrote a proper chess playing program. It may play badly, but it plays.
Examples of heuristic engines are Hans Berliner’s CAPS-II and PARADISE by David Wilkins. PARADISE used a database of 200 rules.
There are also engines that try to retroactively apply rules to Stockfish evals but they’re fairly confusing in my experience. It’s like being told the answer and having to come up with a reason after the fact.
Imagine if LLMs were the only way we had to play chess. You’d need a centralized server and peak performance wouldn't even best a grandmaster. You spend $1 trillion building a super cluster because that’s all you know.
< - - This is where AI is today.
And then some startup creates Stockfish, a chess engine better than any LLM or grandmaster and can run on a smartphone.
This is kind of begging the question, right? How do you know what a “desirable” outcome is, other than mate-in-N positions? (In your case, N<=3.) The point of an evaluation function is to take a position and return a value which describes how desirable it is.
LLMs aren’t searching. They are memorizing. This explains their extremely poor performance on out of domain positions, whereas Stockfish can easily understand them.
First you'd have a heuristic that ranks outcomes from turn 3, which would just mean weighing the pieces the computer lost against the pieces the opponent lost. Then you rank each turn-2 node based on how desirable the turn-3 leaf-nodes it leads to are and how likely the opponent is to actually make the moves that lead to it. Then you'd do the same for turn-1 and then pick the move that scored highest.
[0]: https://github.com/dottxt-ai/outlines
[1]: https://openai.com/index/introducing-structured-outputs-in-t...
Given that, I think what I’m imagining is a PGN linter & renderer.