JCO003 Using Liquid Reasoning to Build a Transformer Chess Engine
By LiquidChess
Summary
## Key takeaways - **Distill Stockfish Data via Chessbench**: We distill our neural network model by generating data from Stockfish and train our model on this data. For this part, we use an existing Stockfish label data set called chessbench, which contains a huge amount of positions that help our model generalize to unseen positions better. [01:23], [01:35] - **Action Value Beats Behavioral Cloning**: Action value engines play a move in a position with highest value that is the best move and behavioral cloning engines rank the most likely moves that Stockfish will make in a position. Our action value model perform much better than our behavioral cloning model as seen in the graphs. [01:53], [02:04] - **77 Tokens Encode Chess Position**: It converts a chess position into input tokens for the neural network: 64 tokens for 64 squares, one token for side to move, four tokens for castling rights, two tokens for en passant file and another six tokens for the number of moves passed. This totals to 77 tokens. [02:13], [02:27] - **Liquid Reasoning Refines Single Token**: The model does the thinking by refining a single reasoning token over multiple sets of reasoning. First go through an underlying transformer model proposing an update which is a possible move, then through a discard gate which decides whether to keep this proposal and a stop gate which decides whether the reasoning should stop. [02:45], [02:55] - **Adaptive Stopping by Position Complexity**: Reasoning is described as liquid because it stops adaptively based on how complex the input position is. For example, it will stop sooner on an easier puzzle and go more steps on a difficult one. [03:01], [03:12]
Topics Covered
- Distill Stockfish to Build Superior Engines
- Action Value Outperforms Behavioral Cloning
- Liquid Reasoning Adapts to Position Complexity
Full Transcript
Chess is a two-player game where piec by the goal is to check opponent's king piece. A chess engine is a program that
piece. A chess engine is a program that analyzes a position to find the best possible moves which maximize the chances. There are two main types of
chances. There are two main types of chess engines for traditional engines and soft. Input the position into an
and soft. Input the position into an evaluation function which outputs how much white or black is position. Then
into possible fusion and find maximize.
The second type is or engines. These
social patterns from direct similar to human pure the position is into which gives an article
to defines the future.
So how does our A model work? We distill
our neural network model by generating data from Stockfish and train our model on this data. For this part, we use an existing Stockfish label data set called chessbench, which contains a huge amount of positions that help our model
generalize to unseen positions better.
Training loss shows how far the model is from meshing with the training data.
Lower loss means the model is learning.
By training our model, this caused our loss aka difference in target move and resultant move to decrease over time.
The engine we train is composed of two parts. Action value learning and
parts. Action value learning and behavioral cloning. Action value engines
behavioral cloning. Action value engines play a move in a position with highest value that is the best move and behavioral cloning engines rank the most likely moves that Stockbridge will make in a position. Note that our action
value model perform much better than our behavioral cloning model as seen in the graphs.
Our underlining model is based on searches chest by Google Dine. It
converts a chest position into input tokens for the neural network like this.
64 tokens for 64 B squares. One token
for side to move. four tokens for casting rights, two tokens for ampson file and another six tokens for the number of moves has passed in the position. This leads uh this totals to
position. This leads uh this totals to 77 tokens. The output of a model is in a
77 tokens. The output of a model is in a fixed space of all 1968 possible allowed moves on a chessboard by pieces. This
allows our model to output to a fixed size vector for ease of computation.
On top of this underlying model, we've implemented a liquid reasoning transformer. The model does the thinking
transformer. The model does the thinking by refining a single reasoning token over multiple sets of reasoning. The
process is roughly like first go through an underlying transformer model proposing an update which is a possible move. The proposal goes through a
move. The proposal goes through a discard gate which decides whether to keep this proposal then a stop gate which decides whether the readings should stop. If so engine outputs an
should stop. If so engine outputs an action value or behavior result if not the process is repeated and a step of reasoning is done. Note that reasoning is described as liquid because it stops
adaptively based on how complex the input position is. For example, it will stop sooner on an easier puzzle and go more steps on a difficult one. We aim to test whether reasoning and architectural
improvement to the baseline model can improve the engine performance. Due to
time and hardware constraints, the model could not be trained to full performance. Despite that, we we have
performance. Despite that, we we have evaluated the engine on puzzles and tournament matchups against Stockfish on different levels. Our results are as
different levels. Our results are as follows. For this puzzles, we found that
follows. For this puzzles, we found that liquid reasoning made a slight improvement.
Loading video analysis...