JCO003 Using Liquid Reasoning to Build a Transformer Chess Engine

By LiquidChess

Summary

## Key takeaways - **Distill Stockfish Data via Chessbench**: We distill our neural network model by generating data from Stockfish and train our model on this data. For this part, we use an existing Stockfish label data set called chessbench, which contains a huge amount of positions that help our model generalize to unseen positions better. [01:23], [01:35] - **Action Value Beats Behavioral Cloning**: Action value engines play a move in a position with highest value that is the best move and behavioral cloning engines rank the most likely moves that Stockfish will make in a position. Our action value model perform much better than our behavioral cloning model as seen in the graphs. [01:53], [02:04] - **77 Tokens Encode Chess Position**: It converts a chess position into input tokens for the neural network: 64 tokens for 64 squares, one token for side to move, four tokens for castling rights, two tokens for en passant file and another six tokens for the number of moves passed. This totals to 77 tokens. [02:13], [02:27] - **Liquid Reasoning Refines Single Token**: The model does the thinking by refining a single reasoning token over multiple sets of reasoning. First go through an underlying transformer model proposing an update which is a possible move, then through a discard gate which decides whether to keep this proposal and a stop gate which decides whether the reasoning should stop. [02:45], [02:55] - **Adaptive Stopping by Position Complexity**: Reasoning is described as liquid because it stops adaptively based on how complex the input position is. For example, it will stop sooner on an easier puzzle and go more steps on a difficult one. [03:01], [03:12]

Topics Covered

Distill Stockfish to Build Superior Engines
Action Value Outperforms Behavioral Cloning
Liquid Reasoning Adapts to Position Complexity

Full Transcript

Chess is a two-player game where piec by the goal is to check opponent's king piece. A chess engine is a program that

piece. A chess engine is a program that analyzes a position to find the best possible moves which maximize the chances. There are two main types of

chances. There are two main types of chess engines for traditional engines and soft. Input the position into an

and soft. Input the position into an evaluation function which outputs how much white or black is position. Then

into possible fusion and find maximize.

The second type is or engines. These

social patterns from direct similar to human pure the position is into which gives an article

to defines the future.

So how does our A model work? We distill

our neural network model by generating data from Stockfish and train our model on this data. For this part, we use an existing Stockfish label data set called chessbench, which contains a huge amount of positions that help our model

generalize to unseen positions better.

Training loss shows how far the model is from meshing with the training data.

Lower loss means the model is learning.

By training our model, this caused our loss aka difference in target move and resultant move to decrease over time.

The engine we train is composed of two parts. Action value learning and

parts. Action value learning and behavioral cloning. Action value engines

behavioral cloning. Action value engines play a move in a position with highest value that is the best move and behavioral cloning engines rank the most likely moves that Stockbridge will make in a position. Note that our action

value model perform much better than our behavioral cloning model as seen in the graphs.

Our underlining model is based on searches chest by Google Dine. It

converts a chest position into input tokens for the neural network like this.

64 tokens for 64 B squares. One token

for side to move. four tokens for casting rights, two tokens for ampson file and another six tokens for the number of moves has passed in the position. This leads uh this totals to

position. This leads uh this totals to 77 tokens. The output of a model is in a

77 tokens. The output of a model is in a fixed space of all 1968 possible allowed moves on a chessboard by pieces. This

allows our model to output to a fixed size vector for ease of computation.

On top of this underlying model, we've implemented a liquid reasoning transformer. The model does the thinking

transformer. The model does the thinking by refining a single reasoning token over multiple sets of reasoning. The

process is roughly like first go through an underlying transformer model proposing an update which is a possible move. The proposal goes through a

move. The proposal goes through a discard gate which decides whether to keep this proposal then a stop gate which decides whether the readings should stop. If so engine outputs an

should stop. If so engine outputs an action value or behavior result if not the process is repeated and a step of reasoning is done. Note that reasoning is described as liquid because it stops

adaptively based on how complex the input position is. For example, it will stop sooner on an easier puzzle and go more steps on a difficult one. We aim to test whether reasoning and architectural

improvement to the baseline model can improve the engine performance. Due to

time and hardware constraints, the model could not be trained to full performance. Despite that, we we have

performance. Despite that, we we have evaluated the engine on puzzles and tournament matchups against Stockfish on different levels. Our results are as

different levels. Our results are as follows. For this puzzles, we found that

follows. For this puzzles, we found that liquid reasoning made a slight improvement.

Loading...

Loading video analysis...