Introduction
Recently I came across M. Gardner's 'New mathematical diversions from scientific American' (NY, Simon and Schuster, 1966) and read the article on self-learning tic-tac-toe machine made from matchboxes. I found it interesting and worth implementing in C++.
One of the first self-learning machines was IBM 704. Designed by A. Samuel in 1959, the program allowed a machine to play draughts, improve game strategy using experience from previous games it played. At first Samuel easily won the games, but as the machine learned how to play, it started beating him at every game played.
For experiments with self-learning machines, it is not necessary even to have a computer, just take a lot of empty matchboxes and colored beads. The happy idea of designing the simple and robust self-learning machine belongs to D. Michie. In the article 'Method of trials and errors' (Penguin Science Survey, 2, 1961), he explains self-learning machine to play tic-tac-toe, which could be assembled from 300 matchboxes. It is called MENACE (Matchbox Educable Noughts And Crosses Engine). Its working is very simple. On every matchbox, the position that could be encountered during the game is pictured. The first move is made by the machine, thus only odd moves' positions could be painted on the matchboxes. Inside every matchbox, the colored beads are placed with every color denoting possible moves. Also the matchboxes contain cardboard corners glued inside and as it is being shaken, one of the beads rolls under it. For the first move matchbox, there are 4 beads of every color, for the third move there are 3 beads of every color and so on until in the matchboxes from the 7th move, there are beads of one color for every move. To determine the next move, just shake the matchbox to get the color of the bead rolled under the cardboard corner. The opened matchboxes remain opened until the end of the game. If the machine wins the game, it is praised by adding 3 beads of the same color that was the bead rolled under the cardboard corner. In case of a draw, only one corresponding color bead is added, and in case the machine loses the game, it is punished by taking out the bead from every opened matchbox's cardboard corner. The more games the machine plays, the more it follows moves leading to victory and avoids the ones leading to defeat. Despite being compared to IBM 704, this machine cannot analyze played games and designs novel strategies from experience. The first tournament between Michie and his MENACE composed of 220 games. At first Michie always punished his machine for lost games, but after the 17th game, the machine started making the first move to the corner square and after the 20th, finished all games in a draw. The tournament ended in Michies' resounding defeat, he lost 8 games out of 10. MENACE became the tic-tac-toe Grand Master.
Background
You should be familiar with the tic-tac-toe game.
Using the Code
To save you from matchbox hurdles, I've written the same program in C++. To start playing with the computer, I supplied with it the player1.dat file which is a trained player playing first always. Run play.bat to play with it. To make your move, type y
and x
coordinates of the empty square. To make a move to [0, 0]
type 00
and enter, to make a move to [1, 0]
type 10
and enter and so on. Just type a string containing the y
coordinate as the first letter and x
coordinate as the second letter.
The tic-tac-toe play could look like this:
012
0 ---
1 ---
2 ---
0.000000 0.000000 0.000000 0.000000 1.000000 0.000000 0.000000 0.000000 0.000000
012
0 ---
1 -x-
2 ---
human move o enter (y,x) coord or '--' to quit: 20
012
0 ---
1 -x-
2 -o-
0.000000 0.000000 0.000000 0.000000 0.000000 1.000000 0.000000
012
0 ---
1 -x-
2 xo-
human move o enter (y,x) coord or '--' to quit: 00
012
0 o--
1 -x-
2 xo-
0.000000 1.000000 0.000000 0.000000 0.000000
012
0 o-X
1 -X-
2 Xo-
player No 1 won
wins: 5868
loses: 214
draws: 10934
finished.
The string of numbers 0.0
and 1.0
are for debugging purposes and are the probabilities of making a particular move for a computer. As you can see, it learned to always make the first move to the central square, which is a win strategy.
The help for the command line arguments is shown below:
argv[1] = 'uc', 'cu', 'cc', 'uu'
'uc' - user vs. computer
'cu' - computer vs. user
'cc' - computer vs. computer
'uu' - user vs. user
argv[2] = number of plays
argv[3] = 'player1.dat'
argv[4] = 'player2.dat'
You may play either as the computer goes first cu
or uc
to make a first move yourself. In cc
regime, the computer plays with itself. That helps you to save time training it yourself to play the game. Just run it this way:
>ccross.exe cc 1000 player1_name player2_name
It will play 1000s of times with itself, creating player1_name
and player2_name
if they do not exist, otherwise reading them from the files. To increase performance, try to redirect console output to the file.
As the computer player encounters novel board configuration, it denotes equal probabilities for performing a move to empty squares. In case it loses the game, it decreases the probabilities of the moves for the board configurations it played, otherwise it increases the probability. In the case of a draw, the probabilities do not change. It keeps every played board configuration and probablities for making a move in a file along with its wins, loses and draw counts.