Design and Implementation of EinStein Würfelt Nicht Program Monte_Alpha

Chen, Chih-Hung; Chiu, Sin-Yi; Lin, Shun-Shii

doi:10.3390/electronics12132936

Open AccessArticle

Design and Implementation of EinStein Würfelt Nicht Program Monte_Alpha

by

Chih-Hung Chen

^1,*

,

Sin-Yi Chiu

² and

Shun-Shii Lin

²

¹

Department of Accounting Information, National Taipei University of Business, Taipei 10051, Taiwan

²

Department of Computer Science and Information Engineering, National Taiwan Normal University, Taipei 11677, Taiwan

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(13), 2936; https://doi.org/10.3390/electronics12132936

Submission received: 26 May 2023 / Revised: 30 June 2023 / Accepted: 30 June 2023 / Published: 4 July 2023

(This article belongs to the Special Issue Recent Advances in Data Science and Information Technology)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

The game of EinStein würfelt nicht involves an element of uncertainty due to die rolling, which poses a big challenge in the development of computer game programs. However, the intriguing nature of probabilistic elements has made this game popular in computer game competitions. This study aimed to develop a high-strength EinStein würfelt nicht program that utilizes an efficient bitboard representation for the game board as well as pre-established probability distribution tables and extensively uses bitwise operations to improve the efficiency of game tree expansion. Additionally, this study attempted to replace random simulation with an evaluation function to enhance the accuracy of the Upper Confidence bounds applied to Trees algorithm. Through this design, we improved the strength of our program, and we hope that this program will be able to achieve additional excellent results in future computer game tournaments.

Keywords:

EinStein würfelt nicht; bitboard; table lookup; Upper Confidence bounds applied to Trees; Monte Carlo Tree Search

1. Introduction

EinStein würfelt nicht [1], designed by Ingo Althöfer in 2004, is a two-player board game in which each player has six pieces. The players take turns rolling a die to determine which piece to move. It is a game with complete information but has a probabilistic nature, and designing a strong EinStein würfelt nicht AI is not an easy task due to the probabilistic uncertainty of the game. However, the game’s combination of challenge and entertainment has made EinStein würfelt nicht a popular game in computer game tournaments such as the International Computer Game Association (ICGA) [2], the Taiwanese Association for Artificial Intelligence (TAAI) [3], the Taiwan Computer Game Association (TCGA) [4], and the Chinese Association for Artificial Intelligence (CAAI) [5].

EinStein würfelt nicht is played on a 5 × 5 board, with both red and blue players each having six pieces numbered from one to six. At the start of the game, the pieces can be placed at random or in a particular order within their respective triangular positions, as shown in Figure 1a. The movement rules are as follows: red pieces can only move one square to the right, bottom-right, or down, while blue pieces can only move one square to the left, top-left, or up. Then, the two players take turns rolling the die to determine which piece they can move. If there is already a piece in the destination square, it must be removed from the board, regardless of whether it belongs to the opponent or the player, as shown in Figure 1b. If the die lands on a number corresponding to a piece that is no longer on the board, the player can choose to move a piece whose number is next-higher or next-lower to the rolled number, as shown in Figure 1c. The goals for both red and blue players are situated in opposite corners, as indicated by the flags in Figure 1d. The winner is the player who reaches the goal first with one piece or who captures all of their opponent’s pieces. Therefore, the outcome of an EinStein würfelt nicht game can only be a win or loss, and there are no draws.

In most games, having more pieces typically confers an advantage, but this is not the case in EinStein würfelt nicht. When there are more pieces on the board, the probability of moving any particular piece decreases. For example, if a player has all 6 pieces on the board, the probability of moving any given piece each turn is

\frac{1}{6}

. This makes it difficult to move a specific piece to the goal quickly. However, if a player has only one piece left on the board, the probability of moving that piece each turn is one, thereby allowing this piece to reach the goal quickly. Therefore, strategically reducing one’s own pieces on the board can be a smart tactic. However, if a player has too few pieces on the board, there is also a relative increase in risk because the player may not have a backup force to eliminate enemy forces that are on the verge of victory.

The rules of EinStein würfelt nicht might seem straightforward, but the gameplay is filled with strategy, with striking a balance between offense and defense presenting a significant challenge. Therefore, this study aimed to develop a high-strength EinStein würfelt nicht program and validate its game-playing abilities through participation in computer games tournaments. This paper extends our previous work presented in [7] by correcting errors in the probability distribution table and by replacing random simulation with an evaluation function, which significantly improved the strength of our EinStein würfelt nicht program. We also hope that these improvements will enable our program to achieve additional excellent results in future competitions.

The remainder of this paper is organized as follows: Section 2 describes the background of the Upper Confidence bounds applied to Trees (UCT) algorithm [8,9]. Section 3 provides a brief overview of related works on the topics under investigation. Section 4 introduces the design and implementation of our EinStein würfelt nicht program, Monte_Alpha. Section 5 presents the experimental results, and Section 6 describes the tournament results. Section 7 introduces further enhancements, and the conclusions are reported in Section 8.

2. Background

The Monte Carlo method [10] is a statistical technique that enables numerical solutions to intricate problems via random sampling. In 2006, Coulom introduced the Monte-Carlo Tree Search (MCTS) [11], which applies the Monte Carlo method to game tree search. Later that same year, Kocsis and Szepesvári formulated the Upper Confidence bounds applied to Trees (UCT) algorithm [8] to address MCTS’s shortcomings.

The UCT algorithm uses the Monte Carlo methods to sample the search space and utilizes the multi-armed bandit strategy [12] to determine the subsequent exploration path. UCT allocates more computational resources toward the most promising areas of the game tree, as determined by the outcomes of the Monte Carlo simulations. If given unlimited computational resources, the UCT is theoretically capable of converging toward the optimal solution. The process of the UCT algorithm, depicted in Figure 2, consists of the following four stages [9]:

Selection: The process begins at the root node and descends through the tree until reaching a leaf node that has unvisited child nodes. In this study, we employed the UCB’ Formula (1) [9] to maintain a balance between the aspects of exploitation and exploration within the search tree.

U C B' ({c h i l d}_{i}) = \{\begin{matrix} \frac{w_{i}}{n_{i}} + \sqrt{\frac{2 \ln n_{p}}{n_{i}}}, c u r r e n t n o d e i s t h e 1 s t p l a y e r \\ (1 - \frac{w_{i}}{n_{i}}) + \sqrt{\frac{2 \ln n_{p}}{n_{i}}}, c u r r e n t n o d e i s t h e 2 n d p l a y e r \end{matrix}

(1)

where

n_{i}

stands for the total number of simulations that the i-th child node (

{c h i l d}_{i}

) has undergone,

w_{i}

stands for the total number of wins within

n_{i}

, and

n_{p}

stands for the total number of simulations that the current (parent) node has undergone. Therefore, the term

\frac{w_{i}}{n_{i}}

represents the win rate of the i-th child from the perspective of the first player, and the term

(1 - \frac{w_{i}}{n_{i}})

represents the win rate of the i-th child from the perspective of the second player. The first component (win rate) of the UCB’ formula promotes the exploitation of nodes with higher rewards. The other component (

\sqrt{\frac{2 \ln n_{p}}{n_{i}}}

) encourages the exploration of nodes that have been visited less frequently. In each layer of the game tree, the child node with the highest UCB’ value is selected as the most promising move.

2.: Expansion: A child node (or multiple) is expanded to the existing leaf node unless the selected node represents the end of the game. In this work, we restricted the expansion to a single child node for every simulated game.
3.: Simulation: Execute a random playout (or multiple) from the newly expanded node until the game concludes. In this study, we limited ourselves to a single random playout for each game simulation.
4.: Backpropagation: Accumulate the game’s results from each playout, initiating from the newly expanded node and proceeding upwards to the root node.

3. Related Works

After the release of EinStein würfelt nicht in August 2004, Althöfer guided Schäfer in the development of an EinStein würfelt nicht program over the following year [13]. This was the first publicly available EinStein würfelt nicht program that utilized the Minimax algorithm [14], the Monte Carlo method [10], and heuristics as its computational techniques. Since then, many stronger EinStein würfelt nicht programs have used the Minimax algorithm. In 2011, Lorentz [15] used Monte Carlo Tree Search to develop an EinStein würfelt nicht program that could reach the same level as the Minimax algorithm. Subsequently, most programs have been implemented using the Monte Carlo Tree Search (MCTS) algorithm [15,16,17,18,19]. Li et al. [17] optimized the Upper Confidence bounds applied to Trees (UCT) algorithm as a node selection strategy [20], significantly enhancing the strength of EinStein würfelt nicht programs. Wang et al. [19] proposed an optimized UCT algorithm with an evaluation function that provided a more precise evaluation of the game situation.

However, many studies have designed EinStein würfelt nicht programs based on the Monte Carlo method and have implemented fine-tuned evaluations [18,19,21,22,23,24]. Zhang et al. [24] designed a set of evaluation functions targeting the distances of pieces and the probability of pieces being rolled, achieving a good balance between offense and defense in the program. Additionally, some EinStein würfelt nicht programs continue to use conventional search algorithms from the Minimax series. For example, Li et al. [25] proposed an Offensive and Defensive Expect Minimax Algorithm to find optimal moves and achieved a high win rate using the traditional Expect Minimax Algorithm. Bonnet and Viennot [26] used a *-Minimax algorithm [27,28] that included both α-cuts and β-cuts to prune branches while executing expectiminimax.

In addition, regarding the theoretical research on EinStein würfelt nicht, Turner [29] established an endgame database in 2012 to analyze the game records of the ICGA 2011 EinStein würfelt nicht competition. Lu et al. [22] discussed the optimal placement of pieces on the initial board to increase the chances of winning. Furthermore, Bonnet and Viennot [30] not only analyzed the optimal solutions when both players have only one piece remaining but also calculated the optimal winning probability on a 4 × 4 board size using the Nash equilibrium [26].

In recent years, the prevalent approach for implementing programs in EinStein würfelt nicht has been the utilization of the Monte Carlo Tree Search algorithm. Advanced EinStein würfelt nicht programs have adopted algorithms from the AlphaZero series for training purposes [31,32]. Hsueh et al. [33] employed the AlphaZero algorithm to learn the theoretical values and optimal strategies for stochastic games. This is due to the fact that the theoretical values for stochastic games are anticipated win rates rather than simple win, loss, or draw outcomes. Furthermore, some programs have incorporated techniques that have combined the MCTS algorithm with n-tuple networks [34,35].

4. Design and Implementation of Monte_Alpha

This section introduces the design and implementation of our EinStein würfelt nicht program, Monte_Alpha. We will elaborate on three aspects: the representation of the game board, the search algorithm, and the design of the die roll. The following sections will explain each of these three parts in detail.

4.1. Bitboard Representation

Monte_Alpha uses a one-dimensional array data structure to record the state of the board. In order to make the program execution more efficient, many bitwise operations are used in the development of the program. Therefore, the board representation is also presented in hexadecimal format. The Board (which comprises 256 spaces (16

\times

16 elements)) is used to record the state of the board and is as shown in Figure 3. Herein, the value of the board position, Board [position], ranges from −1 to 12, where −1 denotes that this position is outside of the range of the game board, 0 indicates that there is no piece in this position, 1 to 6 represent the blue pieces numbered 1 to 6, respectively, and 7 to 12 represent the red pieces numbered 1 to 6, respectively.

Additionally, the used fields are the 25 squares in the orange area (0x11~0x55) in Figure 3. Although this design scheme wastes a large amount of storage space and is not as efficient as the pure bitboard scheme [35,36], it has the following advantages:

Fast move generation: Taking Figure 3 as an example, the red piece with the number 3 (0x44) can move to the right (0x45), down (0x54), or diagonally down to the right (0x55). The move can be generated quickly by adding an increment (INC). The operation is as follows:

R e d (3) = 0 x 44 + I N C

(2)

where INC = {0x01, 0x10, 0x11}.

Fast validation of move legality: The performance of the bitboard is much faster than using a two-dimensional array data structure [37]. It can be quickly determined whether the movement of a piece is within the valid range of the board. Continuing the example in Figure 3, suppose the right move (0x45) for the red piece with number 3 is expanded. The validation operation is as follows:

B o a r d [0 x 45]! = - 1

(3)

Since there is no piece in position 0x45,

B o a r d [0 x 45]

= 0 and the value of Formula (3) is true, which means the right move (0x45) for the red piece with the number 3 is legal. In another example, suppose the diagonally down move (0x36) for the red piece with the number 6 is expanded. The validation operation is as follows:

B o a r d [0 x 36]! = - 1

(4)

Since the position 0x36 is outside of the range of the game board, as shown in the gray area in Figure 3, Board [0x36] = −1 and the value of Formula (4) is false, which means the diagonally down move (0x36) for the red piece with number 6 is illegal.

Quick access to board coordinates: Continuing the example in Figure 3, suppose the right move (0x45) is selected. The board coordinates can be obtained quickly through the following bitwise operations:

R o w = 0 x 45 ≫ 4

(5)

C o l = 0 x 45 & 0 x 0 F

(6)

where its corresponding coordinates are the fourth row (high-order hexadecimal digit) and the fifth column (low-order hexadecimal digit).

Convenient for developers to debug: Using a two-dimensional array or tuple has the advantage of high readability while using a bitboard has the advantage of better performance. Although the overall execution performance is a little lower than when using a pure bitboard, the representation of our bitboard is as simple and clear as using a two-dimensional array.

4.2. Search Algorithm

Due to the probabilistic nature of the die-rolling phase in Einstein würfelt nicht, using the Monte Carlo method represents a good option. Therefore, the search algorithm of Monte_Alpha adopts the Upper Confidence bounds applied to Trees (UCT) algorithm introduced in Section 2, which uses the UCB’ formula in Monte Carlo Tree Search (MCTS). Because of the die-rolling steps involved in Einstein würfelt nicht, the expansion of the game tree becomes more complex. As a result, the original UCT cannot be directly applied to Einstein würfelt nicht. Figure 4 illustrates the search algorithm architecture of Monte_Alpha. The root node is special because its board state includes the number of points rolled on the die, determining which pieces (two pieces at most) can be moved, and each piece has at most three possible moves. In the example in Figure 4, the given board comes with a die roll of 4, but red piece number 4 is no longer on the board. Therefore, the red player can choose either piece number 3 or 6, where red piece number 3 has three possible moves, and red piece number 6 can only move downward to capture blue piece number 6.

Furthermore, the nodes expanded below the root node are different from the root node. The expansion process requires simulating the rolling of the die to determine which pieces can be moved. Therefore, all possible die rolls from one to six must be considered. In the example in Figure 4, the only blue piece that remains on the board is piece number 6, so no matter what number the die rolls, the blue player can only move piece number 6. In addition, the expansion process of the game tree follows the four steps of UCT. Finally, the move with the highest number of simulations is selected as the best decision for the given game state.

4.3. Design of Die Rolling

When designing the die-rolling action, the most intuitive method would be to first randomly generate a number between one and six and then check whether the piece with this number is still on the board. If the piece with this number is no longer on the board, the program would then look to both the left and right to find the piece with the closest number that is still on the board. For instance, as illustrated in Figure 3, suppose the red player rolls a three, and the red piece labeled number 3 is still on the board. This is the best case and requires searching once. Alternatively, suppose the blue player rolls a one, and the blue piece labeled number 1 is no longer on the board. The program would then need to look to the right to find the adjacent piece that is still on the board, which is piece number 6. This is the worst case and requires searching six times. In general, if the rolled piece is no longer on the board, the program needs to search for the adjacent piece, which requires searching approximately three times on average, but the program also needs to check the pieces on the left and the right, so the overall number of searches is approximately six.

During implementation, Monte_Alpha uses bitwise operations, adding two variables to record the state of the blue and red pieces. For instance, with reference to Figure 3, the binary value of the state of the blue pieces is as follows:

b l u e_a l i v e = 0 b 100000

(7)

where the least significant bit represents piece number 1, and the most significant bit represents piece number 6. Similarly, the binary value of the red player’s pieces state is as follows:

r e d_a l i v e = 0 b 100101

(8)

To check whether the blue player still has pieces on the board, it is only necessary to determine whether blue_alive is not equal to zero. For example, in the right branch of Figure 4, red piece number 6 will capture blue piece number 6. Afterward, blue_alive will become zero, indicating that the red player has captured all enemies and won the game. Similarly, the red player can also quickly determine whether there are still pieces in play using this command. Since there are only 2⁶ combinations of a single player’s piece status, a probability distribution table can be pre-established for all piece combinations to speed up the program’s processing time.

In order to accelerate the efficiency of rolling the die, the probability of each die number appearing is converted into the number of times it appears and is stored in a probability table to speed up obtaining information about which pieces can be moved. Taking the red pieces in Figure 3 as an example, the binary representation of the red piece status is 100101, which means that the pieces numbered 1, 3, and 6 are still on the board. The probabilities of those three pieces that can be moved are

\frac{1.5}{6}

(the probability of rolling a 1 is

\frac{1}{6}

, and there is a half chance of moving piece 1 when rolling a 2, with a probability of

\frac{0.5}{6}

),

\frac{2.5}{6}

(the probability of rolling a 3 is

\frac{1}{6}

, and there is a half chance of moving piece 3 when rolling a 2, 4 or 5, with a probability of

\frac{0.5}{6} + \frac{0.5}{6} + \frac{0.5}{6}

), and

\frac{2}{6}

(the probability of rolling a 6 is

\frac{1}{6}

, and there is a half chance of moving piece 6 when rolling 4 or 5, with a probability of

\frac{0.5}{6} + \frac{0.5}{6}

), respectively.

Position 100,101 in the probability table is a 10,000-element array with a value of one for elements 0 to 2499, a value of three for elements 2500 to 6666, and a value of six for elements 6667 to 9999. This indicates that when rolling the die 10,000 times, there are 2500 chances to move piece 1, 4167 chances to move piece 3, and 3333 chances to move piece 6. The contents of the array are as follows:

p r o b i l i t y [100101] = {1, \dots, 1, 3, \dots, 3, 6, \dots, 6}

(9)

In this way, it only takes checking which pieces can be moved once by looking up the table. The operation is as follows:

m o v e = p r o b i l i t y [100101] [r a n d () % 10000

(10)

During the process of game tree expansion, each simulation needs to go through a series of moves to reach the end state of the game. Assuming the depth of the simulation is

d

, each simulation with Monte_Alpha only takes

d

times, whereas the conventional method requires

6 \times d

times. Therefore, as long as the number of simulations increases, the acceleration effect of Monte_Alpha is considerable, and this is where the advantage of Monte_Alpha lies.

5. Experimental Results

This study conducted experiments on a personal computer equipped with an AMD R5-3600 CPU (6 cores; 12 threads) and a Micron Ballistix DDR4-3200 16G RAM (8G × 2). The counterparts for comparison are three other Einstein würfelt nicht programs based on different search algorithms:

Monte_Alpha: This is the EinStein würfelt nicht program developed in this paper. It can simulate approximately 200,000 times per second.
Greedy: This strategy uses the shortest path algorithm to move a piece toward the goal.
UCT_cut: This strategy uses the UCT algorithm with pruning and can simulate approximately 50,000 times per second.
ABS_eval: This strategy uses alpha–beta pruning combined with a transposition table and a game evaluation function based on the level of threat [38].

At the beginning of the experiment, we conducted 400 games (200 as the first player and 200 as the second player) to compare Greedy and Monte_Alpha with different numbers of simulations. Table 1 shows the win rates of different versions of Monte_Alpha against Greedy. From the experimental results, it can be seen that the version of Monte_Alpha with 10,000 simulations cannot outperform the Greedy strategy, with a win rate of only 40.75%. However, all versions of Monte_Alpha with more than 50,000 simulations significantly outperform Greedy. However, due to the probabilistic nature of the game, the win rate exhibits slight fluctuations. The experimental results show that increasing the number of simulations will not continue to increase the win rate. It can be inferred that the statistical results obtained from a number of simulations exceeding 50,000 converged to the local optimal solution, so the win rate of Monte_Alpha against Greedy’s shortest path is approximately 70% (±5%).

Next, we conducted extensive matches between Monte_Alpha and the three other strategies, with each program allotted the same execution time (approximately 1.2 s per move). The results of these matches are shown in Table 2. The experimental results demonstrate that Monte_Alpha does indeed have a win rate of approximately 70% against Greedy. Additionally, the strengths of Monte_Alpha, UCT_cut, and ABS_eval are comparable, which confirms that simply increasing the number of simulations does not significantly improve the strength of the program.

6. Computer Game Tournaments

Our EinStein würfelt nicht program, Monte_Alpha, participated in the ICGA 2022 and TAAI 2022 computer game tournaments. The results of each competition will be described in the following section.

6.1. ICGA 2022

The ICGA 2022 competition was held from July 23 to 29 and was conducted online. Although there were only three teams participating in the EinStein würfelt nicht competition this year, the opponents were all very strong. The techniques used by Monte_Alpha have been introduced in Section 4 of this paper. Meanwhile, Reinstein utilized the bitboard, MCTS, and n-tuple networks techniques. Additionally, YJ_EinStein employed the bitboard and alpha–beta pruning techniques.

During the regular competition, each pair of competitors played 12 games, with each playing 6 games as the first player and 6 games as the second player. At the end of the regular competition, both leading teams, Monte_Alpha and Reinstein, scored 15 points, leading them into the overtime playoffs. The overtime match consisted of an additional six games, with each team having an equal number of turns to play first and second. However, the two teams remained neck and neck during the overtime match, resulting in another tie. As a result, they proceeded to a second overtime match. During the second overtime match, Monte_Alpha maintained a lead throughout, ultimately securing a 5 to 1 victory and winning the gold medal for the EinStein würfelt nicht competition at ICGA 2022. Table 3 presents the results of the competition. The related competition rules and competition platform can be found on the official website of the ICGA.

6.2. TAAI 2022

The TAAI 2022 competition was held in Tainan on December 3rd. This year, there were four teams participating, including Monte_Alpha and Reinstein, as well as two newcomers, Deku and Ssunoo. Since Section 4 of this paper was published on December 1st, some participants knew about our design and implemented our method in their program. Deku used UCT coupled with pruning techniques (UCT_cut, as mentioned in Section 5) and the method mentioned in Section 4.3; Ssunoo used alpha–beta search paired with a fine-tuned evaluation function (ABS_eval, as mentioned in Section 5) and the method mentioned in Section 4.3.

The competition made use of the automated competition platform developed by National Taipei University. This platform significantly reduced the time required for human operation. Furthermore, to minimize the influence of luck on the rankings, each pair of competitors played 100 games during the regular competition, with each player going first in 50 games. At the end of the regular competition, Deku led all competitors, while Monte_Alpha and Ssunoo tied for second place, and Reinstein trailed behind all of the other competitors. As Monte_Alpha and Ssunoo had the same score in the regular competition, they went into overtime. The overtime competition consisted of 100 games, and Monte_Alpha and Ssunoo tied again with 50 games each, leading to a second overtime match. Due to time constraints, the second overtime competition consisted of only 10 games, and ultimately, Ssunoo defeated Monte_Alpha 6 to 4. Unfortunately, Monte_Alpha only achieved third place in this competition. Table 4 shows the results of the EinStein würfelt nicht competition at TAAI 2022, with the competition rules following those of ICGA 2022.

7. Further Enhancement

Given that Deku and Ssunoo leveraged the method outlined in Section 4.3 to enhance the efficiency of their programs, ultimately surpassing Monte_Alpha, this study has made several improvements to Monte_Alpha, which are detailed in the sections below.

7.1. Enhancement of Die Rolling

In the design of the method described in Section 4.3, the probability table is an array that includes 10,000 elements, introducing a slight error. Let us take the red piece in Figure 3 as an example. For position 100,101 in the probability table, there should be 2500 (

10,000 \times \frac{1.5}{6}

) values of 1, 4166.67 (

10,000 \times \frac{2.5}{6}

) values of 3, and 3333.33 (

10,000 \times \frac{2}{6}

) values of 6. Herein, we observe that if the denominator of the probability of a piece appearing is expanded to 12, the numerator will also be an integer. This implies that we only need 12 elements in the probability table and that the number of times each piece appears will be an integer. Therefore, position 100,101 in the probability table in Figure 3 changes to having 3 (

12 \times \frac{1.5}{6}

) values of 1, 5 (

12 \times \frac{2.5}{6}

) values of 3, and 4 (

12 \times \frac{2}{6}

) values of 6. Consequently, the calculation of the probability of a piece appearing becomes more accurate and avoids errors.

In addition, the revised probability is calculated using an integer data type, which also enhances the performance of reading from the probability table. Table 5 lists the time taken to access the probability table using the old and new designs, with the new design achieving a speedup ratio of 1.2 times.

7.2. Enhancement of Simulation

The results of a single Upper Confidence bounds applied to Trees (UCT) simulation are not reliable. A larger number of simulations is required to gradually converge toward the optimal solution based on the statistical results. To make the simulation results more reliable, this study replaced the simulation phase of UCT with an evaluation function [9,39]. As a result, the simulation process no longer needs to unfold until the end of the game, and, moreover, the results of excessively long-term simulations are also less reliable.

Herein, our evaluation function was designed based on a goal-oriented concept and was specifically designed using the reciprocal of the shortest distance from the pieces to the goal. Taking Figure 3 as an example, if the red player rolls a three, corresponding to a specific piece, and the shortest distance from that piece to the goal is one, it will receive an update of one point. Similarly, if the red player rolls a six and the shortest distance from that piece to the goal is three, it will receive an update of

\frac{1}{3}

of a point. This design aligns with the idea that the closer a piece is to the goal, the higher its winning probability.

As the improvements mentioned in Section 7.1 only corrected probability errors and improved the speed of table lookup without significantly impacting the strength of the program, this study did not conduct independent testing specifically for the method in Section 7.1. However, a new version was developed incorporating improvements from both 7.1 and 7.2 called Monte_Alpha v2. We conducted 5000 matches between Monte_Alpha v2, the previous version of Monte_Alpha, and Deku. The results in Table 6 show that Monte_Alpha v2 won 3300 matches, achieving a win rate of 66% against the previous version of Monte_Alpha. Additionally, in the 5000 matches against Deku, Monte_Alpha v2 won 2684 matches, achieving a win rate of 53.68%. These experimental results demonstrate the significant effect of the improvements discussed in Section 7.

8. Discussion

The EinStein würfelt nicht program developed in this study called Monte_Alpha utilizes an efficient bitboard representation for the game board. It is complemented with an efficient die-rolling design scheme and extensively employs bitwise operations, enabling the program to quickly expand nodes during Upper Confidence bounds applied to Trees (UCT) search. The experimental results in Section 5 demonstrate that although our program’s execution speed outperforms other competitors, continuously increasing the number of simulations beyond a certain threshold does not effectively enhance the program’s playing strength. In addition, this study attempted to replace random simulation with an evaluation function, aiming to enhance the quality of UCT and make the search process both faster and more accurate. The experimental results of the new version Monte_Alpha v2 in Table 6 demonstrate that using an evaluation function instead of random simulation significantly improves the program’s playing strength. We hope that this program could achieve commendable results in future computer game competitions.

Currently, our EinStein würfelt nicht program has achieved a high level of efficiency, so improving its execution speed is no longer our primary focus for future enhancements. Instead, we plan to explore developing more advanced evaluation functions. For example, we will consider the opponent’s pieces as they appear on the path of movement or strategically reduce the player’s own number of pieces to increase the chances of movement for other pieces. Additionally, integrating the utilization of opening [13] and endgame databases [29,30] and incorporating calculations based on the Nash equilibrium [26,40] are also worth considering. These ideas may enhance the accuracy of the simulation results and further elevate the program’s performance.

Author Contributions

Conceptualization, C.-H.C. and S.-S.L.; formal analysis, C.-H.C.; investigation, C.-H.C.; methodology, C.-H.C., S.-Y.C. and S.-S.L.; project administration, S.-S.L.; software, S.-Y.C.; supervision, S.-S.L.; validation, C.-H.C. and S.-Y.C.; writing—original draft, C.-H.C.; writing—review and editing, C.-H.C. and S.-S.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the National Science and Technology Council (formerly known as the Ministry of Science and Technology, R.O.C.) under grants MOST 110-2221-E-003-006-MY3 and MOST 111-2221-E-003-017-MY2.

Data Availability Statement

Not applicable.

Acknowledgments

The authors thank Bo-Dong Huang, Chung-Wen Wu, and Yun-Jie Ho from the Department of Computer Science and Information Engineering at National Taiwan Normal University for providing programs with different strategies. Their contributions greatly enhanced the completeness of the experimental results of this research.

Conflicts of Interest

The authors declare no conflict of interest.

References

On the Origins of EinStein Würfelt Nicht! Available online: https://althofer.de/origins-of-ewn.html (accessed on 14 May 2023).
International Computer Game Association (ICGA). Available online: https://icga.org/ (accessed on 14 May 2023).
Taiwanese Association for Artificial Intelligence (TAAI). Available online: http://www.taai.org.tw/ (accessed on 14 May 2023).
Taiwan Computer Game Association (TCGA). Available online: https://www.tcga.tw/ (accessed on 14 May 2023).
Chinese Association for Artificial Intelligence (CAAI). Available online: http://computergames.caai.cn/ (accessed on 14 May 2023).
Sie, C.L.; Lin, S.S. Design and Implementation of EinStein Würfelt Nicht Platform. In Proceedings of the 2016 Computer Games Workshop (TCGA 2016), Taichung, Taiwan, 4–5 June 2016. [Google Scholar]
Chiu, S.Y.; Chen, C.H.; Lin, S.S. Design and Implementation of EinStein Würfelt Nicht Program-Monte_Alpha. In Proceedings of the 27th International Conference on Technologies and Applications of Artificial Intelligence (TAAI 2022), Tainan, Taiwan, 1–3 December 2022. [Google Scholar]
Kocsis, L.; Szepesvári, C. Bandit Based Monte-Carlo Planning. In Proceedings of the 17th European Conference on Machine Learning (ECML’06), Berlin, Germany, 18–22 September 2006. [Google Scholar]
Chen, C.H. Improving the AlphaZero Algorithm in the Playing and Training Phases. PhD Thesis, National Taiwan Normal University, Taipei, Taiwan, 2021. [Google Scholar]
Fishman, G.S. Monte Carlo: Concepts, Algorithms, and Applications; Springer: New York, NY, USA, 1995. [Google Scholar]
Coulom, R. Efficient Selectivity and Backup Operators in Monte-Carlo Tree Search. In Proceedings of the 5th International Conference on Computers and Games (CG’06), Turin, Italy, 29–31 May 2006. [Google Scholar]
Auer, P.; Cesa-Bianchi, N.; Fischer, P. Finite-time Analysis of the Multiarmed Bandit Problem. Mach. Learn. 2002, 47, 235–256. [Google Scholar] [CrossRef]
Schäfer, A. Rock’n’Roll, A Cross-Platform Engine for the Board Game EinStein Würfelt Nicht! Student Research Project Report. Friedrich Schiller University Jena: Jena, Germany, 14 December 2005. [Google Scholar]
Knuth, D.E.; Moore, R.W. An analysis of alpha-beta pruning. Artif. Intell. 1975, 6, 293–326. [Google Scholar] [CrossRef]
Lorentz, R.J. An MCTS program to play EinStein W¨urfelt Nicht! In Advances in Computer Games; van den Herik, H.J., Plaat, A., Eds.; Springer: Heidelberg, Germany, 2012; Volume 7168, pp. 52–59. [Google Scholar]
Liu, Y.Y.; Meng, K.; Miao, H. Phased Algorithm Strategy Analysis of WTN-EinStein Würfelt Nicht! In Proceedings of the 2019 Chinese Control and Decision Conference (CCDC). Nanchang, China, 3–5 June 2019. [Google Scholar]
Li, X.; Cai, Y.; Yu, L.; Wu, L.; Bi, X.; Zhao, Y.; Liu, B. A Modification of UCT Algorithm for WTN-EinStein Würfelt Nicht! Game. In Proceedings of the 2020 IEEE/CIC International Conference on Communications in China (ICCC), Chongqing, China, 9–11 August 2020; pp. 640–644. [Google Scholar]
Wang, D.; Wang, X.; Wang, Y.; Li, J. Research and Implementation of EinStein Würfelt Nicht! Algorithm. In Proceedings of the 29th Chinese Control and Decision Conference (CCDC), Chongqing, China, 28–30 May 2017. [Google Scholar]
Wang, T.; Yang, Z.; Xie, Z. Application and optimization of the UCT algorithm in Einstein würfelt nicht! In Proceedings of the 2022 34th Chinese Control and Decision Conference (CCDC). Hefei, China, 21–23 May 2022. [Google Scholar]
Browne, C.; Powley, E.; Whitehouse, D.; Lucas, S.M.; Cowling, R.I.; Rohlfshagen, P.; Tavener, S.; Perez, D.; Samothrakis, S.; Colton, S. A Survey of Monte Carlo Tree Search Methods. IEEE Trans. Comput. Intell. AI Games 2012, 4, 1–43. [Google Scholar] [CrossRef] [Green Version]
Wang, X.; Zhang, H.; Wang, Y.; Qi, J.; Zhou, B. Research of EinStein Würfelt Nicht! Monte Carlo Algorithm Based on Optimized Evaluation. In Proceedings of the 26th Chinese Control and Decision Conference (2014 CCDC), Changsha, China, 31 May–2 June 2014. [Google Scholar]
Lu, J.; Wang, X.; Li, Y.; Wang, Y.; Yu, T. EinStein Würfelt Nicht! Strategies Research and Algorithm Optimization. In Proceedings of the 27th Chinese Control and Decision Conference (2015 CCDC), Qingdao, China, 23–25 May 2015. [Google Scholar]
Yu, T.; Wang, X.; Yang, J.; Li, C.; Lu, J. Research of the Value of Pieces in EWN Based on Monte Carlo Algorithm. In Proceedings of the 28th Chinese Control and Decision Conference (CCDC), Yinchuan, China, 28–30 May 2016. [Google Scholar]
Zhang, X.C.; Li, Q.; Wang, W.W.; Huang, L.C.; Sun, Y.J.; Peng, L.R.; Li, Y. Study of Valuation Function and Search Strategy in Random Game. In Proceedings of the 2018 Chinese Control and Decision Conference (CCDC), Shenyang, China, 9–11 June 2018. [Google Scholar]
Li, X.; Guang, Y.; Wu, L.; Zhang, Y. An Offensive and Defensive Expect Minimax Algorithm in EinStein Würfelt Nicht! In Proceedings of the 27th Chinese Control and Decision Conference (2015 CCDC). Qingdao, China, 23–25 May 2015. [Google Scholar]
Bonnet, F.; Viennot, S. Toward Solving EinStein Würfelt Nicht! In Proceedings of the Advances in Computer Games (ACG 2017). Leiden, The Netherlands, 3–5 July 2017. [Google Scholar]
Ballard, B.W. The *-minimax search procedure for trees containing chance nodes. Artif. Intell. 1983, 21, 327–350. [Google Scholar] [CrossRef]
Hauk, T.; Buro, M.; Schaeffer, J. Rediscovering *-Minimax search. In Computers and Games; van den Herik, H.J., Björnsson, Y., Netanyahu, N.S., Eds.; Springer: Heidelberg, Germany, 2006; Volume 3846, pp. 35–50. [Google Scholar]
Turner, W. EinStein Würfelt Nicht—An Analysis of Endgame Play. ICGA J. 2012, 35, 94–102. [Google Scholar] [CrossRef]
Bonnet, F.; Viennot, S. Analytical Solution for EinStein Würfelt Nicht! with One Stone. In Proceedings of the Advances in Computer Games (ACG 2017), Leiden, The Netherlands, 3–5 July 2017. [Google Scholar]
Tsao, S.G.; Lin, S.S. The Initial Research of EinStein würfelt nicht! with Deep Learning. In Proceedings of the 2017 Computer Games Workshop (TCGA 2017), Penghu, Taiwan, 5–8 May 2017. [Google Scholar]
Lin, P.R. Comparison between Reinforcement Learning Structure and Expectiminimax Implementation of EinStein würfelt nicht! Master Thesis, National Taiwan Normal University, Taipei, Taiwan, 2019. [Google Scholar]
Hsueh, C.H.; Ikeda, K.; Wu, I.C.; Chen, J.C.; Hsu, T.S. Analyses of Tabular AlphaZero on Strongly-Solved Stochastic Games. IEEE Access 2023, 11, 18157–18182. [Google Scholar] [CrossRef]
Chu, Y.J.R.; Chen, Y.H.; Hsueh, C.H.; Wu, I.C. An Agent for EinStein Würfelt Nicht! Using N-Tuple Networks. In Proceedings of the 2017 Conference on Technologies and Applications of Artificial Intelligence (TAAI), Taipei, Taiwan, 1–3 December 2017. [Google Scholar]
Hsu, T.Y. Applying Bitboard and N-Tuple Networks to EinStein Würfelt Nicht! Master Thesis, National Taipei University, Taipei, Taiwan, 20 July.
Browne, C. Bitboard Methods for Games. ICGA J. 2014, 37, 67–84. [Google Scholar] [CrossRef]
Sturtevant, N.R. An Efficient Chinese Checkers Implementation: Ranking, Bitboards, and BMI2 pext and pdep Instructions. In Proceedings of the International Conference on Computers and Games (CG’22), Virtual Conference, 22–24 November 2022. [Google Scholar]
Hsu, T.S.; Hsu, S.C.; Chen, J.C.; Chiang, Y.T.; Chen, B.N.; Liu, Y.C.; Chang, H.J.; Tsai, S.C.; Lin, T.Y.; Fan, G.Y. Computers and Classical Board Games: An Introduction; National Taiwan University Press: Taipei, Taiwan, 2017. [Google Scholar]
Hsueh, C.H.; Wu, I.C.; Tseng, W.J.; Yen, S.J.; Chen, J.C. An Analysis for Strength Improvement of an MCTS-based Program Playing Chinese Dark Chess. Theor. Comput. Sci. 2016, 644, 63–75. [Google Scholar] [CrossRef]
Porter, R.; Nudelman, E.; Shoham, Y. Simple Search Methods for Finding a Nash Equilibrium. Games Econ. Behav. 2008, 63, 642–662. [Google Scholar] [CrossRef]

Figure 1. Key points in EinStein würfelt nicht are demonstrated by ECUI [6], which serves as an EinStein würfelt nicht platform: (a) A particular initialization setting of the ECUI. Red places its pieces in the six squares in the top-left corner, while blue uses the area in the bottom-right corner. Their goal spots are located in the opposite corners. Red pieces can only move one square to the right, bottom-right, or down, as indicated by the red arrows. Blue pieces, on the other hand, can only move one square to the left, top-left, or up, as indicated by the blue arrows. Each player needs to move toward its respective target direction. (b) Red has rolled a 4 and may now either capture its 5, capture its 6, or move to the free square diagonally below, as indicated by the red arrows. (c) The die shows a 3, which red has already lost. The adjacent numbers on the board are 2 and 4. Legal moves are indicated by the arrows. (d) Red wins by capturing all of its opponent’s pieces.

Figure 2. Illustration of the UCT algorithm step by step (adapted from [9]). (a) The 1st round of UCT: The root node, representing the given board, is selected as a leaf node during the selection phase due to the presence of unvisited child nodes. Following this, an unvisited child node, A, is expanded from the root node during the expansion phase. All nodes are initialized to 0/0 upon their creation (expansion), including the root node. During the simulation phase, a random playout is performed from node A until the game is over. Supposing we obtain a loss result, the result is accumulated from the newly expanded node A back to the root node during the backpropagation phase. (b) The 2nd round of UCT: During the selection phase, the root node is selected as a leaf node because it still has an unvisited child node. Then, the child node, B, is expanded in the expansion phase. Supposing we obtain a win result during the simulation phase, finally, the result is accumulated from node B back to the root node during the backpropagation phase. (c) The 3rd round of UCT: Since the root node lacks any unvisited child nodes, the value of the UCB’ formula is initiated during the selection phase to identify a leaf node. In this case, since UCB’(A) is lower than UCB’(B), node B is selected as the leaf node, and the subsequent steps mirror those in previous operations. The 4th round of UCT is the same as round 3. (d) The 5th round of UCT: During the selection stage, the root node chooses UCB’(B) to reach node B, which is the most promising move with the highest UCB’ value from the first player’s viewpoint. After that, node B chooses UCB’(C) to reach node C, which is the most promising move with the highest UCB’ value from the second player’s viewpoint.

Figure 3. Board representation of Monte_Alpha. Assuming that red has pieces numbered 1, 3, and 6 on the board, while blue only has piece number 6 on the board.

Figure 4. An illustration of Monte_Alpha’s search algorithm. Let’s assume that red has pieces numbered 1, 3, and 6 on the board, while blue only has piece number 6. The die is rolled, resulting in a 4. However, the red piece numbered 4 is no longer on the board. Therefore, the red player can choose to move a piece whose number is next-lower (number 3) or next-higher (number 6) to the rolled number 4. The piece numbered 3 can move in three different directions: right, diagonally downward (which leads to the goal), or straight down. Meanwhile, the red piece numbered 6 has only one option: to move downwards to capture the only piece of blue, which is number 6. Now, the game moves to blue’s turn. Given that the only surviving piece for the blue player is numbered 6, irrespective of the die roll outcome, blue’s only option is to move piece number 6.

Table 1. Win rates of Monte_Alpha against Greedy with different simulation numbers. Data obtained from [7].

Simulations	Monte_Alpha
Simulations	Wins	Losses	Win Rate
10,000	163	237	40.75%
50,000	292	108	73.00%
100,000	285	115	71.25%
500,000	264	136	66.00%
1,000,000	262	138	65.50%
1,500,000	289	111	72.25%
2,000,000	300	100	75.00%
2,500,000	271	129	67.75%
3,000,000	271	129	67.75%
3,500,000	277	123	69.25%
4,000,000	283	117	70.75%
4,500,000	273	127	68.25%
5,000,000	291	109	72.75%

Table 2. Win rates of Monte_Alpha against other search algorithms. Data obtained from [7].

Opponent	Monte_Alpha
Opponent	Wins	Losses	Win Rate
Greedy	10,597	4403	70.65%
UCT_cut	7699	7301	51.33%
ABS_eval	7925	7075	52.83%

Table 3. Results of the EinStein würfelt nicht competition at ICGA 2022. Data obtained from [7].

Program	Monte_Alpha	Reinstein	YJ_EinStein
Monte_Alpha	-	6	3
Reinstein	6	-	3
YJ_EinStein	9	9	-
Total	15	15	6
Playoff 1	3	3	-
Playoff 2	5	1	-
Final	8	4	-
Rank	1	2	3

Table 4. Results of the EinStein würfelt nicht competition at TAAI 2022.

Program	Deku	Ssunoo	Monte_Alpha	Reinstein
Deku	-	45	47	35
Ssunoo	55	-	41	60
Monte_Alpha	53	59	-	44
Reinstein	65	40	56	-
Total	173	144	144	139
Playoff 1	-	50	50	-
Playoff 2	-	6	4	-
Final	-	56	54	-
Rank	1	2	3	4

Table 5. Performance of different table sizes.

Size	10⁸ Accesses	Access Number/ms	Speedup
10,000 elements	3342 ms	29,922	1
12 elements	2750 ms	36,364	1.2

Table 6. Win rates of Monte_Alpha v2 against Monte_Alpha and Deku.

Opponent	Monte_Alpha v2
Opponent	Wins	Losses	Win Rate
Monte_Alpha	3300	1700	66%
Deku	2684	2316	53.68%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, C.-H.; Chiu, S.-Y.; Lin, S.-S. Design and Implementation of EinStein Würfelt Nicht Program Monte_Alpha. Electronics 2023, 12, 2936. https://doi.org/10.3390/electronics12132936

AMA Style

Chen C-H, Chiu S-Y, Lin S-S. Design and Implementation of EinStein Würfelt Nicht Program Monte_Alpha. Electronics. 2023; 12(13):2936. https://doi.org/10.3390/electronics12132936

Chicago/Turabian Style

Chen, Chih-Hung, Sin-Yi Chiu, and Shun-Shii Lin. 2023. "Design and Implementation of EinStein Würfelt Nicht Program Monte_Alpha" Electronics 12, no. 13: 2936. https://doi.org/10.3390/electronics12132936

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Design and Implementation of EinStein Würfelt Nicht Program Monte_Alpha

Abstract

1. Introduction

2. Background

3. Related Works

4. Design and Implementation of Monte_Alpha

4.1. Bitboard Representation

4.2. Search Algorithm

4.3. Design of Die Rolling

5. Experimental Results

6. Computer Game Tournaments

6.1. ICGA 2022

6.2. TAAI 2022

7. Further Enhancement

7.1. Enhancement of Die Rolling

7.2. Enhancement of Simulation

8. Discussion

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI