CSC384 -- Introduction to Artificial Intelligence
Summer 2020


Othello Competition Results

Thank you so much to all of you who chose to submit an agent to the CSC384 Othello Competition this year.

Initially, each agent competed with every other agent on board-dimension of 6 with a timeout of 10 seconds. Agents played each opponent 2x, one as the "light" player and one as the "dark" player. Games where an agent timed out were discarded as were games resulting in draws.

Results from Round One (Regular Season)

Then, the 5 top ranking players played one another on board-dimension of 8 (which is a standard Othello board size) and a timeout of 10 seconds. Agents again played each opponent 2x, one as the "light" player and one as the "dark" player.

Results from Round Two (Finals!)

Finally, the 3 top ranking players played one another on board-dimension of 8 (which is a standard Othello board size) and a timeout of 10 seconds. The results are on the CSC384 YouTube Channel!!

The final round demonstrated Othello AI (agent_wangboh2) to be the winning AI; Othello AI Set 2 (agent_zhanzhir) and --*SQS's BetaGO*-- (agent_shiquish) came in second and third, respectively.

Some "tips" from those with strong performances this year (and last term!) are posted below. Thank again to eveyone who chose to play with us!!!


agent_wangboh2

"The game agent I developed is simply a heuristic alpha-beta search algorithm. I also applied iterative deepening search in order to give a time constraint to my agent and used ordering heuristics based on the result from previous search depth. Generally, the more features the agents have, the higher potential the agent has, but the harder it is to tune the heuristic weights. In my agent, specifically, I used features such as corners, (potential and current) stability, (potential and current) mobility, frontiers, edges, coin parity, position heuristics and the weights to these heuristics are tuned using temporal difference learning. In addition, I divided the game into 19 stages, each stage has a slightly different set of weights. For example, the weights for mobility and coin parity are a lot smaller in the beginning stages. The above approach is able to generate a very strong Othello agent if sufficient time and efforts are spent on the training process."


agent_zhanzhir

"My heuristic is using the dynamic score map to weight each move with existing combination. score on each situation is different as following(these strategies are inspired by the article in the A3 handout here.

agent_kammulaa

"To fix the problem that my minimax + alpha-beta implementation had with not looking at all possible moves, I decided to implement a Monte Carlo Tree Search (MCTS) algorithm because not only did it allow me to run several simulations of every possible move allowed each turn, but it also predicted whether if each move would lead to a win or a loss. To enhance the vanilla version of MCTS, I decided to take a Simulated Annealing approach when it came to deciding between moves, added a heuristic to help make the moves less randomized, and used random exploration (Reinforcement Learning) to help adjust the Upper Confidence Bound (UCB) a bit. If a move appeared as a losing move, I would just lower its UCB instead of completely pruning it or making an algorithm that prunes specific simulations because that move may lead us to a win in the future - who knows? All in all, I wanted to avoid making my algorithm too biased, but not too randomized (Bias - Variance tradeoff).

For my heuristic, I used it to modify/bias the UCB's of each move a bit more. My heuristic was a weighted combination of the following: (1) Utility, (2) Mobility, (3) Corner Checking, (4) Trap Checking and (5) Edge Checking (board). From playing multiple games, I observed that it was best to put the most emphasis on (1) instead of (3) (as many people would do) because there were some cases where securing 3 or less corners could lead to a loss. I wanted to avoid the moves that appeared good at first, but lead to a loss in the end.

Some suggestions for future improvements include: (1) Using shallow alpha-beta simulations instead of randomized simulations (Link: https://openreview.net/forum?id=HyN7UVfOZB), (2) Improving my algorithm on 10x10, 11x11... 14x14 and higher board dimensions to make it less biased, (3) using a neural network or any algorithm to tune the weights in my heuristic, or to learn from previous games, and (4) using a better Exploration or Exploitation policy, or a more specific UCB + Reward formula to make the UCB calculations more accurate."

agent_dangbran

"I decided to concentrate purely on improving the utility function because I felt that it would have a greater impact on performance then, for example, implementing an iterative deepening system (I wanted to concentrate on one thing only). The main thing that I did to improve the utility function was to add "bonus points" whenever a move resulted in my AI having stable pieces that the opponent can not flip. The more stable pieces that result from a move, the more bonus points are awarded. However ... it was very [computationally] expensive to check whether or not a piece was able to be flipped. As a result, the timer sometimes timed out, which meant that this method had to be abandoned.

"... I thought that it was possible to get a rough estimate using a much cheaper algorithm. I played multiple games against the AI and I realized that edge and corner spaces are preferable. I then added bonus points, one for each friendly piece on any of the four edges. This is because although having a piece on the edge is not always stable, it is often much more likely to be stable and harder for the opponent to flip, then, say, a piece somewhere in the middle of the board. Furthermore, I noticed that having a piece on the edges gives many opportunities to flip the opponent pieces in the middle, making edge pieces even more valuable. A corner is even more valuable real estate then an edge. Corner pieces can not be flipped, and they sometimes give some of the rare opportunities to flip edge pieces. I realized that whenever I won a game, I always had at least 3 of the corners under my control. Therefore, I gave 3 bonus points to each corner piece the player owns. Finally, since corner pieces are so valuable, it stands to reason that one should not give an opportunity for the opponent to put a piece on the corners. Each corner directly touches two squares on the board. Having a piece of your colour at either of those squares is the only way for the opponent to acquire the corner, so one should avoid putting their pieces at those locations. I therefore gave penalty points if there is a friendly piece bordering any of the corners. All of this was only in O(n), much cheaper than my initial idea. "

agent_weiszben

"In Othello there are certain positions which are more stable than others; in fact there are some that when captured can't be recaptured by an opponent. The four corners of the Othello board are such positions. When someone places a tile in on of the corners, that position cannot be recaptured because the next player cannot place stones outside of the board. Once a player has captured a corner, they can abuse the stone to keep recapturing any enemy stones between the corner and their newly place stone. This gives the agent an advantage. On top of this, the agent should avoid placing stones in the three positions surrounding each of the corners. If the agent where to place a stone here, the next agent can likely abuse the opportunity and place the stone in the corner. The reason why the idea of capturing corners is a good idea is because the agent will typically only look until a certain depth, but in general the corners might lead to some advantageous state for our agent beyond the depth it can look. So just knowing that these positions are good helps the agent plan better for the future.

Beyond this, the agent's heuristic is a carefully weighted sum of the two heuristics above and the one provided. This was just done through some trial and error, although I can say that I weighted the corner positions as more advantageous than not placing in the closeby squares. Note: The corners add to the value of the position, while the nearby squares remove value from the position."

agent_niyihan

"I don't think I have any secret recipe except that I implemented an extraction strategy, that the agent keeps track of a global timer and immediately return to its root once the timer passes 9.8 seconds. Then the agent decides the best move based on current information. This strategy essentially exploits nearly full time every round to decide for a best move."