We have now dispensed with the necessary background material for AI problem solving techniques, and we can move on to looking at particular types of problems which have been addressed using AI techniques. The first type of problem we'll look at is getting an agent to compete, either against a human or another artificial agent. This area has been extremely well researched over the last 50 years. Indeed, some of the first chess programs were written by Alan Turing, Claude Shannon and other fore-fathers of modern computing. We only have one lecture to look at this topic, so we'll restrict ourselves to looking at two person games such as chess played by software agents. If you are interested in games involving more teamwork and/or robotics, then a good place to start would be with the RoboCup project, as described below.
They are very ambitious in the Robocup project. By 2050, they aim to:
develop a team of fully autonomous humanoid robots that can win against the human world soccer champion team
There are various categories ranging from large robots which look like dustbins, to the AIBO dog category, to simulated soccer where the teams only exist on screen. Each team which enters the national competition must also submit papers to a conference, so that progress is made scientifically, rather than in an ad-hoc manner. Anyone who has seen the large robots, playing in the Robocup competition will realise how unbelievably ambitious the Robocup aim is - the current state of the art is pretty bad!
Legend has it that the first winning team had no intelligence programmed in, the robots simply moved around randomly. On the other hand, the simulated soccer games are strikingly good, and in any case, the impact on the popularity on AI and robots has been very constructive. In some countries, most notably Japan, Robocup is very big - the winning goal of Robocup 2001 was reported on prime time Japanese TV. And apparently the Queen has already taken an interest (a Robocup junior team gave her a demonstration in Adelaide!)
The official Robocup website is here.
Parents often get two children to share a cake fairly by asking one to cut the cake and the other to choose which half they want to eat. In this two player cake-scoffing game, there is only one move (cutting the cake), and player one soon learns that if he wants to maximise the amount of cake he gets, he had better cut the cake into equal halves, because his opponent is going to try and minimise the cake that player 1 gets by choosing the biggest half for herself.
Suppose we have a two player game where the winner scores a positive number at the end, and the loser scores nothing. In board games such as chess, the score is usually just 1 for a win and 0 for a loss. In other games such as poker, however, one player wins the (cash) amount that the other player loses. These are called zero-sum games, because when you add one player's winnings to the other player's loss, the sum is zero.
The minimax algorithm is so called because it assumes that you and your oppenent are going to act rationally, and so you will choose moves to try to maximise your final score and your opponent will choose moves to try to minimise your final score. To demonstrate the minimax algorithm, it is helpful to have a game where the search tree is fairly small. For this reason, we will invent the following very trivial game:
Take a pack of cards and deal out four cards face up. Two players take it in turn to choose a card each until they have two each. The object is to choose two cards so that they add up to an even number. The winner is the one with the largest even number n (picture cards all count as 10), and the winner scores n. If both players get the same even number, it is a draw, and they both score zero.
Suppose the cards dealt are 3, 5, 7 and 8. We are interested in which card player one should choose first, and the minimax algorithm can be used to decide this for us. To demonstrate this, we will draw the entire search tree and put the scores below the final nodes on paths which represent particular games.
Our aim is to write the best score on the top branches of the tree that player one can guarantee to score if he chooses that move. To do this, starting at the bottom, we will write the final scores on successively higher branches on the search tree until we reach the top. Whenever there is a choice of scores to write on a particular branch, we will assume that player two will choose the card which minimises player one's final score, and player one will choose the card which maximises his/her score. Our aim is to move the scores all the way up the graph to the top, which will enable player one to choose the card which leads to the best guaranteed score for the overall game. We will first write the scores on the edges of the tree in the bottom two branches:
Now we want to move the scores up to the next level of branches in the tree. However, there is a choice. For example, for the first branch on the second row, we could write either 10 or -12. This is where our assumption about rationality comes into account. We should write 10 there, because, supposing that player two has actually chosen the 5, then player one can choose either 7 or 8. Choosing 7 would result in a score of 10 for player 1, choosing 8 would result in a score of -12. Clearly, player 1 would choose the 7, so the score we write on this branch is 10. Hence, we should choose the maximum of the scores to write on the edges in the row above. Doing the same for all the other branches, we get the following:
Finally, we want to put the scores on the top edges in the tree. Again, there is a choice. However, in this case, we have to remember that player two is making the choices, and they will act in order to minimise the score that player 1 gets. Hence, in the case when player one chooses the 3 card, player 2 will choose the 7 to minimise the score player 1 can get. Hence, we choose the minimum possibility of the three to put on the edges at the top of the tree as follows:
To choose the correct first card, player one simply looks at the topmost edges of the final tree and chooses the one with the highest score. In this case, choosing the 7 will guarantee player one scores 10 in this game (assuming that player one chooses according to the minimax strategy for move 2, but - importantly - making no assumptions about how player two will choose).
Note that the process above was in order for player one to choose his/her first move. The whole process would need to be repeated for player two's first move, and player one's second move, etc. In general, agents playing games using a minimax search have to calculate the best move at each stage using a new minimax search. Don't forget that just because an agent thinks their opponent will act rationally, doesn't mean they will, and hence they cannot assume a player will make a particular move until they have actually done it.
To use a minimax search in a game playing situation, all we have to do is program our agent to look at the entire search tree from the current state of the game, and choose the minimax solution before making a move. Unfortunately, only in very trivial games such as the one above is it possible to calculate the minimax answer all the way from the end states in a game. So, for games of higher complexity, we are forced to estimate the minimax choice for world states using an evaluation function. This is, of course, a heuristic function such as those we discussed in the lecture on search.
If you thought there were a lot of choices in chess, then you should try to play the game of Go. This game is at least 3000 years old, and is the most popular board game in Japan. As they say on the British Go Association web site:
The rules are very simple, yet attempts to program computers to play Go have met with little success.
Computers are indeed very bad at playing Go. This is because the branching rate is huge: around 350. This means that normal search routines, such as those described in the previous lecture, barely make a dent on the possibilities, and programs often make some fundamental mistakes. If you're not inspired to write a Go playing agent, then maybe the prize of $2M on offer for a program which beats a top-level player will change your mind.
Some Go programs are described on the British GO Association web site HERE.
In a normal minimax search, we write down the whole search space and then propogate the scores from the goal states to the top of the tree so that we can choose the best move for a player. In a cutoff search, however, we write down the whole search space up to a specific depth, and then write down the evaluation function for each of the states at the bottom of the tree. We then propogate these values from the bottom to the top in exactly the same way as minimax.
The depth is chosen in advance to ensure that the agent doesn't take too long to choose a move: if it has longer, then we allow it to go deeper. If our agent has a given time limit for each move, then it makes sense to enable it to carry on searching until the time runs out. There are many ways to do the search in such a way that a game playing agent searches as far as possible in the time available. As an exercise, what possible ways can you think of to perform this search? It is important to bear in mind that the point of the search is not to find a node in the above graph, but to determine which move the agent should make.
Evaluation functions estimate the score that can be guaranteed if a particular world state is reached. In chess, such evaluation functions have been known long before computers came along. One such function simply counts the number of pieces on the board for a particular player. A more sophisticated function scores more for the more influential pieces such as rooks and queens: each pawn is worth 1, knights and bishops score 3, rooks score 5 and queens score 9. These scores are used in a weighted linear function, where the number of pieces of a certain type is multiplied by a weight, and all the products are added up. For instance, if in a particular board state, player one has 6 pawns, 1 bishop, 1 knight, 2 rooks and 1 queen, then the evaluation function, f for that board state, B, would be calculated as follows:
f(B) = 1*6 + 3*1 + 3*1 + 5*2 + 9*1 = 31
The numbers in bold are the weights in this evaluation function (i.e., the scores assigned to the pieces).
Ideally, evaluation functions should be quick to calculate. If they take a long time to calculate, then less of the space will be searched in a given time limit. Ideally, evaluation functions should also match the actual score in goal states. Of course, this isn't true for our weighted linear function in chess, because goal states only score 1 for a win and 0 for a loss. In fact, we don't need the match to be exact - we can use any values for an evaluation function, as long it scores more for better board states.
A bad evaluation function can be disastrous for a game playing agent. There are two main problems with evaluation functions. Firstly, certain evaluation functions only make sense for game states which are quiescent. A board state is quiescent for an evaluation function, f, if the value of f is unlikely to exhibit wild swings in the near future. For example, in chess, board states such as one where a queen is threatened by a pawn, where one piece can take another without a similar valued piece being taken back in the next move are not quiescent for evaluation functions such as the weighted linear evaluation function mentioned above. To get around this problem, we can make an agent's search more sophisticated by implementing a quiescence search, whereby, given a non-quiescent state we want to evaluate the function for, we expand that game state until a quiescent state is reached, and we take the value of the function for that state. If quiescent positions are much more likely to occur than non-quiescent positions in a search, then such an extension to the search will not slow things down too much. In chess, a search strategy may choose to delve further into the space whenever a queen is threatened to try to avoid the quiescent problem.
It is also worth bearing in mind the horizon problem, where a game-playing agent cannot see far enough into the search space. An example of the horizon problem given in Russell and Norvig is the case of promoting a pawn to a queen in chess. In the board state they present, this can be forestalled for a certain number of moves, but is inevitable. However, with a cutoff search at a certain depth, this inevitability cannot be noticed until too late. It is likely that the agent trying to forestall the move would have been better off doing something else with the moves it had available.
In the card game example above, game states are collections of cards, and a possible evaluation function would be to add up the card values and take that if it was an even number, but score it as zero if the sum is an odd number. This evaluation function matches exactly with the actual scores in goal states, but is perhaps not such a good idea. Suppose the cards dealt were: 10, 3, 7 and 9. If player one was forced to cutoff the search after only the first card choice, then the cards would score: 10, 0, 0 and 0 respectively. So player one would choose card 10, which would be disastrous, as this will inevitably lead to player one losing that game by at least twelve points. If we scale the game up to choosing cards from 40 rather than 4, we can see that a more sophisticated heuristic involving the cards left unchosen might be a better idea.
Recall that pruning a search space means deciding that certain branches should not be explored. If an agent knows for sure that exploring a certain branch will not affect its choice for a particular move, then that branch can be pruned with no concern at all (i.e., no effect on the outcome of the search for a move), and the speed up in search may mean that extra depths can be searched.
When using a minimax approach, either for an entire search tree or in a cutoff search, there are often many branches which can be pruned because we find out fairly quickly that the best value down a whole branch is not as good as the best value from a branch we have already explored. Such pruning is called alpha-beta pruning.
As an example, suppose that there are four choices for player one, called moves M1, M2, M3 and M4, and we are looking only two moves ahead (1 for player one and 1 for player two). If we do a depth first search for player one's move, we can work out the score they are guaranteed for M1 before even considering move M2. Suppose that it turns out that player one is guaranteed to score 10 with move M1. We can use this information to reject move M2 without checking all the possibilities for player two's move. For instance, suppose that the first choice possible for player two after M2 from player one means that player one will score only 5 overall. In this case, we know that the maximum player one can score with M2 is 5 or less. Of course, player one won't choose this, because M1 will score 10 for them. We see that there's no point checking all the other possibilites for M2. This can be seen in the following diagram (ignore the X's and N's for the time being):
We see that we could reject M2 straight away, thus saving ourselves 3 nodes in the search space. We could reject M3 after we came across the 9, and in the end M4 turns out to be better than M1 for player one. In total, using alpha-beta pruning, we avoided looking at 5 end nodes out of 16 - around 30%. If the calculation to assess the scores at end-game states (or estimate them with an evaluation function) is computationally expensive, then this saving could enable a much larger search. Moreover, this kind of pruning can occur anywhere on the tree. The general principles are that:
As an exercise: which of these principles did we use in the M1 - M4 pruning example above? (To make it easy, I've written on the N's and X's).
Because we can prune using the alpha-beta method, it makes sense to perform a depth-first search using the minimax principle. Compared to a breadth first search, a depth first search will get to goal states quicker, and this information can be used to determine the scores guaranteed for a player at particular board states, which in turn is used to perform alpha-beta pruning. If a game-playing agent used a breadth first search instead, then only right at the end of the search would it reach the goal states and begin to perform minimax calculations. Hence, the agent would miss much potential to peform pruning.
Using a depth first search and alpha-beta pruning is fairly sensitive to the order in which we try operators in our search. For example above, if we had chosen to look at move M4 first, then we would have been able to do more pruning, due to the higher minimum value (11) from that branch. Often, it is worth spending some time working out how best to order a set of operators, as this will greatly increase the amount of pruning that can occur.
It's obvious that a depth-first minimax search with alpha-beta pruning search dominates minimax search alone. In fact, if the effective branching rate of a normal minimax search was b, then utilising alpha-beta pruning will reduce this rate to b. In chess, this means that the effective branching rate reduces from 35 to around 6, meaning that alpha-beta search can look further moves ahead than a normal minimax search with cutoff.
Games with a certain element of chance are often more interesting than those without chance, and many games involve rolling dice, tossing a coin or something similar.
Backgammon is a game involving strategy and chance, and has received more than its fair share of attention in AI. In particular, Gerry Tesauro at IBM research has developed the TD-Gammon program, which often beats some of the best players in the world. TD-Gammon is basically a neural network (see later lectures) which has been trained using re-inforcement learning by playing against itself. In this way, it learns a sophisticated evaluation function for board states in backgammon.
Re-inforcement learning is a good idea, because game playing programs can simply play themselves, and make adjustments in the light of won or lost games, until they come up with good strategies. However, re-inforcement learning has not found many real-world applications.
Read a technical report about TD-Gammon HERE.
Suppose we spice up our card game by using a die, and introducing the following rule: if, after player one's first choice of card, player two throws a 6, then player two chooses a card and they must swap it with the one previously picked by player one. So, if player 1 chooses a 10, and then player 2 throws a 6, player 2 must choose another card, say a 5, and swap it with player one's 10.
In games with chance such as this, we can introduce probabilities to our search diagrams and calculate minimax solutions in much the same way. When drawing a diagram for the above card game with chance, we can simplify matters by adding two chance nodes to the graph, representing the chance that player 2 throws a six, and the chance that they don't. In the case that they do throw a six, then they can choose which card to swap with player one's.
We can now move the scores from the bottom of the tree to the top as before. However, we do one thing differently: when passing scores through a chance node, we must multiply the score by the probability of that chance node occuring. If we then add up all the multiplicands, we get the score the player can expect. In our card game example, we have to remember that the chance of player 2 throwing a six is 1/6, whereas the chance of them throwing something else is 5/6. So, we must multiply the score coming up from the left by 1/6, and the score coming up from the right by 5/6. This is shown in the following diagram:
With the introduction of chance into a game, we can no longer calculate the best score that player 1 can guarantee by choosing a particular card to start with (assuming both players act rationally). Instead, we can only calculate an expected value (called an expectimax value) based on the probabilities of the choice nodes. In this case, whereas before, we could guarantee that player one would score at least 8 by choosing 3 as the the first card, we can now only state that player one can expect to score 5.33 by choosing card 3 to start with. (Of course, in reality, player one will score either 8 or -8, depending on the throw of the dice).
We have to be more careful with cutoff search when there is an element of chance involved in a game. In particular, we have to carefully design our evaluation function. In games without chance, our evaluation function only had to reflect relative values of board states. So, for boards B1 B2 and B3 we could design our evaluation function to return 1, 100 and 10000 for these states or just 1, 10 and 100, because these reflect the relative value of each. When we are using an expectimax search, however, these values are going to be multiplied by probabilities, which will be added to other expected scores and so on. This means that a poor choice in the actual value of the function can lead to greatly exaggerated expectimax values, which will ultimately mean the agent makes bad choices.
The term "AI Engine" is common in game playing areas. This broadly stands for the Artificial Intelligence techniques behind opponents in games, usually when the opponent is meant to mimic human behaviour. Such engines usually combine many AI techniques (much more than minimax!) in order to give characters in games some vestage of intelligence in order to make the games more compelling. A fairly new development in AI engines is the notion of an extensible AI, whereby the human player can alter various aspects of the AI routines in the program. A good site to read about AI Engines is HERE.
Many engines are just giant hacks based on years of experience in order to get an agent to act rationally in a given context (scary dungeon, battlefield, etc.) However, by consistently hiring good AI graduates, games companies are increasingly producing software based on well known AI techniques, including machine learning techniques. Moreover, games are increasingly being studied within AI research. For example, John E. Laird and Michael van Lent from the University of Michigan delivered an invited talk at AAAI-2000 entitled: "Human-level AI's Killer Applications: Interactive Computer Games". John E. Laird's web pages on computer games are HERE.