## Session 7

#### Artificial Intelligence

today's slides in PDF

### Recap: Search

Uninformed techniques use problem-independent information to determine the order in which the agent considers states.

• breadth-first and depth-first search
• iterative deepening search
• random search

Informed techniques use problem-dependent information to determine the order in which the agent considers states. They typically use domain knowledge in two ways:

• to order the list of states yet to consider, so that we consider sooner states that are likely to be better
• to evaluate states, so that we don't have to go to the leaves of the state space before knowing if the path is a good one

We considered three informed search methods:

• UCS ............. g(n)
• greedy search ... h(n)
• A* ................ f(n) = g(n) + h(n)

### Recap: The A* Algorithm

A* sorts the list of nodes in ascending order on f(n) = g(n) + h(n).

In effect, A* combines UCS (which is optimal and complete, but inefficient) with GS (neither optimal nor complete, but efficient). The result is an efficient algorithm that can find optimal solutions efficiently!

Pessimistic h(n) functions underestimate the distance to the goal and are denoted h'(n). When we use f(n) = g(n) + h'(n), we discover that A* is both optimal (UCS) and as quick as possible (GS)! So, we only "admit" as possible h(n) functions the h'(n) functions.

If h'(n) = 0 for all n, then A* == UCS. If h'(n) = h(n) for all n, then A* is perfect -- go straight to the goal!

### An Example with Heuristic Functions

So, if we take A* as our search algorithm of first choice, then the key to doing good search is finding good h'(n) functions. This is where the real domain knowledge comes in: What do we know about the problem domain that will allow us to evaluate the closeness of a state to a goal?

We would like h' to be as accurate as possible -- "Better is better" -- without overestimating future cost. (But, if you err, err on the side of badly underestimating...)

Consider the 8-puzzle. The typical solution has depth d = 20 and branching factor b = 3. An exhaustive search expands 3^20 = 3.5 * 10^9 states -- but there are only 9! = 362,880 unique states in the puzzle!

Two possible h'(n) are:

• h1 = number of tiles out of place. This is admissible, since each tile moves >= 1 time.
• h2 = sum of Manhattan distances for each. This, too, is admissible, since it assumes that the only moves we make with a tile put it into its home immediately.

Both are admissible, because they both underestimate the number of moves required in a given position.

Actually, h2 <= h1 at all depths, so h2 must be always better. In such a case, we say that h2 dominates h1.

But how do we invent candidate heuristics in the first place? One useful technique is to relax the constraints on operators and then imagine a solution. For the 8-puzzle, there is only one operator: A tile can move from a moves to b if adjacent(a, b) and b is empty. This results in at least three possible relaxations:

• a tile can move from slot a moves to slot b if adjacent(a, b) -- h2
• a tile can move from slot a moves to slot b if b is empty -- "h3"
• a tile can move from slot a moves to slot b -- h1

Does "h3" make any sense?

### Problems with Search in the Real World

How can these search techniques "break"?

1. Even as fast as it is, A* is not fast enough. The search tree for chess contains approximately 10160 nodes, and chess is simple compared to most tasks that an intelligent faces.

2. An agent cannot evaluate every node in a search tree from its own perspective. Sometimes, other agents affect the state of the world.

How do humans address these problems?

1. We don't consider the whole state space.

When Emanuel Lasker (world chess champion from 1894 to 1920) was asked how many moves he considered when analyzing a chess position, he replied, "Only one. But it's always the best move."

Humans filter the search space, eliminating most of the bad moves. -- and, occasionally, some of the good ones! This is called pruning.

2. We take into account our opponent's different perspective.

"If I go here, then she'll go there. But then I can do this, but she'll do that. ..."

Humans evaluate positions from the perspective of the player who is "on the move".

Let's look at a way to solve the second problem first, and then at a way to solve the first problem second.

### Search in an Adversarial World

We'll use simple games as our examples, but being in an "adversarial world" and competing with other agents is a natural part of many complex environments. Ford is competing with General Motors, and Microsoft is competing with, well, everybody. Games give us a nice way to consider some of the important issues without all of the complexities of real-world competition.

Our tree of possible futures now has a distinctly new flavor: in a game, players take turns. So we need to evaluate levels of the tree differently, depending on whose turn it is.

Consider how we might think about playing tic-tac-toe as search:

Let's call the player who goes first "max". max moves with an "x". Let's call the player who goes second "min". min moves with an "o".

Notice how I've used knowledge of tic-tac-toe to reduce the search space. I've taken advantage of rotational and reflective symmetries to cut the branching factor by 67%, 75%, and 29%, respectively.

Let's suppose that I am max. Clearly, in order to choose a move in this search tree, I need to take into account that min gets to choose the second move. I can't decide that I like the position in the lower righthand corner best and so will make moves to force it there, because min will make the choice among the five possibilities after my first move into a side slot. My heuristic function needs to take into account what I expect min to do in that position--and I surely shouldn't count on her to make a move that will help me out!

So, in adversarial search, we need to evaluate positions on alternating levels of the search tree from alternating perspectives.

How can I choose the move I think is "best"? I could search all the way to the root...

Consider this generic search tree:

If search to the "leaves" of the tree, then I will find states that are either wins for max (labeled 1), wins for min (labeled 0), or perhaps ties (labeled 1/2).

Then we can apply the simple idea of alternating evaluation from the perspectives of max and min to determine the value of any state:

1. Label the leaves of the tree as "wins" or "losses".

2. Pick an unlabeled state whose children are all labeled.

3. If the state is on a max level, label it with the maximum of its children's values; otherwise, label it with the minimum of its children's values.

4. Go to 2.

This algorithm is called "minimax". Now you can see why we call the players what we do: the names tell us the players' roles in the algorithm.

### Improving Minimax 1.0

In this simplest form, our algorithm has some problems:

• Minimax stores the entire search tree while it works. This requires an amount of space that is exponential in the size of the problem.

• Minimax looks at every state in the search tree. This requires an amount of time that is exponential in the size of the problem.

We know that depth-first search generally reduces time and space costs, so... A more efficient minimax algorithm:

```1. Set L to be a list containing the initial state.

2. Forever:

a. Let x = first item on L.
If it is the initial state and has a value, stop.

b. If x has a value, then
Retrieve x's parent, p.
If p is a max state, then set p's value to max(p's value, x's value).
If p is a min state, then set p's value to min(p's value, x's value).
Remove x from L.

Else if x is a leaf, then
Set x's value to the value of the position.

Else if x is an interior state,
Set x's value either to -infinity, if it's a max state,
or to +infinity, if it's a min state.
Add x's children to the front of the list.
```

This does a depth-first search and drastically cuts the amount of space required by Minimax.

What are some of the costs of the new algorithm?

### Improving Minimax 2.0

Our new algorithm uses much less time, by doing a depth-first search of the state space. But it still has a major problem:

• Minimax still uses an amount of time exponential in the size of the problem, because it goes all the way to leaf states.

We simply can't afford to consider every state in a large search space... Let's borrow a solution from A*:

• Devise a static evaluation function called e(n) that assigns a value between -1 (loss) and 1 (win) to each state as a prediction of its value.

• Cut search at some pre-detemined depth d.

• Use e(n) to evaluate non-leaf states at level d.

[ Could we use a heuristic function from A* as a static evaluation function? ]

Here is a new Minimax algorithm that uses the idea of static evaluation:

```1. Set L to be a list containing the initial state.

2. Forever:

a. Let x = first item on L.
If it is the initial state and has a value, stop.

b. If x has a value, then
Retrieve x's parent, p.
If p is a max state, then set p's value to max(p's value, x's value).
If p is a min state, then set p's value to min(p's value, x's value).
Remove x from L.

Else if x is a leaf or is at level d, then
Set x's value to the value of the position, computed as e(x).

Else if x is an interior state,
Set x's value either to -infinity, if it's a max state,
or to +infinity, if it's a min state.
Add x's children to the front of the list.
```

### Evaluating Minimax 3.0

This version of Minimax allows the agent to drastically cut the amount of time required by Minimax. More to the point, it gives the agent complete control over how much time it spends on search, by adjusting the value of d.

Setting d arbitrarily has disadvantages, too, though:

• In some positions, the values of states and their children may be fluctuating wildly, that is,

abs( e(x) - e(x's parent) ) >> 0

• For other reasons, one move at the end of a path may be vastly better than its competitors.

• If I know that my opponent is searching only to level d, then I can try to make the critical move on a search path fall at level d + 1!!