810:161 Session 15

Session 15

An Introduction to Learning

810:161

Artificial Intelligence

What Counts as Learning?

Here are some ways that a program can update its own database of facts. Which of these, if any, counts as "learning"?

From P and P -> Q, learn that Q.
From Q and P -> Q, learn that P.
From P(x1) and P(x2) and P(x3) ..., learn that for all x, P(x).

The first form is called deduction. Is that really learning? The fact you add to your database is already logically entailed by your database, and you could re-derive the fact any time you need it later.

If the rules contain variables, then the situation is a bit more complex. By deriving specific facts from general rules, a program can seem to "know" something much more useful than what is entailed by the knowledge base. But is that learning?

The second form is sometimes called abduction. This rule isn't logically "sound"--that is, using it, we can infer facts that are not necessarily true. So if P isn't even necessarily true, how can coming to know it in this way be considered learning?

Maybe you know a lot about the world, and the only way that Q can be true is if P caused it. Does that change your answer? Suppose that we loosen up the "only way" restriction. Does that change your answer? Maybe P is the best explanation for Q's being true...

The third form is called induction. Students always seem to be in agreement that this is learning, almost quintessential learning. But can't I be wrong via induction? If so, then how is it any better as a form of learning than abduction?

Turning the Question Around...

What does learning "count as"? That is, what sorts of things does learning accomplish for the agent? Consider this list:

information gathering ... [ abduction, induction, deduction? ]
hypothesis formation ... [ abduction, deduction? ]
generalization ... [ induction, abduction? ]
speedup ... [ deduction!, abduction, induction ]

Just defining the learning problem is difficult. But doing so can constrain the implementation task quite a bit!

Defining the Learning Task

I have a graduate student who wants to build a program that learns to play checkers better. Help him out by suggesting answers to the following questions:

What kind of knowledge should his program try to learn?
What should it do to learn this knowledge?
How will we know if the program has learned anything?

Your answers need not deal with details of the program, but rather with the game of checkers and the task of software engineering. As a thinking aid, you might ask yourself the same questions about how you learn to play a game (well)!

{ The theme song from "Jeopardy!" plays softly in the background... }

One of the most general answers we can give is that learning is improving with experience at some task.

Improve at task T...
... based on some experience E ...
... with respect to performance standard P.

For the task of learning checkers, my graduate student might fill in the variables with:

T = playing checkers
E = playing human opponents when they are available; playing itself all other times
P = its performance rating in match play and tournament play against players who strength is known

Learning in Specific Contexts

Here are three specific "implementations" of how to learn. Each is appropriate in a particular set of situations.

An agent is given a sequence of problems from a particular class of problems. As a result of solving a problem and getting feedback, the agent should:
- solve the same problem better
- solve other problems from the class
An agent is given a sequence of attribute vectors to classify into n different categories. Each vector is accompanied by its correct category. The agent should have more success categorizing a vector as the number of examples that it sees increases.
An agent is given a sequence of percepts (in the form of attribute vectors), and for each it must suggest an action. The more vectors it sees, the better the action it suggests.

The Learning Agent

Here is a picture of the "architecture" of an agent that can learn.

The "performance element" subsumes everything we have considered up to today: reflex, deliberation, search, inference, goals, uncertainty, ....

The "performance standard" provides correct answers from outside the agent, from the environment. (What about learning of creative activities?)

The "critic" embodies an internal evaluation process--the ability to reflect on problem solving and what is known.

The environment is, well, where the agent lives. In contexts that have "teachers", the teacher is an important part of the environment--maybe the only important part.

Learning and Simplicity

Here is a flashback to the ACT. What is the right answer in each case?

In the third problem, I see two correct answers. But I am allowed to select only one. But which? Implicit in this problem is an accepted standard for whether reflections or rotations are "simpler".

A program would have to learn this standard. It might eventually build up a scale that relates the simplicity of operations. Notice that this scale has to take into account combinations of operators, too...

Why are we talking about simplicity? Because for any of these problems, there is an infinite number of right answers, depending on the environment in which the agent lives. That environment includes the community of other agents and the set of standards that they share.

(This is why exams such as the ACT are often charged with being "culturally biased", because the "right answer" requires that you share the same set of standards with the folks who wrote and validate the test, and with the other folks who take the test. Is this a fair charge?)

The mathematically-inclined among you have no trouble accepting this problem with infinity, because you have seen it in another context: There are an infinite number of functions that contain any finite set of data points. Consider the this set of possibilities.

Are the outliers noise in the data? (Remember, our sensors and effectors aren't perfect, and the world is complex...) Deciding that some data points are noise can affect the answer you generate greatly. Consider another sort of problem that shows up on aptitude tests:

What is the next value in this sequence: 1, 1, 2, 3, 5, 8, ??

This looks like the good, ol' Fibonacci sequence, so the answer is 5 + 8 = 13. An examiner might expect you to know this and use the "toy world" given in tyhe problem exactly as-is.

But in the real world, we encounter noise. What if the 1s are noise? Then you might decide that 3 - 2 = 1, and 5 - 3 = 2, and 8 - 5 = 3, so ?? - 8 must = 4, so ?? = 12. If we have to address the problem of noise, the learning task becomes harder.

Why does this matter to us? Well, first of all, a learning agent will have to eventually learn the "community standard" that governs the performance standard. This usually involves seeking the simplest explanation, an idea to which we will return later in this unit. It will also have to make choices about what is and isn't noise.

Equally important is a second issue: Many of the best techniques for implementing learning agents come from mathematical theory, which is not too surprising. A learning agent encounters a number of data when working on a problem (say, playing checkers). In effect, it is learning an "action function", like a reflex agent, even if it is learning a process for computing the answer. When we cast the learning agents as "learning a function", we gain a lot of understanding about the problem facing the agent, but we also encounter some hard realities from the domain of functions.

Learning Checkers as Finding a Function

Back to our earlier exercise...

What should a checker player learn?

At the base level, it needs to know:
- how to search a game tree
- how to evaluate states in the tree
At the meta level, it needs to know:
- how to extend its search at the right states
- how to manage its time

In order to learn its base knowledge, the agent must commit to a knowledge representation: how will it record what it learns?

In order to do search, the program must assign values to states, say:

    100	= I know I will win
      0 = I expect to draw
   -100 = I know I will lose

It might try to learn a function such as:

    V(b) = w0 + w1bp(b) + w2rp(b)
              + w3bk(b) + w4rk(b)
              + w5bt(b) + w6bt(b)

where:

    bp(b) is the number of black pieces on board b
    wp(b) is the number of white pieces on board b
    bk(b) is the number of black kings on board b
    wk(b) is the number of white kings on board b
    bt(b) is the number of black threats on board b
    rt(b) is the number of white threats on board b

where the agent wants to learn the w_is, which are weights assigned to the features to give them their "value".

This is just one formulation of the problem; you can certainly design better and more interesting ones with a little work!

Wrap Up

Homework -- None for now. We will have a new assignment after a couple of weeks discussing learning.
Lab Section -- Lab Exercise 10 will become available tomorrow. For this exercise, you will design a lab project to do during the rest of the semester. You will determine what you have to submit each week by 4:00 PM on Wednesday.

Eugene Wallingford ==== wallingf@cs.uni.edu ==== October 30, 2001