Here are some ways that a program can update its own database of facts. Which of these, if any, counts as "learning"?

- From P and P -> Q, learn that Q.
- From Q and P -> Q, learn that P.
- From P(x1) and P(x2) and P(x3) ..., learn that for all x, P(x).

The first form is called *deduction*. Is that really learning? The
fact you add to your database is already logically entailed by your
database, and you could re-derive the fact any time you need it later.

If the rules contain variables, then the situation is a bit more complex. By deriving specific facts from general rules, a program can seem to "know" something much more useful than what is entailed by the knowledge base. But is that learning?

The second form is sometimes called *abduction*. This rule isn't
logically "sound"--that is, using it, we can infer facts that are not
necessarily true. So if P isn't even necessarily true, how can coming
to know it in this way be considered learning?

Maybe you know a lot about the world, and the *only way* that Q
can be true is if P caused it. Does that change your answer? Suppose
that we loosen up the "only way" restriction. Does that change your
answer? Maybe P is the **best explanation** for Q's being true...

The third form is called *induction*. Students always seem to be in
agreement that this is learning, almost quintessential learning. But can't
I be wrong via induction? If so, then how is it any better as a form of
learning than abduction?

What does learning "count as"? That is, what sorts of things does learning accomplish for the agent? Consider this list:

- information gathering ... [ abduction, induction, deduction? ]
- hypothesis formation ... [ abduction, deduction? ]
- generalization ... [ induction, abduction? ]
- speedup ... [ deduction!, abduction, induction ]

Just defining the learning problem is difficult. But doing so can constrain the implementation task quite a bit!

I have a graduate student who wants to build a program that learns to play checkers better. Help him out by suggesting answers to the following questions:

- What kind of knowledge should his program try to learn?
- What should it do to learn this knowledge?
- How will we know if the program has learned anything?

Your answers need not deal with details of the program, but rather with
the game of checkers and the task of software engineering. As a thinking
aid, you might ask yourself the same questions about how *you* learn
to play a game (well)!

{ The theme song from "Jeopardy!" plays softly in the background... }

One of the most general answers we can give is that learning is
**improving with experience at some task**.

- Improve at task T...
- ... based on some experience E ...
- ... with respect to performance standard P.

For the task of learning checkers, my graduate student might fill in the variables with:

- T = playing checkers
- E = playing human opponents when they are available; playing itself all other times
- P = its performance rating in match play and tournament play against players who strength is known

Here are three specific "implementations" of how to learn. Each is appropriate in a particular set of situations.

- An agent is given a sequence of problems from a particular class of
problems. As a result of solving a problem and getting feedback,
the agent should:
- solve the same problem better
- solve other problems from the class

- An agent is given a sequence of attribute vectors to classify into
`n`different categories. Each vector is accompanied by its correct category. The agent should have more success categorizing a vector as the number of examples that it sees increases. - An agent is given a sequence of percepts (in the form of attribute vectors), and for each it must suggest an action. The more vectors it sees, the better the action it suggests.

Here is a picture of the "architecture" of an agent that can learn.

The "performance element" subsumes everything we have considered up to today: reflex, deliberation, search, inference, goals, uncertainty, ....

The "performance standard" provides correct answers from outside the agent, from the environment. (What about learning of creative activities?)

The "critic" embodies an internal evaluation process--the ability to reflect on problem solving and what is known.

The environment is, well, where the agent lives. In contexts that have "teachers", the teacher is an important part of the environment--maybe the only important part.

Here is a flashback to the ACT. What is the right answer in each case?

In the third problem, I see two correct answers. But I am allowed to select only one. But which? Implicit in this problem is an accepted standard for whether reflections or rotations are "simpler".

A program would have to learn this standard. It might eventually build up a scale that relates the simplicity of operations. Notice that this scale has to take into account combinations of operators, too...

Why are we talking about simplicity? Because for any of these problems, there is an infinite number of right answers, depending on the environment in which the agent lives. That environment includes the community of other agents and the set of standards that they share.

(This is why exams such as the ACT are often charged with being "culturally biased", because the "right answer" requires that you share the same set of standards with the folks who wrote and validate the test, and with the other folks who take the test. Is this a fair charge?)

The mathematically-inclined among you have no trouble accepting this problem with infinity, because you have seen it in another context: There are an infinite number of functions that contain any finite set of data points. Consider the this set of possibilities.

Are the outliers noise in the data? (Remember, our sensors and effectors aren't perfect, and the world is complex...) Deciding that some data points are noise can affect the answer you generate greatly. Consider another sort of problem that shows up on aptitude tests:

What is the next value in this sequence: 1, 1, 2, 3, 5, 8, ??

This looks like the good, ol' Fibonacci sequence, so the answer is 5 + 8 = 13. An examiner might expect you to know this and use the "toy world" given in tyhe problem exactly as-is.

But in the real world, we encounter noise. What if the 1s are noise?
Then you might decide that `3 - 2 = 1`, and `5 - 3 = 2`,
and `8 - 5 = 3`, so `?? - 8 must = 4`, so `?? = 12`.
If we have to address the problem of noise, the learning task becomes
harder.

Why does this matter to us? Well, first of all, a learning agent will
have to eventually learn the "community standard" that governs the
performance standard. This usually involves seeking the *simplest
explanation*, an idea to which we will return later in this unit.
It will also have to make choices about what is and isn't noise.

Equally important is a second issue: Many of the best techniques for implementing learning agents come from mathematical theory, which is not too surprising. A learning agent encounters a number of data when working on a problem (say, playing checkers). In effect, it is learning an "action function", like a reflex agent, even if it is learning a process for computing the answer. When we cast the learning agents as "learning a function", we gain a lot of understanding about the problem facing the agent, but we also encounter some hard realities from the domain of functions.

Back to our earlier exercise...

What should a checker player learn?

- At the base level, it needs to know:
- how to search a game tree
- how to evaluate states in the tree

- At the meta level, it needs to know:
- how to extend its search at the right states
- how to manage its time

In order to learn its base knowledge, the agent must commit to a
*knowledge representation*: how will it record what it learns?

In order to do search, the program must assign values to states, say:

100 = I know I will win 0 = I expect to draw -100 = I know I will lose

It might try to learn a function such as:

V(b) = w0 + w1bp(b) + w2rp(b) + w3bk(b) + w4rk(b) + w5bt(b) + w6bt(b)

where:

bp(b) is the number of black pieces on board b wp(b) is the number of white pieces on board b bk(b) is the number of black kings on board b wk(b) is the number of white kings on board b bt(b) is the number of black threats on board b rt(b) is the number of white threats on board b

where the agent wants to learn the w_{i}s, which are weights
assigned to the features to give them their "value".

This is just one formulation of the problem; you can certainly design better and more interesting ones with a little work!

- Homework -- None for now. We will have a new assignment after a couple
of weeks discussing learning.
- Lab Section -- Lab Exercise 10 will become available tomorrow. For this exercise, you will design a lab project to do during the rest of the semester. You will determine what you have to submit each week by 4:00 PM on Wednesday.