Learning is about understanding the world.
Search and inference are about acting in the world. They assume that the agent already has a reasonable model of the world.
Learning is about finding patterns in data.
Search and inference are about choosing actions based on known patterns.
Consider the problem faced by an infant learning to speak and understand language. Explain how this process fits into the general architecture of a learning agent, and identify the form and content of each of the components of the model for this task.
(Don't try to solve the problem. Just express it in the terms of our model of a learning agent.)
The first step you face is to appreciate the variety of state and behavior implicit in the task:
These issues are still active areas of research into human understanding of language. They raise questions of
Not too surprisingly, implementing a program that can do any of them is still quite difficult.
In the simplest model, the infant is learning how to exachange "speech sounds". But physical context is critical. When someone says "the watermelon", the child must make a connection between the word and the item to which it refers. But first, it must learn to make the connection!
Children tend to learn language in an environment with little or no negative feedback. How can they learn rules more complex than just "Any sequence of words is a sentence."?
Because many potential functions fj satisfy this definition, we will settle for any function f' that closely predicts the input vectors.
For our study of induction, let's make some assumptions:
These need not be true, but choosing a single context will allow us to focus on a single algorithm.
Some questions that we would have to answer in order to implement any algorithm to do this in a program include:
A decision tree is a program that takes as input an n-ary vector of problem attributes and generates as output the value of the target attribute.
Here is a a sample decision tree (not available yet) for deciding whether to wait at a restaurant that can't seat us yet.
As an example, here is a set of feature vectorstree (not available yet), from which a learning agent might try to build the "Will I wait?" decision tree above.
The features in our vector are:
Decision trees are essentially propositional. They make decisions based on the values of atomic propositions such as, "The wait will be less than 5 minutes." This means that decision trees work best when many combinations of proposition values mean the same thing, because otherwise their size grows very quickly. Because of this, decision trees cannot represent many functions efficiently. For example, a decision tree to map a string of ones and zeros onto the value "have seen more ones than zeros" is exponential in the length of the strings you want to handle!
Construct a decision tree that can decide whether or not to drive forward at an intersection with a stoplight.
Here is a sample solution (not available yet). This problem is deceptively simple--unless we get caught up in all of the possible exceptions that can occur. Real-world situations can be full of exceptions. Unfortunately, decision trees handle exceptions badly. They result in trees that are unbalanced and very deep on the main branches. You can probably imagine many more exceptions than this tree handles, but would we want to include them all in the tree?
This is another form of the qualification problem, which also plagues logic-based representations. How can we be sure that any if-then rule we have written accounts for all the preconditions of the rule. Don't most rules make some assumptions about the world?
In order to learn a decision tree, our program will need to have some information to learn from: a training set of examples, each of which gives its particular value for the problem attributes in a specific situation.
In the restaurant example, our problem attributes are "What is the estimated time?, "What kind of food do they serve?", and the like. The target attribute is "Will we wait?" It is a boolean attribute: its value is either yes or no.
Problems with boolean target attributes are called classification problems. The learning agent is learning to recognize whether a situation is a positive example of some concept or a negative example.
Because there are, in general, an infinite number of functions fi that satisfy the learner's goal, we need to give our agent a bias toward a particular class of functions.
Even if we think that we aren't giving our agent a bias, it will have one by default, by virtue of how we implement the program.
In the absence of any domain knowledge that suggests a good bias, we can give our agent a simple yet surprisingly good domain-independent bias, Occam's Razor: "The most likely solution is the simplest one that is consistent with all observations."
This principle biases our agent toward simplicity.