Homework 7

Techniques for "Learning"


Due: Friday, December 1st


The table below contains 15 pieces of data divided on three attributes (Color, Class, and Degree).  You will use this data to conduct a "paper pencil" walk through of two different learning techniques. NOTE: The ID# are there to help you keep track of where you are in the process. They are NOT part of the classification.

ID# Color Class Degree Classification
1 Gold Senior B.S. Yes
2 Gold Freshman B.S. No
3 Gold Senior BA Yes
4 Purple Senior B.S. Yes
5 Purple Freshman BA Yes
6 Purple Senior BA Yes
7 Purple Junior B.S. No
8 Gold Sophomore B.S. No
9 Purple Junior BA No
10 Purple Sophomore B.S. No
11 Gold Junior BA Yes
12 Gold Junior B.S. Yes
13 Gold Freshman BA Yes
14 Purple Freshman B.S. No

 

 

Activity #1 - Current Best Hypothesis

 

Use the "current best hypothesis" technique (discussed in session 33) to develop a model for the data as each training example is added to your knowledge domain.  At each step in the process your model should be consistent for that training example AS WELL, as all prior examples.  When confronted with a choice between several new conjunctions or disjunctions of the same simplicity level you should always pick the "leftmost" option in the table.

Training example # Is this example consistent with the current model? If not, false positive or false negative Current Consistent Hypothesis (Model)
1

 

     
2

 

     
3

 

     
4

 

     
5

 

     
6

 

     
7

 

     
8

 

     
9

 

     
10

 

     
11

 

     
12

 

     
13

 

     
14

 

     
15

 

     

 

 

 

 

Activity #2 - Decision Tree generated using Entropy/ID3

 

Calculate the initial Entropy of this problem

Initial Entropy                             

 

 

We want to use this data and the ID3 algorithm and the concept of change in entropy to construct an accurate yet compact decision tree for this domain.  To determine the optimal first attribute, you should calculate the entropy after independently dividing the data using each of the three attributes as the first choice.  Complete the table below.

 

Attribute Information Gain if splitting on the attribute
Color  

 

Class  

 

Degree  

 

 

As we know from our study of Entropy and the ID3 algorithm, the attribute with the lowest Entropy will provide the most information gain.  Thus, using your results from the table above, split the fifteen pieces of training data on the best attribute.  Begin to construct the tree resulting from this split.  Notice that some of the resulting categories will be perfectly classified and, thus, leaves in the decision tree.  For each of those leaves, label the node with the correct classification.  For each node not yet a leaf, label the node with the number of training examples in each classification.

 

 

 

 

 

 

For each of the nodes not yet a leaf, recursively calculate (independently) which of the remaining two attributes would make the appropriate second choice by calculating the entropy of that portion of the tree using that attribute split.  Continue making these calculations until you can complete the tree.