Click here to Skip to main content
65,938 articles
CodeProject is changing. Read more.
Articles / artificial-intelligence / machine-learning

ID3 Decision Tree Algorithm - Part 1

4.77/5 (11 votes)
1 Feb 2012CPOL8 min read 87.1K  
ID3 Decision Tree Algorithm - Part 1 (Attribute Selection Basic Information)

Introduction

Iterative Dichotomiser 3 or ID3 is an algorithm which is used to generate decision tree, details about the ID3 algorithm is in here. There are many usage of ID3 algorithm specially in the machine learning field. In this article, we will see the attribute selection procedure uses in ID3 algorithm. Attribute selection section has been divided into basic information related to data set, entropy and information gain has been discussed and few examples have been used to show How to calculate entropy and information gain using example data.

Attribute selection

Attribute selection is the fundamental step to construct a decision tree. There two term Entropy and Information Gain is used to process attribute selection, using attribute selection ID3 algorithm select which attribute will be selected to become a node of the decision tree and so on. Before dive into deep need to introduce few terminology used in attribute selection process,

OutlookTemperatureHumidity Wind Play ball
Sunny Hot High Weak No
Sunny Hot High Strong No
Overcast Hot High Weak Yes
Rain Mild High Weak Yes
Rain Cool Normal Weak Yes
Rain Cool Normal Strong No
Overcast Cool Normal Strong Yes
Sunny Mild High Weak No
Sunny Cool Normal Weak Yes
Rain Mild Normal Weak Yes
Sunny Mild Normal Strong Yes
Overcast Mild High Strong Yes
Overcast Hot Normal Weak Yes
Rain Mild High Strong No
Total 14
Fig 1: Data set to calculate Entropy and Information gain using ID3 Algorithm

Attributes

In the above table, Day, Outlook, Temperature, Humidity, Wind, Play ball denotes as attributes,

Class(C) or Classifier

among these attributes Play ball refers as Class(C) or Classifier. Because based on Outlook, Temperature, Humidity and Wind we need to decide whether we can Play ball or not, that’s why Play ball is a classifier to make decision.

Collection (S)

All the records in the table refer as Collection (S).

Entropy Calculation

Entropy is one kind of measurement procedure in information theory, details about Entropy is in here. In here, we will see how to calculate Entropy of given set of data. The test data used in here is Fig 1 data,

Entropy(S) = n=1-p(I)log2p(I)

p(I) refers to the proportion of S belonging to class I i.e. in the above table we have two kinds of class {No, Yes} with {5, 9} (in here 5 is the total no of No and 9 is the total no of Yes), the collection size is S=14. So the p(I) over C for the Entire collection is No (5/14) and Yes (9/14).

log2p(I), refers to the log2(5/14) and log2(9/14) over C.

is over c i.e. summation of all the classifier items. In this case summation of -p(No/S)log2p(No/S) and -p(Yes/S)log2p(Yes/S) = -p(No/S)log2p(No/S) + -p(Yes/S)log2p(Yes/S)

So,

Entropy(S) = ∑-p(I)log2p(I)

=) Entropy(S) = -p(No/S)log2p(No/S) + -p(Yes/S)log2p(Yes/S)

=) Entropy(S) = ((-(5/14)log2(5/14)) + (-(9/14)log2(9/14)) )

=) Entropy(S) = (-0.64285 x -0.63743) + (-0.35714 x -1.485427)

=) Entropy(S) = 0.40977 + 0.5305096 = 0.940

So the Entropy of S is 0.940

Information Gain G(S,A)

Information gain is the procedure to select a particular attribute to be a decision node of a decision tree. Please see here for details about information gain.

Information gain is G(S,A)

where S is the collection of the data in the data set and A is the attribute for which information gain will be calculated over the collection S. So if Gain(S, Wind) then it refers gain of Wind over S.

Gain(S, A) = Entropy(S) - ( ( |Sv|/|S| ) x Entropy(Sv) )

Where,

S is the total collection of the records.

A is the attribute for which gain will be calculated.

v is all the possible of the attribute A, for instance in this case if we consider Windy attribute then the set of v is { Weak, Strong }.

Sv is the number of elements for each v for instance Sweak = 8 and Sstrong = 6.

is the summation of ( ( |Sv|/|S| ) x Entropy(Sv) ) for all the items from the set of v i.e. ( ( |Sweak|/|S| ) x Entropy(Sweak) ) + ( ( |Sstrong|/|S| ) x Entropy(Sstrong) )

So if we want to calculate information gain of Wind over the collection set S using following formula,

Gain(S, A) = Entropy(S) - ∑( ( |Sv|/|S| ) x Entropy(Sv) )

then the it will be as below,

=) Gain(S, Wind) = Entropy(S) - ( ( |Sweak|/|S| ) x Entropy(Sweak) ) - ( ( |Sstrong|/|S| ) x Entropy(Sstrong) )

from the above table for the Wind attribute there are two types of data ( Weak, Strong ) So new data set will be divided into following two subsets as below,

OutlookTemperatureHumidityWindPlay ball
SunnyHotHighWeak No
Overcast Hot High Weak Yes
Rain Mild High Weak Yes
Rain CoolNormalWeakYes
Sunny Mild High Weak No
Sunny Cool Normal WeakYes
RainMildNormalWeakYes
OvercastHotNormalWeakYes
Total8
OutlookTemperatureHumidityWindPlay ball
SunnyHotHighStrong No
RainCoolNormalStrong No
OvercastCoolNormalStrong Yes
SunnyMildNormalStrong Yes
OvercastMildHighStrong Yes
RainMildHighStrong No
Total6
Fig 2. Fig 1 data has been divided by types of Wind which is Weak and Strong

So, the set of Wind attribute is (Weak, Strong) and total number of elements set is (8,6).

=) Gain(S, Wind) = 0.940 - ( (8/14 ) x Entropy(Sweak) ) - ( ( 6/14 ) x Entropy(Sstrong) )

As mentioned earlier, How to calculate Entropy, now we will calculate Entropy of Weak, there two classifier No and Yes, for Weak the set of classifier is (No,Yes) with number of elements set is (2,6) and Strong has set of (No,Yes) is (3,3).

Entropy(Sweak) = ∑-p(I)log2p(I) = ( - ( (2/8)xlog2(2/8) ) ) + ( - ( (6/8)xlog2(6/8) ) )

=) Entropy(Sweak) = calculated value Value Of Sweak using entropy calculation procedure mentioned earlier.

and

Entropy(Sstrong) = ∑-p(I)log2p(I) = ( - ( (3/6)xlog2(3/6) ) ) + ( - ( (3/6)xlog2(3/6) ) )<o:p>

=) Entropy(Sstrong) = calculated value ValueOf Sstrong using entropy calculation procedure mentioned earlier.

So Gain(S, Wind) will be as below,

=) Gain(S, Wind) = 0.940 - ( (8/14 ) x Entropy(Sweak) ) - ( ( 6/14 ) x Entropy(Sstrong) )

<o:p>

=) Gain(S,Wind) = 0.940 - Value Of Sweak - Value Of Sstrong<o:p>

=) Gain(S, Wind) = 0.048<o:p>

So the information gain of Wind over S is 0.048. using the same procedure it is possible to calculate information gain for Outlook, Temperature and Humidity. Based on the ID3 algorithm highest gained will be selected for the decision node, in here Outlook, as a result the data set showed in Fig 1 will be divided s below,

OutlookTemperatureHumidityWindPlay ball
Sunny Hot HighWeakNo
Sunny Hot HighStrongNo
SunnyMildHighWeakNo
SunnyCoolNormalWeakYes
SunnyMildNormalStrongYes
Total5
OutlookTemperatureHumidityWindPlay ball
OvercastHotHighWeakYes
OvercastCoolNormalStrongYes
OvercastMildHighStrongYes
OvercastHotNormalWeakYes
Total4
OutlookTemperatureHumidityWindPlay ball
RainMildHighWeakYes
Rain Cool NormalWeakYes
Rain CoolNormalStrongNo
RainMildNormalWeakYes
RainMildHighStrongNo
Total5
Fig 3: Revised data set divided using Outlook elements Sunny, Overcast and Rain

Outlook has three different values Sunny, Overcast,Rain which will be used as decision nodes.

So, using above three new sets, information gain will be calculated for Temperature, Humidity and Wind and based on calculated information gain value further node will be selected to generate decision tree nodes. For example, if we want to calculate information gain of Humidity against Sunny then,

Gain(S, A) = Entropy(S) - ( ( |Sv|/|S| ) x Entropy(Sv) )

=) Gain(S, A) = ∑-p(I)log2p(I) - ∑( ( |Sv|/|S| ) x Entropy(Sv) ) =) Gain(Ssunny, Humidity) = (- 3/5log23/5 - 2/5log22/5) - ( ( 3/5 ) x Entropy(Shigh) ) - ( ( 2/5 ) x Entropy(Slow) )

where,
3/5 refers to Total number of No/ Total number of No and Yes
2/5 refers to Total number of Yes/ Total number of No and Yes from the set of Sunny

=) Gain(Ssunny, Humidity) = (0.4421 +0.528 ) - ( ( 3/5 ) x ∑-p(I)log2p(I) ) - ( ( 2/5 ) x ∑-p(I)log2p(I) ) )

=) Gain(Ssunny, Humidity) = (0.9708 ) - ( ( 3/5 ) x (- (3/3log2 3/3) - ( 0/3log2 0/3 ) ) ) - ( ( 2/5 ) x ( -(0/2)log2 0/2 - (2/2log22/2) ) ) )

where,
3/3 refers to the Total number of No/Total number of No and Yes for the set of high
0/3 refers to the Total number of Yes/Total number of No and Yes for the set of high
0/2 refers to the Total number of No/Total number of No and Yes for the set of low
2/2 refers to the Total number of Yes/Total number of No and Yes for the set of low

=) Gain(Ssunny, Humidity) = 0.9708 - 3/5 x (-1 x 0 ) - 2/5 x ( 0 - ( 1x 0))

=) Gain(Ssunny, Humidity) = 0.9708

So the information gain of Humidity against Ssunny set is 0.9708. Similar way, we can calculate information gain for Temperature, Wind against Ssunny.

References

During this writing following site being used as reference,

History

Version 1.0

License

This article, along with any associated source code and files, is licensed under The Code Project Open License (CPOL)