Introduction
The algorithm ID3 (Quinlan) uses the method top-down induction of decision trees. Given a set of classified examples a decision tree is induced, biased by the information gain measure, which heuristically leads to small trees. The examples are given in attribute-value representation. The set of possible classes is finite. Only tests, that split the set of instances of the underlying example languages depending on the value of a single attribute are supported.
Details
Depending on whether the attributes are nominal or numerical, the tests either
- have a successor for each possible attribute value, or
- split according to a comparison of an attribute value to a constant, or depending on if an attribute value belongs to a certain interval or not.
The algorithm starts with the complete set of examples, a set of possible tests and the root node as the actual node. As long as the examples propagated to a node do not all belong to the same class and there are tests left,
- a test with highest information gain is chosen,
- the corresponding set of successors is created for the actual node,
- each example is propagated to the successor given by the chosen test,
- ID3 is called recursively for all successors.
How it works
The core of sample is builded with 3 classes (Attribute
, TreeNode
and DecisionTreeID3
).
TreeNode
- are the nodes of the decision tree;
Attribute
- is the class with have a name e any possible values;
DecisionTreeID3
- is the class what get a data samples and generate a decision tree.