C4.5 Decision Tree Implementation

It’s time to go deeper in decision tree induction. In this post, I’ll give summary on real-world implementation (i.e. the implementation has been used in actual data mining scenario) called C4.5.

C4.5

C4.5 is collection of algorithms for performing classifications in machine learning and data mining. It develops the classification model as a decision tree. C4.5 consists of three groups of algorithm: C4.5, C4.5-no-pruning and C4.5-rules. In this summary, we will focus on the basic C4.5 algorithm

Algorithm

In a nutshell, C4.5 is implemented recursively with this following sequence

  1.     Check if algorithm satisfies termination criteria
  2.     Computer information-theoretic criteria for all attributes
  3.     Choose best attribute according to the information-theoretic criteria
  4.     Create a decision node based on the best attribute in step 3
  5.     Induce (i.e. split) the dataset based on newly created decision node in step 4
  6.     For all sub-dataset in step 5, call C4.5 algorithm to get a sub-tree (recursive call)
  7.     Attach the tree obtained in step 6 to the decision node in step 4
  8.     Return tree

Continue reading C4.5 Decision Tree Implementation