It’s time to go deeper in decision tree induction. In this post, I’ll give summary on real-world implementation (i.e. the implementation has been used in actual data mining scenario) called C4.5.

**C4.5**

C4.5 is collection of algorithms for performing classifications in machine learning and data mining. It develops the classification model as a decision tree. C4.5 consists of three groups of algorithm: C4.5, C4.5-no-pruning and C4.5-rules. In this summary, we will focus on the basic C4.5 algorithm

#### Algorithm

In a nutshell, C4.5 is implemented recursively with this following sequence

- Check if algorithm satisfies termination criteria
- Computer information-theoretic criteria for all attributes
- Choose best attribute according to the information-theoretic criteria
- Create a decision node based on the best attribute in step 3
- Induce (i.e. split) the dataset based on newly created decision node in step 4
- For all sub-dataset in step 5, call C4.5 algorithm to get a sub-tree (recursive call)
- Attach the tree obtained in step 6 to the decision node in step 4
- Return tree