Classification using Decision Trees

Decision tree based classification uses a decision tree (as a predictive model) to go from observations about an item (represented in the branches) to conclusions about the item’s target value (represented in the leaves).

Overall Process of Building Decision Tree (Source:Swetha,2017)

Classification problem specific to decision trees include

  • Classifying male/female based on the first name of user
  • Classifying rainy or cloudy based on the time of day
  • Classifying Spam/ham based on the message received.

I will consider the first example of classifying male/female based on the first name of user.

If we are to do the above task manually, we would have a decision based tree as follows where patterns such as vowel-endings are used for classification of either male or female.

Decision tree models works similarly.The decision tree model finds patterns from the training data set and builds its own splitting rules which can be used for classification.

The Gini Index is used as the cost function used when constructing a decision tree.The Gini Index is used to form the rules in the training dataset.

Summarized Gini Index (Source: Siraj Raval)

A Gini score gives an idea of how good a split is by how mixed the classes are in the two labeled groups created by the split. A perfect separation results in a Gini score of 0, whereas the worst case split that results in 50/50 classes. We calculate it for every row and split the data accordingly in our binary tree. We repeat this process recursively.

Example use of Gini to split dataset

One problem that might occur with Decision Trees is that it can overfit/underfit. That is the Decision Tree can “memorize” the training set.

Ensemble learning techniques could be used to reduce the effects of this. These include using Random forest, Gradient boosted trees.

Random Forest

What random forest does is construct multiple Decision Trees and get a majority vote when classifying.

Random Forest Overview (Source:Swetha,2017)

Further Reference

Senior Software Engineer at 99x | Freelance developer | Interested in History and Philosophy

Senior Software Engineer at 99x | Freelance developer | Interested in History and Philosophy