By Neha Kanneganti
The concept of decision trees may seem quite confusing or obtuse at first. However, decision trees in reality can be interpreted in a relatively straightforward way. This blog post will break down the components of the decision trees so you can better grasp this concept and how it’s used to solve classification problems.
What are Decision Trees?
In machine learning, decision trees are supervised learning algorithms. In very simple terms, it gives all the possible solutions to a problem through representing the decision-making process as a tree with new branches formed for each “choice” made. So the algorithm makes decisions by splitting up the data based on specific features. When it decides how to classify the data, it will ask itself a series of questions. These questions are often yes or no questions.
Since all the possible outcomes are being split up, the diagram will look like a family tree, with the original node being analogous to the oldest “ancestor” and each derived node in the decision tree being analogous to a “descendant.” This means that we can always trace a clear “path” through the way we answered questions in order to arrive at our final classification. Therefore, decision trees make it very clear how we ended up at a certain conclusion.
Building a Decision Tree
As an example, let’s try to build a theoretical decision tree model from scratch. In order to build a decision tree, we have to use some form of training data. So for our example, here is our training data:
The training data consists of groups A and B. It consists of three different shapes, the circle, triangle, and square, with different colors, red, blue, and yellow. Now in order to create a decision tree for this training data, we need to find the best split.
When we look at the training data, we first try to find a feature that gives us the best split. Then we can create a new node in the tree based on this split. In this example, the first split we can make is based on if the shape is blue or not.
So then our decision tree should start off like this:
We can continue to make splits and add nodes to the decision tree until a majority of the examples are correctly classified. You can do this on your own on paper or on a computer. In the end, it should look something like this:
There are many other ways to go about this and the decision tree above is an example of one that can be constructed on our data.
Hopefully, you are now familiar with decision trees and how this algorithm works. Decision trees are a simple but powerful way to do classification that is a great tool to have in your machine learning toolkit.