

) Misclassified when cutting at 100kg: 18 In the example above, impurity will include the percentage of people that weight >=100 kg that are not obese and the percentage of people with weight=100) & (data=0),:].shape, "\n",ĭata.loc>=80) & (data=0),:].shape Impurity refers to the fact that, when we make a cut, how likely is it that the target variable will be classified incorrectly. In the case of decision trees, there are two main cost functions: the Gini index and entropy.Īny of the cost functions we can use are based on measuring impurity. Impurity and cost functions of a decision treeĪs in all algorithms, the cost function is the basis of the algorithm. How does the algorithm decide which variable to use as the first cutoff? How do you choose the values? Let’s see it little by little programming our own decision tree from scratch in Python. Now you know the bases of this algorithm, but surely you have doubts.
#DECISION TREE VISUALIZATION PYTHON CODE#
In fact, we will code a decision tree from scratch that can do both. Let’s see a graphic example:īesides,a decision trees can work for both regression problems and for classification problems. This last node is known as a leaf node or leaf node. This is so until we get to a node that does not split. el sitio web de Betcris PerúĪs you can see, decision trees usually have sub-trees that serve to fine-tune the prediction of the previous node. Thus, the decision tree continues to create more branches that generate new conditions to “refine” our predictions.

However, that cut will not be precise: there will be people who weigh 100kg or more who are not obese. In that case, a decision tree would tell us different rules, such as that if the person’s weight is greater than 100kg, it is most likely that the person is obese. Based on the description of the dataset (available on Kaggle), people with an index of 4 or 5 are obese, so we could create a variable that reflects this: data = (data.Index >= 4).astype('int')ĭata.drop('Index', axis = 1, inplace = True) Imagine that we want to predict whether or not the person is obese. To do this, we will use the following dataset. For example, let’s say we train an algorithm that predicts whether or not a person is obese based on their height and weight. Does it sound interesting? Let’s get to it! Understanding how a decision tree worksĪ decision tree consists of creating different rules by which we make the prediction. To do this, we are going to create our own decision tree in Python from scratch. However, do you know how it works? In this post I am going to explain everything that you need about decision trees. The decision tree is one of the most widely used machine learning algorithms due to its ease of interpretation.
