How to Create a Decision Tree Classifier in Python using sklearn

In this article, we show how to create a decision tree classifier in Python using the sklearn module.

So a decision tree classifier is a tool in machine learning that allows us to make a prediction of what is likely to happen based on a given data set.

It is a predictor in machine learning that is a form of supervised learning in which the computer programs predicts what will happen based on past occurrences.

For example, if it's raining outside and the rain has caused children in the past to not play outside, then if we know that it's raining, the likely result is that children are not playing outside. If it is sunny outside and children normally play while it is sunny, we can predict that children are playing outside.

So using a training set of data, a machine learning program can predict to fairly well accurately what will occur given the circumstances.

So below, we will use a decision tree classifier to classify outcomes.

So our scenario is, we want to decide if it is likely that kids will play outside given the weather conditions: the temperature, humidity, and whether or not it is windy.

We put our data in a CSV file. This file can be found at the following link: Play.csv

Below is the Python code that uses a decision tree to classify the outcome whether it is likely the children play or not, given the temperature, humidity, and whether it is windy.

import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns df= pd.read_csv('Played.csv') from sklearn.model_selection import train_test_split X= df.drop(columns=['Played'], axis=1) y= df['Played'] X_train, X_test, y_train, y_test= train_test_split(X,y,test_size= 0.3) from sklearn.tree import DecisionTreeClassifier dtree= DecisionTreeClassifier() dtree.fit(X_train,y_train) predictions= dtree.predict(X_test) from sklearn.metrics import classification_report, confusion_matrix print(confusion_matrix(y_test,predictions)) print('\n') print(classification_report(y_test,predictions))

The first thing we have to do is import our modules, including pandas, numpy, matplotlib, seaborn, and sklearn.

We create a variable, df, and set it equal to, pd.read_csv('Play.csv'), which reads the contents of the "Play.csv" file.

We create a variable, X, which will contain all columns of a dataframe object except the column that represents the outcome, which is whether the children went out to play.

We then create a variable, y, which represents the column of whether the children played or not.

The line, X_train, X_test, y_train, y_test= train_test_split(X,y,test_size= 0.3), gives us x training data, x testing data, y training dta, and y testing data. This is done using t the train_test_split() function. It allows us to have training data and testing data.

We then create a variable, dtree, and set it equal to DecisionTreeClassifier()

We then train the model using the fit() function. We feed it the training data.

We then create a variable, predictions, which works to predict the results of the test data.

We then want to see the metrics of how well the model predicted data from the test set.

We do a confusion_matrix and a classification report to show the results of how well the model predicted outcomes.

The results are shown below.

[[3 0] [0 4]] precision recall f1-score support 0 1.00 1.00 1.00 3 1 1.00 1.00 1.00 4 accuracy 1.00 7 macro avg 1.00 1.00 1.00 7 weighted avg 1.00 1.00 1.00 7

The confusion matrix can tell us information about true negatives, false positives, false negatives, and true positives. In this case, there were no false negatives or false positives. There were only true positives and true negatives.

The classification report showed 100% precision with the machine learning model.

So a decision tree classifier can be used to help predict outcomes based on certain given conditions.

The more training data you feed into the machine learning model, the more accurate the model will be. The more it will learn from the training data to be able to accurately predict test data. So keep in mind that you want to give it a good amount of training data. The more training data it has, the more accurate it can be.

And this is how to create a decision tree classifier in Python using the sklearn module.

Related Resources

How to Randomly Select From or Shuffle a List in Python

HTML Comment Box is loading comments...

Learning about Electronics

How to Create a Decision Tree Classifier in Python using sklearn