Padhai Time

Hands On Practice for Precision and Recall

Let us take the same titanic data set and create one machine learning model. For model evaluation we will use all the metrics which we have discussed so far.

If you have worked on some machine learning model, then you would be familier about the flow. It contains various steps in sequence:

1) Import Library

2) Reading Data

3) Preprocessing Data

4) Training the Model

5) Model Predictions

6) Evaluating the Model output

## Import Libraries

import pandas as pd

import numpy as np

from sklearn.model_selection import train_test_split

from sklearn.ensemble import RandomForestClassifier

from sklearn.metrics import precision_score, recall_score, accuracy_score, f1_score, fbeta_score

## Import Data

data = pd.read_csv("titanic-data.csv")

data = data[['Fare', 'Age', 'Sex', 'Survived']]

print("No. of records are:", data.shape)

data.head()

## PreProcessing

def replace_string(value):

    if value == "male":

        return 0

    if value == "female":

        return 1

data['Sex'] = data['Sex'].apply(replace_string)

data['Age'] = data['Age'].fillna(data['Age'].mean())

data['Age'].describe()

data.head()

## Model Training

X = data.drop(['Survived'], axis = 1)

y = data['Survived']

X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, random_state=42, test_size=0.3)

rc = RandomForestClassifier()

rc.fit(X_train, y_train)

## Model Predictions

predictions = rc.predict(X_test)

y_test = y_test.reset_index(drop = True)

result = pd.DataFrame()

result['Actual_Target'] = y_test

result['Predicted_Target'] = predictions

result.head()

## Confusion Matrix

pd.crosstab(result['Predicted_Target'], result['Actual_Target'])

## Calculating Metrics - Manually

tp = result[(result['Actual_Target'] == 1) & (result['Predicted_Target'] == 1)].shape[0]

fn = result[(result['Actual_Target'] == 1) & (result['Predicted_Target'] == 0)].shape[0]

tn = result[(result['Actual_Target'] == 0) & (result['Predicted_Target'] == 0)].shape[0]

fp = result[(result['Actual_Target'] == 0) & (result['Predicted_Target'] == 1)].shape[0]

Accuracy = (tp + tn) / (tp + fn + tn + fp)

Precision = (tp) / (tp + fp)

Recall = (tp) / (tp + fn)

F1Score = 2*Precision*Recall/(Precision+Recall)

Precision, Recall, Accuracy, F1Score

Output:

(0.6831683168316832, 0.6699029126213593, 0.753731343283582, 0.6764705882352942)

## Calculating Metrics - Using Library

print("Precision:", precision_score(y_test, predictions))

print("Recall:", recall_score(y_test, predictions))

print("Accuracy:", accuracy_score(y_test, predictions))

print("F1 Score:", f1_score(y_test, predictions))

print("FBeta Score - Equal weights:", fbeta_score(y_test, predictions, beta=1))

print("FBeta Score - 0.5 weight:", fbeta_score(y_test, predictions, beta=0.5))

print("FBeta Score - 2 weight:", fbeta_score(y_test, predictions, beta=2))

Output:

Precision: 0.6831683168316832

Recall: 0.6699029126213593

Accuracy: 0.753731343283582

F1 Score: 0.6764705882352942

FBeta Score - Equal weights: 0.6764705882352942

FBeta Score - 0.5 weight: 0.6804733727810651

FBeta Score - 2 weight: 0.672514619883041

If you want to download the jupyter notebook file for the above code, click here.

Bengaluru, India

contact.padhaitime@gmail.com