Let us take the same titanic data set and create one machine learning model. For model evaluation we will use all the metrics which we have discussed so far.
If you have worked on some machine learning model, then you would be familier about the flow. It contains various steps in sequence:
1) Import Library
2) Reading Data
3) Preprocessing Data
4) Training the Model
5) Model Predictions
6) Evaluating the Model output
## Import Libraries
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.metrics import precision_score, recall_score, accuracy_score, f1_score, fbeta_score
## Import Data
data = pd.read_csv("titanic-data.csv")
data = data[['Fare', 'Age', 'Sex', 'Survived']]
print("No. of records are:", data.shape)
data.head()
## PreProcessing
def replace_string(value):
if value == "male":
return 0
if value == "female":
return 1
data['Sex'] = data['Sex'].apply(replace_string)
data['Age'] = data['Age'].fillna(data['Age'].mean())
data['Age'].describe()
data.head()
## Model Training
X = data.drop(['Survived'], axis = 1)
y = data['Survived']
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, random_state=42, test_size=0.3)
rc = RandomForestClassifier()
rc.fit(X_train, y_train)
## Model Predictions
predictions = rc.predict(X_test)
y_test = y_test.reset_index(drop = True)
result = pd.DataFrame()
result['Actual_Target'] = y_test
result['Predicted_Target'] = predictions
result.head()
## Confusion Matrix
pd.crosstab(result['Predicted_Target'], result['Actual_Target'])
## Calculating Metrics - Manually
tp = result[(result['Actual_Target'] == 1) & (result['Predicted_Target'] == 1)].shape[0]
fn = result[(result['Actual_Target'] == 1) & (result['Predicted_Target'] == 0)].shape[0]
tn = result[(result['Actual_Target'] == 0) & (result['Predicted_Target'] == 0)].shape[0]
fp = result[(result['Actual_Target'] == 0) & (result['Predicted_Target'] == 1)].shape[0]
Accuracy = (tp + tn) / (tp + fn + tn + fp)
Precision = (tp) / (tp + fp)
Recall = (tp) / (tp + fn)
F1Score = 2*Precision*Recall/(Precision+Recall)
Precision, Recall, Accuracy, F1Score
Output:
(0.6831683168316832, 0.6699029126213593, 0.753731343283582, 0.6764705882352942)
## Calculating Metrics - Using Library
print("Precision:", precision_score(y_test, predictions))
print("Recall:", recall_score(y_test, predictions))
print("Accuracy:", accuracy_score(y_test, predictions))
print("F1 Score:", f1_score(y_test, predictions))
print("FBeta Score - Equal weights:", fbeta_score(y_test, predictions, beta=1))
print("FBeta Score - 0.5 weight:", fbeta_score(y_test, predictions, beta=0.5))
print("FBeta Score - 2 weight:", fbeta_score(y_test, predictions, beta=2))
Output:
Precision: 0.6831683168316832
Recall: 0.6699029126213593
Accuracy: 0.753731343283582
F1 Score: 0.6764705882352942
FBeta Score - Equal weights: 0.6764705882352942
FBeta Score - 0.5 weight: 0.6804733727810651
FBeta Score - 2 weight: 0.672514619883041
If you want to download the jupyter notebook file for the above code, click here.
We collect cookies and may share with 3rd party vendors for analytics, advertising and to enhance your experience. You can read more about our cookie policy by clicking on the 'Learn More' Button. By Clicking 'Accept', you agree to use our cookie technology.
Our Privacy policy can be found by clicking here