HTW-Berlin - Informatik und Wirtschaft - Aktuelle Trends - Machine Learning: Evaluation Exercise

Introduction

Goal of this exercise is to learn evaluation scores for a classification task in Python. You can use the Python standard library and math functions from numpy. This notebook guides you through the implementation process.

This notebooks implements tests using assert or np.testing.assert_almost_equal. If you run the corresponding notebook cell and no output appears, the test has passed. Otherwise an exception is raised.

General Hint:

If you have problems with the implementation (e.g. you don't know how to call a certain function or you don't know how to loop through the dataset), make use of the interactive nature of the notebook. You can at all times add new cells to the notebook to inspect defined variables or to try small code snippets.

Required Knowledge

This exercise is part of the course "Aktuelle Trends der Informations- und Kommunikationstechnik". The fundamentals of evaluation metrics are taught in class.

  • The PDF slides used in class are available in the educational-materials repository.

Required Python Modules

import os
import socket
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from deep_teaching_commons.data.fundamentals.iris import Iris

Required Data

First we load our dataset and do some preprocessing like reducing the number of classes to two, make the class label binary, scale our features and finally divide our dataset into train- and test set. No exercise yet, just execute the cells and try to understand the code. All these preprocessing steps are very common.

base_data_dir = os.path.expanduser('~/deep.TEACHING/data')
dm = Iris(base_data_dir=base_data_dir)  # data manager
iris = dm.dataframe()
iris.head()
df_reduced = iris.query('species == "Iris-versicolor" | species == "Iris-virginica"')
df_reduced.head()
X = df_reduced[['petal_width', 'petal_length']].values
Y = df_reduced['species'].replace({'Iris-versicolor': 0, 'Iris-virginica': 1}).values
X[:5]
Y[:5]
X.shape, Y.shape
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
X_scaled[:5]
# plot data
cp = sns.color_palette()
df_scaled = pd.DataFrame(X_scaled, columns=['x1', 'x2'])
df_scaled['y'] = Y
sns.scatterplot(data=df_scaled, x='x1', y='x2', hue='y', palette=cp[1:3]);
# split data for training and test
X_train, X_test, Y_train, Y_test = train_test_split(
    X_scaled, Y, stratify=Y, test_size=0.2, random_state=42
)
train_classes = dict(zip(*np.unique(Y_train, return_counts=True)))
test_classes = dict(zip(*np.unique(Y_test, return_counts=True)))

train_classes, test_classes

Implementation of Logistic Hypothesis

This implementation was part of the last notebook "Exercise: Logistic Regression".

def sigmoid(t):
    return 1 / (1 + np.exp(-t))
def make_logistic_hypothesis(w1, w2, b):
    def logistic_hypothesis(x1, x2):
        return sigmoid(x1 * w1 + x2 * w2 + b)
    
    return logistic_hypothesis
def make_decision_boundary(w1, w2, b, threshold):
    def decision_boundary(x1):
        return (np.log(threshold / (1 - threshold)) - x1*w1 - b) * (1 / w2)
    
    return decision_boundary
def plot_boundary(df, decision_boundary):
    sns.scatterplot(data=df, x='x1', y='x2', hue='y', palette=sns.color_palette()[1:3])
    
    spacing = np.linspace(df['x1'].min(), df['x1'].max(), 10)
    boundary_values = np.array([decision_boundary(x1) for x1 in spacing])

    plt.plot(spacing, boundary_values, label='boundary')
def make_classify(w1, w2, b, threshold):
    h = make_logistic_hypothesis(w1, w2, b)
    
    def classify(x1, x2):
        return 1 if h(x1, x2) > threshold else 0
    
    return classify
# Choose some mediocre values, to demonstrate metrics
w1, w2, b = 3.6962765211562245, 2.548083316850051, 0.01089234547433182
# plot training data
df_train = pd.DataFrame(X_train, columns=['x1', 'x2'])
df_train['y'] = Y_train
plot_boundary(df_train, make_decision_boundary(w1, w2, b, 0.5))
# plot test data
df_test = pd.DataFrame(X_test, columns=['x1', 'x2'])
df_test['y'] = Y_test
plot_boundary(df_test, make_decision_boundary(w1, w2, b, 0.5))
# create classifier
classify = make_classify(w1, w2, b, 0.5)
classify(-1, -1)

Classify Datasets

C_train = np.array([classify(x1, x2) for x1, x2 in X_train])
C_test = np.array([classify(x1, x2) for x1, x2 in X_test])
C_train[:5]
C_test[:5]
C_train.shape, C_test.shape

Exercise: Accuracy

Accuracy is defined as

$ accuracy = \frac{T}{T + F} $

where$ T $ is the number of true classifications and$ F $ is the number of false classifications on a dataset.

Task:

Implement the function accuracy below. C contains the preditions of your classifier and Y the true labels.

def accuracy(C, Y):
    raise NotImplementedError('implement this function')
train_accuracy = accuracy(C_train, Y_train)
train_accuracy
test_accuracy = accuracy(C_test, Y_test)
test_accuracy
np.testing.assert_almost_equal(accuracy(C_train, Y_train), 0.975)
np.testing.assert_almost_equal(accuracy(C_test, Y_test), 0.8)

Exercise: TP, FP, TN, FN

The following table shows a confusion matrix for the results of a binary classifier.

True Positive (TP)

Number of true (T) classifications where the classification result is class 1 (P) bestimmt wurden.

False Positive (FP)

Number of false (F) classifications where the classification result is class 1 (P) bestimmt wurden.

True Negative (TN)

Number of true (T) classifications where the classification result is class 0 (N) bestimmt wurden.

False Negative (FN)

Number of false (F) classifications where the classification result is class 0 (N) bestimmt wurden.

Task:

Implement the function tp_fp_tn_fn to calculate values for TP, FP, FN, FN on a dataset. Again C contains the preditions of your classifier and Y the true labels.

def tp_fp_tn_fn(C, Y):
    raise NotImplementedError('implement this function')
    
    tp, fp, tn, fn = 0, 0, 0, 0
                
    return tp, fp, tn, fn
train_tp, train_fp, train_tn, train_fn = tp_fp_tn_fn(C_train, Y_train)
train_tp, train_fp, train_tn, train_fn
test_tp, test_fp, test_tn, test_fn = tp_fp_tn_fn(C_test, Y_test)
test_tp, test_fp, test_tn, test_fn
np.testing.assert_almost_equal(tp_fp_tn_fn(C_train, Y_train), (39, 1, 39, 1))
np.testing.assert_almost_equal(tp_fp_tn_fn(C_test, Y_test), (7, 1, 9, 3))

Exercise: Precision and Recall

Precision and recall are defined as follows:

$ precision = \frac{TP}{TP+FP} $

$ recall = \frac{TP}{TP+FN} $

Implement a function precision_recall to calculate precision and recall.

def precision_recall(tp, fp, fn):
    raise NotImplementedError('implement this function')
    
    precision = None
    recall = None
    
    return precision, recall
train_precision, train_recall = precision_recall(train_tp, train_fp, train_fn)
train_precision, train_recall
test_precision, test_recall = precision_recall(test_tp, test_fp, test_fn)
test_precision, test_recall
np.testing.assert_almost_equal(precision_recall(39, 1, 1), (0.975, 0.975))
np.testing.assert_almost_equal(precision_recall(7, 1, 3), (0.875, 0.7))

Exercise: F-Score

The f-score metric is defined as follows:

$ F_{\beta} = (1 + \beta^2) \cdot \frac{precision \cdot recall}{(\beta^2 \cdot precision) + recall} $

Implement a function f_score to calculate the value.

def f_score(precision, recall, beta):
    raise NotImplementedError('implement this function')
train_f1_score = f_score(train_precision, train_recall, 1)
train_f1_score
train_f05_score = f_score(train_precision, train_recall, 0.5)
train_f05_score
test_f1_score = f_score(test_precision, test_recall, 1)
test_f1_score
test_f05_score = f_score(test_precision, test_recall, 0.5)
test_f05_score
np.testing.assert_almost_equal(f_score(train_precision, train_recall, 1), 0.975)
np.testing.assert_almost_equal(f_score(train_precision, train_recall, 0.5), 0.975)
np.testing.assert_almost_equal(f_score(test_precision, test_recall, 1), 0.7777777777777777)
np.testing.assert_almost_equal(f_score(test_precision, test_recall, 0.5), 0.8333333333333334)

Licenses

Notebook License (CC-BY-SA 4.0)

The following license applies to the complete notebook, including code cells. It does however not apply to any referenced external media (e.g. images).

HTW-Berlin - Informatik und Wirtschaft - Aktuelle Trends - Machine Learning: Evaluation Exercise
by Christoph Jansen (deep.TEACHING - HTW Berlin)
is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on a work at https://gitlab.com/deep.TEACHING.

Code License (MIT)

The following license only applies to code cells of the notebook.

Copyright 2018 Christoph Jansen

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.