Exercise - Neural Network with PyTorch

Introduction
Requirements
- Knowledge
- Modules
Data Generation
Exercises
Summary and Outlook
Literature
Licenses

Introduction

In this exercise you will be presented a classification problem with two classes and two features. The classes are not linearly separable. First you will implement the logistic regression, which will yield a very bad decision boundary. Then you will extend your model with a hidden layer consisting of two hidden neurons only. By executing the plots you will see, that these two hidden neurons are already almost enough to find a decision boundary, that separates our data much better.

Finally, you will implement a neural network with multiple hidden layers to solve the problem without any missclassifications.

Requirements

Knowledge

You should have a basic knowledge of:

Logistic regression
Logistic function
Tanh as activation function
Cross-entropy loss
Gradient descent
numpy
matplotlib

Suitable sources for acquiring this knowledge are:

Logistic Regression Notebook by Christian Herta and corresponding lecture slides (German)
Deep Learning Book by Ian Goodfellow
numpy quickstart
Matplotlib tutorials
PyTorch basics:
Tutorial about PyTorch's tensors
Tutorial about PyTorch's autograd
PyTorch Optimizer
PyTorch nn Module

Python Modules

By deep.TEACHING convention, all python modules needed to run the notebook are loaded centrally at the beginning.

# External Modules
import numpy as np
import matplotlib.pyplot as plt

import torch
import torch.nn as nn
import torch.optim as optim

torch.manual_seed(1)

%matplotlib inline

Data Generation

For convenience and visualization, we will only use two features in this notebook, so we are still able to plot them together with the target class and decision boundary

First we will create some artificial data:

-$ m_1 = 10 $ examples for class 0 -$ m_2 = 15 $ examples for class 1 -$ n = 2 $ features for each example

No exercise yet, just execute the cells.

m1 = 10
m2 = 15
m = m1 + m2
n = 2
X = np.ndarray((m,n))
X.shape

y = np.zeros((m))
y[m1:] = y[m1:] + 1.0
y

### Execute this to generate linearly sperable data
def x2_function_class_0(x):
    return -x*2 + 2

def x2_function_class_1(x):
    return -x*2 + 4

### Execute this to generate NOT linearly sperable data
def x2_function_class_0(x):
    return np.sin(x)

def x2_function_class_1(x):
    return np.sin(x) + 1

x1_min = -5
x1_max = +5

X[:m1,0] = np.linspace(x1_min, x1_max, m1)
X[m1:,0] = np.linspace(x1_min+0.5, x1_max-0.2, m2)
X[:m1,1] = x2_function_class_0(X[:m1,0])
X[m1:,1] = x2_function_class_1(X[m1:,0])

def plot_data():
    plt.scatter(X[:m1,0], X[:m1,1], alpha=0.5, label='class 0 train data')
    plt.scatter(X[m1:,0], X[m1:,1], alpha=0.5, label='class 1 train data')

    plt.plot(x1_line, x2_line_class_0, alpha=0.2, label='class 0 true target func')
    plt.plot(x1_line, x2_line_class_1, alpha=0.2, label='class 1 true target func')
    plt.legend(loc=1)

x1_line = np.linspace(x1_min, x1_max, 100)
x2_line_class_0 = x2_function_class_0(x1_line)
x2_line_class_1 = x2_function_class_1(x1_line)    

plot_data()

Exercises

Convert the Data to torch tensors

Task:

Convert numpy arrays to tensors.

###############################
##### YOUR SOLUTION START #####
###############################
#
# Task: Convert numpy arrays to tensors
#

###############################
##### YOUR SOLUTION End   #####
###############################

### If your implementation is correct, these tests should not throw and exception

print(X_tensor.shape) ### should be [25,2]
print(y_tensor.shape) ### should be [25,1]

assert X_tensor.shape[0] == 25
assert X_tensor.shape[1] == 2
assert y_tensor.shape[0] == 25
assert y_tensor.shape[1] == 1

Logistic Regression

Task:

Implement the class below or logistic regression.
Use torch.nn.Linear and torch.nn.Sigmoid.
Add both as class members.
Data flow should be like: -$ x \rightarrow linear \rightarrow sigmoid \rightarrow \hat y $
- mathematically: $ \hat y = sigmoid(linear(\vec x)) $
- with$ \hat y $ the prediction of your model

The following picture visualizes the data flow:

class LogisticRegression(nn.Module):  # inheriting from nn.Module!
    def __init__(self, num_labels, num_features):
        
        super(LogisticRegression, self).__init__()

        ###############################
        ##### YOUR SOLUTION START #####
        ###############################

        raise NotImplementedError()

        ###############################
        ##### YOUR SOLUTION End   #####
        ###############################


    def forward(self, x):
        
        ###############################
        ##### YOUR SOLUTION START #####
        ###############################

        raise NotImplementedError()

        ###############################
        ##### YOUR SOLUTION End   #####
        ###############################

NUM_LABELS = 1
NUM_FEATURES = 2
model = LogisticRegression(NUM_LABELS, NUM_FEATURES)

### Should output something like:
###
### LogisticRegression(
###  (linear): Linear(in_features=2, out_features=1, bias=True)
###  (sigmoid): Sigmoid()
### )
print(model)

Train your model

Task:

Iteratively train your model
Use torch.nn.BCELoss as cost function
Use any Optimizer from torch.optim

Hint:

Print the costs every ~100 epochs for instant feedback

###############################
##### YOUR SOLUTION START #####
###############################
#
# Task: Create a new model and train with with built-in cost and optimizer


###############################
##### YOUR SOLUTION End   #####
###############################

### With this function you can access your trained model parameters
model.state_dict()

model.state_dict().keys()

### depending on the name of your class members you might have to adjust the keys
weights = model.state_dict()['linear.weight'].detach().numpy()
bias = model.state_dict()['linear.bias'].detach().numpy()

Plot Decision Boundary

With acces to our trained model parameters we can plot the decision boundary together with out data. Execeuting the cell below should result in a plot like the following:

As we should have already known, using plain logistic regression we cannot seperate our dataset very well.

### Plot the data and decision boundary, just execute this cell

x2_boundary = (-bias[0] -weights[0,0]*x1_line)/weights[0,1]
plt.plot(x1_line, x2_boundary, c='g', label='boundary')

plot_data()

Adding Hidden Layer

Now we are going to add one hidden layer consisting of two neurons. For the hidden layer neurons use the activation function torch.nn.Tanh instead of torch.nn.Sigmoid.

Task:

Implement the class HiddenLayerNN
Use nn.ModuleList as class member to store the linear layers
Use another of these list objects to store the activation functions (tanh and sigmoid)
Also store the intermediate results in class members (linear_results, activation_results) when doing the calculations in the forward pass
The data flow should be like the following: -$ x \rightarrow linear \rightarrow tanh \rightarrow linear \rightarrow sigmoid \rightarrow \hat y $
- mathematically:$ \hat y = sigmoid(linear(tanh(linear(\vec{x})))) $
- with$ \hat y $ the prediction of your model

class HiddenLayerNN(nn.Module):  # inheriting from nn.Module!
    def __init__(self, num_labels, num_features, num_hidden):
        
        super(HiddenLayerNN, self).__init__()

        self.linear_modules = nn.ModuleList()
        self.activation_modules = nn.ModuleList()
        self.linear_results = []
        self.activation_results = []
        
        ###############################
        ##### YOUR SOLUTION START #####
        ###############################
        #
        # Task: add linear module and tanh and sigmoid functions to the ModuleLists


        ###############################
        ##### YOUR SOLUTION End   #####
        ###############################

    def forward(self, x):
                                         
        self.linear_results = [] ### clear after every run
        self.activation_results = [] ### clear after every run
        
        ###############################
        ##### YOUR SOLUTION START #####
        ###############################
        #
        # Task: iterate through both ModuleLists
        #
        #       save intermediate results in the python lists


        ###############################
        ##### YOUR SOLUTION End   #####
        ###############################

        return x_

NUM_LABELS = 1
NUM_FEATURES = 2
NUM_HIDDEN = 2
model = HiddenLayerNN(NUM_LABELS, NUM_FEATURES, NUM_HIDDEN)

### Should output something like:
### 
### HiddenLayerNN(
###   (linear_modules): ModuleList(
###     (0): Linear(in_features=2, out_features=2, bias=True)
###     (1): Linear(in_features=2, out_features=1, bias=True)
###   )
###   (activation_modules): ModuleList(
###     (0): Tanh()
###     (1): Sigmoid()
###   )
### )
print(model)

Train your model

Task:

Iteratively train your model
Use torch.nn.BCELoss as cost function
Use any Optimizer from torch.optim

Sidenote:

With only 1 hiddenlayer of 2 hidden neurons, we can be unlucky with bad weight initialization. If the plots some cells below after the training do not exactly look like the sample pictures, rerun your training several times.

###############################
##### YOUR SOLUTION START #####
###############################
#
# Task: Create a new model and train with built-in cost and optimizer


###############################
##### YOUR SOLUTION End   #####
###############################

Now we are going to plot two things:

1st: The two neurons in the hidden layer representing the original data, but transformed into another 2D space. Though, for our data following two different$ sin $ functions, these two hidden neurons are still not enough to transform our data to be linearly seperable
2nd: We can also plot the decision boundary in our original space.

If your implementation is correct and training did succeed, your plots could look similiar to the following:

### plot hidden transformation of X with learned w1s (transformation) and learned w2s (boundary)
###
### ATTENTION: ONLY WORKDS IF HIDDEN LAYER has 2 neurons only
###

def plot_last_hidden_layer_feature_space(model):
    preds = model(X_tensor)
    A1 = model.activation_results[-2].detach().numpy()
    plt.scatter(A1[:m1,0], A1[:m1,1], alpha=0.5, label='class 0')
    plt.scatter(A1[m1:,0], A1[m1:,1], alpha=0.5, label='class 1')

    ### Plot true target functions
    data_tmp_tensor = torch.tensor(np.ndarray((len(x1_line), 2)), dtype=torch.float32)
    data_tmp_tensor[:,0] = torch.from_numpy(x1_line)
    data_tmp_tensor[:,1] = torch.from_numpy(x2_line_class_0)

    preds = model(data_tmp_tensor)
    A1 = model.activation_results[-2].detach().numpy()
    plt.plot(A1[:,0], A1[:,1])

    data_tmp_tensor[:,1] = torch.from_numpy(x2_line_class_1)
    preds = model(data_tmp_tensor)
    A1 = model.activation_results[-2].detach().numpy()
    plt.plot(A1[:,0], A1[:,1])

    ### Plot boundary
    preds = model(X_tensor)
    keys = list(model.state_dict().keys())
    weights = model.state_dict()[keys[-2]].detach().numpy()
    bias = model.state_dict()[keys[-1]].detach().numpy()

    x1_boundary_mlp = np.linspace(-1, +1, 10)
    x2_boundary_mlp = (-bias[0] -weights[0,0]*x1_boundary_mlp)/weights[0,1]
    plt.plot(x1_boundary_mlp, x2_boundary_mlp, c='g')
    plt.legend(loc=2)
    plt.title('Data and boundary in hidden space')
    

### Plot transformations
plot_last_hidden_layer_feature_space(model)

def plot_boundary_in_original_space(model):
    grid_density = 100
    x1 = np.linspace(X[:,0].min()-1,X[:,0].max()+1,grid_density)
    x2 = np.linspace(X[:,1].min()-1,X[:,1].max()+1,grid_density)
    mash = np.meshgrid(x1,x2)

    data_tmp = np.ndarray((grid_density**2, n))
    data_tmp[:,0] = mash[0].flatten()
    data_tmp[:,1] = mash[1].flatten()
    data_tmp_tensor = torch.tensor(data_tmp, dtype=torch.float32)

    preds = model(data_tmp_tensor).detach().numpy()
    print(preds.flatten())
    c0 = data_tmp[preds.flatten() < 0.5]
    c1 = data_tmp[preds.flatten() >= 0.5]
    plt.scatter(c0[:,0],c0[:,1], alpha=1.0, marker='s', color="#aaccee")
    plt.scatter(c1[:,0],c1[:,1], alpha=1.0, marker='s', color="#eeccaa")
    plot_data()
    plt.title('Data and boundary in original space')
    
    
### plot boundary in original space
plot_boundary_in_original_space(model)

Adding more Layers and Parametrization

Task:

Now implement the class MultiHiddenLayerNN. This class accepts as parameter num_hidden, a list specifying how many neurons in each layer should be.

Instead of the parameter num_features the first entry in num_hidden specifys the number of our features (2).

Finalize with the Sigmoid function

Hint:

Use a loop to iterate through the entries of num_hidden to add linear layer modules and tanh activation functions.
Again, save intermediate results during your forward function in the according python lists.
Note that any hidden layer may have a different number of neurons. BUT the last hidden layer must have exactly 2 neurons, when we want to visualize the last stage of feature space transformation.

multi_hidden_layer_binary_classification_last_hidden_2_neurons.svg

class MultiHiddenLayerNN(nn.Module):  # inheriting from nn.Module!
    def __init__(self, num_labels, num_hidden):
        super(MultiHiddenLayerNN, self).__init__()

        self.linear_modules = nn.ModuleList()
        self.activation_modules = nn.ModuleList()
        self.linear_results = []
        self.activation_results = []
        
        
        ###############################
        ##### YOUR SOLUTION START #####
        ###############################
        #
        # Task: add linear modulte and tanh and sigmoid functions to the ModuleLists


        ###############################
        ##### YOUR SOLUTION End   #####
        ###############################

    def forward(self, x):
                                         
        self.linear_results = [] ### clear after every run
        self.activation_results = [] ### clear after every run
        
        ###############################
        ##### YOUR SOLUTION START #####
        ###############################
        #
        # Task: iterate through both ModuleLists
        #
        #       save intermediate results in the python lists


        ###############################
        ##### YOUR SOLUTION End   #####
        ###############################

        return x_

NUM_LABELS = 1

### Meaning: 2 features, 
### 20 neurons in 1st hidden layer, 
### 20 in a 2nd hidden layer, 
### 2 in a 3rd hidden layer
NUM_HIDDEN = [2,20,20,2]
model = MultiHiddenLayerNN(NUM_LABELS, NUM_HIDDEN)

Train your model

Task:

Iteratively train your model
Use torch.nn.BCELoss as cost function
Use any Optimizer from torch.optim

###############################
##### YOUR SOLUTION START #####
###############################
#
# Task: Create a new model and train with built-in cost and optimizer


###############################
##### YOUR SOLUTION End   #####
###############################

The plots now should show, that we achieved to transform our data into a feature space, where the data is linearly seperable:

Plots should look similar to:

pytorch_multi_hidden_layer_vis_transformed.png.png

### Plot transformations
plot_last_hidden_layer_feature_space(model)

### plot boundary in original space
plot_boundary_in_original_space(model)

Freestyle Exercise

When you are finished you can try out a lot of different things and see how the hidden feature space and the final decision boundary change. Here are some ideas to try out:

adjust the number of neurons in the hidden layer (more / less)
add more / bigger layers
different activation functions $ tanh $,$ logistic SINGLE``SINGLE relu $)
you can try to use mean-squared-error as cost-function
you can also try to change the true target functions and make it harder for the network to seperate the data !

Things you should note when trying different things:

In order to plot the transformed hidden space, the last hidden layer may only consist of 2 neurons
When using$ relu $ as activation function, do not use it for a Layer with only 2 neurons. Chances are high, that wheigths are negative and you end up with 2 dead neurons.

Summary and Outlook

In this notebook you learned how to set up a more complex and deeper neural network using the tools from previous exercies. You also saw how these more complex networks are able to perform non-linear classification due to feature transformation using (more) hidden layers and how to vizualise this process. All techniques learned here can be used for even more complex problem while using PyTorch as framework.

Licenses

Notebook License (CC-BY-SA 4.0)

The following license applies to the complete notebook, including code cells. It does however not apply to any referenced external media (e.g., images).

Exercise - Neural Network with PyTorch
by Klaus Strohmenger
is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on a work at https://gitlab.com/deep.TEACHING.

Code License (MIT)

The following license only applies to code cells of the notebook.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.

Exercise - Neural Network with PyTorch

Table of Contents

Introduction

Requirements

Knowledge

Python Modules

Data Generation

Exercises

Convert the Data to torch tensors

Logistic Regression

Train your model

Plot Decision Boundary

Adding Hidden Layer

Train your model

Adding more Layers and Parametrization

Train your model

Freestyle Exercise

Summary and Outlook

Licenses

Notebook License (CC-BY-SA 4.0)

Code License (MIT)