Neural Network Framework - Exercise: Fully Connected Network

Introduction

For a better understanding of neural networks, you will start to implement a framework on your own. The given notebook explains some core functions and concepts of the framework, so all of you have the same starting point. Our previous exercises were self-contained and not very modular. You are going to change that. Let us begin with a fully connected network on the now well-known MNIST dataset. The Pipeline will be:

  • Define a model architecture
  • Construct a neural network based on the architecture
  • Define an evaluation criteria
  • Optimize the model (training)

Read the whole notebook carefully to understand how the pipeline works even if there is no specific implementation work required from you.

Requirements

Knowledge

TODO

Python-Modules

# third party
import numpy as np
from deep_teaching_commons.data.fundamentals.mnist import Mnist

Data

We load the MNIST dataset. Have a look at the data structure that is necessary to use feed data into the framework. A batch is a 4d tensor with: (image_i, channel, width, height).

# create mnist loader from deep_teaching_commons
mnist_loader = Mnist(data_dir='data/MNIST')

# load all data, labels are not one-hot-encoded, images are flatten and pixel squashed between [0,1]
train_images, train_labels, test_images, test_labels = mnist_loader.get_all_data(flatten=False, one_hot_enc=False, normalized=True)
print(train_images.shape, train_labels.shape)

# reshape to match general framework architecture 
train_images, test_images = train_images.reshape(60000, 1, 28, 28), test_images.reshape(10000, 1, 28, 28)            
print(train_images.shape, train_labels.shape)

# shuffle training data
shuffle_index = np.random.permutation(60000)
train_images, train_labels = train_images[shuffle_index], train_labels[shuffle_index]

Towards a Neural Network Framework

To create custom models you have to be able to define layers and activation functions in a modular way. Layers and activation functions are therefore modelled as objects. Each object that you want to use has to implement a forward and a backward method that is used later by the NeuralNetwork class. Additionally the self.params attribute is mandatory to meet the specification of the NeuralNetwork class. It is used to store all learnable parameters that you need for the optimization algorithm. Implemented that way you can use the objects as building blocks and stack them up to create a custom model. Be aware of using an activation function after each layer except the last one, cause the softmax function is applied by default during loss calculation of the network output.

After completing this notebook you can move the implemented functions to the script files for further development. The framework consists of the following files:

  • layer.py
  • activation_func.py
  • network.py
  • cost_func
  • optimizer.py
  • utils.py

After processing the notebook, it certainly becomes clear which functionality belongs into which file.

Exercise: Define Layers

The first layers added to the framework are a flatten and a fully-connected layer, which we need to build an architecture for the corresponding fully connected network — sidenote sometimes, depending on the framework, the term dense layer is used instead of fully-connected.

All kind of neural network layers and regularization techniques that can be inserted as layers into a architecture will be implemented in the file layer.py later.

Task:

Implement the methods FullyConnected.forward and FullyConnected.backward.

class Flatten(object):
    ''' Flatten layer used to reshape inputs into vector representation
    
    Layer should be used in the forward pass before a dense layer to 
    transform a given tensor into a vector. 
    '''
    def __init__(self):
        self.params = []

    def forward(self, X):
        ''' Reshapes a n-dim representation into a vector 
            by preserving the number of input rows.
        
        Examples:
            [10000,[1,28,28]] -> [10000,784]
        '''
        self.X_shape = X.shape
        self.out_shape = (self.X_shape[0], -1)    
        out = X.reshape(-1).reshape(self.out_shape)
        return out

    def backward(self, dout):
        ''' Restore dimensions before flattening operation
        '''
        out = dout.reshape(self.X_shape)
        return out, []

class FullyConnected(object):
    ''' Fully connected layer implemtenting linear function hypothesis 
        in the forward pass and its derivation in the backward pass.
    '''
    def __init__(self, in_size, out_size):
        ''' initialize all learning parameters in the layer
        
        Weights will be initialized with modified Xavier initialization.
        Biases will be initialized with zero. 
        '''
        self.W = np.random.randn(in_size, out_size) * np.sqrt(2. / in_size)
        self.b = np.zeros((1, out_size))
        self.params = [self.W, self.b]

    def forward(self, X):
        ''' Linear combiationn of images, weights and bias terms
            
        Args:
            X: Matrix of images (flatten represenation)
    
        Returns:
            out: Sum of X*W+b  
        '''
        self.X = X
        ############################################
        #                   TODO                   #    
        ############################################
        raise NotImplementedError("Your task...")
        # out = 
        ############################################
        #             END OF YOUR CODE             #
        ############################################
        return out

    def backward(self, dout):
        ''' Restore dimensions before flattening operation
            
        Args:
            dout: Derivation of the local out
    
        Returns:
            dX : Derivation with respect to X
            dW : Derivation with respect to W
            db : Derivation with respect to b
        '''
        ############################################
        #                   TODO                   #    
        ############################################
        raise NotImplementedError("Your task...")
        #dX =
        #dW =
        ############################################
        #             END OF YOUR CODE             #
        ############################################        
        db = np.sum(dout, axis=0)
        return dX, [dW, db]

Testing

Once you've connected many types of layers in a network and you notice an error in your training, it can be difficult to track down which layer exactly has a buggy implementation. Since you're implementing each layer in a modular fashion you can also test them individually. So, it's a good practice to write tests for each of your layers at this point already.

There are properties you know should hold true about the input and output of your layer. In the FullyConnected layer, you may want to test:

  • In the forward pass: which shape should the return value have?
  • In the backward pass: which shape should the derivatives dX, dW and db have?
  • In the backward pass: which shape do you expect from the argument dout?

Exercise: Define Activation Function

First, remember that activation functions are non-linearities added to your architecture. As an example the classic ReLU function is implemented here:

$ f ( x ) = \left\{ \begin{array} { l l } { x } & { \text { if } x > 0 } \\ { 0 } & { \text { otherwise } } \end{array} \right. $

The ReLU function matches the current weight initialization in the fully-connected layer. Note that may have to be changed if you implement other activation functions.

Actually, the activation function belongs into the layer.py, for the sake of clarity, however, the functions are put into a separate file activation_func.py.

Task:

Implement the ReLU class.

class ReLU(object):
    ''' Implements activation function rectified linear unit (ReLU) 
    
    ReLU activation function is defined as the positive part of 
    its argument. 
    (paper: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.165.6419&rep=rep1&type=pdf)
    '''
    def __init__(self):
        self.params = []

    def forward(self, X):
        ''' In the forward pass return the identity for x > 0
        
        Safe input for backprop and forward all values that are above 0.
        '''
        self.X = X
        ############################################
        #                   TODO                   #    
        ############################################
        raise NotImplementedError("Your task...")
        #return        
        ############################################
        #             END OF YOUR CODE             #
        ############################################

    def backward(self, dout):
        ''' Derivative of ReLU
        
        Retruns:
            dX: for all x \elem X <= 0 in forward pass 
                return 0 else x
            []: no gradients on ReLU operation
        '''
        dX = dout.copy()
        ############################################
        #                   TODO                   #    
        ############################################
        raise NotImplementedError("Your task...")
        #dX =
        ############################################
        #             END OF YOUR CODE             #
        ############################################
        return dX, []

NeuralNetwork Class

A NeuralNetwork object connects all layers and activation functions of a model architecture using the forward and backward methods of the containing objects. Calling forward on the NeuralNetwork object will pass a given input through the whole computational graph. The backward function calculates the gradients via backpropagation.

It further creates a global list of all parameters in the network during initialization, which is used later in optimization process.

A predict function implements a foward pass with the application of a given score function at the end of the calculation. At the momennt it is suited for the softmax function, taking only the max argument.

class NeuralNetwork(object):
    ''' Creates a neural network from a given layer architecture 
    
    This class is suited for fully-connected network and
    convolutional neural network architectures. It connects 
    the layers and passes the data from one end to another.
    '''
    def __init__(self, layers, score_func=None):
        ''' Setup a global parameter list and initilize a
            score function that is used for predictions.
        
        Args:
            layer: neural network architecture based on layer and activation function objects
            score_func: function that is used as classifier on the output
        '''
        self.layers = layers
        self.params = []
        for layer in self.layers:
            self.params.append(layer.params)
        self.score_func = score_func

    def forward(self, X):
        ''' Pass input X through all layers in the network 
        '''
        for layer in self.layers:
            X = layer.forward(X)
        return X

    def backward(self, dout):
        grads = []
        ''' Backprop through the network and keep a list of the gradients
            from each layer.
        '''
        for layer in reversed(self.layers):
            dout, grad = layer.backward(dout)
            grads.append(grad)
        return grads

    def predict(self, X):
        ''' Run a forward pass and use the score function to classify 
            the output.
        '''
        X = self.forward(X)
        return np.argmax(self.score_func(X), axis=1)

Exercise: Define a Cost Function

Implementations of different cost functions should be placed into cost_func.py. A cost function object defines the criteria your network is evaluating during the optimization process. Further the class contains score functions that can be used as classification criteria for predictions using a certain model. So it is necessary to provide a cost function object to a optimization algorithm for the training process.

Task:

Implement the softmax method.

class CostCriteria(object):
    ''' Implements different types of loss and score functions for neural networks
    
    Todo:
        - Implement init that defines score and loss function 
    '''
    def softmax(X):
        ''' Numeric stable calculation of softmax
        '''
        ############################################
        #                   TODO                   #    
        ############################################
        raise NotImplementedError("Your task...")
        #return =
        ############################################
        #             END OF YOUR CODE             #
        ############################################        
    
    def cross_entropy_softmax(X, y):
        ''' Computes loss and prepares dout for backprop 

        https://eli.thegreenplace.net/2016/the-softmax-function-and-its-derivative/
        '''
        m = y.shape[0]
        p = CostCriteria.softmax(X)
        log_likelihood = -np.log(p[range(m), y])
        loss = np.sum(log_likelihood) / m
        dout = p.copy()
        dout[range(m), y] -= 1
        return loss, dout

Testing

Softmax turns each row (each sample) into a probability distribution over the output classes. So you may want to test

  • the shape of the return value should contain the same number of samples
  • if each row is a valid probability distribution. So all values should of the return value should be [0..1] and each row should sum to 1.

Optimization with SGD

The file optimizer.py contains implementations of optimization algorithms. Your optimizer needs your custom network, data and loss function and some additional hyperparameter as arguments to optimize your model.

class Optimizer(object):   
    def get_minibatches(X, y, batch_size):
        ''' Decomposes data set into small subsets (batches)
        '''
        m = X.shape[0]
        batches = []
        for i in range(0, m, batch_size):
            X_batch = X[i:i + batch_size, :, :, :]
            y_batch = y[i:i + batch_size, ]
            batches.append((X_batch, y_batch))
        return batches    

    def sgd(network, X_train, y_train, cost_function, batch_size=32, epoch=100, learning_rate=0.001, X_test=None, y_test=None, verbose=None):
        ''' Optimize a given network with stochastical gradient descent
        
        Args:
            X_train: trainings data
            y_train: trainings label (ground truth)
            cost_function: cost function
            batch_size: size of a single batch
            epoch: amount of epochs
            learning_rate: the rate which is going to be multiplied with the gradient
            X_test: trainings data if you want to test your model in each epcoh
            y_test: trainings labels
            verbose: if set it prints out training accuracy and test accuracy
        Returns:
            Model with optimized parameters
        '''
        minibatches = Optimizer.get_minibatches(X_train, y_train, batch_size)
        for i in range(epoch):
            loss = 0
            if verbose:
                print('Epoch',i)
            for X_mini, y_mini in minibatches:
                # calculate loss and derivation of the last layer
                loss, dout = cost_function(network.forward(X_mini), y_mini)
                # Do not train in epoch 0, so we now performance b4 training
                if i > 0:
                    # calculate gradients via backpropagation
                    grads = network.backward(dout)
                    # run vanilla sgd update for all learnable parameters in self.params
                    for param, grad in zip(network.params, reversed(grads)):
                        for i in range(len(grad)):
                            param[i] += - learning_rate * grad[i]
            if verbose:
                train_acc = np.mean(y_train == network.predict(X_train))
                test_acc = np.mean(y_test == network.predict(X_test))                                
                print("Loss = {0} :: Training = {1} :: Test = {2}".format(loss, train_acc, test_acc))
        return network

Put it All Together

Now you have to put all parts together to create and train a fully connected neural network. First, you have to define an individual network architecture by flattening the input and stacking fully-connected layer with activation functions, e.g.:

Input -> Flatten -> Dense -> Activation -> Dense -> Activation -> Dense -> Activation -> Dense

You have to initialize all objects you need to build your custom architecture and put them into a list afterward. Your architecture is then given to a NeuralNetwork object that handles the inter-layer communication during the forward and backward pass. It will also set the evaluation criteria at the end of the network, because of that you end your architecture with a fully-connected layer. The pipeline above is implemented in the following cell.

Finally, you can train the model with an optimization algorithm and a cost function, here stochastic gradient descent and cross-entropy with softmax. That kind of pipeline is similar to the one you would create with a more sophisticated framework like Tensorflow or PyTorch.

# design a three hidden layer architecture with Dense-Layer
# and ReLU as activation function
def fcn_mnist():
    flat = Flatten()
    hidden_01 = FullyConnected(784, 500)
    relu_01 = ReLU()
    hidden_02 = FullyConnected(500, 200)
    relu_02 = ReLU()
    hidden_03 = FullyConnected(200, 100)
    relu_03 = ReLU()
    ouput = FullyConnected(100, 10)
    return [flat, hidden_01, relu_01, hidden_02, relu_02, hidden_03, relu_03, ouput]

# create a neural network on specified architecture with softmax as score function
fcn = NeuralNetwork(fcn_mnist(), score_func=CostCriteria.softmax)

# optimize the network and a softmax loss
fcn = Optimizer.sgd(fcn, train_images, train_labels, CostCriteria.cross_entropy_softmax, batch_size=64, epoch=10, learning_rate=0.01, X_test=test_images, y_test=test_labels, verbose=True)

Exercise: Experiment with the Framework

Here is your last exercise for this notebook:

Now you have a basic idea of how to build a fully-connected neural network with the framework. The next steps are straight forward. Download all script files of the framework from Moodle or the exercise-repository and move the implemented functions into the correct script files:

  • layer.py
  • activation_func.py
  • cost_func

After chose a dataset you like and create a data loader in the script file utils.py. Load your data, build a neural network and try to build a good classifier. Have fun!

Licenses

Notebook License (CC-BY-SA 4.0)

The following license applies to the complete notebook, including code cells. It does however not apply to any referenced external media (e.g., images).

Neural Networks - Exercise: Simple MNIST Network
by Benjamin Voigt
is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on a work at https://gitlab.com/deep.TEACHING.

Code License (MIT)

The following license only applies to code cells of the notebook.

Copyright 2018 Benjamin Voigt

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.