Neural Network Framework - Exercise: Fully Connected Network
Table of Contents
Introduction
For a better understanding of neural networks, you will start to implement a framework on your own. The given notebook explains some core functions and concepts of the framework, so all of you have the same starting point. Our previous exercises were self-contained and not very modular. You are going to change that. Let us begin with a fully connected network on the now well-known MNIST dataset. The Pipeline will be:
- Define a model architecture
- Construct a neural network based on the architecture
- Define an evaluation criteria
- Optimize the model (training)
Read the whole notebook carefully to understand how the pipeline works even if there is no specific implementation work required from you.
Requirements
Knowledge
TODO
Python-Modules
# third party
import numpy as np
from deep_teaching_commons.data.fundamentals.mnist import Mnist
Data
We load the MNIST dataset. Have a look at the data structure that is necessary to use feed data into the framework. A batch is a 4d tensor with: (image_i, channel, width, height).
# create mnist loader from deep_teaching_commons
mnist_loader = Mnist(data_dir='data/MNIST')
# load all data, labels are not one-hot-encoded, images are flatten and pixel squashed between [0,1]
train_images, train_labels, test_images, test_labels = mnist_loader.get_all_data(flatten=False, one_hot_enc=False, normalized=True)
print(train_images.shape, train_labels.shape)
# reshape to match general framework architecture
train_images, test_images = train_images.reshape(60000, 1, 28, 28), test_images.reshape(10000, 1, 28, 28)
print(train_images.shape, train_labels.shape)
# shuffle training data
shuffle_index = np.random.permutation(60000)
train_images, train_labels = train_images[shuffle_index], train_labels[shuffle_index]
Towards a Neural Network Framework
To create custom models you have to be able to define layers and activation functions in a modular way. Layers and activation functions are therefore modelled as objects. Each object that you want to use has to implement a forward
and a backward
method that is used later by the NeuralNetwork
class. Additionally the self.params
attribute is mandatory to meet the specification of the NeuralNetwork
class. It is used to store all learnable parameters that you need for the optimization algorithm. Implemented that way you can use the objects as building blocks and stack them up to create a custom model. Be aware of using an activation function after each layer except the last one, cause the softmax function is applied by default during loss calculation of the network output.
After completing this notebook you can move the implemented functions to the script files for further development. The framework consists of the following files:
layer.py
activation_func.py
network.py
cost_func
optimizer.py
utils.py
After processing the notebook, it certainly becomes clear which functionality belongs into which file.
Exercise: Define Layers
The first layers added to the framework are a flatten and a fully-connected layer, which we need to build an architecture for the corresponding fully connected network — sidenote sometimes, depending on the framework, the term dense layer is used instead of fully-connected.
All kind of neural network layers and regularization techniques that can be inserted as layers into a architecture will be implemented in the file layer.py
later.
Task:
Implement the methods FullyConnected.forward
and FullyConnected.backward
.
class Flatten(object):
''' Flatten layer used to reshape inputs into vector representation
Layer should be used in the forward pass before a dense layer to
transform a given tensor into a vector.
'''
def __init__(self):
self.params = []
def forward(self, X):
''' Reshapes a n-dim representation into a vector
by preserving the number of input rows.
Examples:
[10000,[1,28,28]] -> [10000,784]
'''
self.X_shape = X.shape
self.out_shape = (self.X_shape[0], -1)
out = X.reshape(-1).reshape(self.out_shape)
return out
def backward(self, dout):
''' Restore dimensions before flattening operation
'''
out = dout.reshape(self.X_shape)
return out, []
class FullyConnected(object):
''' Fully connected layer implemtenting linear function hypothesis
in the forward pass and its derivation in the backward pass.
'''
def __init__(self, in_size, out_size):
''' initialize all learning parameters in the layer
Weights will be initialized with modified Xavier initialization.
Biases will be initialized with zero.
'''
self.W = np.random.randn(in_size, out_size) * np.sqrt(2. / in_size)
self.b = np.zeros((1, out_size))
self.params = [self.W, self.b]
def forward(self, X):
''' Linear combiationn of images, weights and bias terms
Args:
X: Matrix of images (flatten represenation)
Returns:
out: Sum of X*W+b
'''
self.X = X
############################################
# TODO #
############################################
raise NotImplementedError("Your task...")
# out =
############################################
# END OF YOUR CODE #
############################################
return out
def backward(self, dout):
''' Restore dimensions before flattening operation
Args:
dout: Derivation of the local out
Returns:
dX : Derivation with respect to X
dW : Derivation with respect to W
db : Derivation with respect to b
'''
############################################
# TODO #
############################################
raise NotImplementedError("Your task...")
#dX =
#dW =
############################################
# END OF YOUR CODE #
############################################
db = np.sum(dout, axis=0)
return dX, [dW, db]
Testing
Once you've connected many types of layers in a network and you notice an error in your training, it can be difficult to track down which layer exactly has a buggy implementation. Since you're implementing each layer in a modular fashion you can also test them individually. So, it's a good practice to write tests for each of your layers at this point already.
There are properties you know should hold true about the input and output of your layer. In the FullyConnected
layer, you may want to test:
- In the forward pass: which shape should the return value have?
- In the backward pass: which shape should the derivatives
dX
,dW
anddb
have? - In the backward pass: which shape do you expect from the argument
dout
?
Exercise: Define Activation Function
First, remember that activation functions are non-linearities added to your architecture. As an example the classic ReLU function is implemented here:
$ f ( x ) = \left\{ \begin{array} { l l } { x } & { \text { if } x > 0 } \\ { 0 } & { \text { otherwise } } \end{array} \right. $
The ReLU function matches the current weight initialization in the fully-connected layer. Note that may have to be changed if you implement other activation functions.
Actually, the activation function belongs into the layer.py
, for the sake of clarity, however, the functions are put into a separate file activation_func.py
.
Task:
Implement the ReLU
class.
class ReLU(object):
''' Implements activation function rectified linear unit (ReLU)
ReLU activation function is defined as the positive part of
its argument.
(paper: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.165.6419&rep=rep1&type=pdf)
'''
def __init__(self):
self.params = []
def forward(self, X):
''' In the forward pass return the identity for x > 0
Safe input for backprop and forward all values that are above 0.
'''
self.X = X
############################################
# TODO #
############################################
raise NotImplementedError("Your task...")
#return
############################################
# END OF YOUR CODE #
############################################
def backward(self, dout):
''' Derivative of ReLU
Retruns:
dX: for all x \elem X <= 0 in forward pass
return 0 else x
[]: no gradients on ReLU operation
'''
dX = dout.copy()
############################################
# TODO #
############################################
raise NotImplementedError("Your task...")
#dX =
############################################
# END OF YOUR CODE #
############################################
return dX, []
NeuralNetwork Class
A NeuralNetwork object
connects all layers and activation functions of a model architecture using the forward
and backward
methods of the containing objects. Calling forward
on the NeuralNetwork object
will pass a given input through the whole computational graph. The backward
function calculates the gradients via backpropagation.
It further creates a global list of all parameters in the network during initialization, which is used later in optimization process.
A predict
function implements a foward pass with the application of a given score function at the end of the calculation. At the momennt it is suited for the softmax function, taking only the max argument.
class NeuralNetwork(object):
''' Creates a neural network from a given layer architecture
This class is suited for fully-connected network and
convolutional neural network architectures. It connects
the layers and passes the data from one end to another.
'''
def __init__(self, layers, score_func=None):
''' Setup a global parameter list and initilize a
score function that is used for predictions.
Args:
layer: neural network architecture based on layer and activation function objects
score_func: function that is used as classifier on the output
'''
self.layers = layers
self.params = []
for layer in self.layers:
self.params.append(layer.params)
self.score_func = score_func
def forward(self, X):
''' Pass input X through all layers in the network
'''
for layer in self.layers:
X = layer.forward(X)
return X
def backward(self, dout):
grads = []
''' Backprop through the network and keep a list of the gradients
from each layer.
'''
for layer in reversed(self.layers):
dout, grad = layer.backward(dout)
grads.append(grad)
return grads
def predict(self, X):
''' Run a forward pass and use the score function to classify
the output.
'''
X = self.forward(X)
return np.argmax(self.score_func(X), axis=1)
Exercise: Define a Cost Function
Implementations of different cost functions should be placed into cost_func.py
. A cost function object defines the criteria your network is evaluating during the optimization process. Further the class contains score functions that can be used as classification criteria for predictions using a certain model. So it is necessary to provide a cost function object to a optimization algorithm for the training process.
Task:
Implement the softmax
method.
class CostCriteria(object):
''' Implements different types of loss and score functions for neural networks
Todo:
- Implement init that defines score and loss function
'''
def softmax(X):
''' Numeric stable calculation of softmax
'''
############################################
# TODO #
############################################
raise NotImplementedError("Your task...")
#return =
############################################
# END OF YOUR CODE #
############################################
def cross_entropy_softmax(X, y):
''' Computes loss and prepares dout for backprop
https://eli.thegreenplace.net/2016/the-softmax-function-and-its-derivative/
'''
m = y.shape[0]
p = CostCriteria.softmax(X)
log_likelihood = -np.log(p[range(m), y])
loss = np.sum(log_likelihood) / m
dout = p.copy()
dout[range(m), y] -= 1
return loss, dout
Testing
Softmax turns each row (each sample) into a probability distribution over the output classes. So you may want to test
- the shape of the return value should contain the same number of samples
- if each row is a valid probability distribution. So all values should of the return value should be [0..1] and each row should sum to 1.
Optimization with SGD
The file optimizer.py
contains implementations of optimization algorithms. Your optimizer needs your custom network
, data
and loss function
and some additional hyperparameter as arguments to optimize your model.
class Optimizer(object):
def get_minibatches(X, y, batch_size):
''' Decomposes data set into small subsets (batches)
'''
m = X.shape[0]
batches = []
for i in range(0, m, batch_size):
X_batch = X[i:i + batch_size, :, :, :]
y_batch = y[i:i + batch_size, ]
batches.append((X_batch, y_batch))
return batches
def sgd(network, X_train, y_train, cost_function, batch_size=32, epoch=100, learning_rate=0.001, X_test=None, y_test=None, verbose=None):
''' Optimize a given network with stochastical gradient descent
Args:
X_train: trainings data
y_train: trainings label (ground truth)
cost_function: cost function
batch_size: size of a single batch
epoch: amount of epochs
learning_rate: the rate which is going to be multiplied with the gradient
X_test: trainings data if you want to test your model in each epcoh
y_test: trainings labels
verbose: if set it prints out training accuracy and test accuracy
Returns:
Model with optimized parameters
'''
minibatches = Optimizer.get_minibatches(X_train, y_train, batch_size)
for i in range(epoch):
loss = 0
if verbose:
print('Epoch',i)
for X_mini, y_mini in minibatches:
# calculate loss and derivation of the last layer
loss, dout = cost_function(network.forward(X_mini), y_mini)
# Do not train in epoch 0, so we now performance b4 training
if i > 0:
# calculate gradients via backpropagation
grads = network.backward(dout)
# run vanilla sgd update for all learnable parameters in self.params
for param, grad in zip(network.params, reversed(grads)):
for i in range(len(grad)):
param[i] += - learning_rate * grad[i]
if verbose:
train_acc = np.mean(y_train == network.predict(X_train))
test_acc = np.mean(y_test == network.predict(X_test))
print("Loss = {0} :: Training = {1} :: Test = {2}".format(loss, train_acc, test_acc))
return network
Put it All Together
Now you have to put all parts together to create and train a fully connected neural network. First, you have to define an individual network architecture by flattening the input and stacking fully-connected layer with activation functions, e.g.:
Input -> Flatten -> Dense -> Activation -> Dense -> Activation -> Dense -> Activation -> Dense
You have to initialize all objects you need to build your custom architecture and put them into a list
afterward. Your architecture is then given to a NeuralNetwork
object that handles the inter-layer communication during the forward and backward pass. It will also set the evaluation criteria at the end of the network, because of that you end your architecture with a fully-connected layer. The pipeline above is implemented in the following cell.
Finally, you can train the model with an optimization algorithm and a cost function, here stochastic gradient descent and cross-entropy with softmax. That kind of pipeline is similar to the one you would create with a more sophisticated framework like Tensorflow or PyTorch.
# design a three hidden layer architecture with Dense-Layer
# and ReLU as activation function
def fcn_mnist():
flat = Flatten()
hidden_01 = FullyConnected(784, 500)
relu_01 = ReLU()
hidden_02 = FullyConnected(500, 200)
relu_02 = ReLU()
hidden_03 = FullyConnected(200, 100)
relu_03 = ReLU()
ouput = FullyConnected(100, 10)
return [flat, hidden_01, relu_01, hidden_02, relu_02, hidden_03, relu_03, ouput]
# create a neural network on specified architecture with softmax as score function
fcn = NeuralNetwork(fcn_mnist(), score_func=CostCriteria.softmax)
# optimize the network and a softmax loss
fcn = Optimizer.sgd(fcn, train_images, train_labels, CostCriteria.cross_entropy_softmax, batch_size=64, epoch=10, learning_rate=0.01, X_test=test_images, y_test=test_labels, verbose=True)
Exercise: Experiment with the Framework
Here is your last exercise for this notebook:
Now you have a basic idea of how to build a fully-connected neural network with the framework. The next steps are straight forward. Download all script files of the framework from Moodle or the exercise-repository and move the implemented functions into the correct script files:
layer.py
activation_func.py
cost_func
After chose a dataset you like and create a data loader in the script file utils.py
. Load your data, build a neural network and try to build a good classifier. Have fun!
Licenses
Notebook License (CC-BY-SA 4.0)
The following license applies to the complete notebook, including code cells. It does however not apply to any referenced external media (e.g., images).
Neural Networks - Exercise: Simple MNIST Network
by Benjamin Voigt
is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
Based on a work at https://gitlab.com/deep.TEACHING.
Code License (MIT)
The following license only applies to code cells of the notebook.
Copyright 2018 Benjamin Voigt
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.