Neural Networks - Exercise: Simple MNIST Network

Introduction
Requirements
- Modules
Data
Simple MNIST Network
- Exercise - Understanding an Implementation
- Exercise - Step towards a NN-Framework
Licenses

Introduction

In this exercise, you will analyze an existing implementation of a neural network to recognise handwritten digits. It is an adaptation of the network presented by Michael Nielsen in his book Neural Networks and Deep Learning. The aim is to identify and understand how the different components that make up the network interact. With this overview of the network, you will refactor the script-like approach into a more modular implementation that clearly distinguishes between forward pass, computation of loss, backward pass and training.

Requirements

Knowledge

Read at least Chapter 1 of Michael Nielsen's Neural Networks and Deep Learning. (NIE15)

Python-Modules

# third party
import numpy as np
import matplotlib.pyplot as plt

# internal
from deep_teaching_commons.data.fundamentals.mnist import Mnist

Data

# create mnist loader from deep_teaching_commons
mnist_loader = Mnist(data_dir='data')

# load all data, labels are one-hot-encoded, images are flatten and pixel squashed between [0,1]
train_images, train_labels, test_images, test_labels = mnist_loader.get_all_data(one_hot_enc=True, normalized=True)

# shuffle training data
shuffle_index = np.random.permutation(60000)
train_images, train_labels = train_images[shuffle_index], train_labels[shuffle_index]

Simple MNIST Network

The presented network is an adaptation of Michael Nielson's introductory example to neural networks. It is recommended, though not necessary, to read the first two chapters of his great online book 'Neural Networks and Deep Learning' for a better understanding of the given example. Compared to the original by Nielsen, the present variant was vectorized and the sigmoid activation function replaced by a rectified linear unit function (ReLU). As a result, the code is written much more compact, and the optimization of the model is much more efficient.

delta_hist =[]

def feed_forward(X, weights):
    a = [X]
    for w in weights:
        a.append(np.maximum(a[-1].dot(w),0))
    return a

def grads(X, Y, weights):
    grads = np.empty_like(weights)
    a = feed_forward(X, weights)
    # https://brilliant.org/wiki/backpropagation/ or https://stats.stackexchange.com/questions/154879/a-list-of-cost-functions-used-in-neural-networks-alongside-applications
    delta = a[-1] - Y
    delta_hist.append(np.sum(delta*Y)/len(X))
    grads[-1] = a[-2].T.dot(delta)
    for i in range(len(a)-2, 0, -1):
        delta = (a[i] > 0) * delta.dot(weights[i].T)
        grads[i-1] = a[i-1].T.dot(delta)
    return grads / len(X)

trX, trY, teX, teY = train_images, train_labels, test_images, test_labels
weights = [np.random.randn(*w) * 0.1 for w in [(784, 200), (200,100), (100, 10)]]
num_epochs, batch_size, learn_rate = 20, 50, 0.1
for i in range(num_epochs):
    for j in range(0, len(trX), batch_size):
        X, Y = trX[j:j+batch_size], trY[j:j+batch_size]
        weights -= learn_rate * grads(X, Y, weights)
        once = False
    prediction_test = np.argmax(feed_forward(teX, weights)[-1], axis=1)
    print (i, np.mean(prediction_test == np.argmax(teY, axis=1)))

Exercise - Understanding an Implementation

Your goal is to understand how the implementation works. Therefore you can do the following:

Task:

Plot delta_hist, which stores the delta value calculated on the output layer during each iteration
Add comments to functions and lines of code. Follow the Google-Python guidelines or similar for comments.
Add an argument verbose (boolean) to the function. When set to true, add meaningful print lines to the network.

Answer the Questions

After working through the implementation, try and answer the following questions:

Which cost function is used, what is its derivative and how is it implemented?
Why are the boundaries of your plot between [-1,0], why is it so noisy, how can you reduce the noise and what is the difference to a usual plot of a loss function?
How does the network implement the backpropagation algorithm?

Exercise - Step towards a NN-Framework

The presented implementation is compact and efficient, but hard to modify or extend. However, a modular design is crucial if you want to experiment with a neural network to understand the influence of its components. Now you make the first changes towards your own 'toy-neural-network-framework', which you should expand in the progress of the course.

Rework the implementation from above given the classes and methods below. Again, you do not have to re-engineer the whole neural network in this step. Rework the code to match the given specification and do necessary modifications only. For your understanding, you can change the names of the variables to more fitting ones.

class FullyConnectedNetwork:
    def __init__(self, layers):
        raise NotImplementedError("This is your duty")
        
    def forward(self, data):
        raise NotImplementedError("This is your duty")

    def backward(self, X, Y):
        raise NotImplementedError("This is your duty")

    def predict(self, data):
        raise NotImplementedError("This is your duty")
            
class Optimizer:
    def __init__(self, network, train_data, train_labels, test_data=None, test_labels=None, epochs=100, batch_size=20, learning_rate=0.01):
        raise NotImplementedError("This is your duty")
        
    def sgd(self):
        raise NotImplementedError("This is your duty")

    
# Following code should run:    
mnist_NN = FullyConnectedNetwork([(784, 200),(200,100),(100, 10)]) 
epochs, batch_size, learning_rate = 20, 500, 0.1
Optimizer(mnist_NN, train_images, train_labels, test_images, test_labels, epochs, batch_size, learning_rate)
plt.plot(mnist_NN.delta_hist)

Licenses

Literature

Notebook License (CC-BY-SA 4.0)

The following license applies to the complete notebook, including code cells. It does however not apply to any referenced external media (e.g., images).

Neural Networks - Exercise: Simple MNIST Network
by Benjamin Voigt
is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Code License (MIT)

The following license only applies to code cells of the notebook.

Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.