The past few weeks, I have been experimenting with the latest-and-greatest deep learning networks, all written in python, to decide which framework I could dive into an become an expert in. After looking at hebel, keras, chainer, and Lasagne, I decided to go with Lasagne because of the documentation and tutorials available online. The other frameworks are great, it just seemed like Lasagne currently has the most tutorials and the best docs.

In this blog post, I am going to show you how to use the Lasagne framework to train a neural network on the MNIST database. In later blog posts, I am going to use Lasagne to solve a variety of deep learning problems in natural language processing and computer vision.

All of my work was done on an GPU g2.8xlarge rented on Amazon AWS. Specifically, I was able to utilize the following AMI to do this work: ami-55deaf30.

To start with, I need to load the MNIST database into a format that Lasagne accepts, which happens to be numpy matrices. Luckily, I did not need tow write much code here because mnielsen did most of the work for me. I did end up writing a single method that utilized his code:

def load(testing=False):
tr_d, va_d, te_d = load_data()
if not testing:
training_inputs = [np.reshape(x, (784, 1)) for x in tr_d[0]]
training_results = [vectorized_result(y) for y in tr_d[1]]
X,y = np.array(training_inputs),np.array(training_results)
X = np.reshape(X,(50000, 784))
y = np.reshape(y,(50000, 10))
X = X.astype(np.float32)
y = y.astype(np.float32)
return (X,y)
else:
raise
    
"""
mnist_loader
~~~~~~~~~~~~

A library to load the MNIST image data. For details of the data
structures that are returned, see the doc strings for ``load_data``
and ``load_data_wrapper``. In practice, ``load_data_wrapper`` is the
function usually called by our neural network code.
"""


#### Libraries
# Standard library
import cPickle
import gzip

# Third-party libraries
import numpy as np

def load_data():
"""Return the MNIST data as a tuple containing the training data,
the validation data, and the test data.

The ``training_data`` is returned as a tuple with two entries.
The first entry contains the actual training images. This is a
numpy ndarray with 50,000 entries. Each entry is, in turn, a
numpy ndarray with 784 values, representing the 28 * 28 = 784
pixels in a single MNIST image.

The second entry in the ``training_data`` tuple is a numpy ndarray
containing 50,000 entries. Those entries are just the digit
values (0...9) for the corresponding images contained in the first
entry of the tuple.

The ``validation_data`` and ``test_data`` are similar, except
each contains only 10,000 images.

This is a nice data format, but for use in neural networks it's
helpful to modify the format of the ``training_data`` a little.
That's done in the wrapper function ``load_data_wrapper()``, see
below.
"""

f = gzip.open('./data/mnist.pkl.gz', 'rb')
training_data, validation_data, test_data = cPickle.load(f)
f.close()
return (training_data, validation_data, test_data)

def load_data_wrapper():
"""Return a tuple containing ``(training_data, validation_data,
test_data)``. Based on ``load_data``, but the format is more
convenient for use in our implementation of neural networks.

In particular, ``training_data`` is a list containing 50,000
2-tuples ``(x, y)``. ``x`` is a 784-dimensional numpy.ndarray
containing the input image. ``y`` is a 10-dimensional
numpy.ndarray representing the unit vector corresponding to the
correct digit for ``x``.

``validation_data`` and ``test_data`` are lists containing 10,000
2-tuples ``(x, y)``. In each case, ``x`` is a 784-dimensional
numpy.ndarry containing the input image, and ``y`` is the
corresponding classification, i.e., the digit values (integers)
corresponding to ``x``.

Obviously, this means we're using slightly different formats for
the training data and the validation / test data. These formats
turn out to be the most convenient for use in our neural network
code."""

tr_d, va_d, te_d = load_data()
training_inputs = [np.reshape(x, (784, 1)) for x in tr_d[0]]
training_results = [vectorized_result(y) for y in tr_d[1]]
training_data = zip(training_inputs, training_results)
validation_inputs = [np.reshape(x, (784, 1)) for x in va_d[0]]
validation_data = zip(validation_inputs, va_d[1])
test_inputs = [np.reshape(x, (784, 1)) for x in te_d[0]]
test_data = zip(test_inputs, te_d[1])
return (training_data, validation_data, test_data)

def load(testing=False):
tr_d, va_d, te_d = load_data()
if not testing:
training_inputs = [np.reshape(x, (784, 1)) for x in tr_d[0]]
training_results = [vectorized_result(y) for y in tr_d[1]]
X,y = np.array(training_inputs),np.array(training_results)
X = np.reshape(X,(50000, 784))
y = np.reshape(y,(50000, 10))
X = X.astype(np.float32)
y = y.astype(np.float32)
return (X,y)
else:
raise

def vectorized_result(j):
"""Return a 10-dimensional unit vector with a 1.0 in the jth
position and zeroes elsewhere. This is used to convert a digit
(0...9) into a corresponding desired output from the neural
network."""

e = np.zeros((10, 1))
e[j] = 1.0
return e
from lasagne import layers;
from lasagne.updates import nesterov_momentum;
from nolearn.lasagne import NeuralNet;
import numpy as np;

net1 = NeuralNet(
layers=[ # three layers: one hidden layer
('input', layers.InputLayer),
('hidden', layers.DenseLayer),
('output', layers.DenseLayer),
],
# layer parameters:
input_shape=(None, 784), # 96x96 input pixels per batch
hidden_num_units=100, # number of units in hidden layer
output_nonlinearity=None, # output layer uses identity function
output_num_units=10, # 30 target values

# optimization method:
update=nesterov_momentum,
update_learning_rate=0.01,
update_momentum=0.9,

regression=True, # flag to indicate we're dealing with regression problem
max_epochs=400, # we want to train this many epochs
verbose=1,
)


X,y = load()
net1.fit(X, y)
# Neural Network with 79510 learnable parameters

## Layer information

# name size
--- ------ ------
0 input 784
1 hidden 100
2 output 10

epoch train loss valid loss train/val dur
------- ------------ ------------ ----------- -----
1 0.06673 0.04498 1.48367 0.48s
2 0.04156 0.03616 1.14935 0.41s
3 0.03523 0.03184 1.10660 0.41s
4 0.03165 0.02910 1.08774 0.41s
5 0.02921 0.02713 1.07678 0.43s
6 0.02738 0.02560 1.06937 0.44s
7 0.02593 0.02437 1.06394 0.44s
8 0.02475 0.02336 1.05926 0.44s
9 0.02376 0.02251 1.05548 0.42s
10 0.02293 0.02179 1.05204 0.42s

......

391 0.00703 0.00939 0.74809 0.44s
392 0.00702 0.00939 0.74772 0.41s
393 0.00702 0.00939 0.74724 0.42s
394 0.00701 0.00939 0.74683 0.43s
395 0.00700 0.00938 0.74634 0.42s
396 0.00700 0.00938 0.74590 0.42s
397 0.00699 0.00938 0.74554 0.43s
398 0.00699 0.00938 0.74508 0.45s
399 0.00698 0.00938 0.74453 0.43s
400 0.00698 0.00937 0.74412 0.41s

TODO - add evaluation stuff here …