Neural Networks: Street View Housing Number Digit Recognition¶

Welcome to the project on classification using Artificial Neural Networks. We will work with the Street View Housing Numbers (SVHN) image dataset for this project.


Context:¶


One of the most interesting tasks in deep learning is to recognize objects in natural scenes. The ability to process visual information using machine learning algorithms can be very useful as demonstrated in various applications.

The SVHN dataset contains over 600,000 labeled digits cropped from street-level photos. It is one of the most popular image recognition datasets. It has been used in neural networks created by Google to improve the map quality by automatically transcribing the address numbers from a patch of pixels. The transcribed number with a known street address helps pinpoint the location of the building it represents.


Objective:¶


To build a feed-forward neural network model that can recognize the digits in the images.


Dataset¶


Here, we will use a subset of the original data to save some computation time. The dataset is provided as a .h5 file. The basic preprocessing steps have been applied on the dataset.

Mount the drive¶

Let us start by mounting the Google drive. You can run the below cell to mount the Google drive.

In [1]:
from google.colab import drive
drive.mount('/content/drive')
Mounted at /content/drive

Importing the necessary libraries¶

In [ ]:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

from sklearn.model_selection import train_test_split
from sklearn.preprocessing import MinMaxScaler

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense, Dropout, Activation, BatchNormalization
from tensorflow.keras.utils import to_categorical

Let us check the version of tensorflow.

In [ ]:
print(tf.__version__)
2.8.0

Load the dataset¶

  • Let us now load the dataset that is available as a .h5 file.
  • Split the data into train and the test dataset.
In [ ]:
import h5py

# Open the file as read only
# User can make changes in the path as required

h5f = h5py.File('/content/drive/MyDrive/SVHN_single_grey1.h5', 'r')

# Load the training and the test set
X_train = h5f['X_train'][:]
y_train = h5f['y_train'][:]
X_test = h5f['X_test'][:]
y_test = h5f['y_test'][:]


# Close this file
h5f.close()

Let's check the number of images in the training and the testing dataset.

In [ ]:
len(X_train), len(X_test)
Out[ ]:
(42000, 18000)

Observations:

  • There are 42,000 images in the training data and 18,000 images in the testing data.

Visualizing images¶

  • Use X_train to visualize the first 10 images.
  • Use Y_train to print the first 10 labels.
In [ ]:
# Visualizing the first 10 images in the dataset and their labels
plt.figure(figsize=(10, 1))

for i in range(10):
    plt.subplot(1, 10, i+1)
    plt.imshow(X_train[i], cmap="gray")
    plt.axis('off')

plt.show()

print('label for each of the above image: %s' % (y_train[0:10]))
No description has been provided for this image
label for each of the above image: [2 6 7 4 4 0 3 0 7 3]

Data preparation¶

  • Print the shape and the array of pixels for the first image in the training dataset.
  • Reshape the train and the test dataset because we always have to give a 4D array as input to CNNs.
  • Normalize the train and the test dataset by dividing by 255.
  • Print the new shapes of the train and the test dataset.
  • One-hot encode the target variable.
In [ ]:
# Shape and the array of pixels for the first image
print("Shape:", X_train[0].shape)
print()
print("First image:\n", X_train[0])
Shape: (32, 32)

First image:
 [[ 33.0704  30.2601  26.852  ...  71.4471  58.2204  42.9939]
 [ 25.2283  25.5533  29.9765 ... 113.0209 103.3639  84.2949]
 [ 26.2775  22.6137  40.4763 ... 113.3028 121.775  115.4228]
 ...
 [ 28.5502  36.212   45.0801 ...  24.1359  25.0927  26.0603]
 [ 38.4352  26.4733  23.2717 ...  28.1094  29.4683  30.0661]
 [ 50.2984  26.0773  24.0389 ...  49.6682  50.853   53.0377]]
In [ ]:
# Reshaping the dataset to flatten them. We are reshaping the 2D image into 1D array
X_train = X_train.reshape(X_train.shape[0], 1024)
X_test = X_test.reshape(X_test.shape[0], 1024)
In [ ]:
# Normalize inputs from 0-255 to 0-1
X_train = X_train / 255.0
X_test = X_test / 255.0
In [ ]:
# New shape
print('Training set:', X_train.shape, y_train.shape)
print('Test set:', X_test.shape, y_test.shape)
Training set: (42000, 1024) (42000,)
Test set: (18000, 1024) (18000,)
In [ ]:
# One-hot encode output
y_train = to_categorical(y_train)
y_test = to_categorical(y_test)

# No.of classes
y_test
Out[ ]:
array([[0., 1., 0., ..., 0., 0., 0.],
       [0., 0., 0., ..., 1., 0., 0.],
       [0., 0., 1., ..., 0., 0., 0.],
       ...,
       [0., 0., 0., ..., 1., 0., 0.],
       [0., 0., 0., ..., 0., 0., 1.],
       [0., 0., 1., ..., 0., 0., 0.]], dtype=float32)

Observations:

  • Notice that each entry of the target variable is a one-hot encoded vector instead of a single label.

Model Building¶

Now, we have done the data preprocessing, let's build an ANN model.

In [ ]:
# Fixing the seed for random number generators
np.random.seed(42)

import random
random.seed(42)

tf.random.set_seed(42)

Model Architecture¶

  • Build an sequential model with the following architecture:

  • First hidden layer with 64 nodes and the relu activation and the input shape = (1024, )

  • Second hidden layer with 32 nodes and the relu activation

  • Output layer with activation as 'softmax' and number of nodes equal to the number of classes, i.e., 10

  • Compile the model with the loss equal to categorical_crossentropy, optimizer equal to Adam(learning_rate = 0.001), and metric equal to 'accuracy'.

  • Print the summary of the model.

  • Fit on the train data with a validation split of 0.2, batch size = 128, verbose = 1, and epochs = 20. Store the model building history to use later for visualization.

In [ ]:
# Define the model
from tensorflow.keras import losses
from tensorflow.keras import optimizers

# Create model
model1 = Sequential()

model1.add(Dense(64, activation='relu', input_shape = (1024,)))
model1.add(Dense(32, activation='relu'))
model1.add(Dense(10, activation='softmax'))

# Compile the model
adam = optimizers.Adam(learning_rate=0.001)
model1.compile(loss=losses.categorical_crossentropy, optimizer=adam, metrics=['accuracy'])
In [ ]:
# Model summary
model1.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 64)                65600     
                                                                 
 dense_1 (Dense)             (None, 32)                2080      
                                                                 
 dense_2 (Dense)             (None, 10)                330       
                                                                 
=================================================================
Total params: 68,010
Trainable params: 68,010
Non-trainable params: 0
_________________________________________________________________

Observations:

  • The model has 68,010 parameters.
  • All the parameters are trainable.
In [ ]:
# Fit the model
history_model_1 = model1.fit(X_train, y_train, validation_split=0.2, epochs=20, batch_size=128, verbose=1)
Epoch 1/20
263/263 [==============================] - 4s 4ms/step - loss: 2.3008 - accuracy: 0.1144 - val_loss: 2.2670 - val_accuracy: 0.1352
Epoch 2/20
263/263 [==============================] - 1s 3ms/step - loss: 2.1307 - accuracy: 0.2299 - val_loss: 1.9434 - val_accuracy: 0.3024
Epoch 3/20
263/263 [==============================] - 1s 3ms/step - loss: 1.8054 - accuracy: 0.3639 - val_loss: 1.6951 - val_accuracy: 0.4092
Epoch 4/20
263/263 [==============================] - 1s 3ms/step - loss: 1.6367 - accuracy: 0.4389 - val_loss: 1.5765 - val_accuracy: 0.4687
Epoch 5/20
263/263 [==============================] - 1s 3ms/step - loss: 1.5370 - accuracy: 0.4800 - val_loss: 1.4827 - val_accuracy: 0.5077
Epoch 6/20
263/263 [==============================] - 1s 3ms/step - loss: 1.4697 - accuracy: 0.5068 - val_loss: 1.4335 - val_accuracy: 0.5269
Epoch 7/20
263/263 [==============================] - 1s 3ms/step - loss: 1.4359 - accuracy: 0.5207 - val_loss: 1.4063 - val_accuracy: 0.5386
Epoch 8/20
263/263 [==============================] - 1s 3ms/step - loss: 1.4061 - accuracy: 0.5334 - val_loss: 1.3790 - val_accuracy: 0.5501
Epoch 9/20
263/263 [==============================] - 1s 3ms/step - loss: 1.3800 - accuracy: 0.5468 - val_loss: 1.3579 - val_accuracy: 0.5590
Epoch 10/20
263/263 [==============================] - 1s 3ms/step - loss: 1.3612 - accuracy: 0.5553 - val_loss: 1.3392 - val_accuracy: 0.5688
Epoch 11/20
263/263 [==============================] - 1s 3ms/step - loss: 1.3399 - accuracy: 0.5690 - val_loss: 1.3638 - val_accuracy: 0.5569
Epoch 12/20
263/263 [==============================] - 1s 3ms/step - loss: 1.3186 - accuracy: 0.5778 - val_loss: 1.3065 - val_accuracy: 0.5879
Epoch 13/20
263/263 [==============================] - 1s 3ms/step - loss: 1.3025 - accuracy: 0.5879 - val_loss: 1.3119 - val_accuracy: 0.5818
Epoch 14/20
263/263 [==============================] - 1s 3ms/step - loss: 1.2831 - accuracy: 0.5951 - val_loss: 1.2772 - val_accuracy: 0.6001
Epoch 15/20
263/263 [==============================] - 1s 3ms/step - loss: 1.2642 - accuracy: 0.6021 - val_loss: 1.2839 - val_accuracy: 0.5877
Epoch 16/20
263/263 [==============================] - 1s 3ms/step - loss: 1.2583 - accuracy: 0.6060 - val_loss: 1.2592 - val_accuracy: 0.6062
Epoch 17/20
263/263 [==============================] - 1s 3ms/step - loss: 1.2459 - accuracy: 0.6103 - val_loss: 1.2521 - val_accuracy: 0.6061
Epoch 18/20
263/263 [==============================] - 1s 3ms/step - loss: 1.2375 - accuracy: 0.6117 - val_loss: 1.2350 - val_accuracy: 0.6148
Epoch 19/20
263/263 [==============================] - 1s 3ms/step - loss: 1.2308 - accuracy: 0.6143 - val_loss: 1.2356 - val_accuracy: 0.6155
Epoch 20/20
263/263 [==============================] - 1s 3ms/step - loss: 1.2240 - accuracy: 0.6166 - val_loss: 1.2246 - val_accuracy: 0.6210

Plotting the validation and training accuracies¶

In [ ]:
# Plotting the accuracies
dict_hist = history_model_1.history
list_ep = [i for i in range(1,21)]

plt.figure(figsize = (8,8))
plt.plot(list_ep,dict_hist['accuracy'],ls = '--', label = 'accuracy')
plt.plot(list_ep,dict_hist['val_accuracy'],ls = '--', label = 'val_accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epochs')
plt.legend()
plt.show()
No description has been provided for this image

Observations:

  • The accuracy on the train and the validation set is almost similar. We can say that model is giving a generalized performance.
  • The plot shows that training accuracy is increasing with the number of epochs, but the validation accuracy has started to fluctuate after 10 epochs. However, the overall validation accuracy is also increasing with epochs.

Let's build one more model with higher complexity and see if we can improve the performance of the model.

First, we need to clear the previous model's history from the Keras backend. Also, let's fix the seed again after clearing the backend.

In [ ]:
# Clearing backend
from tensorflow.keras import backend
backend.clear_session()
In [ ]:
# Fixing the seed for random number generators
np.random.seed(42)

import random
random.seed(42)

tf.random.set_seed(42)

Second Model Architecture¶

  • Building a sequential model with the following architecture
  • First hidden layer with 256 nodes and the relu activation and the input shape = (1024, )
  • Second hidden layer with 128 nodes and the relu activation
  • Add the Dropout layer with the rate equal to 0.2
  • Third hidden layer with 64 nodes and the relu activation
  • Fourth hidden layer with 64 nodes and the relu activation
  • Fifth hidden layer with 32 nodes and the relu activation
  • Add the BatchNormalization layer
  • Output layer with activation as 'softmax' and number of nodes equal to the number of classes, i.e., 10
  • Compile the model with the loss equal to categorical_crossentropy, optimizer equal to Adam(learning_rate = 0.0005), and metric equal to 'accuracy'. Do not fit the model here, just return the compiled model.
  • Print the summary of the model.
  • Fit on the train data with a validation split of 0.2, batch size = 128, verbose = 1, and epochs = 30. Store the model building history to use later for visualization.
In [ ]:
# Define model

from tensorflow.keras import losses
from tensorflow.keras import optimizers

# Create model
model2 = Sequential()

model2.add(Dense(256, activation='relu', input_shape = (1024,)))
model2.add(Dense(128, activation='relu'))
model2.add(Dropout(0.2))
model2.add(Dense(64, activation='relu'))
model2.add(Dense(64, activation='relu'))
model2.add(Dense(32, activation='relu'))
model2.add(BatchNormalization())
model2.add(Dense(10, activation='softmax'))

# Compile model
adam = optimizers.Adam(learning_rate=0.0005)
model2.compile(loss=losses.categorical_crossentropy, optimizer=adam, metrics=['accuracy'])
In [ ]:
# Model summary
model2.summary()
Model: "sequential"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense (Dense)               (None, 256)               262400    
                                                                 
 dense_1 (Dense)             (None, 128)               32896     
                                                                 
 dropout (Dropout)           (None, 128)               0         
                                                                 
 dense_2 (Dense)             (None, 64)                8256      
                                                                 
 dense_3 (Dense)             (None, 64)                4160      
                                                                 
 dense_4 (Dense)             (None, 32)                2080      
                                                                 
 batch_normalization (BatchN  (None, 32)               128       
 ormalization)                                                   
                                                                 
 dense_5 (Dense)             (None, 10)                330       
                                                                 
=================================================================
Total params: 310,250
Trainable params: 310,186
Non-trainable params: 64
_________________________________________________________________

Observations:

  • The total number of parameters has increased by approximately 4.5 times of the previous model, i.e., the second model is much more complex than the first model.
  • There are 64 non-trainable parameters. They belong to the batch normalization layer.

Let's fit the model and plot the accuracies of the training and the validation data.

In [ ]:
# Fit the model
history_model_2 = model2.fit(X_train, y_train, validation_split=0.2, epochs=30, batch_size=128, verbose=1)
Epoch 1/30
263/263 [==============================] - 3s 5ms/step - loss: 2.3507 - accuracy: 0.0979 - val_loss: 2.3047 - val_accuracy: 0.1106
Epoch 2/30
263/263 [==============================] - 1s 4ms/step - loss: 2.1886 - accuracy: 0.1686 - val_loss: 2.0674 - val_accuracy: 0.2562
Epoch 3/30
263/263 [==============================] - 1s 4ms/step - loss: 1.7453 - accuracy: 0.3871 - val_loss: 1.5670 - val_accuracy: 0.4777
Epoch 4/30
263/263 [==============================] - 1s 5ms/step - loss: 1.4380 - accuracy: 0.5112 - val_loss: 1.3193 - val_accuracy: 0.5718
Epoch 5/30
263/263 [==============================] - 1s 4ms/step - loss: 1.2700 - accuracy: 0.5784 - val_loss: 1.1708 - val_accuracy: 0.6149
Epoch 6/30
263/263 [==============================] - 1s 4ms/step - loss: 1.1703 - accuracy: 0.6196 - val_loss: 1.0695 - val_accuracy: 0.6592
Epoch 7/30
263/263 [==============================] - 1s 5ms/step - loss: 1.1118 - accuracy: 0.6419 - val_loss: 1.0547 - val_accuracy: 0.6620
Epoch 8/30
263/263 [==============================] - 1s 4ms/step - loss: 1.0657 - accuracy: 0.6578 - val_loss: 1.0521 - val_accuracy: 0.6573
Epoch 9/30
263/263 [==============================] - 1s 4ms/step - loss: 1.0215 - accuracy: 0.6730 - val_loss: 0.9803 - val_accuracy: 0.6917
Epoch 10/30
263/263 [==============================] - 1s 4ms/step - loss: 0.9898 - accuracy: 0.6836 - val_loss: 1.0238 - val_accuracy: 0.6787
Epoch 11/30
263/263 [==============================] - 1s 4ms/step - loss: 0.9717 - accuracy: 0.6895 - val_loss: 0.9391 - val_accuracy: 0.7024
Epoch 12/30
263/263 [==============================] - 1s 4ms/step - loss: 0.9396 - accuracy: 0.7002 - val_loss: 0.8684 - val_accuracy: 0.7246
Epoch 13/30
263/263 [==============================] - 1s 4ms/step - loss: 0.9093 - accuracy: 0.7135 - val_loss: 0.9138 - val_accuracy: 0.7121
Epoch 14/30
263/263 [==============================] - 1s 4ms/step - loss: 0.8993 - accuracy: 0.7153 - val_loss: 0.8496 - val_accuracy: 0.7332
Epoch 15/30
263/263 [==============================] - 1s 4ms/step - loss: 0.8756 - accuracy: 0.7225 - val_loss: 0.8766 - val_accuracy: 0.7195
Epoch 16/30
263/263 [==============================] - 1s 4ms/step - loss: 0.8445 - accuracy: 0.7337 - val_loss: 0.8252 - val_accuracy: 0.7432
Epoch 17/30
263/263 [==============================] - 1s 4ms/step - loss: 0.8383 - accuracy: 0.7357 - val_loss: 0.8077 - val_accuracy: 0.7494
Epoch 18/30
263/263 [==============================] - 1s 4ms/step - loss: 0.8253 - accuracy: 0.7374 - val_loss: 0.7905 - val_accuracy: 0.7524
Epoch 19/30
263/263 [==============================] - 1s 4ms/step - loss: 0.8066 - accuracy: 0.7450 - val_loss: 0.7995 - val_accuracy: 0.7437
Epoch 20/30
263/263 [==============================] - 1s 4ms/step - loss: 0.8033 - accuracy: 0.7452 - val_loss: 0.7711 - val_accuracy: 0.7585
Epoch 21/30
263/263 [==============================] - 1s 4ms/step - loss: 0.7944 - accuracy: 0.7502 - val_loss: 0.7974 - val_accuracy: 0.7496
Epoch 22/30
263/263 [==============================] - 1s 4ms/step - loss: 0.7780 - accuracy: 0.7538 - val_loss: 0.7902 - val_accuracy: 0.7532
Epoch 23/30
263/263 [==============================] - 1s 4ms/step - loss: 0.7677 - accuracy: 0.7565 - val_loss: 0.7841 - val_accuracy: 0.7510
Epoch 24/30
263/263 [==============================] - 1s 4ms/step - loss: 0.7675 - accuracy: 0.7582 - val_loss: 0.7841 - val_accuracy: 0.7602
Epoch 25/30
263/263 [==============================] - 1s 4ms/step - loss: 0.7522 - accuracy: 0.7607 - val_loss: 0.7625 - val_accuracy: 0.7635
Epoch 26/30
263/263 [==============================] - 1s 4ms/step - loss: 0.7393 - accuracy: 0.7642 - val_loss: 0.7293 - val_accuracy: 0.7726
Epoch 27/30
263/263 [==============================] - 1s 4ms/step - loss: 0.7293 - accuracy: 0.7700 - val_loss: 0.7446 - val_accuracy: 0.7664
Epoch 28/30
263/263 [==============================] - 1s 4ms/step - loss: 0.7392 - accuracy: 0.7662 - val_loss: 0.7721 - val_accuracy: 0.7593
Epoch 29/30
263/263 [==============================] - 1s 4ms/step - loss: 0.7196 - accuracy: 0.7713 - val_loss: 0.7433 - val_accuracy: 0.7660
Epoch 30/30
263/263 [==============================] - 1s 5ms/step - loss: 0.7184 - accuracy: 0.7721 - val_loss: 0.7046 - val_accuracy: 0.7793

Plotting the validation and training accuracies¶

In [ ]:
# Plotting the accuracies

dict_hist = history_model_2.history
list_ep = [i for i in range(1,31)]

plt.figure(figsize = (8,8))
plt.plot(list_ep,dict_hist['accuracy'],ls = '--', label = 'accuracy')
plt.plot(list_ep,dict_hist['val_accuracy'],ls = '--', label = 'val_accuracy')
plt.ylabel('Accuracy')
plt.xlabel('Epochs')
plt.legend()
plt.show()
No description has been provided for this image

Observations:

  • The second model which is more complex than the previous model is performing significantly better.
  • The train and validation accuracy has improved significantly.
  • The validation accuracy is slightly higher than training accuracy, which implies that the model complexity can be further increased.
  • The plot shows that the train and validation accuracies have an upward trend even after 30 epochs which implies that the number of epochs can be increased.

Predictions on the test data¶

  • Make predictions on the test set using the second model.
  • Print the obtained results using the classification report and the confusion matrix.
  • Final observations on the obtained results.
In [ ]:
test_pred = model2.predict(X_test)

test_pred = np.argmax(test_pred, axis=-1)

Note: Earlier, we noticed that each entry of the test data is a one-hot encoded vector but to print the classification report and confusion matrix, we must convert each entry of y_test to a single label.

In [ ]:
# Converting each entry to single label from one-hot encoded vector
y_test = np.argmax(y_test, axis=-1)
In [ ]:
# Importing required functions
from sklearn.metrics import classification_report
from sklearn.metrics import confusion_matrix

# Printing the classification report
print(classification_report(y_test, test_pred))

# Plotting the heatmap using confusion matrix
cm = confusion_matrix(y_test, test_pred)
plt.figure(figsize=(8,5))
sns.heatmap(cm, annot=True,  fmt='.0f')
plt.ylabel('Actual')
plt.xlabel('Predicted')
plt.show()
              precision    recall  f1-score   support

           0       0.79      0.80      0.80      1814
           1       0.75      0.83      0.79      1828
           2       0.82      0.78      0.80      1803
           3       0.79      0.71      0.75      1719
           4       0.77      0.85      0.81      1812
           5       0.77      0.73      0.75      1768
           6       0.75      0.80      0.77      1832
           7       0.84      0.81      0.83      1808
           8       0.76      0.72      0.74      1812
           9       0.78      0.75      0.76      1804

    accuracy                           0.78     18000
   macro avg       0.78      0.78      0.78     18000
weighted avg       0.78      0.78      0.78     18000

No description has been provided for this image

Observations:¶

  • The accuracy is 78% on the test set. This is comparable with the results on the train and the validation sets which implies that the model is giving a generalized performance.
  • The recall values for all the digits are higher than 70% with 3 having the least recall values of 71%.
  • The confusion matrix shows that the model has confused 5 and 6 digits with digit 6 and 4 the most number of times.
  • The highest recall of about 85% is for digit 4 i.e. the model can identify 85% of images with digit 4.
  • The precision values have less variation (from 75% to 84%) than recall (from 71% to 85%).
  • The least precision of 75% is for digit 1 and 6. The confusion matrix shows that the model confused it with digit 4 the most number of times.
  • This indicates that the model is not able to identify small variations among digits.

Note: We can try tuning this model further or increase the complexity of the model and see if we can get better results. As this is image data, we can also try convolutional neural networks which might be able to identify small variations in the orientation of digits and give better results than simple feed-forward neural networks.

In [2]:
# Convert notebook to html
!jupyter nbconvert --to html "/content/drive/MyDrive/MIT - Data Sciences/Colab Notebooks/Week_Six_-_Deep_Learning/Street_View_Housing_Number_Digit_Recognition_using_ANNs/NN_Practice_Project_SVHN.ipynb"
[NbConvertApp] Converting notebook /content/drive/MyDrive/MIT - Data Sciences/Colab Notebooks/Week_Six_-_Deep_Learning/Street_View_Housing_Number_Digit_Recognition_using_ANNs/NN_Practice_Project_SVHN.ipynb to html
[NbConvertApp] WARNING | Alternative text is missing on 4 image(s).
[NbConvertApp] Writing 484066 bytes to /content/drive/MyDrive/MIT - Data Sciences/Colab Notebooks/Week_Six_-_Deep_Learning/Street_View_Housing_Number_Digit_Recognition_using_ANNs/NN_Practice_Project_SVHN.html