View on GitHub

data-science

Notebooks and Python about data science

If you like this project please add your Star

Convolutional versus Dense layers in neural networks - Part 2

Dense layers visualization

Using the the Dense and Convolutional networks of the Part-1 (HTML / Jupyter) with size optimization and regularization, and the MNIST let's visualize intermediate response to test samples.

Most of the dense layers' outputs are high dimension, we will use dimension reduction technique UMAP [1].

Learning goals

  • Deep Neural Network internal representations
  • UMAP dimension reduction technique
In [1]:
#!pip install umap bokeh
In [2]:
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras import activations, datasets, layers, losses, metrics, models, optimizers, regularizers
import seaborn as sns
import pandas as pd
import umap
In [3]:
from io import BytesIO
import base64
from PIL import Image

from bokeh import plotting, palettes
from bokeh.models import HoverTool, ColumnDataSource, CategoricalColorMapper

plotting.output_notebook()
Loading BokehJS ...
In [4]:
# Ignore warnings from UMAP
import warnings
warnings.filterwarnings('ignore')

UMAP

UMAP ([1] Uniform Manifold Approximation and Projection for Dimension Reduction) is very recent (2018) technic to fold large dimension feature spaces onto the two dimension plane.

It is competing with t-SNE ([3] t-distributed Stochastic Neighbor Embedding) which has been shown to have limitations [4].

Both methods are attempting to preserve distance between samples from the feature space (using a given norm) into the 2D plane.

Dataset and models

For more information on the MNIST dataset and the design of the two classifiers, see Part 1 (HTML / Jupyter).

In [5]:
(xTrain, yTrain),(xTest, yTest) = datasets.mnist.load_data()
xTrain = xTrain / 255.
xTest  = xTest  / 255.

xTrainPad  = np.pad(xTrain.reshape(-1, 28, 28, 1), ((0,0),(2,2),(2,2),(0,0)), 'constant')
xTestPad   = np.pad(xTest.reshape(-1, 28, 28, 1), ((0,0),(2,2),(2,2),(0,0)), 'constant')

Classifier for MNIST dataset based on 2 Dense (Perceptron) layers

In [6]:
modelDense128 = models.load_model('models/MNIST_dense128.h5')
modelDense128.summary()
Model: "sequential_6"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
flatten (Flatten)            (None, 784)               0         
_________________________________________________________________
dropout_0 (Dropout)          (None, 784)               0         
_________________________________________________________________
dense_0 (Dense)              (None, 128)               100480    
_________________________________________________________________
dropout_1 (Dropout)          (None, 128)               0         
_________________________________________________________________
dense_1 (Dense)              (None, 10)                1290      
=================================================================
Total params: 101,770
Trainable params: 101,770
Non-trainable params: 0
_________________________________________________________________

Classifier for MNIST based on 2 convolutional and 3 dense layers similar to LeNet5 but with dropout and less neurons

In [7]:
modelLeNet60 = models.load_model('models/MNIST_LeNet60.h5')
modelLeNet60.summary()
Model: "sequential_131"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv_0 (Conv2D)              (None, 30, 30, 6)         60        
_________________________________________________________________
average_pooling2d_248 (Avera (None, 15, 15, 6)         0         
_________________________________________________________________
conv_1 (Conv2D)              (None, 13, 13, 16)        880       
_________________________________________________________________
average_pooling2d_249 (Avera (None, 6, 6, 16)          0         
_________________________________________________________________
flatten (Flatten)            (None, 576)               0         
_________________________________________________________________
dropout_0 (Dropout)          (None, 576)               0         
_________________________________________________________________
dense_0 (Dense)              (None, 60)                34620     
_________________________________________________________________
dropout_1 (Dropout)          (None, 60)                0         
_________________________________________________________________
dense_1 (Dense)              (None, 42)                2562      
_________________________________________________________________
dense_2 (Dense)              (None, 10)                430       
=================================================================
Total params: 38,552
Trainable params: 38,552
Non-trainable params: 0
_________________________________________________________________

Helpers

In [8]:
def predictUntilLayer(model, layerIndex, data):
    """ Execute prediction on a portion of the model """
    intermediateModel = models.Model(inputs=model.input,
                                 outputs=model.layers[layerIndex].output)
    return intermediateModel.predict(data)
In [9]:
def embeddableImage(data):
    img_data = (255 * (1 - data)).astype(np.uint8) 
    image = Image.fromarray(img_data, mode='L')
    buffer = BytesIO()
    image.save(buffer, format='png')
    return 'data:image/png;base64,' + base64.b64encode(buffer.getvalue()).decode()

def umapPlot(embedding, x, y, yTrue=None, title=''):
    """ Plot the embedding of X and y with popovers using Bokeh """
    
    df = pd.DataFrame(embedding, columns=('x', 'y'))
    df['image'] = list(map(embeddableImage, x))
    df['digit'] = [str(d) for d in y]
    if yTrue is not None:
        df['trueDigit'] = [str(d) for d in yTrue]

    datasource = ColumnDataSource(df)

    colorMapping = CategoricalColorMapper(factors=np.arange(10).astype(np.str), palette=palettes.Spectral10)

    plotFigure = plotting.figure(
        title=title,
        plot_width=600,
        plot_height=600,
        tools=('pan, wheel_zoom, reset')
    )

    if yTrue is None:
        tooltip = """
            <div>
                <div>
                    <img src='@image' style='float: left; margin: 5px 5px 5px 5px'/>
                </div>
                <div>
                    <span style='font-size: 16px; color: #224499'>Digit:</span>
                    <span style='font-size: 18px'>@digit</span>
                </div>
            </div>
            """
    else:
        tooltip = """
            <div>
                <div>
                    <img src='@image' style='float: left; margin: 5px 5px 5px 5px'/>
                </div>
                <div>
                    <span style='font-size: 16px; color: #224499'>Digit:</span>
                    <span style='font-size: 18px'>@digit (true: @trueDigit)</span>
                </div>
            </div>
            """
    plotFigure.add_tools(HoverTool(tooltips=tooltip))

    plotFigure.circle(
        'x', 'y',
        source=datasource,
        color=dict(field='digit', transform=colorMapping),
        line_alpha=0.6, fill_alpha=0.6, size=4
    )
    plotting.show(plotFigure)
    
    return plotFigure

Visualization of the test samples

Using the technic of the UMAP documentation [2]

In [10]:
reducerMNIST = umap.UMAP()
embeddingMNIST = reducerMNIST.fit_transform(xTest.reshape(-1, 28*28));
In [11]:
fig = umapPlot(embeddingMNIST, xTest, yTest, title='UMAP projection of the MNIST Test dataset')

Using features made of the 28x28 pixels of each image, UMAP is able to make clusters of homogeneous digits in most cases. But there are locatios in which the digits clouds are heterogeneous.

However, the compactness and relative distance between clusters is low.

In [12]:
plotting.save(fig, 'mnist_raw.html')
Out[12]:
'/Users/antoinehue/Code/data-science-github/cnn/mnist_raw.html'

Representation of the last layer input

Let's represent the output of the last large Dense layer before the softmax layer. These values are also known as the logits.

Dense network

Showing the output of the first dense layer, input to the second layer (two in total)

In [13]:
denseAnteLastTest = predictUntilLayer(modelDense128, len(modelDense128.layers) - 2, xTest)
denseEst = modelDense128.predict(xTest)
In [14]:
reducerDenseAnteLast = umap.UMAP()
embeddingDenseAnteLast = reducerDenseAnteLast.fit_transform(denseAnteLastTest);
In [15]:
umapPlot(embeddingDenseAnteLast, xTest, np.argmax(denseEst, axis=1), yTest, 
         title='Input of last layer of Dense network')
Out[15]:
Figure(
id = '1166', …)

LeNet5 Convolutional network

Showing the output of the second dense layer, input to the last dense layer (2 convo + 3 dense in total).

In [16]:
leNetAnteLastTest = predictUntilLayer(modelLeNet60, len(modelLeNet60.layers) - 2, xTestPad)
leNetEst = modelLeNet60.predict(xTestPad)
In [17]:
reducerLeNetAnteLast = umap.UMAP()
embeddingLeNetAnteLast = reducerLeNetAnteLast.fit_transform(leNetAnteLastTest);
In [18]:
umapPlot(embeddingLeNetAnteLast, xTest, np.argmax(leNetEst, axis=1), yTest, 
         title='Input of last layer of LeNet5 network')
Out[18]:
Figure(
id = '1260', …)

Compared to the original data (MNIST), the two networks are folding the test samples into more compact clusters with still some mis-positionned (mis-classified) samples. The convolutionnal network is showing better separation of the clusters.

You may check some of the digits that appear misclassified. Some of them are really challenging for both the machine and a human being. You may also observe that the classifier is matching with the clusters created by UMAP but not in all cases.

Representation of the output of the softmax layer of the networks

Let's represent the output of the last layer that computes softmax probabilities of both networks.

Dense network

Showing the output of the last (softmax) layer

In [19]:
reducerDenseLast = umap.UMAP()
embeddingDenseLast = reducerDenseLast.fit_transform(denseEst);
In [20]:
umapPlot(embeddingDenseLast, xTest, np.argmax(denseEst, axis=1), yTest, 
         title='Softmax layer output of Dense network')
Out[20]:
Figure(
id = '1361', …)

LeNet5 Convolutional network

Showing the output of the last (softmax) layer

In [21]:
reducerLeNetLast = umap.UMAP()
embeddingLeNetLast = reducerLeNetLast.fit_transform(leNetEst);
In [22]:
umapPlot(embeddingLeNetLast, xTest, np.argmax(leNetEst, axis=1), yTest, 
         title='Softmax layer output of LeNet network')
Out[22]:
Figure(
id = '1469', …)

Compared to a hard decision (argmax), UMAP is showing the similarities of the 10 probabilities for each test sample.

Sometimes, the max probability is very large, leading to samples that are stacked on the UMAP graphic. But it does not mean that the classifier is correct!

A digit may have several clusters that are spread far away.

Conclusion

UMAP representation is a good tool to visualy inspect the processing of a deep neural network and spot corner cases.

It does not provide a measurement of the network quality such that accuracy of confusion matrices but we were anyway able to spot differences between the lesser performing (the Dense network) and better performing (LeNet5) looking at the output of the ante-last layer, before the softmax probabilities are computed.

Where to go from here

  • Architecture review of Dense vs. Convolutional in Part-1 (HTML / Jupyter)
  • Dense vs. Convolutional Part-3 coming soon, we will study the impact of geometry transformations (translation, rotation, scale...) on the two kinds of networks

References

  1. UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction, Leland McInnes et al., 2018 - https://arxiv.org/abs/1802.03426
  2. How to use UMAP - https://umap-learn.readthedocs.io/en/latest/basic_usage.html
  3. Visualizing Data using t-SNE, Laurens van der Maaten et al, 2008 - http://www.jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf
  4. Misread t-SNE - https://distill.pub/2016/misread-tsne/