Convolutional versus Dense layers in neural networks - Part 2¶
Dense layers visualization¶
Using the the Dense and Convolutional networks of the Part-1 (HTML / Jupyter) with size optimization and regularization, and the MNIST let's visualize intermediate response to test samples.
Most of the dense layers' outputs are high dimension, we will use dimension reduction technique UMAP [1].
Learning goals¶
- Deep Neural Network internal representations
- UMAP dimension reduction technique
#!pip install umap bokeh
import numpy as np
import matplotlib.pyplot as plt
from tensorflow.keras import activations, datasets, layers, losses, metrics, models, optimizers, regularizers
import seaborn as sns
import pandas as pd
import umap
from io import BytesIO
import base64
from PIL import Image
from bokeh import plotting, palettes
from bokeh.models import HoverTool, ColumnDataSource, CategoricalColorMapper
plotting.output_notebook()
# Ignore warnings from UMAP
import warnings
warnings.filterwarnings('ignore')
UMAP¶
UMAP ([1] Uniform Manifold Approximation and Projection for Dimension Reduction) is very recent (2018) technic to fold large dimension feature spaces onto the two dimension plane.
It is competing with t-SNE ([3] t-distributed Stochastic Neighbor Embedding) which has been shown to have limitations [4].
Both methods are attempting to preserve distance between samples from the feature space (using a given norm) into the 2D plane.
(xTrain, yTrain),(xTest, yTest) = datasets.mnist.load_data()
xTrain = xTrain / 255.
xTest = xTest / 255.
xTrainPad = np.pad(xTrain.reshape(-1, 28, 28, 1), ((0,0),(2,2),(2,2),(0,0)), 'constant')
xTestPad = np.pad(xTest.reshape(-1, 28, 28, 1), ((0,0),(2,2),(2,2),(0,0)), 'constant')
Classifier for MNIST dataset based on 2 Dense (Perceptron) layers
modelDense128 = models.load_model('models/MNIST_dense128.h5')
modelDense128.summary()
Classifier for MNIST based on 2 convolutional and 3 dense layers similar to LeNet5 but with dropout and less neurons
modelLeNet60 = models.load_model('models/MNIST_LeNet60.h5')
modelLeNet60.summary()
Helpers¶
def predictUntilLayer(model, layerIndex, data):
""" Execute prediction on a portion of the model """
intermediateModel = models.Model(inputs=model.input,
outputs=model.layers[layerIndex].output)
return intermediateModel.predict(data)
def embeddableImage(data):
img_data = (255 * (1 - data)).astype(np.uint8)
image = Image.fromarray(img_data, mode='L')
buffer = BytesIO()
image.save(buffer, format='png')
return 'data:image/png;base64,' + base64.b64encode(buffer.getvalue()).decode()
def umapPlot(embedding, x, y, yTrue=None, title=''):
""" Plot the embedding of X and y with popovers using Bokeh """
df = pd.DataFrame(embedding, columns=('x', 'y'))
df['image'] = list(map(embeddableImage, x))
df['digit'] = [str(d) for d in y]
if yTrue is not None:
df['trueDigit'] = [str(d) for d in yTrue]
datasource = ColumnDataSource(df)
colorMapping = CategoricalColorMapper(factors=np.arange(10).astype(np.str), palette=palettes.Spectral10)
plotFigure = plotting.figure(
title=title,
plot_width=600,
plot_height=600,
tools=('pan, wheel_zoom, reset')
)
if yTrue is None:
tooltip = """
<div>
<div>
<img src='@image' style='float: left; margin: 5px 5px 5px 5px'/>
</div>
<div>
<span style='font-size: 16px; color: #224499'>Digit:</span>
<span style='font-size: 18px'>@digit</span>
</div>
</div>
"""
else:
tooltip = """
<div>
<div>
<img src='@image' style='float: left; margin: 5px 5px 5px 5px'/>
</div>
<div>
<span style='font-size: 16px; color: #224499'>Digit:</span>
<span style='font-size: 18px'>@digit (true: @trueDigit)</span>
</div>
</div>
"""
plotFigure.add_tools(HoverTool(tooltips=tooltip))
plotFigure.circle(
'x', 'y',
source=datasource,
color=dict(field='digit', transform=colorMapping),
line_alpha=0.6, fill_alpha=0.6, size=4
)
plotting.show(plotFigure)
return plotFigure
reducerMNIST = umap.UMAP()
embeddingMNIST = reducerMNIST.fit_transform(xTest.reshape(-1, 28*28));
fig = umapPlot(embeddingMNIST, xTest, yTest, title='UMAP projection of the MNIST Test dataset')
Using features made of the 28x28 pixels of each image, UMAP is able to make clusters of homogeneous digits in most cases. But there are locatios in which the digits clouds are heterogeneous.
However, the compactness and relative distance between clusters is low.
plotting.save(fig, 'mnist_raw.html')
denseAnteLastTest = predictUntilLayer(modelDense128, len(modelDense128.layers) - 2, xTest)
denseEst = modelDense128.predict(xTest)
reducerDenseAnteLast = umap.UMAP()
embeddingDenseAnteLast = reducerDenseAnteLast.fit_transform(denseAnteLastTest);
umapPlot(embeddingDenseAnteLast, xTest, np.argmax(denseEst, axis=1), yTest,
title='Input of last layer of Dense network')
LeNet5 Convolutional network¶
Showing the output of the second dense layer, input to the last dense layer (2 convo + 3 dense in total).
leNetAnteLastTest = predictUntilLayer(modelLeNet60, len(modelLeNet60.layers) - 2, xTestPad)
leNetEst = modelLeNet60.predict(xTestPad)
reducerLeNetAnteLast = umap.UMAP()
embeddingLeNetAnteLast = reducerLeNetAnteLast.fit_transform(leNetAnteLastTest);
umapPlot(embeddingLeNetAnteLast, xTest, np.argmax(leNetEst, axis=1), yTest,
title='Input of last layer of LeNet5 network')
Compared to the original data (MNIST), the two networks are folding the test samples into more compact clusters with still some mis-positionned (mis-classified) samples. The convolutionnal network is showing better separation of the clusters.
You may check some of the digits that appear misclassified. Some of them are really challenging for both the machine and a human being. You may also observe that the classifier is matching with the clusters created by UMAP but not in all cases.
reducerDenseLast = umap.UMAP()
embeddingDenseLast = reducerDenseLast.fit_transform(denseEst);
umapPlot(embeddingDenseLast, xTest, np.argmax(denseEst, axis=1), yTest,
title='Softmax layer output of Dense network')
LeNet5 Convolutional network¶
Showing the output of the last (softmax) layer
reducerLeNetLast = umap.UMAP()
embeddingLeNetLast = reducerLeNetLast.fit_transform(leNetEst);
umapPlot(embeddingLeNetLast, xTest, np.argmax(leNetEst, axis=1), yTest,
title='Softmax layer output of LeNet network')
Compared to a hard decision (argmax), UMAP is showing the similarities of the 10 probabilities for each test sample.
Sometimes, the max probability is very large, leading to samples that are stacked on the UMAP graphic. But it does not mean that the classifier is correct!
A digit may have several clusters that are spread far away.
Conclusion¶
UMAP representation is a good tool to visualy inspect the processing of a deep neural network and spot corner cases.
It does not provide a measurement of the network quality such that accuracy of confusion matrices but we were anyway able to spot differences between the lesser performing (the Dense network) and better performing (LeNet5) looking at the output of the ante-last layer, before the softmax probabilities are computed.
Where to go from here¶
References¶
- UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction, Leland McInnes et al., 2018 - https://arxiv.org/abs/1802.03426
- How to use UMAP - https://umap-learn.readthedocs.io/en/latest/basic_usage.html
- Visualizing Data using t-SNE, Laurens van der Maaten et al, 2008 - http://www.jmlr.org/papers/volume9/vandermaaten08a/vandermaaten08a.pdf
- Misread t-SNE - https://distill.pub/2016/misread-tsne/