Exercise:Convolution and Pooling
From Ufldl
Contents |
Convolution and Pooling
In this problem set, you will use the features you learned on 8x8 patches sampled from images from the STL10 dataset in the earlier exercise on linear decoders for classifying images from a reduced STL10 dataset applying convolution and pooling. The reduced STL10 dataset comprises 64x64 images from 4 classes (airplane, car, cat, dog).
In the file cnn_exercise.zip we have provided some starter code. You should write your code at the places indicated "YOUR CODE HERE" in the files.
For this exercise, you will need to modify cnnConvolve.m and cnnPool.m.
Dependencies
The following additional files are required for this exercise:
You will also need:
- sparseAutoencoderLinear.m or your saved features from Exercise:Learning color features with Sparse Autoencoders
- feedForwardAutoencoder.m (and related functions) from Exercise:Self-Taught Learning
- softmaxTrain.m (and related functions) from Exercise:Softmax Regression
If you have not completed the exercises listed above, we strongly suggest you complete them first.
Step 1: Load learned features
In this step, we will load the color features you learned in Exercise:Learning color features with Sparse Autoencoders. To verify that the features are correct, the loaded features will be visualized, and you should get something like the following:
Step 2: Implement and test convolution and pooling
In this step, you will implement convolution and pooling, and test them on a small part of the data set to ensure that you have implemented these two functions correctly. In the next step, you will actually convolve and pool the features with the STL10 images.
Step 2a: Implement convolution
Implement convolution, as described in feature extraction using convolution, in the function cnnConvolve in cnnConvolve.m. Implementing convolution is somewhat involved, so we will guide you through the process below.
First of all, what we want to compute is σ(Wx_{(r,c)} + b) for all valid (r,c) (valid meaning that the entire 8x8 patch is contained within the image; as opposed to a full convolution which allows the patch to extend outside the image, with the area outside the image assumed to be 0) , where W and b are the learned weights and biases from the input layer to the hidden layer, and x_{(r,c)} is the 8x8 patch with the upper left corner at (r,c). To accomplish this, what we could do is loop over all such patches and compute σ(Wx_{(r,c)} + b) for each of them. In theory, this is correct. However, in practice, the convolution is usually done in three small steps to take advantage of MATLAB's optimized convolution functions.
Observe that the convolution above can be broken down into the following three small steps. First, compute Wx_{(r,c)} for all (r,c). Next, add b to all the computed values. Finally, apply the sigmoid function to the resultant values. This doesn't seem to buy you anything, since the first step still requires a loop. However, you can replace the loop in the first step with one of MATLAB's optimized convolution functions, conv2, speeding up the process slightly.
However, there are two complications in using conv2.
First, conv2 performs a 2-D convolution, but you have 5 "dimensions" - image number, feature number, row of image, column of image, and channel of image - that you want to convolve over. Because of this, you will have to convolve each feature and image channel separately for each image, using the row and column of the image as the 2 dimensions you convolve over. This means that you will need three outer loops over the image number imageNum, feature number featureNum, and the channel number of the image channel, with the 2-D convolution of the weight matrix for the featureNum-th feature and channel-th channel with the image matrix for the imageNum-th image going inside.
Second, because of the mathematical definition of convolution, the feature matrix must be "flipped" before passing it to conv2. This is explained in greater detail in the implementation tip section following the code.
Concretely, the code to do the convolution using conv2 will look something like the following:
convolvedFeatures = zeros(hiddenSize, numImages, imageDim - patchDim + 1, imageDim - patchDim + 1); for imageNum = 1:numImages for featureNum = 1:hiddenSize % Obtain the feature matrix for this feature Wfeat = W(featureNum, :); Wfeat = reshape(Wfeat, patchDim, patchDim, 3); % Get convolution of image with feature matrix for each channel convolvedImage = zeros(imageDim - patchDim + 1, imageDim - patchDim + 1); for channel = 1:3 % Flip the feature matrix because of the definition of convolution, as explained later filter = flipud(fliplr(squeeze(Wfeat(:, :, channel)))); im = squeeze(images(:, :, channel, imageNum)); % Convolve "filter" with "im", adding the result convolvedImage = convolvedImage + conv2(im, filter), 'valid'); end % The convolved feature is the sum of the convolved values for all channels convolvedFeatures(featureNum, imageNum, :, :) = convolvedImage; end end
The following implementation tip explains the "flipping" of feature matrices when using MATLAB's convolution functions:
Implementation tip: Using conv2 and convn
Because the mathematical definition of convolution involves "flipping" the matrix to convolve with (reversing its rows and its columns), to use MATLAB's convolution functions, you must first "flip" the weight matrix so that when MATLAB "flips" it according to the mathematical definition the entries will be at the correct place. For example, suppose you wanted to convolve two matrices image (a large image) and W (the feature) using conv2(image, W), and W is a 3x3 matrix as below:
If you use conv2(image, W), MATLAB will first "flip" W, reversing its rows and columns, before convolving W with image, as below:
If the original layout of W was correct, after flipping, it would be incorrect. For the layout to be correct after flipping, you will have to flip W before passing it into conv2, so that after MATLAB flips W in conv2, the layout will be correct. For conv2, this means reversing the rows and columns, which can be done with flipud and fliplr, as we did in the example code above. This is also true for the general convolution function convn, in which case MATLAB reverses every dimension. In general, you can flip the matrix W using the following code snippet, which works for W of any dimension
% Flip W for use in conv2 / convn temp = W(:); temp = flipud(temp); temp = reshape(temp, size(W));
To each of convolvedFeatures, you should then add b, the corresponding bias for the featureNum-th feature. If you had done no preprocessing of the patches, you could then apply the sigmoid function to obtain the convolved features. However, because you preprocessed the patches before learning features on them, you must also apply the same preprocessing steps to the convolved patches to get the correct feature activations.
In particular, you did the following to the patches:
- subtract the mean patch, meanPatch to zero the mean of the patches
- ZCA whiten using the whitening matrix ZCAWhite.
These same three steps must also be applied to the convolved patches.
Taking the preprocessing steps into account, the feature activations that you should compute is , where T is the whitening matrix and is the mean patch. Expanding this, you obtain , which suggests that you should convolve the images with WT rather than W as earlier, and you should add , rather than just b to convolvedFeatures, before finally applying the sigmoid function.
Step 2b: Check your convolution
We have provided some code for you to check that you have done the convolution correctly. The code randomly checks the convolved values for a number of (feature, row, column) tuples by computing the feature activations using feedForwardAutoencoder for the selected features and patches directly using the sparse autoencoder.
Step 2c: Pooling
Implement pooling in the function cnnPool in cnnPool.m.
Step 2d: Check your pooling
We have provided some code for you to check that you have done the pooling correctly. The code runs cnnPool against a test matrix to see if it produces the expected result.
Step 3: Convolve and pool with the dataset
In this step, you will convolve each of the features you learned with the full 64x64 images from the STL dataset to obtain the convolved features for both train and test sets. You will then pool the convolved features to obtain the pooled features for both train and test sets. The pooled features for the train set will be used for classification, and those for the test set will be used to test the trained classifier.
Because the convolved features matrix is very large, the code provided does the convolution and pooling 50 features at a time to avoid running out of memory.
Step 4: Use pooled features for classification
In this step, you will use the pooled features to train a softmax classifier to map the pooled features to the class labels. The code in this section uses softmaxTrain from the softmax exercise to train a softmax classifier on the pooled features for 500 iterations, which should take around 5 minutes.
Step 4: Test classifier
Now that you have a trained softmax classifier, you can see how well it performs on the test set. These pooled features for the test set will be run through the softmax classifier, and the accuracy of the predictions will be computed. You should expect to get an accuracy of around 77-78%.