Exercise:Convolution and Pooling

== Convolution and Pooling ==

This problem set is divided into two parts. In the first part, you will implement a [[linear decoders | linear decoder]] to learn features on color images from the STL10 dataset. In the second part, you will use these learned features in convolution and pooling for classifying STL10 images.

In the file <tt>cnnExercise.zip</tt> we have provided some starter code. You should write your code at the places indicated "YOUR CODE HERE" in the files.

For this exercise, you will need to modify '''<tt>sparseAutoencoderCost.m</tt>''' from your earlier exercise, as well as '''<tt>cnnConvolve.m</tt>''' and '''<tt>cnnPool.m</tt>''' from this exercise.

=== Dependencies ===

The following additional files are required for this exercise:
* STL10 dataset

You will also need:
* <tt>sparseAutoencoderCost.m</tt> (and related functions) from [[Exercise:Sparse Autoencoder]]
* <tt>softmaxTrain.m</tt> (and related functions) from [[Exercise:Softmax Regression]]

''If you have not completed the exercises listed above, we strongly suggest you complete them first.''

=== Part I: Linear decoder on color images ===

In all the exercise so far, you have been working only with grayscale images. In this exercise, you will get the opportunity to work with RGB color images for the first time. 

Conveniently, the fact that an image has three color channels (RGB), rather than a single gray channel, presents little difficulty for the sparse autoencoder. You can just combine the intensities from all the color channels for the pixels into one long vector, as if you were working with a grayscale image with 3x the number of pixels as the original image. 

=== Step 0: Initialization ===

In this step, we initialize some parameters used in the exercise.

=== Step 1: Modify sparseAutoencoderCost.m to use a linear decoder ===

You should modify <tt>sparseAutoencoderCost</tt> in <tt>sparseAutoencoderCost.m</tt> from your earlier exercise to use a [[linear decoders | linear decoder]]. In particular, you should change the cost and gradients returned to reflect the change from a sigmoid to a linear decoder. After making this change, check your gradient to ensure that they are correct.

=== Step 2: Learn features on small patches ===

You will now use your sparse autoencoder to learn features on small 8x8 patches sampled from the larger 96x96 STL10 images (The STL10 dataset comprises 5000 test and 8000 train 96x96 labelled color images belonging to one of ten classes: airplane, bird, car, cat, deer, dog, horse, monkey, ship, truck). 

Code has been provided to load the dataset and sample patches from the images. However, because the dataset is relatively large (about 150 megabytes on disk, and close to 1 gigabyte loaded), we recommend that you load the dataset and sample patches from it, then save the patches so that you will not need to load the entire dataset into memory repeatedly. Furthermore, since you will need to apply the exact same preprocessing steps to the convolved images as you do to the patches used for training the autoencoder (you have to subtract the same mean image and use the exact same whitening matrix), storing the original set of patches means that you can recompute these matrices if necessary. Code to save and load the sampled patches has already been provided, so no additional changes are required on your part.

We have also provided a wrapper function, <tt>sparseAutoencoderTrain</tt>, analogous to <tt>softmaxTrain</tt>, which trains a sparse autoencoder on the given parameters and data. This function wraps around the function <tt>sparseAutoencoderCost</tt> that you modified in this exercise, providing a convenient way to train a sparse autoencoder using a single function, which may be useful in future exercises. 

In this step, you will use <tt>sparseAutoencoderTrain</tt> to train a sparse autoencoder on the sampled patches. The code provided trains your sparse autoencoder for 800 iterations with the default parameters initialized in step 0. This should take less than 15 minutes. Your sparse autoencoder should learn features which when visualized, look like edges and opponent colors, as in the figure below. 

[[File:cnn_Features_Good.png|480px]]

If your parameters are improperly tuned (the default parameters should work), or if your implementation of the autoencoder is buggy, you might get one of the following images instead:

<table>
<tr><td>[[File:cnn_Features_Bad1.png|240px]]</td><td>[[File:cnn_Features_Bad2.png|240px]]</td></tr>
</table>



=== Part II: Convolution and pooling ===