Exercise:Convolution and Pooling

From Ufldl

Jump to: navigation, search
(Dependencies)
Line 1: Line 1:
== Convolution and Pooling ==
== Convolution and Pooling ==
-
This problem set is divided into two parts. In the first part, you will implement a [[linear decoder]] to learn features on color images from the STL10 dataset. In the second part, you will use these learned features in convolution and pooling for classifying STL10 images.
+
This problem set is divided into two parts. In the first part, you will implement a [[linear decoders | linear decoder]] to learn features on color images from the STL10 dataset. In the second part, you will use these learned features in convolution and pooling for classifying STL10 images.
-
In the file <tt>...</tt> we have provided some starter code.  
+
In the file <tt>cnnExercise.zip</tt> we have provided some starter code. You should write your code at the places indicated "YOUR CODE HERE" in the files.
-
You will need to modify '''<tt>...</tt>''' for this exercise.
+
For this exercise, you will need to modify '''<tt>sparseAutoencoderCost.m</tt>''' from your earlier exercise, as well as '''<tt>cnnConvolve.m</tt>''' and '''<tt>cnnPool.m</tt>''' from this exercise.
=== Dependencies ===
=== Dependencies ===
Line 23: Line 23:
Conveniently, the fact that an image has three color channels (RGB), rather than a single gray channel, presents little difficulty for the sparse autoencoder. You can just combine the intensities from all the color channels for the pixels into one long vector, as if you were working with a grayscale image with 3x the number of pixels as the original image.  
Conveniently, the fact that an image has three color channels (RGB), rather than a single gray channel, presents little difficulty for the sparse autoencoder. You can just combine the intensities from all the color channels for the pixels into one long vector, as if you were working with a grayscale image with 3x the number of pixels as the original image.  
 +
 +
=== Step 0: Initialization ===
 +
 +
In this step, we initialize some parameters used in the exercise.
 +
 +
=== Step 1: Modify sparseAutoencoderCost.m to use a linear decoder ===
 +
 +
You should modify <tt>sparseAutoencoderCost</tt> in <tt>sparseAutoencoderCost.m</tt> from your earlier exercise to use a [[linear decoders | linear decoder]]. In particular, you should change the cost and gradients returned to reflect the change from a sigmoid to a linear decoder. After making this change, check your gradient to ensure that they are correct.
 +
 +
=== Step 2: Learn features on small patches ===
 +
 +
You will now use your sparse autoencoder to learn features on small 8x8 patches sampled from the larger 96x96 STL10 images (The STL10 dataset comprises 5000 test and 8000 train 96x96 labelled color images belonging to one of ten classes: airplane, bird, car, cat, deer, dog, horse, monkey, ship, truck).
 +
 +
Code has been provided to load the dataset and sample patches from the images. However, because the dataset is relatively large (about 150 megabytes on disk, and close to 1 gigabyte loaded), we recommend that you load the dataset and sample patches from it, then save the patches so that you will not need to load the entire dataset into memory repeatedly. Furthermore, since you will need to apply the exact same preprocessing steps to the convolved images as you do to the patches used for training the autoencoder (you have to subtract the same mean image and use the exact same whitening matrix), storing the original set of patches means that you can recompute these matrices if necessary. Code to save and load the sampled patches has already been provided, so no additional changes are required on your part.
 +
 +
We have also provided a wrapper function, <tt>sparseAutoencoderTrain</tt>, analogous to <tt>softmaxTrain</tt>, which trains a sparse autoencoder on the given parameters and data. This function wraps around the function <tt>sparseAutoencoderCost</tt> that you modified in this exercise, providing a convenient way to train a sparse autoencoder using a single function, which may be useful in future exercises.
 +
 +
In this step, you will use <tt>sparseAutoencoderTrain</tt> to train a sparse autoencoder on the sampled patches. The code provided trains your sparse autoencoder for 800 iterations with the default parameters initialized in step 0. This should take less than 15 minutes. Your sparse autoencoder should learn features which when visualized, look like edges and opponent colors, as in the figure below.
 +
 +
[[File:cnn_Features_Good.png|480px]]
 +
 +
If your parameters are improperly tuned (the default parameters should work), or if your implementation of the autoencoder is buggy, you might get one of the following images instead:
 +
 +
<table>
 +
<tr><td>[[File:cnn_Features_Bad1.png|240px]]</td><td>[[File:cnn_Features_Bad2.png|240px]]</td></tr>
 +
</table>
 +
 +
=== Part II: Convolution and pooling ===
=== Part II: Convolution and pooling ===

Revision as of 07:00, 8 May 2011

Personal tools