Exercise:Convolution and Pooling

From Ufldl

Jump to: navigation, search
Line 1: Line 1:
== Convolution and Pooling ==
== Convolution and Pooling ==
-
This problem set is divided into two parts. In the first part, you will implement a [[Linear Decoders | linear  
+
This problem set is divided into two parts. In the first part, you will implement a [[Linear Decoders | linear decoder]] to learn features on color images from the STL10 dataset. In the second part, you will use these learned features in convolution and pooling for classifying STL10 images.
-
decoder]] to learn features on color images from the STL10 dataset. In the second part, you will use these learned
+
In the file <tt>cnnExercise.zip</tt> we have provided some starter code. You should write your code at the places indicated "YOUR CODE HERE" in the files.
-
features in convolution and pooling for classifying STL10 images.
+
For this exercise, you will need to modify '''<tt>sparseAutoencoderCost.m</tt>''' from your earlier exercise, as well as '''<tt>cnnConvolve.m</tt>''' and '''<tt>cnnPool.m</tt>''' from this exercise.
-
 
+
-
In the file <tt>cnnExercise.zip</tt> we have provided some starter code. You should write your code at the places
+
-
 
+
-
indicated "YOUR CODE HERE" in the files.
+
-
 
+
-
For this exercise, you will need to modify '''<tt>sparseAutoencoderCost.m</tt>''' from your earlier exercise, as well  
+
-
 
+
-
as '''<tt>cnnConvolve.m</tt>''' and '''<tt>cnnPool.m</tt>''' from this exercise.
+
=== Dependencies ===
=== Dependencies ===
Line 28: Line 20:
=== Part I: Linear decoder on color images ===
=== Part I: Linear decoder on color images ===
-
In all the exercise so far, you have been working only with grayscale images. In this exercise, you will get the  
+
In all the exercise so far, you have been working only with grayscale images. In this exercise, you will get the opportunity to work with RGB color images for the first time.
-
opportunity to work with RGB color images for the first time.
+
Conveniently, the fact that an image has three color channels (RGB), rather than a single gray channel, presents little difficulty for the sparse autoencoder. You can just combine the intensities from all the color channels for the pixels into one long vector, as if you were working with a grayscale image with 3x the number of pixels as the original image.  
-
 
+
-
Conveniently, the fact that an image has three color channels (RGB), rather than a single gray channel, presents little  
+
-
 
+
-
difficulty for the sparse autoencoder. You can just combine the intensities from all the color channels for the pixels  
+
-
 
+
-
into one long vector, as if you were working with a grayscale image with 3x the number of pixels as the original image.  
+
=== Step 0: Initialization ===
=== Step 0: Initialization ===
Line 44: Line 30:
=== Step 1: Modify sparseAutoencoderCost.m to use a linear decoder ===
=== Step 1: Modify sparseAutoencoderCost.m to use a linear decoder ===
-
You should modify <tt>sparseAutoencoderCost</tt> in <tt>sparseAutoencoderCost.m</tt> from your earlier exercise to use  
+
You should modify <tt>sparseAutoencoderCost</tt> in <tt>sparseAutoencoderCost.m</tt> from your earlier exercise to use a [[Linear Decoders | linear decoder]]. In particular, you should change the cost and gradients returned to reflect the change from a sigmoid to a linear decoder. After making this change, check your gradient to ensure that they are correct.
-
 
+
-
a [[Linear Decoders | linear decoder]]. In particular, you should change the cost and gradients returned to reflect the  
+
-
 
+
-
change from a sigmoid to a linear decoder. After making this change, check your gradient to ensure that they are  
+
-
 
+
-
correct.
+
=== Step 2: Learn features on small patches ===
=== Step 2: Learn features on small patches ===
-
You will now use your sparse autoencoder to learn features on small 8x8 patches sampled from the larger 96x96 STL10  
+
You will now use your sparse autoencoder to learn features on small 8x8 patches sampled from the larger 96x96 STL10 images (The STL10 dataset comprises 5000 test and 8000 train 96x96 labelled color images belonging to one of ten classes: airplane, bird, car, cat, deer, dog, horse, monkey, ship, truck).
-
images (The STL10 dataset comprises 5000 test and 8000 train 96x96 labelled color images belonging to one of ten
+
Code has been provided to load the dataset and sample patches from the images. However, because the dataset is relatively large (about 150 megabytes on disk, and close to 1 gigabyte loaded), we recommend that you load the dataset and sample patches from it, then save the patches so that you will not need to load the entire dataset into memory repeatedly. Furthermore, since you will need to apply the exact same preprocessing steps to the convolved images as you do to the patches used for training the autoencoder (you have to subtract the same mean image and use the exact same whitening matrix), storing the original set of patches means that you can recompute these matrices if necessary. Code to save and load the sampled patches has already been provided, so no additional changes are required on your part.
-
classes: airplane, bird, car, cat, deer, dog, horse, monkey, ship, truck).  
+
We have also provided a wrapper function, <tt>sparseAutoencoderTrain</tt>, analogous to <tt>softmaxTrain</tt>, which trains a sparse autoencoder on the given parameters and data. This function wraps around the function <tt>sparseAutoencoderCost</tt> that you modified in this exercise, providing a convenient way to train a sparse autoencoder using a single function, which may be useful in future exercises.  
-
Code has been provided to load the dataset and sample patches from the images. However, because the dataset is
+
In this step, you will use <tt>sparseAutoencoderTrain</tt> to train a sparse autoencoder on the sampled patches. The code provided trains your sparse autoencoder for 800 iterations with the default parameters initialized in step 0. This should take less than 15 minutes. Your sparse autoencoder should learn features which when visualized, look like edges and opponent colors, as in the figure below.  
-
 
+
-
relatively large (about 150 megabytes on disk, and close to 1 gigabyte loaded), we recommend that you load the dataset
+
-
 
+
-
and sample patches from it, then save the patches so that you will not need to load the entire dataset into memory
+
-
 
+
-
repeatedly. Furthermore, since you will need to apply the exact same preprocessing steps to the convolved images as you
+
-
 
+
-
do to the patches used for training the autoencoder (you have to subtract the same mean image and use the exact same
+
-
 
+
-
whitening matrix), storing the original set of patches means that you can recompute these matrices if necessary. Code
+
-
 
+
-
to save and load the sampled patches has already been provided, so no additional changes are required on your part.
+
-
 
+
-
We have also provided a wrapper function, <tt>sparseAutoencoderTrain</tt>, analogous to <tt>softmaxTrain</tt>, which
+
-
 
+
-
trains a sparse autoencoder on the given parameters and data. This function wraps around the function
+
-
 
+
-
<tt>sparseAutoencoderCost</tt> that you modified in this exercise, providing a convenient way to train a sparse
+
-
 
+
-
autoencoder using a single function, which may be useful in future exercises.
+
-
 
+
-
In this step, you will use <tt>sparseAutoencoderTrain</tt> to train a sparse autoencoder on the sampled patches. The  
+
-
 
+
-
code provided trains your sparse autoencoder for 800 iterations with the default parameters initialized in step 0. This  
+
-
 
+
-
should take less than 15 minutes. Your sparse autoencoder should learn features which when visualized, look like edges  
+
-
 
+
-
and opponent colors, as in the figure below.  
+
[[File:cnn_Features_Good.png|480px]]
[[File:cnn_Features_Good.png|480px]]
-
If your parameters are improperly tuned (the default parameters should work), or if your implementation of the  
+
If your parameters are improperly tuned (the default parameters should work), or if your implementation of the autoencoder is buggy, you might get one of the following images instead:
-
 
+
-
autoencoder is buggy, you might get one of the following images instead:
+
<table>
<table>
Line 104: Line 54:
=== Step 3: Convolution and pooling ===
=== Step 3: Convolution and pooling ===
-
Now that you have learned features for small patches, you will convolved these learned features with the large images,  
+
Now that you have learned features for small patches, you will convolved these learned features with the large images, and pool these convolved features for use in a classifier later.
-
 
+
-
and pool these convolved features for use in a classifier later.
+
==== Step 3a: Convolution ====
==== Step 3a: Convolution ====
-
Implement convolution, as described in [[feature extraction using convolution]], in the function <tt>cnnConvolve</tt>  
+
Implement convolution, as described in [[feature extraction using convolution]], in the function <tt>cnnConvolve</tt> in <tt>cnnConvolve.m</tt>. Implementing convolution is somewhat involved, so we will guide you through the process below.
-
 
+
-
in <tt>cnnConvolve.m</tt>. Implementing convolution is somewhat involved, so we will guide you through the process  
+
-
 
+
-
below.
+
-
 
+
-
First of all, what we want to compute is <math>\sigma(Wx_{(r,c)} + b)</math> for all valid <math>(r, c)</math>, where
+
-
 
+
-
<math>W</math> and <math>b</math> are the learned weights and biases from the input layer to the hidden layer, and
+
-
 
+
-
<math>x_{(r,c)}</math> is the 8x8 patch with the upper left corner at <math>(r, c)</math>. To accomplish this, what we
+
-
 
+
-
could do is loop over all such patches and compute <math>\sigma(Wx_{(r,c)} + b)</math> for each of them. However, this
+
-
 
+
-
is not very efficient.
+
-
 
+
-
Observe that what we are doing above can be broken down into three small steps. First, we need to compute <math>Wx_
+
-
 
+
-
{(r,c)}</math> for all <math>(r, c)</math>. Next, we can add b to all the computed values. Finally, we apply the
+
-
 
+
-
sigmoid function to the resultant values. The first substep still requires a loop, but if instead of using a loop, we
+
-
 
+
-
use MATLAB's convolution functions which are optimized for such computations, we can speed up the process slightly.
+
-
To use these convolution functions, we will have to convolve the features with the large images one at a time. That is,  
+
First of all, what we want to compute is <math>\sigma(Wx_{(r,c)} + b)</math> for all valid <math>(r, c)</math>, where <math>W</math> and <math>b</math> are the learned weights and biases from the input layer to the hidden layer, and <math>x_{(r,c)}</math> is the 8x8 patch with the upper left corner at <math>(r, c)</math>. To accomplish this, what we could do is loop over all such patches and compute <math>\sigma(Wx_{(r,c)} + b)</math> for each of them. However, this is not very efficient.
-
for every feature, we take the matrix <math>W_f</math>, the weights from the input layer to the fth unit in the hidden
+
Observe that what we are doing above can be broken down into three small steps. First, we need to compute <math>Wx_{(r,c)}</math> for all <math>(r, c)</math>. Next, we can add b to all the computed values. Finally, we apply the sigmoid function to the resultant values. The first substep still requires a loop, but if instead of using a loop, we use MATLAB's convolution functions which are optimized for such computations, we can speed up the process slightly.
-
layer, and convolve it with the large image.  
+
To use these convolution functions, we will have to convolve the features with the large images one at a time. That is, for every feature, we take the matrix <math>W_f</math>, the weights from the input layer to the fth unit in the hidden layer, and convolve it with the large image.  
''Implementation tip:'' Using <tt>conv2</tt> and <tt>convn</tt>:
''Implementation tip:'' Using <tt>conv2</tt> and <tt>convn</tt>:
-
Because the mathematical definition of convolution involves "flipping" the matrix to convolve with, to use MATLAB's  
+
Because the mathematical definition of convolution involves "flipping" the matrix to convolve with, to use MATLAB's convolution functions, you must first "flip" the weight matrix so that when MATLAB "flips" it according to the mathematical definition the entries will be at the correct place. For example, suppose you wanted to convolve two matrices <math>X</math> (the large image) and <math>W</math> (the feature) using <tt>conv2(X, W)</tt>, and W is a 3x3 matrix as below:
-
 
+
-
convolution functions, you must first "flip" the weight matrix so that when MATLAB "flips" it according to the  
+
-
 
+
-
mathematical definition the entries will be at the correct place. For example, suppose you wanted to convolve two  
+
-
 
+
-
matrices <math>X</math> (the large image) and <math>W</math> (the feature) using <tt>conv2(X, W)</tt>, and W is a 3x3  
+
-
 
+
-
matrix as below:
+
<math>
<math>
Line 161: Line 79:
</math>
</math>
-
If you use <tt>conv2(X, W)</tt>, MATLAB will first "flip" <math>W</math> before convolving <math>W</math> with  
+
If you use <tt>conv2(X, W)</tt>, MATLAB will first "flip" <math>W</math> before convolving <math>W</math> with <math>X</math>, as below:
-
 
+
-
<math>X</math>, as below:
+
<math>
<math>
Line 181: Line 97:
</math>
</math>
-
If the original layout of <math>W</math> was correct, after flipping, it would be incorrect. For the layout to be  
+
If the original layout of <math>W</math> was correct, after flipping, it would be incorrect. For the layout to be correct after flipping, you will have to flip <math>W</math> before passing it into <tt>conv2</tt>, so that after MATLAB flips <math>W</math> in <tt>conv2</tt>, the layout will be correct. This is also true for the general convolution function <tt>convn</tt>. In general, you can flip the matrix <math>W</math> using the following code snippet, which works for <math>W</math> of any dimension
-
 
+
-
correct after flipping, you will have to flip <math>W</math> before passing it into <tt>conv2</tt>, so that after  
+
-
 
+
-
MATLAB flips <math>W</math> in <tt>conv2</tt>, the layout will be correct. This is also true for the general  
+
-
 
+
-
convolution function <tt>convn</tt>. In general, you can flip the matrix <math>W</math> using the following code  
+
-
 
+
-
snippet, which works for <math>W</math> of any dimension
+
  % Flip W for use in conv2 / convn
  % Flip W for use in conv2 / convn

Revision as of 07:45, 8 May 2011

Personal tools