Exercise:Convolution and Pooling

Revision as of 01:15, 26 May 2011 (view source)

Jngiam (Talk | contribs)

m (→Step 2c: Pooling)

← Older edit

Revision as of 19:04, 27 May 2011 (view source)

Ang (Talk | contribs)

(→Convolution and Pooling)

Newer edit →

Line 1:

== Convolution and Pooling ==

-

In this ~~problem set,~~ you will use the features you learned on 8x8 patches sampled from images from the ~~STL10~~ dataset in [[Exercise:Learning color features with Sparse Autoencoders | the earlier exercise on linear decoders]] for classifying images from a reduced ~~STL10~~ dataset applying [[Feature extraction using convolution | convolution]] and [[Pooling | pooling]]. The reduced ~~STL10~~ dataset comprises 64x64 images from 4 classes (airplane, car, cat, dog).

+

In this exercise you will use the features you learned on 8x8 patches sampled from images from the STL-10 dataset in [[Exercise:Learning color features with Sparse Autoencoders | the earlier exercise on linear decoders]] for classifying images from a reduced STL-10 dataset applying [[Feature extraction using convolution | convolution]] and [[Pooling | pooling]]. The reduced STL-10 dataset comprises 64x64 images from 4 classes (airplane, car, cat, dog).

In the file <tt>[http://ufldl.stanford.edu/wiki/resources/cnn_exercise.zip cnn_exercise.zip]</tt> we have provided some starter code. You should write your code at the places indicated "YOUR CODE HERE" in the files.

Line 22:

=== Step 1: Load learned features ===

-

In this step, you will use the features from [[Exercise:Learning color features with Sparse Autoencoders]]. If you have completed that exercise, you can load the color features that ~~was~~ previously saved. To verify that the features are good, the visualized features should look like the following:

+

In this step, you will use the features from [[Exercise:Learning color features with Sparse Autoencoders]]. If you have completed that exercise, you can load the color features that were previously saved. To verify that the features are good, the visualized features should look like the following:

[[File:CNN_Features_Good.png|300px]]

Line 28:

=== Step 2: Implement and test convolution and pooling ===

-

In this step, you will implement convolution and pooling, and test them on a small part of the data set to ensure that you have implemented these two functions correctly. In the next step, you will actually convolve and pool the features with the ~~STL10~~ images.

+

In this step, you will implement convolution and pooling, and test them on a small part of the data set to ensure that you have implemented these two functions correctly. In the next step, you will actually convolve and pool the features with the STL-10 images.

==== Step 2a: Implement convolution ====

Line 34:

Implement convolution, as described in [[feature extraction using convolution]], in the function <tt>cnnConvolve</tt> in <tt>cnnConvolve.m</tt>. Implementing convolution is somewhat involved, so we will guide you through the process below.

-

First, we want to compute <math>\sigma(Wx_{(r,c)} + b)</math> for all ''valid'' <math>(r, c)</math> (''valid'' meaning that the entire 8x8 patch is contained within the image; as opposed to a ''full'' convolution which allows the patch to extend outside the image, with the area outside the image assumed to be 0) , where <math>W</math> and <math>b</math> are the learned weights and biases from the input layer to the hidden layer, and <math>x_{(r,c)}</math> is the 8x8 patch with the upper left corner at <math>(r, c)</math>. To accomplish this, one naive method is to loop over all such patches and compute <math>\sigma(Wx_{(r,c)} + b)</math> for each of them; while this is fine in theory, it can very slow. Hence, we usually use Matlab's built in convolution functions which are well optimized.

+

First, we want to compute <math>\sigma(Wx_{(r,c)} + b)</math> for all ''valid'' <math>(r, c)</math> (''valid'' meaning that the entire 8x8 patch is contained within the image; this is as opposed to a ''full'' convolution, which allows the patch to extend outside the image, with the area outside the image assumed to be 0), where <math>W</math> and <math>b</math> are the learned weights and biases from the input layer to the hidden layer, and <math>x_{(r,c)}</math> is the 8x8 patch with the upper left corner at <math>(r, c)</math>. To accomplish this, one naive method is to loop over all such patches and compute <math>\sigma(Wx_{(r,c)} + b)</math> for each of them; while this is fine in theory, it can very slow. Hence, we usually use Matlab's built in convolution functions, which are well optimized.

-

Observe that the convolution above can be broken down into the following three small steps. First, compute <math>Wx_{(r,c)}</math> for all <math>(r, c)</math>. Next, add b to all the computed values. Finally, apply the sigmoid function to the ~~resultant~~ values. This doesn't seem to buy you anything, since the first step still requires a loop. However, you can replace the loop in the first step with one of MATLAB's optimized convolution functions, <tt>conv2</tt>, speeding up the process significantly.

+

Observe that the convolution above can be broken down into the following three small steps. First, compute <math>Wx_{(r,c)}</math> for all <math>(r, c)</math>. Next, add b to all the computed values. Finally, apply the sigmoid function to the resulting values. This doesn't seem to buy you anything, since the first step still requires a loop. However, you can replace the loop in the first step with one of MATLAB's optimized convolution functions, <tt>conv2</tt>, speeding up the process significantly.

However, there are two important points to note in using <tt>conv2</tt>.

-

First, <tt>conv2</tt> performs a 2-D convolution, but you have 5 "dimensions" - image number, feature number, row of image, column of image, and channel of image - that you want to convolve over. Because of this, you will have to convolve each feature and image channel separately for each image, using the row and column of the image as the 2 dimensions you convolve over. This means that you will need three outer loops over the image number <tt>imageNum</tt>, feature number <tt>featureNum</tt>, and the channel number of the image <tt>channel</tt>~~, with~~ the 2-D convolution of the weight matrix for the <tt>featureNum</tt>-th feature and <tt>channel</tt>-th channel ~~with~~ the image matrix for the <tt>imageNum</tt>-th image ~~going inside~~.

+

First, <tt>conv2</tt> performs a 2-D convolution, but you have 5 "dimensions" - image number, feature number, row of image, column of image, and (color) channel of image - that you want to convolve over. Because of this, you will have to convolve each feature and image channel separately for each image, using the row and column of the image as the 2 dimensions you convolve over. This means that you will need three outer loops over the image number <tt>imageNum</tt>, feature number <tt>featureNum</tt>, and the channel number of the image <tt>channel</tt>. Inside the three nested for-loops, you will perform a <tt>conv2</tt> 2-D convolution, using the weight matrix for the <tt>featureNum</tt>-th feature and <tt>channel</tt>-th channel, and the image matrix for the <tt>imageNum</tt>-th image.

Second, because of the mathematical definition of convolution, the feature matrix must be "flipped" before passing it to <tt>conv2</tt>. The following implementation tip explains the "flipping" of feature matrices when using MATLAB's convolution functions:

Line 86:

</div>

-

To each of <tt>convolvedFeatures</tt>, you should then add <tt>b</tt>, the corresponding bias for the <tt>featureNum</tt>-th feature. If ~~you~~ had not done any preprocessing of the patches, you could ~~then~~ apply the sigmoid function to obtain the convolved features. However, because you preprocessed the patches before learning features on them, you must also apply the same preprocessing steps to the convolved patches to get the correct feature activations.

+

Next, to each of the <tt>convolvedFeatures</tt>, you should then add <tt>b</tt>, the corresponding bias for the <tt>featureNum</tt>-th feature.

+

However, there is one additional complication. If we had not done any preprocessing of the input patches, you could just follow the procedure as described above, and apply the sigmoid function to obtain the convolved features, and we'd be done. However, because you preprocessed the patches before learning features on them, you must also apply the same preprocessing steps to the convolved patches to get the correct feature activations.

In particular, you did the following to the patches:

Line 93:

Line 95:

<li> ZCA whiten using the whitening matrix <tt>ZCAWhite</tt>.

</ol>

-

These same three steps must also be applied to the ~~convolved~~ patches.

+

These same three steps must also be applied to the input image patches.

Taking the preprocessing steps into account, the feature activations that you should compute is <math>\sigma(W(T(x-\bar{x})) + b)</math>, where <math>T</math> is the whitening matrix and <math>\bar{x}</math> is the mean patch. Expanding this, you obtain <math>\sigma(WTx - WT\bar{x} + b)</math>, which suggests that you should convolve the images with <math>WT</math> rather than <math>W</math> as earlier, and you should add <math>(b - WT\bar{x})</math>, rather than just <math>b</math> to <tt>convolvedFeatures</tt>, before finally applying the sigmoid function.

Line 111:

Line 113:

=== Step 3: Convolve and pool with the dataset ===

-

In this step, you will convolve each of the features you learned with the full 64x64 images from the STL dataset to obtain the convolved features for both ~~train~~ and test sets. You will then pool the convolved features to obtain the pooled features for both ~~train~~ and test sets. The pooled features for the ~~train~~ set will be used ~~for classification~~, ~~and those for~~ the test set ~~will be used to test the trained classifier~~.

+

In this step, you will convolve each of the features you learned with the full 64x64 images from the STL-10 dataset to obtain the convolved features for both the training and test sets. You will then pool the convolved features to obtain the pooled features for both training and test sets. The pooled features for the training set will be used to train your classifier, which you can then test on the test set.

Because the convolved features matrix is very large, the code provided does the convolution and pooling 50 features at a time to avoid running out of memory.

Exercise:Convolution and Pooling

From Ufldl

Revision as of 19:04, 27 May 2011

Views

Personal tools

ufldl resources

wiki

Search

Toolbox

@@ Line 1: / Line 1: @@
 == Convolution and Pooling ==
-In this problem set, you will use the features you learned on 8x8 patches sampled from images from the STL10 dataset in [[Exercise:Learning color features with Sparse Autoencoders | the earlier exercise on linear decoders]] for classifying images from a reduced STL10 dataset applying [[Feature extraction using convolution | convolution]] and [[Pooling | pooling]]. The reduced STL10 dataset comprises 64x64 images from 4 classes (airplane, car, cat, dog).
+In this exercise you will use the features you learned on 8x8 patches sampled from images from the STL-10 dataset in [[Exercise:Learning color features with Sparse Autoencoders | the earlier exercise on linear decoders]] for classifying images from a reduced STL-10 dataset applying [[Feature extraction using convolution | convolution]] and [[Pooling | pooling]]. The reduced STL-10 dataset comprises 64x64 images from 4 classes (airplane, car, cat, dog).
 In the file <tt>[http://ufldl.stanford.edu/wiki/resources/cnn_exercise.zip cnn_exercise.zip]</tt> we have provided some starter code. You should write your code at the places indicated "YOUR CODE HERE" in the files.
@@ Line 22: / Line 22: @@
 === Step 1: Load learned features ===
-In this step, you will use the features from  [[Exercise:Learning color features with Sparse Autoencoders]]. If you have completed that exercise, you can load the color features that was previously saved. To verify that the features are good, the visualized features should look like the following:
+In this step, you will use the features from  [[Exercise:Learning color features with Sparse Autoencoders]]. If you have completed that exercise, you can load the color features that were previously saved. To verify that the features are good, the visualized features should look like the following:
 [[File:CNN_Features_Good.png|300px]]
@@ Line 28: / Line 28: @@
 === Step 2: Implement and test convolution and pooling ===
-In this step, you will implement convolution and pooling, and test them on a small part of the data set to ensure that you have implemented these two functions correctly. In the next step, you will actually convolve and pool the features with the STL10 images.
+In this step, you will implement convolution and pooling, and test them on a small part of the data set to ensure that you have implemented these two functions correctly. In the next step, you will actually convolve and pool the features with the STL-10 images.
 ==== Step 2a: Implement convolution ====
@@ Line 34: / Line 34: @@
 Implement convolution, as described in [[feature extraction using convolution]], in the function <tt>cnnConvolve</tt> in <tt>cnnConvolve.m</tt>. Implementing convolution is somewhat involved, so we will guide you through the process below.
-First, we want to compute <math>\sigma(Wx_{(r,c)} + b)</math> for all ''valid'' <math>(r, c)</math> (''valid'' meaning that the entire 8x8 patch is contained within the image; as opposed to a ''full'' convolution which allows the patch to extend outside the image, with the area outside the image assumed to be 0) , where <math>W</math> and <math>b</math> are the learned weights and biases from the input layer to the hidden layer, and <math>x_{(r,c)}</math> is the 8x8 patch with the upper left corner at <math>(r, c)</math>. To accomplish this, one naive method is to loop over all such patches and compute <math>\sigma(Wx_{(r,c)} + b)</math> for each of them; while this is fine in theory, it can very slow. Hence, we usually use Matlab's built in convolution functions which are well optimized.
+First, we want to compute <math>\sigma(Wx_{(r,c)} + b)</math> for all ''valid'' <math>(r, c)</math> (''valid'' meaning that the entire 8x8 patch is contained within the image; this is as opposed to a ''full'' convolution, which allows the patch to extend outside the image, with the area outside the image assumed to be 0), where <math>W</math> and <math>b</math> are the learned weights and biases from the input layer to the hidden layer, and <math>x_{(r,c)}</math> is the 8x8 patch with the upper left corner at <math>(r, c)</math>. To accomplish this, one naive method is to loop over all such patches and compute <math>\sigma(Wx_{(r,c)} + b)</math> for each of them; while this is fine in theory, it can very slow. Hence, we usually use Matlab's built in convolution functions, which are well optimized.
-Observe that the convolution above can be broken down into the following three small steps. First, compute <math>Wx_{(r,c)}</math> for all <math>(r, c)</math>. Next, add b to all the computed values. Finally, apply the sigmoid function to the resultant values. This doesn't seem to buy you anything, since the first step still requires a loop. However, you can replace the loop in the first step with one of MATLAB's optimized convolution functions, <tt>conv2</tt>, speeding up the process significantly.
+Observe that the convolution above can be broken down into the following three small steps. First, compute <math>Wx_{(r,c)}</math> for all <math>(r, c)</math>. Next, add b to all the computed values. Finally, apply the sigmoid function to the resulting values. This doesn't seem to buy you anything, since the first step still requires a loop. However, you can replace the loop in the first step with one of MATLAB's optimized convolution functions, <tt>conv2</tt>, speeding up the process significantly.
 However, there are two important points to note in using <tt>conv2</tt>.
-First, <tt>conv2</tt> performs a 2-D convolution, but you have 5 "dimensions" - image number, feature number, row of image, column of image, and channel of image - that you want to convolve over. Because of this, you will have to convolve each feature and image channel separately for each image, using the row and column of the image as the 2 dimensions you convolve over. This means that you will need three outer loops over the image number <tt>imageNum</tt>, feature number <tt>featureNum</tt>, and the channel number of the image <tt>channel</tt>, with the 2-D convolution of the weight matrix for the <tt>featureNum</tt>-th feature and <tt>channel</tt>-th channel with the image matrix for the <tt>imageNum</tt>-th image going inside.
+First, <tt>conv2</tt> performs a 2-D convolution, but you have 5 "dimensions" - image number, feature number, row of image, column of image, and (color) channel of image - that you want to convolve over.  Because of this, you will have to convolve each feature and image channel separately for each image, using the row and column of the image as the 2 dimensions you convolve over. This means that you will need three outer loops over the image number <tt>imageNum</tt>, feature number <tt>featureNum</tt>, and the channel number of the image <tt>channel</tt>.  Inside the three nested for-loops, you will perform a <tt>conv2</tt> 2-D convolution, using the weight matrix for the <tt>featureNum</tt>-th feature and <tt>channel</tt>-th channel, and the image matrix for the <tt>imageNum</tt>-th image.
 Second, because of the mathematical definition of convolution, the feature matrix must be "flipped" before passing it to <tt>conv2</tt>. The following implementation tip explains the "flipping" of feature matrices when using MATLAB's convolution functions:
@@ Line 86: / Line 86: @@
 </div>
-To each of <tt>convolvedFeatures</tt>, you should then add <tt>b</tt>, the corresponding bias for the <tt>featureNum</tt>-th feature. If you had not done any preprocessing of the patches, you could then apply the sigmoid function to obtain the convolved features. However, because you preprocessed the patches before learning features on them, you must also apply the same preprocessing steps to the convolved patches to get the correct feature activations.
+Next, to each of the <tt>convolvedFeatures</tt>, you should then add <tt>b</tt>, the corresponding bias for the <tt>featureNum</tt>-th feature.
+However, there is one additional complication.  If we had not done any preprocessing of the input patches, you could just follow the procedure as described above, and apply the sigmoid function to obtain the convolved features, and we'd be done. However, because you preprocessed the patches before learning features on them, you must also apply the same preprocessing steps to the convolved patches to get the correct feature activations.
 In particular, you did the following to the patches:
@@ Line 93: / Line 95: @@
 <li> ZCA whiten using the whitening matrix <tt>ZCAWhite</tt>.
 </ol>
-These same three steps must also be applied to the convolved patches.
+These same three steps must also be applied to the input image patches.
 Taking the preprocessing steps into account, the feature activations that you should compute is <math>\sigma(W(T(x-\bar{x})) + b)</math>, where <math>T</math> is the whitening matrix and <math>\bar{x}</math> is the mean patch. Expanding this, you obtain <math>\sigma(WTx - WT\bar{x} + b)</math>, which suggests that you should convolve the images with <math>WT</math> rather than <math>W</math> as earlier, and you should add <math>(b - WT\bar{x})</math>, rather than just <math>b</math> to <tt>convolvedFeatures</tt>, before finally applying the sigmoid function.
@@ Line 111: / Line 113: @@
 === Step 3: Convolve and pool with the dataset ===
-In this step, you will convolve each of the features you learned with the full 64x64 images from the STL dataset to obtain the convolved features for both train and test sets. You will then pool the convolved features to obtain the pooled features for both train and test sets. The pooled features for the train set will be used for classification, and those for the test set will be used to test the trained classifier.
+In this step, you will convolve each of the features you learned with the full 64x64 images from the STL-10 dataset to obtain the convolved features for both the training and test sets. You will then pool the convolved features to obtain the pooled features for both training and test sets.  The pooled features for the training set will be used to train your  classifier, which you can then test on the test set.
 Because the convolved features matrix is very large, the code provided does the convolution and pooling 50 features at a time to avoid running out of memory.