Exercise:Convolution and Pooling

Revision as of 06:47, 20 May 2011 (view source)

Cyfoo (Talk | contribs)

(→Convolution and Pooling)

← Older edit

Revision as of 07:18, 20 May 2011 (view source)

Cyfoo (Talk | contribs)

Newer edit →

Line 5:

In the file <tt>[http://ufldl.stanford.edu/wiki/resources/cnn_exercise.zip cnn_exercise.zip]</tt> we have provided some starter code. You should write your code at the places indicated "YOUR CODE HERE" in the files.

-

For this exercise, you will need to copy and modify '''<tt>sparseAutoencoderCost.m</tt>''' from your earlier exercise~~, as well as~~ '''<tt>cnnConvolve.m</tt>''' and '''<tt>cnnPool.m</tt>''' from this exercise.

+

For this exercise, you will need to copy and modify '''<tt>sparseAutoencoderCost.m</tt>''' from your earlier exercise. You will also need to modify '''<tt>cnnConvolve.m</tt>''' and '''<tt>cnnPool.m</tt>''' from this exercise.

=== Dependencies ===

Line 62:

Observe that the convolution above can be broken down into the following three small steps. First, compute <math>Wx_{(r,c)}</math> for all <math>(r, c)</math>. Next, add b to all the computed values. Finally, apply the sigmoid function to the resultant values. This doesn't seem to buy you anything, since the first step still requires a loop. However, you can replace the loop in the first step with one of MATLAB's optimized convolution functions, <tt>conv2</tt>, speeding up the process slightly.

-

~~To use~~ <tt>conv2</tt>, you will have to convolve the ~~features with~~ the ~~large images one feature at a time (so~~ you will ~~still~~ need ~~a loop~~ over the ~~features). For every feature, take the matrix~~ <~~math~~>~~W_f~~</~~math~~>, ~~the weights from the input layer to the fth unit in the hidden layer, and convolve it with the large image, using~~ <tt>~~C = conv2(images, W, 'valid')~~</tt>. <tt>~~C = conv2(images, W, 'valid')~~</tt> ~~performs a ''valid''~~ convolution ~~(as opposed to a ''full'' convolution, as described earlier)~~ of <tt>W</tt> ~~with~~ <tt>~~images~~</tt>~~, yielding a~~ matrix <tt>C</tt> ~~of convolved features~~.

+

There are a few complications in using <tt>conv2</tt>. First, <tt>conv2</tt> performs a 2-D convolution, but you have 5 "dimensions" - image number, feature number, row of image, column of image, and channel of image - that you want to convolve over. Because of this, you will have to convolve each image, feature and image channel separately for each image, using the row and column of the image as the 2 dimensions you convolve over. This means that you will need three outer loops over the image number <tt>imageNum</tt>, feature number <tt>featureNum</tt>, and the channel number of the image <tt>channel</tt>, with the 2-D convolution of the weight matrix for the <tt>featureNum</tt>-th feature and <tt>channel</tt>-th channel with the image matrix for the <tt>imageNum</tt>-th image going inside.

-

~~[[File~~:~~Convolution_schematic~~.~~gif|300px]]~~

+

More concretely, your code will look something like the following:

+

convolvedFeatures(featureNum, imageNum, r, c)

+

for imageNum = 1:numImages

+

for featureNum = 1:hiddenSize

+

% Obtain the feature matrix for this feature

+

Wt = W(featureNum, :);

+

Wt = reshape(Wt, patchDim, patchDim, 3);

+

% Get convolution of image with feature matrix for each channel

+

convolvedTemp = zeros(imageDim - patchDim + 1, imageDim - patchDim + 1, 3);

+

for channel = 1:3

+

% Flip the feature matrix because of the definition of convolution, as explained

+

% later

+

Wt(:, :, channel) = flipud(fliplr(squeeze(Wt(:, :, channel))));

+

convolvedTemp(:, :, channel) = conv2(squeeze(images(:, :, channel, imageNum)), squeeze(Wt(:, :, channel)), 'valid');

+

end

+

% The convolved feature is the sum of the convolved values for all channels

+

convolvedFeatures(featureNum, imageNum, :, :) = sum(convolvedTemp, 3);

+

end

+

end

+

One detail in the above code needs to be explained - observe that the we "flip" the feature matrix about its rows and columns before passing it into <tt>conv2</tt>. This is necessary because the mathematical definition of convolution involves "flipping" the matrix that is convolved with, as explained in more detail in the implementation tip section below.

Line 81:

Line 104:

</math>

-

If you use <tt>conv2(image, W)</tt>, MATLAB will first "flip" <math>W</math> before convolving <math>W</math> with <math>image</math>, as below:

+

If you use <tt>conv2(image, W)</tt>, MATLAB will first "flip" <math>W</math>, reversing its rows and columns, before convolving <math>W</math> with <math>image</math>, as below:

<math>

Line 99:

Line 122:

</math>

-

If the original layout of <math>W</math> was correct, after flipping, it would be incorrect. For the layout to be correct after flipping, you will have to flip <math>W</math> before passing it into <tt>conv2</tt>, so that after MATLAB flips <math>W</math> in <tt>conv2</tt>, the layout will be correct. This is also true for the general convolution function <tt>convn</tt>. In general, you can flip the matrix <math>W</math> using the following code snippet, which works for <math>W</math> of any dimension

+

If the original layout of <math>W</math> was correct, after flipping, it would be incorrect. For the layout to be correct after flipping, you will have to flip <math>W</math> before passing it into <tt>conv2</tt>, so that after MATLAB flips <math>W</math> in <tt>conv2</tt>, the layout will be correct. For <tt>conv2</tt>, this means reversing the rows and columns, which can be done with <tt>flipud</tt> and <tt>fliplr</tt>, as we did in the example code above. This is also true for the general convolution function <tt>convn</tt>, in which case MATLAB reverses every dimension. In general, you can flip the matrix <math>W</math> using the following code snippet, which works for <math>W</math> of any dimension

% Flip W for use in conv2 / convn

Line 108:

Line 131:

</div>

-

To <tt>C</tt>, you ~~can~~ then add <tt>b</tt>, the corresponding bias for the ~~fth~~ feature. If you had done no preprocessing of the patches, you could ~~now~~ apply the sigmoid function to ~~<tt>C</tt>~~ obtain the convolved features. However, because you preprocessed the patches before learning features on them, you must also apply the same preprocessing steps to the convolved patches to get the correct feature activations.

+

To each of <tt>convolvedFeatures</tt>, you should then add <tt>b</tt>, the corresponding bias for the <tt>featureNum</tt>-th feature. If you had done no preprocessing of the patches, you could then apply the sigmoid function to obtain the convolved features. However, because you preprocessed the patches before learning features on them, you must also apply the same preprocessing steps to the convolved patches to get the correct feature activations.

In particular, you did the following to the patches:

Line 134:

Line 157:

=== Step 5: Test classifier ===

-

Now that you have a trained softmax classifier, you can see how well it performs on the test set. This section contains code that will load the test set and obtain the pooled, convolved features for the images using the functions <tt>cnnConvolve</tt> and <tt>cnnPool</tt> which you wrote earlier, as well as the preprocessing matrices <tt>ZCAWhite</tt> and <tt>meanImage</tt> which were computed earlier in preprocessing the training images. These pooled features will then be run through the softmax classifier, and the accuracy of the predictions will be computed. Because object recognition is a difficult task, the accuracy will be relatively low - we obtained an accuracy of around XX%.

+

Now that you have a trained softmax classifier, you can see how well it performs on the test set. This section contains code that will load the test set (which is a smaller part of the STL10 dataset, specifically, 3200 rescaled 64x64 images from 4 different classes) and obtain the pooled, convolved features for the images using the functions <tt>cnnConvolve</tt> and <tt>cnnPool</tt> which you wrote earlier, as well as the preprocessing matrices <tt>ZCAWhite</tt> and <tt>meanImage</tt> which were computed earlier in preprocessing the training images. These pooled features will then be run through the softmax classifier, and the accuracy of the predictions will be computed. Because object recognition is a difficult task, the accuracy will be relatively low - we obtained an accuracy of around XX%.

Exercise:Convolution and Pooling

From Ufldl

Revision as of 07:18, 20 May 2011

Views

Personal tools

ufldl resources

wiki

Search

Toolbox

@@ Line 5: / Line 5: @@
 In the file <tt>[http://ufldl.stanford.edu/wiki/resources/cnn_exercise.zip cnn_exercise.zip]</tt> we have provided some starter code. You should write your code at the places indicated "YOUR CODE HERE" in the files.
-For this exercise, you will need to copy and modify '''<tt>sparseAutoencoderCost.m</tt>''' from your earlier exercise, as well as '''<tt>cnnConvolve.m</tt>''' and '''<tt>cnnPool.m</tt>''' from this exercise.
+For this exercise, you will need to copy and modify '''<tt>sparseAutoencoderCost.m</tt>''' from your earlier exercise. You will also need to modify '''<tt>cnnConvolve.m</tt>''' and '''<tt>cnnPool.m</tt>''' from this exercise.
 === Dependencies ===
@@ Line 62: / Line 62: @@
 Observe that the convolution above can be broken down into the following three small steps. First, compute <math>Wx_{(r,c)}</math> for all <math>(r, c)</math>. Next, add b to all the computed values. Finally, apply the sigmoid function to the resultant values. This doesn't seem to buy you anything, since the first step still requires a loop. However, you can replace the loop in the first step with one of MATLAB's optimized convolution functions, <tt>conv2</tt>, speeding up the process slightly.
-To use <tt>conv2</tt>, you will have to convolve the features with the large images one feature at a time (so you will still need a loop over the features). For every feature, take the matrix <math>W_f</math>, the weights from the input layer to the fth unit in the hidden layer, and convolve it with the large image, using <tt>C = conv2(images, W, 'valid')</tt>. <tt>C = conv2(images, W, 'valid')</tt> performs a ''valid'' convolution (as opposed to a ''full'' convolution, as described earlier) of <tt>W</tt> with <tt>images</tt>, yielding a matrix <tt>C</tt> of convolved features.
+There are a few complications in using <tt>conv2</tt>. First,  <tt>conv2</tt> performs a 2-D convolution, but you have 5 "dimensions" - image number, feature number, row of image, column of image, and channel of image - that you want to convolve over. Because of this, you will have to convolve each image, feature and image channel separately for each image, using the row and column of the image as the 2 dimensions you convolve over. This means that you will need three outer loops over the image number <tt>imageNum</tt>, feature number <tt>featureNum</tt>, and the channel number of the image <tt>channel</tt>, with the 2-D convolution of the weight matrix for the <tt>featureNum</tt>-th feature and <tt>channel</tt>-th channel with the image matrix for the <tt>imageNum</tt>-th image going inside.
-[[File:Convolution_schematic.gif|300px]]
+More concretely, your code will look something like the following:
+ convolvedFeatures(featureNum, imageNum, r, c)
+ for imageNum = 1:numImages
+	for featureNum = 1:hiddenSize
+		% Obtain the feature matrix for this feature
+		Wt = W(featureNum, :);
+		Wt = reshape(Wt, patchDim, patchDim, 3);
+		% Get convolution of image with feature matrix for each channel
+		convolvedTemp = zeros(imageDim - patchDim + 1, imageDim - patchDim + 1, 3);
+		for channel = 1:3
+			% Flip the feature matrix because of the definition of convolution, as explained
+			% later
+			Wt(:, :, channel) = flipud(fliplr(squeeze(Wt(:, :, channel))));
+			convolvedTemp(:, :, channel) = conv2(squeeze(images(:, :, channel, imageNum)), squeeze(Wt(:, :, channel)), 'valid');
+		end
+		% The convolved feature is the sum of the convolved values for all channels
+		convolvedFeatures(featureNum, imageNum, :, :) = sum(convolvedTemp, 3);
+	end
+ end
+One detail in the above code needs to be explained - observe that the we "flip" the feature matrix about its rows and columns before passing it into <tt>conv2</tt>. This is necessary because the mathematical definition of convolution involves "flipping" the matrix that is convolved with, as explained in more detail in the implementation tip section below.
 <div style="border:1px solid black; padding: 5px">
@@ Line 81: / Line 104: @@
 </math>
-If you use <tt>conv2(image, W)</tt>, MATLAB will first "flip" <math>W</math> before convolving <math>W</math> with <math>image</math>, as below:
+If you use <tt>conv2(image, W)</tt>, MATLAB will first "flip" <math>W</math>, reversing its rows and columns, before convolving <math>W</math> with <math>image</math>, as below:
 <math>
@@ Line 99: / Line 122: @@
 </math>
-If the original layout of <math>W</math> was correct, after flipping, it would be incorrect. For the layout to be correct after flipping, you will have to flip <math>W</math> before passing it into <tt>conv2</tt>, so that after MATLAB flips <math>W</math> in <tt>conv2</tt>, the layout will be correct. This is also true for the general convolution function <tt>convn</tt>. In general, you can flip the matrix <math>W</math> using the following code snippet, which works for <math>W</math> of any dimension
+If the original layout of <math>W</math> was correct, after flipping, it would be incorrect. For the layout to be correct after flipping, you will have to flip <math>W</math> before passing it into <tt>conv2</tt>, so that after MATLAB flips <math>W</math> in <tt>conv2</tt>, the layout will be correct. For <tt>conv2</tt>, this means reversing the rows and columns, which can be done with <tt>flipud</tt> and <tt>fliplr</tt>, as we did in the example code above. This is also true for the general convolution function <tt>convn</tt>, in which case MATLAB reverses every dimension. In general, you can flip the matrix <math>W</math> using the following code snippet, which works for <math>W</math> of any dimension
   % Flip W for use in conv2 / convn
@@ Line 108: / Line 131: @@
 </div>
-To <tt>C</tt>, you can then add <tt>b</tt>, the corresponding bias for the fth feature. If you had done no preprocessing of the patches, you could now apply the sigmoid function to <tt>C</tt> obtain the convolved features. However, because you preprocessed the patches before learning features on them, you must also apply the same preprocessing steps to the convolved patches to get the correct feature activations.
+To each of <tt>convolvedFeatures</tt>, you should then add <tt>b</tt>, the corresponding bias for the <tt>featureNum</tt>-th feature. If you had done no preprocessing of the patches, you could then apply the sigmoid function to obtain the convolved features. However, because you preprocessed the patches before learning features on them, you must also apply the same preprocessing steps to the convolved patches to get the correct feature activations.
 In particular, you did the following to the patches:
@@ Line 134: / Line 157: @@
 === Step 5: Test classifier ===
-Now that you have a trained softmax classifier, you can see how well it performs on the test set. This section contains code that will load the test set and obtain the pooled, convolved features for the images using the functions <tt>cnnConvolve</tt> and <tt>cnnPool</tt> which you wrote earlier, as well as the preprocessing matrices <tt>ZCAWhite</tt> and <tt>meanImage</tt> which were computed earlier in preprocessing the training images. These pooled features will then be run through the softmax classifier, and the accuracy of the predictions will be computed. Because object recognition is a difficult task, the accuracy will be relatively low - we obtained an accuracy of around XX%.
+Now that you have a trained softmax classifier, you can see how well it performs on the test set. This section contains code that will load the test set (which is a smaller part of the STL10 dataset, specifically, 3200 rescaled 64x64 images from 4 different classes) and obtain the pooled, convolved features for the images using the functions <tt>cnnConvolve</tt> and <tt>cnnPool</tt> which you wrote earlier, as well as the preprocessing matrices <tt>ZCAWhite</tt> and <tt>meanImage</tt> which were computed earlier in preprocessing the training images. These pooled features will then be run through the softmax classifier, and the accuracy of the predictions will be computed. Because object recognition is a difficult task, the accuracy will be relatively low - we obtained an accuracy of around XX%.