Exercise:Convolution and Pooling

From Ufldl

Jump to: navigation, search
(Convolution and Pooling)
(Step 3a: Convolution)
Line 60: Line 60:
Implement convolution, as described in [[feature extraction using convolution]], in the function <tt>cnnConvolve</tt> in <tt>cnnConvolve.m</tt>. Implementing convolution is somewhat involved, so we will guide you through the process below.
Implement convolution, as described in [[feature extraction using convolution]], in the function <tt>cnnConvolve</tt> in <tt>cnnConvolve.m</tt>. Implementing convolution is somewhat involved, so we will guide you through the process below.
-
First of all, what we want to compute is <math>\sigma(Wx_{(r,c)} + b)</math> for all valid <math>(r, c)</math>, where <math>W</math> and <math>b</math> are the learned weights and biases from the input layer to the hidden layer, and <math>x_{(r,c)}</math> is the 8x8 patch with the upper left corner at <math>(r, c)</math>. To accomplish this, what we could do is loop over all such patches and compute <math>\sigma(Wx_{(r,c)} + b)</math> for each of them. However, this is not very efficient.  
+
First of all, what we want to compute is <math>\sigma(Wx_{(r,c)} + b)</math> for all ''valid'' <math>(r, c)</math> (''valid'' meaning that the entire 8x8 patch is contained within the image; as opposed to a ''full'' convolution which allows the patch to extend outside the image, with the area outside the image assumed to be 0) , where <math>W</math> and <math>b</math> are the learned weights and biases from the input layer to the hidden layer, and <math>x_{(r,c)}</math> is the 8x8 patch with the upper left corner at <math>(r, c)</math>. To accomplish this, what we could do is loop over all such patches and compute <math>\sigma(Wx_{(r,c)} + b)</math> for each of them. In theory, this is correct. However, in practice, the convolution is usually done in three small steps to take advantage of MATLAB's optimized convolution functions.
-
Observe that what we are doing above can be broken down into three small steps. First, we need to compute <math>Wx_{(r,c)}</math> for all <math>(r, c)</math>. Next, we can add b to all the computed values. Finally, we apply the sigmoid function to the resultant values. The first substep still requires a loop, but if instead of using a loop, we use MATLAB's convolution functions which are optimized for such computations, we can speed up the process slightly.
+
Observe that the convolution above can be broken down into the following three small steps. First, compute <math>Wx_{(r,c)}</math> for all <math>(r, c)</math>. Next, add b to all the computed values. Finally, apply the sigmoid function to the resultant values. This doesn't seem to buy you anything, since the first step still requires a loop. However, you can replace the loop in the first step with one of MATLAB's optimized convolution functions, <tt>conv2</tt>, speeding up the process slightly.
-
To use these convolution functions, we will have to convolve the features with the large images one at a time. That is, for every feature, we take the matrix <math>W_f</math>, the weights from the input layer to the fth unit in the hidden layer, and convolve it with the large image.  
+
To use <tt>conv2</tt>, you will have to convolve the features with the large images one feature at a time (so you will still need a loop over the features). For every feature, take the matrix <math>W_f</math>, the weights from the input layer to the fth unit in the hidden layer, and convolve it with the large image, using <tt>C = conv2(images, W, 'valid')</tt>. <tt>C = conv2(images, W, 'valid')</tt> performs a ''valid'' convolution (as opposed to a ''full'' convolution, as described earlier) of <tt>W</tt> with <tt>images</tt>, yielding a matrix <tt>C</tt> of convolved features.
 +
 
 +
[[File:Convolution_schematic.png|300px]]
 +
 
 +
<div style="border:1px solid black">
'''Implementation tip:''' Using <tt>conv2</tt> and <tt>convn</tt>
'''Implementation tip:''' Using <tt>conv2</tt> and <tt>convn</tt>
-
Because the mathematical definition of convolution involves "flipping" the matrix to convolve with, to use MATLAB's convolution functions, you must first "flip" the weight matrix so that when MATLAB "flips" it according to the mathematical definition the entries will be at the correct place. For example, suppose you wanted to convolve two matrices <math>X</math> (the large image) and <math>W</math> (the feature) using <tt>conv2(X, W)</tt>, and W is a 3x3 matrix as below:
+
Because the mathematical definition of convolution involves "flipping" the matrix to convolve with, to use MATLAB's convolution functions, you must first "flip" the weight matrix so that when MATLAB "flips" it according to the mathematical definition the entries will be at the correct place. For example, suppose you wanted to convolve two matrices <math>image</math> (a large image) and <math>W</math> (the feature) using <tt>conv2(image, W)</tt>, and W is a 3x3 matrix as below:
<math>
<math>
Line 79: Line 83:
</math>
</math>
-
If you use <tt>conv2(X, W)</tt>, MATLAB will first "flip" <math>W</math> before convolving <math>W</math> with <math>X</math>, as below:
+
If you use <tt>conv2(image, W)</tt>, MATLAB will first "flip" <math>W</math> before convolving <math>W</math> with <math>image</math>, as below:
<math>
<math>
Line 103: Line 107:
  temp = flipud(temp);
  temp = flipud(temp);
  temp = reshape(temp, size(W));
  temp = reshape(temp, size(W));
 +
 +
</div>
 +
 +
To <tt>C</tt>, you can then add <tt>b</tt>, the corresponding bias for the fth feature. If you had done no preprocessing of the patches, you could now apply the sigmoid function to <tt>C</tt> obtain the convolved features. However, because you preprocessed the patches before learning features on them, you must also apply the same preprocessing steps to the convolved patches to get the correct feature activations.
 +
 +
In particular, you did the following to the patches: (1) divide by 255 to normalize them into the range <math>[0, 1]</math> (2) subtract the mean patch, <tt>meanPatch</tt> to zero the mean of the patches (3) ZCA whiten using the whitening matrix <tt>ZCAWhite</tt>. These same three steps must also be applied to the convolved patches.
 +
 +
Taking the preprocessing steps into account, the feature activations that you should compute is <math>\sigma(W(T(x-\bar{x})) + b)</math>, where <math>T</math> is the whitening matrix and <math>\bar(x)</math> is the mean patch. Expanding this, you obtain <math>\sigma(WTx - WT\bar{x} + b)</math>, which suggests that you should convolve the images with <tt>WT</tt> rather than <tt>W</tt> as earlier, and you should add <tt>(b - WT\bar{x})</tt>, rather than just <tt>b</tt> to the resulting matrix <tt>C</tt>, before finally applying the sigmoid function.
==== Step 3b: Checking ====
==== Step 3b: Checking ====

Revision as of 07:42, 14 May 2011

Personal tools