Feature extraction using convolution

From Ufldl

Jump to: navigation, search
Line 10: Line 10:
Formally, given some large <math>r \times c</math> images <math>x_{large}</math>, we first train a sparse autoencoder on small <math>a \times b</math> patches <math>x_{small}</math> sampled from these images, learning <math>k</math> features <math>f = \sigma(W^{(1)}x_{small} + b^{(1)})</math> (where \sigma is the sigmoid function), given by the weights <math>W^{(1)}M</math> and biases <math>b^{(1)}</math> from the visible units to the hidden units. For every <math>a \times b</math> patch <math>x_s</math> in the large image, we compute <math>f_s = \sigma(W^{(1)}x_s + b^{(1)})</math>, giving us <math>f_{convolved}</math>, a <math>k \times (r - a + 1) \times (c - b + 1)</math> array of convolved features. These convolved features can then be [[#pooling | pooled]] for classification, as described below.
Formally, given some large <math>r \times c</math> images <math>x_{large}</math>, we first train a sparse autoencoder on small <math>a \times b</math> patches <math>x_{small}</math> sampled from these images, learning <math>k</math> features <math>f = \sigma(W^{(1)}x_{small} + b^{(1)})</math> (where \sigma is the sigmoid function), given by the weights <math>W^{(1)}M</math> and biases <math>b^{(1)}</math> from the visible units to the hidden units. For every <math>a \times b</math> patch <math>x_s</math> in the large image, we compute <math>f_s = \sigma(W^{(1)}x_s + b^{(1)})</math>, giving us <math>f_{convolved}</math>, a <math>k \times (r - a + 1) \times (c - b + 1)</math> array of convolved features. These convolved features can then be [[#pooling | pooled]] for classification, as described below.
-
 
-
=== Pooling ===
 
-
 
-
Now that you have obtained an array of convolved features, you might try using these features for classification. However, thinking about why we decided to obtain convolved features suggests a further step that could improve our classification performance. Recall that we decided to obtain convolved features because we thought that the features for the large image would simply be the features for smaller patches translated around the large image. This suggests to us that what we might really be interested in are the feature activations independent of some small translations. You can see why this might be so intuitively - if you were to take an MNIST digit and translate it left or right, you would want your classifier to still accurately classify it as the same digit regardless of its final position.
 
-
 
-
Hence, what we are really interested in is the '''translation-invariant''' feature activation - we want to know whether there is an edge, regardless of whether it is at <math>(1, 1), (3, 3)</math> or <math>(5, 5)</math>, though perhaps if it is at <math>(50, 50)</math> we might want to treat it as a separate edge. This suggests that what we should do is to take the maximum (or perhaps mean) activation of the convolved features around a certain small region, hence making our resultant pooled features less sensitive to small translations.
 
-
 
-
[[File:Pooling_schematic.gif]]
 
-
 
-
Formally, after obtaining our convolved features as earlier, we decide the size of the region, say <math>m \times n</math> to pool our convolved features over. Then, we divide our convolved features into disjoint <math>m \times n</math> regions, and take the maximum (or mean) feature activation over these regions to obtain the pooled convolved features. These pooled features can then be used for classification.
 

Revision as of 07:00, 14 May 2011

Personal tools