Pooling

From Ufldl

Jump to: navigation, search
Line 1: Line 1:
 +
== Pooling ==
 +
 +
After obtaining features using convolution, the next step is to use them in for classification. In theory, one could use all the extracted features with a classifier (e.g., softmax regression) but this can be computationally challenging. Consider for instance images of size 96x96 pixels and 400 features that are 8x8 each and convolved over the entire image; each features after (valid) convolution results in <math>(96-8+1)*(96-8+1)=7921</math> and since we have 400 features, this results in a feature vector of <math>(98^2) * 400 = 3,168,400</math> features per example. Learning a classifier with inputs having 3+ million features can be unwieldy and also prone to over-fitting.
 +
 +
 +
 +
== Invariances ==
 +
 +
== Pooling Methods ==
 +
 +
Average Pooling
 +
 +
Max Pooling
 +
 +
Now that you have obtained an array of convolved features, you might try using these features for classification. However, thinking about why we decided to obtain convolved features suggests a further step that could improve our classification performance. Recall that we decided to obtain convolved features because we thought that the features for the large image would simply be the features for smaller patches translated around the large image. This suggests to us that what we might really be interested in are the feature activations independent of some small translations. You can see why this might be so intuitively - if you were to take an MNIST digit and translate it left or right, you would want your classifier to still accurately classify it as the same digit regardless of its final position.  
Now that you have obtained an array of convolved features, you might try using these features for classification. However, thinking about why we decided to obtain convolved features suggests a further step that could improve our classification performance. Recall that we decided to obtain convolved features because we thought that the features for the large image would simply be the features for smaller patches translated around the large image. This suggests to us that what we might really be interested in are the feature activations independent of some small translations. You can see why this might be so intuitively - if you were to take an MNIST digit and translate it left or right, you would want your classifier to still accurately classify it as the same digit regardless of its final position.  

Revision as of 23:29, 21 May 2011

Personal tools