Pooling

Revision as of 23:01, 25 May 2011 (view source)

171.64.68.245 (Talk)

(→Pooling: Overview)

← Older edit

Revision as of 18:29, 27 May 2011 (view source)

Ang (Talk | contribs)

(→Pooling: Overview)

Newer edit →

Line 1:

== Pooling: Overview ==

-

After obtaining features using convolution, ~~the~~ next ~~step is~~ to use them for classification. In theory, one could use all the extracted features with a classifier ~~(e.g.~~, ~~softmax regression)~~ but this can be computationally challenging. Consider for instance images of size 96x96 pixels and 400 features ~~that are 8x8 each and convolved~~ over ~~the entire image; each features after (valid)~~ convolution results in <math>(96-8+1)*(96-8+1)=7921</math> and since we have 400 features, this results in a ~~feature~~ vector of <math>89^2 * 400 = 3,168,400</math> features per example. Learning a classifier with inputs having 3+ million features can be unwieldy and also prone to over-fitting.

+

After obtaining features using convolution, we would next like to use them for classification. In theory, one could use all the extracted features with a classifier such as a softmax classifier, but this can be computationally challenging. Consider for instance images of size 96x96 pixels, and suppose we have learned 400 features over 8x8 inputs. Each convolution results in an output of size <math>(96-8+1)*(96-8+1)=7921</math>, and since we have 400 features, this results in a vector of <math>89^2 * 400 = 3,168,400</math> features per example. Learning a classifier with inputs having 3+ million features can be unwieldy, and can also be prone to over-fitting.

-

~~However~~, ~~thinking about why we decided to obtain convolved features suggests a further step that could improve our feature extraction pipeline. Recall~~ that we decided to obtain convolved features because images have the property that features that are useful in one region ~~will~~ be useful for other regions ~~(stationary)~~.

+

To address this, first recall that we decided to obtain convolved features because images have the "stationarity" property, which implies that features that are useful in one region are also likely to be useful for other regions. Thus, to describe a large image, one natural approach is to aggregate statistics of these features at various locations. For example, one could compute the mean (or max) value of a particular feature over a region of the image. These summary statistics are much lower in dimension (compared to using all of the extracted features) and can also improve results (less over-fitting). We aggregation operation is called this operation '''pooling''', or sometimes '''mean pooling''' or '''max pooling''' (depending on the pooling operation applied).

-

+

-

~~Then~~, to describe a large image, one natural approach is to aggregate statistics of these features at various locations~~: ''pooling'' over regions of the image~~. For example, one could compute the mean (or max) value of a particular feature over a region of the image. These summary statistics are much lower in dimension (compared to using all extracted features) and can also improve results (less over-fitting).

+

The following image shows how pooling is done over 4 non-overlapping regions of the image.

From Ufldl

Revision as of 18:29, 27 May 2011

Views

Personal tools

ufldl resources

wiki

Search

Toolbox

@@ Line 1: / Line 1: @@
 == Pooling: Overview ==
-After obtaining features using convolution, the next step is to use them for classification. In theory, one could use all the extracted features with a classifier (e.g., softmax regression) but this can be computationally challenging. Consider for instance images of size 96x96 pixels and 400 features that are 8x8 each and convolved over the entire image; each features after (valid) convolution results in <math>(96-8+1)*(96-8+1)=7921</math> and since we have 400 features, this results in a feature vector of <math>89^2 * 400 = 3,168,400</math> features per example. Learning a classifier with inputs having 3+ million features can be unwieldy and also prone to over-fitting.
+After obtaining features using convolution, we would next like to use them for classification. In theory, one could use all the extracted features with a classifier such as a softmax classifier, but this can be computationally challenging. Consider for instance images of size 96x96 pixels, and suppose we have learned 400 features over 8x8 inputs.  Each convolution results in an output of size <math>(96-8+1)*(96-8+1)=7921</math>, and since we have 400 features, this results in a vector of <math>89^2 * 400 = 3,168,400</math> features per example. Learning a classifier with inputs having 3+ million features can be unwieldy, and can also be prone to over-fitting.
-However, thinking about why we decided to obtain convolved features suggests a further step that could improve our feature extraction pipeline. Recall that we decided to obtain convolved features because images have the property that features that are useful in one region will be useful for other regions (stationary).
+To address this, first recall that we decided to obtain convolved features because images have the "stationarity" property, which implies that features that are useful in one region are also likely to be useful for other regions.  Thus, to describe a large image, one natural approach is to aggregate statistics of these features at various locations.  For example, one could compute the mean (or max) value of a particular feature over a region of the image. These summary statistics are much lower in dimension (compared to using all of the extracted features) and can also improve results (less over-fitting).  We aggregation operation is called this operation '''pooling''', or sometimes '''mean pooling''' or '''max pooling''' (depending on the pooling operation applied).
-Then, to describe a large image, one natural approach is to aggregate statistics of these features at various locations: ''pooling'' over regions of the image. For example, one could compute the mean (or max) value of a particular feature over a region of the image. These summary statistics are much lower in dimension (compared to using all extracted features) and can also improve results (less over-fitting).
 The following image shows how pooling is done over 4 non-overlapping regions of the image.