Stacked Autoencoders

From Ufldl

Jump to: navigation, search
(Motivation)
Line 56: Line 56:
[[File:Stacked_Combined.png|500px]]
[[File:Stacked_Combined.png|500px]]
-
===Motivation===
+
===Discussion===
-
A stacked autoencoder inherits all the benefits of any deep network: greater expressive power and greater statistical efficiency. In addition, its purpose can be described in an intuitive sense as follows.
+
A stacked autoencoder enjoys all the benefits of any deep network of greater expressive power.
-
Recall that an autoencoder tends to learn features that form a good representation of its input. The first layer of a stacked autoencoder tends to learn first-order features in the raw input. The second layer of a stacked autoencoder tends to learn second-order features corresponding to patterns in the appearance of first-order features. Higher layers of the stacked autoencoder tend to learn even higher-order features.
+
Further, it often captures a useful "hierarchical grouping" or "part-whole decomposition" of the input.  To see this, recall that an autoencoder tends to learn features that form a good representation of its input. The first layer of a stacked autoencoder tends to learn first-order features in the raw input (such as edges in an image). The second layer of a stacked autoencoder tends to learn second-order features corresponding to patterns in the appearance of first-order features (e.g., in terms of what edges tend to occur together--for example, to form contour or corner detectors). Higher layers of the stacked autoencoder tend to learn even higher-order features.  
 +
<!--
For instance, in the context of image input, the first layers usually learns to recognize edges. The second layer usually learns features that arise from combinations of the edges, such as corners. With certain types of network configuration and input modes, the higher layers can learn meaningful combinations of features. For instance, if the input set consists of images of faces, higher layers may learn features corresponding to parts of the face such as eyes, noses or mouths.
For instance, in the context of image input, the first layers usually learns to recognize edges. The second layer usually learns features that arise from combinations of the edges, such as corners. With certain types of network configuration and input modes, the higher layers can learn meaningful combinations of features. For instance, if the input set consists of images of faces, higher layers may learn features corresponding to parts of the face such as eyes, noses or mouths.
 +
!-->

Revision as of 20:48, 13 May 2011

Personal tools