Stacked Autoencoders
From Ufldl
for
Stacked Autoencoders
Jump to:
navigation
,
search
===Overview=== A stacked autoencoder is a neural network consisting of multiple layers of sparse autoencoders in which the outputs of each layer is wired to the inputs of the successive layer. Formally, consider a stacked autoencoder with n layers. Using notation from the autoencoder section, let <math>W^{(k, 1)}, W^{(k, 2)}, b^{(k, 1)}, b^{(k, 2)}</math> denote the parameters <math>W^{(1)}, W^{(2)}, b^{(1)}, b^{(2)}</math> for kth autoencoder. Then the encoding step for the stacked autoencoder is given by running the encoding step of each layer in forward order: <math> \begin{align} a^{(l)} = f(z^{(l)}) \\ z^{(l + 1)} = W^{(l, 1)}a^{(l)} + b^{(l, l)} \end{align} </math> The decoding step is given by running the decoding stack of each autoencoder in reverse order: <math> \begin{align} a^{(n + l)} = f(z^{(n + l)}) \\ z^{(n + l + 1)} = W^{(n - l, 2)}a^{(n + l)} + b^{(n - l, 2)} \end{align} </math> The information of interest is contained within <math>a^{(n)}</math>, which is the activation of the deepest layer of hidden units. This vector gives us a representation of the input in terms of higher-order features. The stacked autoencoder can be used for classification problems by feeding a(1) to a softmax classifier. ===Training=== A good way to obtain good parameters for a stacked autoencoder is to use greedy layer-wise training. To do this, first train the first layer on raw input to obtain parameters W1, W2, b1 and b2. Use the first layer to transform the raw input into a vector consisting of activation of the hidden units, A. Train the second layer on this vector to obtain parameters W1, W2, b1 and b2. Repeat for subsequent layers, using the output of each layer as input for the subsequent layer. This method trains the parameters of each layer individually while freezing parameters for the remainder of the model. To produce better results, after this phase of training is complete, fine-tuning can be used to improve the results. In fine-tuning, the parameters of all layers are changed at the same time. The loss function can be back-propagated to each preceding layer, and the In practice, fine-tuning should be use when the parameters have been brought close to convergence through layer-wise training. Attempting to use fine-tuning with the weights initialized randomly will lead to poor results due to local optima. ===Motivation=== A stacked autoencoder inherits all the benefits of any deep network: greater expressive power and greater statistical efficiency. In addition, its purpose can be described in an intuitive sense as follows. Recall that an autoencoder tends to learn features that form a good representation of its input. The first layer of a stacked autoencoder tends to learn first-order features in the raw input. The second layer of a stacked autoencoder tends to learn second-order features corresponding to patterns in the appearance of first-order features. Higher layers of the stacked autoencoder tend to learn even higher-order features. For instance, in the context of image input, the first layers usually learns to recognize edges. The second layer usually learns features that arise from combinations of the edges, such as corners. With certain types of network configuration and input modes, the higher layers can learn meaningful combinations of features. For instance, if the input set consists of images of faces, higher layers may learn features corresponding to parts of the face such as eyes, noses or mouths.
Template:CNN
(
view source
)
Template:Languages
(
view source
)
Template:Quote
(
view source
)
Return to
Stacked Autoencoders
.
Views
Page
Discussion
View source
History
Personal tools
Log in
ufldl resources
UFLDL Tutorial
Recommended Readings
wiki
Main page
Recent changes
Random page
Help
Search
Toolbox
What links here
Related changes
Special pages