Stacked Autoencoders

From Ufldl

Jump to: navigation, search
(Training)
Line 35: Line 35:
If one is only interested in finetuning for the purposes of classification, the common practice is to then discard the "decoding" layers of the stacked autoencoder and link the last hidden layer <math>a^{(n)}</math> to the softmax classifier. The gradients from the (softmax) classification error will then be backpropagated into the encoding layers.
If one is only interested in finetuning for the purposes of classification, the common practice is to then discard the "decoding" layers of the stacked autoencoder and link the last hidden layer <math>a^{(n)}</math> to the softmax classifier. The gradients from the (softmax) classification error will then be backpropagated into the encoding layers.
}}
}}
 +
 +
===Concrete example===
 +
 +
To give a concrete example, suppose you wished to train a stacked autoencoder with 2 hidden layers for classification of MNIST digits, as you will be doing in [[Exercise: Implement deep networks for digit classification | the next exercise]].
 +
 +
First, you would train a sparse autoencoder on the raw inputs <math>x^{(k)}</math> to learn primary features <math>h^{(1)(k)}</math> on the raw input.
 +
 +
[[File:Stacked_SparseAE_Features1.png|400px]]
 +
 +
Next, you would feed the raw input into this trained sparse autoencoder, obtaining the primary feature activations <math>h^{(1)(k)}</math> for each of the inputs <math>x^{(k)}</math>. You would then use these primary features as the "raw input" to another sparse autoencoder to learn secondary features <math>h^{(2)(k)}</math> on these primary features.
 +
 +
[[File:Stacked_SparseAE_Features2.png|400px]]
 +
 +
Following this, you would feed the primary features into the second sparse autoencoder to obtain the secondary feature activations <math>h^{(2)(k)}</math> for each of the primary features <math>h^{(1)(k)}</math> (which correspond to the primary features of the corresponding inputs <math>x^{(k)}</math>). You would then treat these secondary features as "raw input" to a softmax classifier, training it to map secondary features to digit labels.
 +
 +
[[File:Stacked_Softmax_Classifier.png|400px]]
 +
 +
Finally, you would combine all three layers together to form a stacked autoencoder with 2 hidden layers and a final softmax classifier layer capable of classifying the MNIST digits as desired.
 +
 +
[[File:Stacked_Combined.png|400px]]
===Motivation===
===Motivation===

Revision as of 04:47, 13 May 2011

Personal tools