Linear Decoders
From Ufldl
Line 16: | Line 16: | ||
where <math>a^{(3)}</math> is the output. In the autoencoder, <math>a^{(3)}</math> is our approximate reconstruction of the input <math>x = a^{(1)}</math>. | where <math>a^{(3)}</math> is the output. In the autoencoder, <math>a^{(3)}</math> is our approximate reconstruction of the input <math>x = a^{(1)}</math>. | ||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
Because we used a sigmoid activation function for <math>f(z^{(3)})</math>, we needed to constrain or scale the inputs to be in the range <math>[0,1]</math>, | Because we used a sigmoid activation function for <math>f(z^{(3)})</math>, we needed to constrain or scale the inputs to be in the range <math>[0,1]</math>, | ||
Line 35: | Line 21: | ||
While some datasets like MNIST fit well with this scaling of the output, this can sometimes be awkward to satisfy. For example, if one uses PCA whitening, the input is | While some datasets like MNIST fit well with this scaling of the output, this can sometimes be awkward to satisfy. For example, if one uses PCA whitening, the input is | ||
no longer constrained to <math>[0,1]</math> and it's not clear what the best way is to scale the data to ensure it fits into the constrained range. | no longer constrained to <math>[0,1]</math> and it's not clear what the best way is to scale the data to ensure it fits into the constrained range. | ||
+ | |||
+ | |||
+ | == Linear Decoder == | ||
One easy fix for this problem is to set <math>a^{(3)} = z^{(3)}</math>. Formally, this is achieved by having the output | One easy fix for this problem is to set <math>a^{(3)} = z^{(3)}</math>. Formally, this is achieved by having the output | ||
Line 63: | Line 52: | ||
\end{align} | \end{align} | ||
</math> | </math> | ||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
- | |||
Of course, when using backpropagation to compute the error terms for the ''hidden'' layer: | Of course, when using backpropagation to compute the error terms for the ''hidden'' layer: | ||
Line 80: | Line 61: | ||
Because the hidden layer is using a sigmoid (or tanh) activation <math>f</math>, in the equation above <math>f'(\cdot)</math> should still be the | Because the hidden layer is using a sigmoid (or tanh) activation <math>f</math>, in the equation above <math>f'(\cdot)</math> should still be the | ||
derivative of the sigmoid (or tanh) function. | derivative of the sigmoid (or tanh) function. | ||
+ | |||
+ | |||
+ | {{Languages|线性解码器|中文}} |