Sparse Coding: Autoencoder Interpretation

From Ufldl

Jump to: navigation, search
Line 59: Line 59:
Observe that with our modified objective function, the objective function <math>J(A, s)</math> given <math>s</math>, that is <math>J(A; s) = \lVert As - x \rVert_2^2 + \gamma \lVert A \rVert_2^2</math> (the L1 term in <math>s</math> can be omitted since it is not a function of <math>A</math>) is simply a quadratic term in <math>A</math>, and hence has an easily derivable analytic solution in <math>A</math>. A quick way to derive this solution would be to use matrix calculus - some pages about matrix calculus can be found in the [[Useful Links | useful links]] section. Unfortunately, the objective function given <math>A</math> does not have a similarly nice analytic solution, so that minimization step will have to be carried out using gradient descent or similar optimization methods.
Observe that with our modified objective function, the objective function <math>J(A, s)</math> given <math>s</math>, that is <math>J(A; s) = \lVert As - x \rVert_2^2 + \gamma \lVert A \rVert_2^2</math> (the L1 term in <math>s</math> can be omitted since it is not a function of <math>A</math>) is simply a quadratic term in <math>A</math>, and hence has an easily derivable analytic solution in <math>A</math>. A quick way to derive this solution would be to use matrix calculus - some pages about matrix calculus can be found in the [[Useful Links | useful links]] section. Unfortunately, the objective function given <math>A</math> does not have a similarly nice analytic solution, so that minimization step will have to be carried out using gradient descent or similar optimization methods.
-
Optimizing for this objective function using the iterative method as above, will yield features (the basis vectors of <math>A</math>) similar to those learned using the sparse autoencoder. For more practical tips on implementing sparse coding, you may wish to refer to [[Exercise:Sparse Coding | the sparse coding exercise]]. For assistance with deriving the gradients, you may wish to refer to [[Deriving gradients using the backpropagation idea]].
+
Optimizing for this objective function using the iterative method as above, will yield features (the basis vectors of <math>A</math>) similar to those learned using the sparse autoencoder. However, in practice, there are quite a few tricks required for better convergence of the algorithm, and these tricks are described in greater detail in [[Exercise:Sparse Coding | the sparse coding exercise]]. The gradient computations may be slightly tricky as well, and using matrix calculus or [[Deriving gradients using backpropagation | using the backpropagation intuition)]] can be helpful.
== Topographic sparse coding ==
== Topographic sparse coding ==

Revision as of 05:46, 29 May 2011

Personal tools