# Sparse Coding: Autoencoder Interpretation

 Revision as of 04:33, 8 April 2013 (view source)Kandeng (Talk | contribs)← Older edit Latest revision as of 02:49, 19 April 2013 (view source)Wikiroot (Talk | contribs) Line 37: Line 37: (note that the third term, $\lVert A \rVert_2^2$ is simply the sum of squares of the entries of A, or $\sum_r{\sum_c{A_{rc}^2}}$) (note that the third term, $\lVert A \rVert_2^2$ is simply the sum of squares of the entries of A, or $\sum_r{\sum_c{A_{rc}^2}}$) - This objective function presents one last problem - the L1 norm is not differentiable at 0, and hence poses a problem for gradient-based methods. While the problem can be solved using other non-gradient descent-based methods, we will "smooth out" the L1 norm using an approximation which will allow us to use gradient descent. To "smooth out" the L1 norm, we use $\sqrt{x + \epsilon}$ in place of $\left| x \right|$, where $\epsilon$ is a "smoothing parameter" which can also be interpreted as a sort of "sparsity parameter" (to see this, observe that when $\epsilon$ is large compared to $x$, the $x + \epsilon$ is dominated by $\epsilon$, and taking the square root yields approximately $\sqrt{\epsilon}$). This "smoothing" will come in handy later when considering topographic sparse coding below. + This objective function presents one last problem - the L1 norm is not differentiable at 0, and hence poses a problem for gradient-based methods. While the problem can be solved using other non-gradient descent-based methods, we will "smooth out" the L1 norm using an approximation which will allow us to use gradient descent. To "smooth out" the L1 norm, we use $\sqrt{x^2 + \epsilon}$ in place of $\left| x \right|$, where $\epsilon$ is a "smoothing parameter" which can also be interpreted as a sort of "sparsity parameter" (to see this, observe that when $\epsilon$ is large compared to $x$, the $x + \epsilon$ is dominated by $\epsilon$, and taking the square root yields approximately $\sqrt{\epsilon}$). This "smoothing" will come in handy later when considering topographic sparse coding below. Our final objective function is hence: Our final objective function is hence: