稀疏编码自编码表达

 Revision as of 06:47, 21 March 2013 (view source)Kandeng (Talk | contribs)← Older edit Revision as of 06:50, 21 March 2013 (view source)Kandeng (Talk | contribs) Newer edit → Line 59: Line 59: 观察修改后的目标函数$J(A, s)$，给定$s$的条件下，目标函数可以简化为$J(A; s) = \lVert As - x \rVert_2^2 + \gamma \lVert A \rVert_2^2$（因为$s$的L1范式不是$A$的函数，所以可以忽略）。简化后的目标函数是一个关于$A$的简单二次项式，因此对$A$求导是很容易的。这种求导的一种快捷方法是矩阵微积分（[相关链接]部分列出了跟矩阵演算有关的内容）。遗憾的是，在给定$A$的条件下，目标函数却不具备这样的求导方法，因此目标函数的最小化步骤只能用梯度下降或其他类似的最优化方法。 观察修改后的目标函数$J(A, s)$，给定$s$的条件下，目标函数可以简化为$J(A; s) = \lVert As - x \rVert_2^2 + \gamma \lVert A \rVert_2^2$（因为$s$的L1范式不是$A$的函数，所以可以忽略）。简化后的目标函数是一个关于$A$的简单二次项式，因此对$A$求导是很容易的。这种求导的一种快捷方法是矩阵微积分（[相关链接]部分列出了跟矩阵演算有关的内容）。遗憾的是，在给定$A$的条件下，目标函数却不具备这样的求导方法，因此目标函数的最小化步骤只能用梯度下降或其他类似的最优化方法。 - [原文] + 理论上，通过上述迭代方法求解目标函数的最优化问题最终得到的特征集（A的基向量）与通过稀疏自编码学习得到的特征集是差不多的。但是实际上，为了获得更好的算法收敛性需要使用一些小技巧，后面的[[ Sparse Coding: Autoencoder Interpretation#稀疏编码实践| 稀疏编码实践]] 稀疏编码实践章节会详细介绍这些技巧。用梯度下降方法求解目标函数也略需技巧，另外使用矩阵演算或[[Deriving gradients using the backpropagation idea | 反向传播算法]]则有助于解决此类问题。 - + In theory, optimizing for this objective function using the iterative method as above should (eventually) yield features (the basis vectors of $A$) similar to those learned using the sparse autoencoder. However, in practice, there are quite a few tricks required for better convergence of the algorithm, and these tricks are described in greater detail in the later section on [[ Sparse Coding: Autoencoder Interpretation#Sparse coding in practice | sparse coding in practice]]. Deriving the gradients for the objective function may be slightly tricky as well, and using matrix calculus or [[Deriving gradients using the backpropagation idea | using the backpropagation intuition]] can be helpful. In theory, optimizing for this objective function using the iterative method as above should (eventually) yield features (the basis vectors of $A$) similar to those learned using the sparse autoencoder. However, in practice, there are quite a few tricks required for better convergence of the algorithm, and these tricks are described in greater detail in the later section on [[ Sparse Coding: Autoencoder Interpretation#Sparse coding in practice | sparse coding in practice]]. Deriving the gradients for the objective function may be slightly tricky as well, and using matrix calculus or [[Deriving gradients using the backpropagation idea | using the backpropagation intuition]] can be helpful.