稀疏编码自编码表达

From Ufldl

Jump to: navigation, search
(Good initialization of s)
Line 314: Line 314:
-
=== Good initialization of <math>s</math> ===
+
=== Good initialization of <math>s</math>[良好的s初始值] ===
Another important trick in obtaining faster and better convergence is good initialization of the feature matrix <math>s</math> before using gradient descent (or other methods) to optimize for the objective function for <math>s</math> given <math>A</math>. In practice, initializing <math>s</math> randomly at each iteration can result in poor convergence unless a good optima is found for <math>s</math> before moving on to optimize for <math>A</math>. A better way to initialize <math>s</math> is the following:
Another important trick in obtaining faster and better convergence is good initialization of the feature matrix <math>s</math> before using gradient descent (or other methods) to optimize for the objective function for <math>s</math> given <math>A</math>. In practice, initializing <math>s</math> randomly at each iteration can result in poor convergence unless a good optima is found for <math>s</math> before moving on to optimize for <math>A</math>. A better way to initialize <math>s</math> is the following:
Line 324: Line 324:
[初译]
[初译]
 +
在给定<math>A</math>的条件下,根据目标函数使用梯度下降(或其他方法)求解<math>s</math>之前找到良好的特征矩阵<math>s</math>的初始值是另一个快速高效收敛的重要技巧。实际上,每次迭代过程<math>s</math>的随机初始化导致收敛性较差,除非在求解<math>A</math>的最优值前已得到<math>s</math>的最优解。下面给出一个初始化s的较好方法:
 +
<ol>
 +
<li>令<math>s \leftarrow W^Tx</math> (<math>x</math> 是迷你块中patches的矩阵表示)
 +
<li>对s做归一化处理:<math>s</math>中的每个特征(<math>s</math>的每一列)除以其在<math>A</math>中对应的偏移量。换句话说,如果 <math>s_{r, c}</math>表示<math>c</math>样本的第<math>r</math>个特征,<math>A_c</math>表示<math>A</math>中第<math>c</math>个偏移量,则令<math>s_{r, c} \leftarrow \frac{ s_{r, c} } { \lVert A_c \rVert }.</math>
 +
</ol>
[一审]
[一审]
-
[原文]
+
在给定<math>A</math>的条件下,根据目标函数使用梯度下降(或其他方法)求解<math>s</math>之前找到良好的特征矩阵<math>s</math>的初始值是另一个快速高效收敛的重要技巧。实际上,每次迭代过程<math>s</math>的随机初始化导致收敛性较差,除非在优化<math>A</math>的最优值前已得到<math>s</math>的最优解。下面给出一个初始化s的较好方法:
 +
<ol>
 +
<li>令<math>s \leftarrow W^Tx</math> (<math>x</math> 是迷你块中patches的矩阵表示)
 +
<li>对s做归一化处理:<math>s</math>中的每个特征(<math>s</math>的每一列)除以其在<math>A</math>中对应的基向量。即,如果 <math>s_{r, c}</math>表示<math>c</math>样本的第<math>r</math>个特征,<math>A_c</math>表示<math>A</math>中第<math>c</math>个基向量,则令<math>s_{r, c} \leftarrow \frac{ s_{r, c} } { \lVert A_c \rVert }.</math>
 +
</ol>
 +
[原文]
Very roughly and informally speaking, this initialization helps because the first step is an attempt to find a good <math>s</math> such that <math>Ws \approx x</math>, and the second step "normalizes" <math>s</math> in an attempt to keep the sparsity penalty small. It turns out that initializing <math>s</math> using only one but not both steps results in poor performance in practice. ([[TODO]]: a better explanation for why this initialization helps?)
Very roughly and informally speaking, this initialization helps because the first step is an attempt to find a good <math>s</math> such that <math>Ws \approx x</math>, and the second step "normalizes" <math>s</math> in an attempt to keep the sparsity penalty small. It turns out that initializing <math>s</math> using only one but not both steps results in poor performance in practice. ([[TODO]]: a better explanation for why this initialization helps?)

Revision as of 07:21, 8 March 2013

Personal tools