稀疏编码自编码表达
From Ufldl
(→Good initialization of s) |
|||
Line 314: | Line 314: | ||
- | === Good initialization of <math>s</math> === | + | === Good initialization of <math>s</math>[良好的s初始值] === |
Another important trick in obtaining faster and better convergence is good initialization of the feature matrix <math>s</math> before using gradient descent (or other methods) to optimize for the objective function for <math>s</math> given <math>A</math>. In practice, initializing <math>s</math> randomly at each iteration can result in poor convergence unless a good optima is found for <math>s</math> before moving on to optimize for <math>A</math>. A better way to initialize <math>s</math> is the following: | Another important trick in obtaining faster and better convergence is good initialization of the feature matrix <math>s</math> before using gradient descent (or other methods) to optimize for the objective function for <math>s</math> given <math>A</math>. In practice, initializing <math>s</math> randomly at each iteration can result in poor convergence unless a good optima is found for <math>s</math> before moving on to optimize for <math>A</math>. A better way to initialize <math>s</math> is the following: | ||
Line 324: | Line 324: | ||
[初译] | [初译] | ||
+ | 在给定<math>A</math>的条件下,根据目标函数使用梯度下降(或其他方法)求解<math>s</math>之前找到良好的特征矩阵<math>s</math>的初始值是另一个快速高效收敛的重要技巧。实际上,每次迭代过程<math>s</math>的随机初始化导致收敛性较差,除非在求解<math>A</math>的最优值前已得到<math>s</math>的最优解。下面给出一个初始化s的较好方法: | ||
+ | <ol> | ||
+ | <li>令<math>s \leftarrow W^Tx</math> (<math>x</math> 是迷你块中patches的矩阵表示) | ||
+ | <li>对s做归一化处理:<math>s</math>中的每个特征(<math>s</math>的每一列)除以其在<math>A</math>中对应的偏移量。换句话说,如果 <math>s_{r, c}</math>表示<math>c</math>样本的第<math>r</math>个特征,<math>A_c</math>表示<math>A</math>中第<math>c</math>个偏移量,则令<math>s_{r, c} \leftarrow \frac{ s_{r, c} } { \lVert A_c \rVert }.</math> | ||
+ | </ol> | ||
[一审] | [一审] | ||
- | + | 在给定<math>A</math>的条件下,根据目标函数使用梯度下降(或其他方法)求解<math>s</math>之前找到良好的特征矩阵<math>s</math>的初始值是另一个快速高效收敛的重要技巧。实际上,每次迭代过程<math>s</math>的随机初始化导致收敛性较差,除非在优化<math>A</math>的最优值前已得到<math>s</math>的最优解。下面给出一个初始化s的较好方法: | |
+ | <ol> | ||
+ | <li>令<math>s \leftarrow W^Tx</math> (<math>x</math> 是迷你块中patches的矩阵表示) | ||
+ | <li>对s做归一化处理:<math>s</math>中的每个特征(<math>s</math>的每一列)除以其在<math>A</math>中对应的基向量。即,如果 <math>s_{r, c}</math>表示<math>c</math>样本的第<math>r</math>个特征,<math>A_c</math>表示<math>A</math>中第<math>c</math>个基向量,则令<math>s_{r, c} \leftarrow \frac{ s_{r, c} } { \lVert A_c \rVert }.</math> | ||
+ | </ol> | ||
+ | [原文] | ||
Very roughly and informally speaking, this initialization helps because the first step is an attempt to find a good <math>s</math> such that <math>Ws \approx x</math>, and the second step "normalizes" <math>s</math> in an attempt to keep the sparsity penalty small. It turns out that initializing <math>s</math> using only one but not both steps results in poor performance in practice. ([[TODO]]: a better explanation for why this initialization helps?) | Very roughly and informally speaking, this initialization helps because the first step is an attempt to find a good <math>s</math> such that <math>Ws \approx x</math>, and the second step "normalizes" <math>s</math> in an attempt to keep the sparsity penalty small. It turns out that initializing <math>s</math> using only one but not both steps results in poor performance in practice. ([[TODO]]: a better explanation for why this initialization helps?) |