稀疏编码自编码表达

+
1. 令$s \leftarrow W^Tx$ ($x$ 是迷你块中patches的矩阵表示) +
2. 对s做归一化处理：$s$中的每个特征（$s$的每一列）除以其在$A$中对应的偏移量。换句话说，如果 $s_{r, c}$表示$c$样本的第$r$个特征，$A_c$表示$A$中第$c$个偏移量，则令$s_{r, c} \leftarrow \frac{ s_{r, c} } { \lVert A_c \rVert }.$ +
[一审] [一审] - [原文] + 在给定$A$的条件下，根据目标函数使用梯度下降（或其他方法）求解$s$之前找到良好的特征矩阵$s$的初始值是另一个快速高效收敛的重要技巧。实际上，每次迭代过程$s$的随机初始化导致收敛性较差，除非在优化$A$的最优值前已得到$s$的最优解。下面给出一个初始化s的较好方法： +
+
1. 令$s \leftarrow W^Tx$ ($x$ 是迷你块中patches的矩阵表示) +
2. 对s做归一化处理：$s$中的每个特征（$s$的每一列）除以其在$A$中对应的基向量。即，如果 $s_{r, c}$表示$c$样本的第$r$个特征，$A_c$表示$A$中第$c$个基向量，则令$s_{r, c} \leftarrow \frac{ s_{r, c} } { \lVert A_c \rVert }.$ +
+ [原文] Very roughly and informally speaking, this initialization helps because the first step is an attempt to find a good $s$ such that $Ws \approx x$, and the second step "normalizes" $s$ in an attempt to keep the sparsity penalty small. It turns out that initializing $s$ using only one but not both steps results in poor performance in practice. ([[TODO]]: a better explanation for why this initialization helps?) Very roughly and informally speaking, this initialization helps because the first step is an attempt to find a good $s$ such that $Ws \approx x$, and the second step "normalizes" $s$ in an attempt to keep the sparsity penalty small. It turns out that initializing $s$ using only one but not both steps results in poor performance in practice. ([[TODO]]: a better explanation for why this initialization helps?)