自编码算法与稀疏性

From Ufldl

Jump to: navigation, search

@@ Line 273: / Line 273: @@
 \end{align}</math>
 其中， <math>\textstyle J(W,b)</math> 在之前课程中已有定义， <math>\textstyle \beta</math> 控制稀疏性惩罚项的权重， <math>\textstyle \hat\rho_j</math> （间接）依赖于 <math>\textstyle W,b</math> ，因为它是隐藏单元 <math>\textstyle j</math> 的平均激活值，而隐藏单元的激活值依赖于参数 <math>\textstyle W,b</math> 。
 【二审】
 我们可以看出，相对熵在 <math>\textstyle \hat\rho_j = \rho</math> 时达到它的最小值0，而当 <math>\textstyle \hat\rho_j</math> 靠近0或者1的时候，相对熵则变得非常大（其实是趋向于<math>\textstyle \infty</math>）。所以，最小化这一惩罚因子具有使得 <math>\textstyle \hat\rho_j</math> 靠近 <math>\textstyle \rho</math> 的效果。
@@ Line 284: / Line 282: @@
 \end{align}</math>
 其中 <math>\textstyle J(W,b)</math> 如之前所定义，而 <math>\textstyle \beta</math> 控制稀疏性惩罚因子的权重。 <math>\textstyle \hat\rho_j</math> 项则也（间接地）取决于 <math>\textstyle W,b</math> ，因为它是隐藏神经元 <math>\textstyle j</math> 的平均激活度，而隐藏层神经元的激活度取决于 <math>\textstyle W,b</math> 。
+【原文】
+To incorporate the KL-divergence term into your derivative calculation, there is a simple-to-implement
+trick involving only a small change to your code.  Specifically, where previously for
+the second layer (<math>\textstyle l=2</math>), during backpropagation you would have computed
+:<math>\begin{align}
+\delta^{(2)}_i = \left( \sum_{j=1}^{s_{2}} W^{(2)}_{ji} \delta^{(3)}_j \right) f'(z^{(2)}_i),
+\end{align}</math>
+now instead compute
+:<math>\begin{align}
+\delta^{(2)}_i =
+  \left( \left( \sum_{j=1}^{s_{2}} W^{(2)}_{ji} \delta^{(3)}_j \right)
++ \beta \left( - \frac{\rho}{\hat\rho_i} + \frac{1-\rho}{1-\hat\rho_i} \right) \right) f'(z^{(2)}_i) .
+\end{align}</math>
+【初译】
+为了将相对熵引入导数的计算，我们可以使用一个易于实现的技巧，这只需要在你的程序中稍作改动。具体来说，前面在后向传播算法中计算第二层（ <math>\textstyle l=2</math> ）更新的时候我们已经计算了
+:<math>\begin{align}
+\delta^{(2)}_i = \left( \sum_{j=1}^{s_{2}} W^{(2)}_{ji} \delta^{(3)}_j \right) f'(z^{(2)}_i),
+\end{align}</math>
+现在我们将其换成
+:<math>\begin{align}
+\delta^{(2)}_i =
+  \left( \left( \sum_{j=1}^{s_{2}} W^{(2)}_{ji} \delta^{(3)}_j \right)
++ \beta \left( - \frac{\rho}{\hat\rho_i} + \frac{1-\rho}{1-\hat\rho_i} \right) \right) f'(z^{(2)}_i) .
+\end{align}</math>
+就可以了。
+【一审】
+为了将KL距离项整合进导数的计算中，这里有个易于实现的小技巧，只需对你的代码稍作改动。就比如说，在之前介绍反向传播算法的课程中，对于第二层（ <math>\textstyle l=2</math> ），你应该计算得到：
+:<math>\begin{align}
+\delta^{(2)}_i = \left( \sum_{j=1}^{s_{2}} W^{(2)}_{ji} \delta^{(3)}_j \right) f'(z^{(2)}_i),
+\end{align}</math>
+现在我们将其换成 ：
+:<math>\begin{align}
+\delta^{(2)}_i =
+  \left( \left( \sum_{j=1}^{s_{2}} W^{(2)}_{ji} \delta^{(3)}_j \right)
++ \beta \left( - \frac{\rho}{\hat\rho_i} + \frac{1-\rho}{1-\hat\rho_i} \right) \right) f'(z^{(2)}_i) .
+\end{align}</math>
+【二审】
+为了对相对熵进行导数计算，我们可以使用一个易于实现的技巧，这只需要在你的程序中稍作改动即可。具体来说，前面在后向传播算法中计算第二层（ <math>\textstyle l=2</math> ）更新的时候我们已经计算了
+:<math>\begin{align}
+\delta^{(2)}_i = \left( \sum_{j=1}^{s_{2}} W^{(2)}_{ji} \delta^{(3)}_j \right) f'(z^{(2)}_i),
+\end{align}</math>
+现在我们将其换成
+:<math>\begin{align}
+\delta^{(2)}_i =
+  \left( \left( \sum_{j=1}^{s_{2}} W^{(2)}_{ji} \delta^{(3)}_j \right)
++ \beta \left( - \frac{\rho}{\hat\rho_i} + \frac{1-\rho}{1-\hat\rho_i} \right) \right) f'(z^{(2)}_i) .
+\end{align}</math>
+就可以了。

自编码算法与稀疏性

From Ufldl

Revision as of 14:07, 12 March 2013

Views

Personal tools

ufldl resources

wiki

Search

Toolbox