用反向传导思想求导
From Ufldl
Line 1: | Line 1: | ||
+ | [原文] | ||
+ | |||
+ | == Introduction == | ||
+ | |||
+ | In the section on the [[Backpropagation Algorithm | backpropagation algorithm]], you were briefly introduced to backpropagation as a means of deriving gradients for learning in the sparse autoencoder. It turns out that together with matrix calculus, this provides a powerful method and intuition for deriving gradients for more complex matrix functions (functions from matrices to the reals, or symbolically, from <math>\mathbb{R}^{r \times c} \rightarrow \mathbb{R}</math>). | ||
+ | |||
[初译] | [初译] | ||
Line 10: | Line 16: | ||
在反向传播算法一节中,我们介绍了在稀疏自动编码器中用来求导的反向传播算法。事实证明这种手段同矩阵计算相结合可以提供计算复杂矩阵函数(从矩阵到实数的函数,或用符号表示,从<math>\mathbb{R}^{r \times c} \rightarrow \mathbb{R}</math>这样映射的函数)的强大方法与直觉。 | 在反向传播算法一节中,我们介绍了在稀疏自动编码器中用来求导的反向传播算法。事实证明这种手段同矩阵计算相结合可以提供计算复杂矩阵函数(从矩阵到实数的函数,或用符号表示,从<math>\mathbb{R}^{r \times c} \rightarrow \mathbb{R}</math>这样映射的函数)的强大方法与直觉。 | ||
+ | |||
+ | [原文] | ||
+ | |||
+ | First, recall the backpropagation idea, which we present in a modified form appropriate for our purposes below: | ||
+ | <ol> | ||
+ | <li>For each output unit <math>i</math> in layer <math>n_l</math> (the final layer), set | ||
+ | :<math> | ||
+ | \delta^{(n_l)}_i | ||
+ | = \frac{\partial}{\partial z^{(n_l)}_i} \;\; | ||
+ | J(z^{(n_l)}) | ||
+ | </math> | ||
+ | where <math>J(z)</math> is our "objective function" (explained below). | ||
+ | <li>For <math>l = n_l-1, n_l-2, n_l-3, \ldots, 2</math> | ||
+ | :For each node <math>i</math> in layer <math>l</math>, set | ||
+ | ::<math> | ||
+ | \delta^{(l)}_i = \left( \sum_{j=1}^{s_{l+1}} W^{(l)}_{ji} \delta^{(l+1)}_j \right) \bullet \frac{\partial}{\partial z^{(l)}_i} f^{(l)} (z^{(l)}_i) | ||
+ | </math> | ||
+ | <li>Compute the desired partial derivatives, | ||
+ | :<math> | ||
+ | \begin{align} | ||
+ | \nabla_{W^{(l)}} J(W,b;x,y) &= \delta^{(l+1)} (a^{(l)})^T, \\ | ||
+ | \end{align} | ||
+ | </math> | ||
+ | </ol> | ||
+ | |||
[初译] | [初译] | ||
Line 58: | Line 89: | ||
</math> . | </math> . | ||
</ol> | </ol> | ||
+ | |||
+ | [原文] | ||
+ | |||
+ | Quick notation recap: | ||
+ | <ul> | ||
+ | <li><math>l</math> is the number of layers in the neural network | ||
+ | <li><math>n_l</math> is the number of neurons in the <math>l</math>th layer | ||
+ | <li><math>W^{(l)}_{ji}</math> is the weight from the <math>i</math>th unit in the <math>l</math>th layer to the <math>j</math>th unit in the <math>(l + 1)</math>th layer | ||
+ | <li><math>z^{(l)}_i</math> is the input to the <math>i</math>th unit in the <math>l</math>th layer | ||
+ | <li><math>a^{(l)}_i</math> is the activation of the <math>i</math>th unit in the <math>l</math>th layer | ||
+ | <li><math>A \bullet B</math> is the Hadamard or element-wise product, which for <math>r \times c</math> matrices <math>A</math> and <math>B</math> yields the <math>r \times c</math> matrix <math>C = A \bullet B</math> such that <math>C_{r, c} = A_{r, c} \cdot B_{r, c}</math> | ||
+ | <li><math>f^{(l)}</math> is the activation function for units in the <math>l</math>th layer | ||
+ | </ul> | ||
+ | |||
+ | [初译] | ||
+ | |||
+ | 符号概述: | ||
+ | <ul> | ||
+ | <li><math>l</math>是神经网络的层数 | ||
+ | <li><math>n_l</math>第l层神经元的个数 | ||
+ | <li><math>W^{(l)}_{ji}</math>是<math>l</math>层第<math>i</math>各节点到第<math>(l+1)</math>层第<math>j</math>个节点的权重 | ||
+ | <li><math>z^{(l)}_i</math> 是第<math>l</math>层第<math>i</math>个单元的输入 | ||
+ | <li><math>a^{(l)}_i</math> 是第<math>l</math>层第<math>i</math>个节点的激励 | ||
+ | <li><math>A \bullet B</math> 是矩阵的Hadamard积或逐项乘积,对<math>r \times c</math>矩阵<math>A</math>和<math>B</math>,它们的乘积是<math>r \times c</math> 矩阵<math>C = A \bullet B</math> 满足<math>C_{r, c} = A_{r, c} \cdot B_{r, c}</math> | ||
+ | <li><math>f^{(l)}</math> 是第<math>l</math>层中各单元的激励函数 | ||
+ | </ul> | ||
+ | |||
+ | [一审] | ||
+ | |||
+ | 快速符号回顾: | ||
+ | <ul> | ||
+ | <li><math>l</math>是神经网络的层数 | ||
+ | <li><math>n_l</math>第l层神经元的个数 | ||
+ | <li><math>W^{(l)}_{ji}</math>是<math>l</math>层第<math>i</math>个节点到第<math>(l+1)</math>层第<math>j</math>个节点的权重 | ||
+ | <li><math>z^{(l)}_i</math> 是第<math>l</math>层第<math>i</math>个单元的输入 | ||
+ | <li><math>a^{(l)}_i</math> 是第<math>l</math>层第<math>i</math>个节点的激励 | ||
+ | <li><math>A \bullet B</math> 是矩阵的Hadamard积或对应元素乘积,对<math>r \times c</math>矩阵<math>A</math>和<math>B</math>,它们的乘积是<math>r \times c</math> 矩阵<math>C = A \bullet B</math> 满足<math>C_{r, c} = A_{r, c} \cdot B_{r, c}</math> | ||
+ | <li><math>f^{(l)}</math> 是第<math>l</math>层中各单元的激励函数 | ||
+ | </ul> |