|
|
Line 1: |
Line 1: |
| [原文] | | [原文] |
- | == Introduction [简介]== | + | == 简介 == |
| + | 在[[Backpropagation Algorithm | backpropagation algorithm]]一节中,我们介绍了在稀疏自编码器中用反向传导算法来求梯度的方法。事实证明,反向传导算法与矩阵运算相结合的方法,对于计算复杂矩阵函数(从矩阵到实数的函数,或用符号表示为:从<math>\mathbb{R}^{r \times c} \rightarrow \mathbb{R}</math>)的梯度是十分强大和直观的。 |
| | | |
- | In the section on the [[Backpropagation Algorithm | backpropagation algorithm]], you were briefly introduced to backpropagation as a means of deriving gradients for learning in the sparse autoencoder. It turns out that together with matrix calculus, this provides a powerful method and intuition for deriving gradients for more complex matrix functions (functions from matrices to the reals, or symbolically, from <math>\mathbb{R}^{r \times c} \rightarrow \mathbb{R}</math>).
| + | 首先,我们回顾一下反向传导的思想,为了更适合我们的目的,将其稍作修改呈现于下: |
- | | + | |
- | [初译]
| + | |
- | | + | |
- | 简介
| + | |
- | | + | |
- | 这一节关于反向传导算法,你会简要了解到使用反向传导作为一种求导方法,以便稀疏自编码器的学习。事实证明这种手段同矩阵计算相结合提供了计算复杂矩阵函数(从矩阵到实数的函数,或用符号表示,从 <math>\mathbb{R}^{r \times c} \rightarrow \mathbb{R}</math>)的强大方法与直觉。
| + | |
- | | + | |
- | [一审]
| + | |
- | | + | |
- | 简介
| + | |
- | | + | |
- | 在反向传导算法一节中,我们介绍了在稀疏自动编码器中用来求导的反向传导算法。事实证明这种手段同矩阵计算相结合可以提供计算复杂矩阵函数(从矩阵到实数的函数,或用符号表示,从<math>\mathbb{R}^{r \times c} \rightarrow \mathbb{R}</math>这样映射的函数)的强大方法与直觉。
| + | |
- | | + | |
- | [原文]
| + | |
- | | + | |
- | First, recall the backpropagation idea, which we present in a modified form appropriate for our purposes below:
| + | |
| <ol> | | <ol> |
- | <li>For each output unit <math>i</math> in layer <math>n_l</math> (the final layer), set | + | <li>对第<math>n_l</math>层(最后一层)中的每一个输出单元<math>i</math>,令 |
| :<math> | | :<math> |
| \delta^{(n_l)}_i | | \delta^{(n_l)}_i |
Line 26: |
Line 11: |
| J(z^{(n_l)}) | | J(z^{(n_l)}) |
| </math> | | </math> |
- | where <math>J(z)</math> is our "objective function" (explained below).
| + | ,其中<math>J(z)</math>是我们的“目标函数”(稍后解释)。 |
- | <li>For <math>l = n_l-1, n_l-2, n_l-3, \ldots, 2</math> | + | <li>对<math>l = n_l-1, n_l-2, n_l-3, \ldots, 2</math>, |
- | :For each node <math>i</math> in layer <math>l</math>, set | + | :对第<math>l</math>层中的每个节点<math>i</math>, 令 |
| ::<math> | | ::<math> |
- | \delta^{(l)}_i = \left( \sum_{j=1}^{s_{l+1}} W^{(l)}_{ji} \delta^{(l+1)}_j \right) \bullet \frac{\partial}{\partial z^{(l)}_i} f^{(l)} (z^{(l)}_i)
| |
- | </math>
| |
- | <li>Compute the desired partial derivatives,
| |
- | :<math>
| |
- | \begin{align}
| |
- | \nabla_{W^{(l)}} J(W,b;x,y) &= \delta^{(l+1)} (a^{(l)})^T, \\
| |
- | \end{align}
| |
- | </math>
| |
- | </ol>
| |
- |
| |
- |
| |
- | [初译]
| |
- |
| |
- | 首先,回忆一下反向传导思想,这里我们用一种变形的形式逐渐逼近我们的目的:
| |
- | <ol>
| |
- | <li>对每一个第<math>n_l</math>层(最后一层)中的输出单元<math>i</math>,令
| |
- | :<math>
| |
- | \delta^{(n_l)}_i
| |
- | = \frac{\partial}{\partial z^{(n_l)}_i} \;\;
| |
- | J(z^{(n_l)})
| |
- | </math>
| |
- | ,其中<math>J(z)</math>是我们的“目标函数”(下面解释);
| |
- | <li>对<math>l = n_l-1, n_l-2, n_l-3, \ldots, 2</math>,
| |
- | 对每个第<math>l</math>层中的节点<math>i</math>,令
| |
- | :<math>
| |
| \delta^{(l)}_i = \left( \sum_{j=1}^{s_{l+1}} W^{(l)}_{ji} \delta^{(l+1)}_j \right) \bullet \frac{\partial}{\partial z^{(l)}_i} f^{(l)} (z^{(l)}_i) | | \delta^{(l)}_i = \left( \sum_{j=1}^{s_{l+1}} W^{(l)}_{ji} \delta^{(l+1)}_j \right) \bullet \frac{\partial}{\partial z^{(l)}_i} f^{(l)} (z^{(l)}_i) |
- | </math> | + | </math> |
- | <li>计算所需偏导数 | + | <li>计算我们要的偏导数 |
| :<math> | | :<math> |
| \begin{align} | | \begin{align} |
| \nabla_{W^{(l)}} J(W,b;x,y) &= \delta^{(l+1)} (a^{(l)})^T, \\ | | \nabla_{W^{(l)}} J(W,b;x,y) &= \delta^{(l+1)} (a^{(l)})^T, \\ |
| \end{align} | | \end{align} |
- | </math> 。
| |
- | </ol>
| |
- |
| |
- | [一审]
| |
- |
| |
- | 首先,我们回顾一下反向传导的思想,为了更适合我们的目的稍作修改呈现于下:
| |
- | <ol>
| |
- | <li>对每一个第<math>nl</math>层(最后一层)中的输出单元<math>i</math>,令
| |
- | :<math>
| |
- | \delta^{(n_l)}_i
| |
- | = \frac{\partial}{\partial z^{(n_l)}_i} \;\;
| |
- | J(z^{(n_l)})
| |
| </math> | | </math> |
- | ,其中<math>J(z)</math>是我们的“目标函数”(下面解释)。
| |
- | <li>对<math>l = n_l-1, n_l-2, n_l-3, \ldots, 2</math>,
| |
- | 对每个第<math>l</math>层中的节点<math>i</math>, 令
| |
- | :<math>
| |
- | \delta^{(l)}_i = \left( \sum_{j=1}^{s_{l+1}} W^{(l)}_{ji} \delta^{(l+1)}_j \right) \bullet \frac{\partial}{\partial z^{(l)}_i} f^{(l)} (z^{(l)}_i)
| |
- | </math>
| |
- | <li>计算所需偏导数
| |
- | :<math>
| |
- | \begin{align}
| |
- | \nabla_{W^{(l)}} J(W,b;x,y) &= \delta^{(l+1)} (a^{(l)})^T, \\
| |
- | \end{align}
| |
- | </math> .
| |
| </ol> | | </ol> |
| | | |
- | [原文]
| + | 符号扼要重述: |
- | | + | |
- | Quick notation recap:
| + | |
- | <ul>
| + | |
- | <li><math>l</math> is the number of layers in the neural network
| + | |
- | <li><math>n_l</math> is the number of neurons in the <math>l</math>th layer
| + | |
- | <li><math>W^{(l)}_{ji}</math> is the weight from the <math>i</math>th unit in the <math>l</math>th layer to the <math>j</math>th unit in the <math>(l + 1)</math>th layer
| + | |
- | <li><math>z^{(l)}_i</math> is the input to the <math>i</math>th unit in the <math>l</math>th layer
| + | |
- | <li><math>a^{(l)}_i</math> is the activation of the <math>i</math>th unit in the <math>l</math>th layer
| + | |
- | <li><math>A \bullet B</math> is the Hadamard or element-wise product, which for <math>r \times c</math> matrices <math>A</math> and <math>B</math> yields the <math>r \times c</math> matrix <math>C = A \bullet B</math> such that <math>C_{r, c} = A_{r, c} \cdot B_{r, c}</math>
| + | |
- | <li><math>f^{(l)}</math> is the activation function for units in the <math>l</math>th layer
| + | |
- | </ul>
| + | |
- | | + | |
- | [初译]
| + | |
- | | + | |
- | 符号概述:
| + | |
- | <ul>
| + | |
- | <li><math>l</math>是神经网络的层数
| + | |
- | <li><math>n_l</math>第l层神经元的个数
| + | |
- | <li><math>W^{(l)}_{ji}</math>是<math>l</math>层第<math>i</math>各节点到第<math>(l+1)</math>层第<math>j</math>个节点的权重
| + | |
- | <li><math>z^{(l)}_i</math> 是第<math>l</math>层第<math>i</math>个单元的输入
| + | |
- | <li><math>a^{(l)}_i</math> 是第<math>l</math>层第<math>i</math>个节点的激励
| + | |
- | <li><math>A \bullet B</math> 是矩阵的Hadamard积或逐项乘积,对<math>r \times c</math>矩阵<math>A</math>和<math>B</math>,它们的乘积是<math>r \times c</math> 矩阵<math>C = A \bullet B</math> 满足<math>C_{r, c} = A_{r, c} \cdot B_{r, c}</math>
| + | |
- | <li><math>f^{(l)}</math> 是第<math>l</math>层中各单元的激励函数
| + | |
- | </ul>
| + | |
- | | + | |
- | [一审]
| + | |
- | | + | |
- | 快速符号回顾:
| + | |
| <ul> | | <ul> |
| <li><math>l</math>是神经网络的层数 | | <li><math>l</math>是神经网络的层数 |
Line 124: |
Line 32: |
| <li><math>z^{(l)}_i</math> 是第<math>l</math>层第<math>i</math>个单元的输入 | | <li><math>z^{(l)}_i</math> 是第<math>l</math>层第<math>i</math>个单元的输入 |
| <li><math>a^{(l)}_i</math> 是第<math>l</math>层第<math>i</math>个节点的激励 | | <li><math>a^{(l)}_i</math> 是第<math>l</math>层第<math>i</math>个节点的激励 |
- | <li><math>A \bullet B</math> 是矩阵的Hadamard积或对应元素乘积,对<math>r \times c</math>矩阵<math>A</math>和<math>B</math>,它们的乘积是<math>r \times c</math> 矩阵<math>C = A \bullet B</math> 满足<math>C_{r, c} = A_{r, c} \cdot B_{r, c}</math> | + | <li><math>A \bullet B</math> 是矩阵的Hadamard积或逐个元素乘积,对<math>r \times c</math>矩阵<math>A</math>和<math>B</math>,它们的乘积是<math>r \times c</math> 矩阵<math>C = A \bullet B</math> ,即<math>C_{r, c} = A_{r, c} \cdot B_{r, c}</math> |
| <li><math>f^{(l)}</math> 是第<math>l</math>层中各单元的激励函数 | | <li><math>f^{(l)}</math> 是第<math>l</math>层中各单元的激励函数 |
| </ul> | | </ul> |