用反向传导思想求导
From Ufldl
for
用反向传导思想求导
Jump to:
navigation
,
search
[原文] == Introduction == In the section on the [[Backpropagation Algorithm | backpropagation algorithm]], you were briefly introduced to backpropagation as a means of deriving gradients for learning in the sparse autoencoder. It turns out that together with matrix calculus, this provides a powerful method and intuition for deriving gradients for more complex matrix functions (functions from matrices to the reals, or symbolically, from <math>\mathbb{R}^{r \times c} \rightarrow \mathbb{R}</math>). [初译] 简介 这一节关于反向传播算法,你会简要了解到使用反向传播作为一种求导方法,以便稀疏自编码器的学习。事实证明这种手段同矩阵计算相结合提供了计算复杂矩阵函数(从矩阵到实数的函数,或用符号表示,从 <math>\mathbb{R}^{r \times c} \rightarrow \mathbb{R}</math>)的强大方法与直觉。 [一审] 简介 在反向传播算法一节中,我们介绍了在稀疏自动编码器中用来求导的反向传播算法。事实证明这种手段同矩阵计算相结合可以提供计算复杂矩阵函数(从矩阵到实数的函数,或用符号表示,从<math>\mathbb{R}^{r \times c} \rightarrow \mathbb{R}</math>这样映射的函数)的强大方法与直觉。 [原文] First, recall the backpropagation idea, which we present in a modified form appropriate for our purposes below: <ol> <li>For each output unit <math>i</math> in layer <math>n_l</math> (the final layer), set :<math> \delta^{(n_l)}_i = \frac{\partial}{\partial z^{(n_l)}_i} \;\; J(z^{(n_l)}) </math> where <math>J(z)</math> is our "objective function" (explained below). <li>For <math>l = n_l-1, n_l-2, n_l-3, \ldots, 2</math> :For each node <math>i</math> in layer <math>l</math>, set ::<math> \delta^{(l)}_i = \left( \sum_{j=1}^{s_{l+1}} W^{(l)}_{ji} \delta^{(l+1)}_j \right) \bullet \frac{\partial}{\partial z^{(l)}_i} f^{(l)} (z^{(l)}_i) </math> <li>Compute the desired partial derivatives, :<math> \begin{align} \nabla_{W^{(l)}} J(W,b;x,y) &= \delta^{(l+1)} (a^{(l)})^T, \\ \end{align} </math> </ol> [初译] 首先,回忆一下反向传播思想,这里我们用一种变形的形式逐渐逼近我们的目的: <ol> <li>对每一个第<math>n_l</math>层(最后一层)中的输出单元<math>i</math>,令 :<math> \delta^{(n_l)}_i = \frac{\partial}{\partial z^{(n_l)}_i} \;\; J(z^{(n_l)}) </math> ,其中<math>J(z)</math>是我们的“目标函数”(下面解释); <li>对<math>l = n_l-1, n_l-2, n_l-3, \ldots, 2</math>, 对每个第<math>l</math>层中的节点<math>i</math>,令 :<math> \delta^{(l)}_i = \left( \sum_{j=1}^{s_{l+1}} W^{(l)}_{ji} \delta^{(l+1)}_j \right) \bullet \frac{\partial}{\partial z^{(l)}_i} f^{(l)} (z^{(l)}_i) </math> <li>计算所需偏导数 :<math> \begin{align} \nabla_{W^{(l)}} J(W,b;x,y) &= \delta^{(l+1)} (a^{(l)})^T, \\ \end{align} </math> 。 </ol> [一审] 首先,我们回顾一下反向传播的思想,为了更适合我们的目的稍作修改呈现于下: <ol> <li>对每一个第<math>nl</math>层(最后一层)中的输出单元<math>i</math>,令 :<math> \delta^{(n_l)}_i = \frac{\partial}{\partial z^{(n_l)}_i} \;\; J(z^{(n_l)}) </math> ,其中<math>J(z)</math>是我们的“目标函数”(下面解释)。 <li>对<math>l = n_l-1, n_l-2, n_l-3, \ldots, 2</math>, 对每个第<math>l</math>层中的节点<math>i</math>, 令 :<math> \delta^{(l)}_i = \left( \sum_{j=1}^{s_{l+1}} W^{(l)}_{ji} \delta^{(l+1)}_j \right) \bullet \frac{\partial}{\partial z^{(l)}_i} f^{(l)} (z^{(l)}_i) </math> <li>计算所需偏导数 :<math> \begin{align} \nabla_{W^{(l)}} J(W,b;x,y) &= \delta^{(l+1)} (a^{(l)})^T, \\ \end{align} </math> . </ol> [原文] Quick notation recap: <ul> <li><math>l</math> is the number of layers in the neural network <li><math>n_l</math> is the number of neurons in the <math>l</math>th layer <li><math>W^{(l)}_{ji}</math> is the weight from the <math>i</math>th unit in the <math>l</math>th layer to the <math>j</math>th unit in the <math>(l + 1)</math>th layer <li><math>z^{(l)}_i</math> is the input to the <math>i</math>th unit in the <math>l</math>th layer <li><math>a^{(l)}_i</math> is the activation of the <math>i</math>th unit in the <math>l</math>th layer <li><math>A \bullet B</math> is the Hadamard or element-wise product, which for <math>r \times c</math> matrices <math>A</math> and <math>B</math> yields the <math>r \times c</math> matrix <math>C = A \bullet B</math> such that <math>C_{r, c} = A_{r, c} \cdot B_{r, c}</math> <li><math>f^{(l)}</math> is the activation function for units in the <math>l</math>th layer </ul> [初译] 符号概述: <ul> <li><math>l</math>是神经网络的层数 <li><math>n_l</math>第l层神经元的个数 <li><math>W^{(l)}_{ji}</math>是<math>l</math>层第<math>i</math>各节点到第<math>(l+1)</math>层第<math>j</math>个节点的权重 <li><math>z^{(l)}_i</math> 是第<math>l</math>层第<math>i</math>个单元的输入 <li><math>a^{(l)}_i</math> 是第<math>l</math>层第<math>i</math>个节点的激励 <li><math>A \bullet B</math> 是矩阵的Hadamard积或逐项乘积,对<math>r \times c</math>矩阵<math>A</math>和<math>B</math>,它们的乘积是<math>r \times c</math> 矩阵<math>C = A \bullet B</math> 满足<math>C_{r, c} = A_{r, c} \cdot B_{r, c}</math> <li><math>f^{(l)}</math> 是第<math>l</math>层中各单元的激励函数 </ul> [一审] 快速符号回顾: <ul> <li><math>l</math>是神经网络的层数 <li><math>n_l</math>第l层神经元的个数 <li><math>W^{(l)}_{ji}</math>是<math>l</math>层第<math>i</math>个节点到第<math>(l+1)</math>层第<math>j</math>个节点的权重 <li><math>z^{(l)}_i</math> 是第<math>l</math>层第<math>i</math>个单元的输入 <li><math>a^{(l)}_i</math> 是第<math>l</math>层第<math>i</math>个节点的激励 <li><math>A \bullet B</math> 是矩阵的Hadamard积或对应元素乘积,对<math>r \times c</math>矩阵<math>A</math>和<math>B</math>,它们的乘积是<math>r \times c</math> 矩阵<math>C = A \bullet B</math> 满足<math>C_{r, c} = A_{r, c} \cdot B_{r, c}</math> <li><math>f^{(l)}</math> 是第<math>l</math>层中各单元的激励函数 </ul>
Template:Languages
(
view source
)
Return to
用反向传导思想求导
.
Views
Page
Discussion
View source
History
Personal tools
Log in
ufldl resources
UFLDL Tutorial
Recommended Readings
wiki
Main page
Recent changes
Random page
Help
Search
Toolbox
What links here
Related changes
Special pages