Deriving gradients using the backpropagation idea
From Ufldl
Line 227: | Line 227: | ||
To have <math>J(z^{(4)}) = F(x)</math>, we can set <math>J(z^{(4)}) = \sum_k J(z^{(4)}_k)</math>. | To have <math>J(z^{(4)}) = F(x)</math>, we can set <math>J(z^{(4)}) = \sum_k J(z^{(4)}_k)</math>. | ||
- | Now that we can see <math>F</math> as a neural network, we can try to compute the gradient <math>\nabla_W F</math>. However, we now face the difficulty that <math>W</math> appears twice in the network. Fortunately, it turns out that if <math>W</math> appears multiple times in the network, the gradient with respect to <math>W</math> is simply the sum of gradients for each <math>W</math> in the network (you may wish to work out a formal proof of this fact to convince yourself). With this in mind, we | + | Now that we can see <math>F</math> as a neural network, we can try to compute the gradient <math>\nabla_W F</math>. However, we now face the difficulty that <math>W</math> appears twice in the network. Fortunately, it turns out that if <math>W</math> appears multiple times in the network, the gradient with respect to <math>W</math> is simply the sum of gradients for each instance of <math>W</math> in the network (you may wish to work out a formal proof of this fact to convince yourself). With this in mind, we will proceed to work out the deltas first: |
<table align="center"> | <table align="center"> | ||
Line 258: | Line 258: | ||
</table> | </table> | ||
- | + | To find the gradients with respect to <math>W</math>, first we find the gradients with respect to each instance of <math>W</math> in the network. | |
With respect to <math>W^T</math>: | With respect to <math>W^T</math>: | ||
Line 284: | Line 284: | ||
\end{align} | \end{align} | ||
</math> | </math> | ||
+ | |||
+ | |||
+ | |||
+ | {{Languages|用反向传导思想求导|中文}} |