Deriving gradients using the backpropagation idea

From Ufldl

Jump to: navigation, search

@@ Line 227: / Line 227: @@
 To have <math>J(z^{(4)}) = F(x)</math>, we can set <math>J(z^{(4)}) = \sum_k J(z^{(4)}_k)</math>.
-Now that we can see <math>F</math> as a neural network, we can try to compute the gradient <math>\nabla_W F</math>. However, we now face the difficulty that <math>W</math> appears twice in the network. Fortunately, it turns out that if <math>W</math> appears multiple times in the network, the gradient with respect to <math>W</math> is simply the sum of gradients for each <math>W</math> in the network (you may wish to work out a formal proof of this fact to convince yourself). With this in mind, we can proceed to work out the deltas first:
+Now that we can see <math>F</math> as a neural network, we can try to compute the gradient <math>\nabla_W F</math>. However, we now face the difficulty that <math>W</math> appears twice in the network. Fortunately, it turns out that if <math>W</math> appears multiple times in the network, the gradient with respect to <math>W</math> is simply the sum of gradients for each instance of <math>W</math> in the network (you may wish to work out a formal proof of this fact to convince yourself). With this in mind, we will proceed to work out the deltas first:
 <table align="center">
@@ Line 258: / Line 258: @@
 </table>
-First we find the gradients with respect to each <math>W</math>.
+To find the gradients with respect to <math>W</math>, first we find the gradients with respect to each instance of <math>W</math> in the network.
 With respect to <math>W^T</math>:

Deriving gradients using the backpropagation idea

From Ufldl

Revision as of 06:46, 30 May 2011

Views

Personal tools

ufldl resources

wiki

Search

Toolbox