Deriving gradients using the backpropagation idea

From Ufldl

Jump to: navigation, search
(Example 3: ICA reconstruction cost)
Line 227: Line 227:
To have <math>J(z^{(4)}) = F(x)</math>, we can set <math>J(z^{(4)}) = \sum_k J(z^{(4)}_k)</math>.
To have <math>J(z^{(4)}) = F(x)</math>, we can set <math>J(z^{(4)}) = \sum_k J(z^{(4)}_k)</math>.
-
Now that we can see <math>F</math> as a neural network, we can try to compute the gradient <math>\nabla_W F</math>. However, we now face the difficulty that <math>W</math> appears twice in the network. Fortunately, it turns out that if <math>W</math> appears multiple times in the network, the gradient with respect to <math>W</math> is simply the sum of gradients for each <math>W</math> in the network (you may wish to work out a formal proof of this fact to convince yourself). With this in mind, we can proceed to work out the deltas first:
+
Now that we can see <math>F</math> as a neural network, we can try to compute the gradient <math>\nabla_W F</math>. However, we now face the difficulty that <math>W</math> appears twice in the network. Fortunately, it turns out that if <math>W</math> appears multiple times in the network, the gradient with respect to <math>W</math> is simply the sum of gradients for each instance of <math>W</math> in the network (you may wish to work out a formal proof of this fact to convince yourself). With this in mind, we will proceed to work out the deltas first:
<table align="center">
<table align="center">
Line 258: Line 258:
</table>
</table>
-
First we find the gradients with respect to each <math>W</math>.
+
To find the gradients with respect to <math>W</math>, first we find the gradients with respect to each instance of <math>W</math> in the network.
With respect to <math>W^T</math>:
With respect to <math>W^T</math>:

Revision as of 06:46, 30 May 2011

Personal tools