Backpropagation Algorithm
From Ufldl
m |
|||
Line 71: | Line 71: | ||
partial derivatives of the cost function <math>J(W,b;x,y)</math> defined with respect | partial derivatives of the cost function <math>J(W,b;x,y)</math> defined with respect | ||
to a single example <math>(x,y)</math>. | to a single example <math>(x,y)</math>. | ||
- | Once we can compute these | + | Once we can compute these, we see that |
- | + | ||
the derivative of the overall cost function <math>J(W,b)</math> can be computed as | the derivative of the overall cost function <math>J(W,b)</math> can be computed as | ||
:<math>\begin{align} | :<math>\begin{align} | ||
Line 126: | Line 125: | ||
The algorithm can then be written: | The algorithm can then be written: | ||
- | : 1. Perform a feedforward pass, computing the activations for layers <math>\textstyle L_2</math>, <math>\textstyle L_3</math>, up to the output layer <math>\textstyle L_{n_l}</math>, using | + | : 1. Perform a feedforward pass, computing the activations for layers <math>\textstyle L_2</math>, <math>\textstyle L_3</math>, up to the output layer <math>\textstyle L_{n_l}</math>, using the equations defining the forward propagation steps. |
: 2. For the output layer (layer <math>\textstyle n_l</math>), set | : 2. For the output layer (layer <math>\textstyle n_l</math>), set | ||
::<math>\begin{align} | ::<math>\begin{align} | ||
Line 159: | Line 158: | ||
: 1. Set <math>\textstyle \Delta W^{(l)} := 0</math>, <math>\textstyle \Delta b^{(l)} := 0</math> (matrix/vector of zeros) for all <math>\textstyle l</math>. | : 1. Set <math>\textstyle \Delta W^{(l)} := 0</math>, <math>\textstyle \Delta b^{(l)} := 0</math> (matrix/vector of zeros) for all <math>\textstyle l</math>. | ||
: 2. For <math>\textstyle i = 1</math> to <math>\textstyle m</math>, | : 2. For <math>\textstyle i = 1</math> to <math>\textstyle m</math>, | ||
- | :: 2a. Use backpropagation to compute <math>\textstyle \nabla_{W^{(l)}} J(W,b;x,y)</math> and | + | :: 2a. Use backpropagation to compute <math>\textstyle \nabla_{W^{(l)}} J(W,b;x,y)</math> and |
<math>\textstyle \nabla_{b^{(l)}} J(W,b;x,y)</math>. | <math>\textstyle \nabla_{b^{(l)}} J(W,b;x,y)</math>. | ||
:: 2b. Set <math>\textstyle \Delta W^{(l)} := \Delta W^{(l)} + \nabla_{W^{(l)}} J(W,b;x,y)</math>. | :: 2b. Set <math>\textstyle \Delta W^{(l)} := \Delta W^{(l)} + \nabla_{W^{(l)}} J(W,b;x,y)</math>. |