# Backpropagation Algorithm

 Revision as of 23:26, 26 February 2011 (view source)Ang (Talk | contribs)← Older edit Revision as of 23:53, 1 March 2011 (view source)Ang (Talk | contribs) mNewer edit → Line 71: Line 71: partial derivatives of the cost function $J(W,b;x,y)$ defined with respect partial derivatives of the cost function $J(W,b;x,y)$ defined with respect to a single example $(x,y)$. to a single example $(x,y)$. - Once we can compute these, + Once we can compute these, we see that - then by referring to Equation~(\ref{eqn-costfunction}), we see that + the derivative of the overall cost function $J(W,b)$ can be computed as the derivative of the overall cost function $J(W,b)$ can be computed as :\begin{align} :[itex]\begin{align} Line 126: Line 125: The algorithm can then be written: The algorithm can then be written: - : 1. Perform a feedforward pass, computing the activations for layers [itex]\textstyle L_2, $\textstyle L_3$, up to the output layer $\textstyle L_{n_l}$, using Equations~(\ref{eqn-forwardprop1}-\ref{eqn-forwardprop2}). + : 1. Perform a feedforward pass, computing the activations for layers $\textstyle L_2$, $\textstyle L_3$, up to the output layer $\textstyle L_{n_l}$, using the equations defining the forward propagation steps. : 2. For the output layer (layer $\textstyle n_l$), set : 2. For the output layer (layer $\textstyle n_l$), set ::\begin{align} ::[itex]\begin{align} Line 159: Line 158: : 1. Set [itex]\textstyle \Delta W^{(l)} := 0, $\textstyle \Delta b^{(l)} := 0$ (matrix/vector of zeros) for all $\textstyle l$. : 1. Set $\textstyle \Delta W^{(l)} := 0$, $\textstyle \Delta b^{(l)} := 0$ (matrix/vector of zeros) for all $\textstyle l$. : 2. For $\textstyle i = 1$ to $\textstyle m$, : 2. For $\textstyle i = 1$ to $\textstyle m$, - :: 2a. Use backpropagation to compute $\textstyle \nabla_{W^{(l)}} J(W,b;x,y)$ and \\ + :: 2a. Use backpropagation to compute $\textstyle \nabla_{W^{(l)}} J(W,b;x,y)$ and $\textstyle \nabla_{b^{(l)}} J(W,b;x,y)$. $\textstyle \nabla_{b^{(l)}} J(W,b;x,y)$. :: 2b. Set $\textstyle \Delta W^{(l)} := \Delta W^{(l)} + \nabla_{W^{(l)}} J(W,b;x,y)$. :: 2b. Set $\textstyle \Delta W^{(l)} := \Delta W^{(l)} + \nabla_{W^{(l)}} J(W,b;x,y)$.