Backpropagation Algorithm

 partial derivatives of the cost function $J(W,b;x,y)$ defined with respect
to a single example $(x,y)$.
Once we can compute these, we see that
the derivative of the overall cost function $J(W,b)$ can be computed as

:\begin{align}
:[itex]\begin{align}

The algorithm can then be written:

: 1. Perform a feedforward pass, computing the activations for layers $\textstyle L_2$, $\textstyle L_3$, up to the output layer $\textstyle L_{n_l}$, using the equations defining the forward propagation steps.
: 2. For the output layer (layer $\textstyle n_l$), set
::\begin{align}
::[itex]\begin{align}

: 1. Set $\textstyle \Delta W^{(l)} := 0$, $\textstyle \Delta b^{(l)} := 0$ (matrix/vector of zeros) for all $\textstyle l$.
: 2. For $\textstyle i = 1$ to $\textstyle m$,
:: 2a. Use backpropagation to compute $\textstyle \nabla_{W^{(l)}} J(W,b;x,y)$ and $\textstyle \nabla_{b^{(l)}} J(W,b;x,y)$.
:: 2b. Set $\textstyle \Delta W^{(l)} := \Delta W^{(l)} + \nabla_{W^{(l)}} J(W,b;x,y)$.