Neural Networks

 Neural Networks

In the sequel, we also let $z^{(l)}_i$ denote the total weighted sum of inputs to unit $i$ in layer $l$,
including the bias term (e.g., $\textstyle z_i^{(2)} = \sum_{j=1}^n W^{(1)}_{ij} x_j + b^{(1)}_i$), so that
$a^{(l)}_i = f(z^{(l)}_i)$.