# Neural Networks

 to apply to vectors in an element-wise fashion (i.e.,
$f([z_1, z_2, z_3]) = [f(z_1), f(z_2), f(z_3)]$), then we can write
the equations above more
compactly as:
:[itex]\begin{align}
z^{(2)} &= W^{(1)} x + b^{(1)} \\
a^{(2)} &= f(z^{(2)}) \\
z^{(3)} &= W^{(2)} a^{(2)} + b^{(2)} \\
h_{W,b}(x) &= a^{(3)} = f(z^{(3)})
\end{align}[/itex]
We call this step '''forward propagation.'''  More generally, recalling that we also use $a^{(1)} = x$ to also denote the values from the input layer,
then given layer $l$'s activations $a^{(l)}$, we can compute layer $l+1$'s activations $a^{(l+1)}$ as:
:[itex]\begin{align}
z^{(l+1)} &= W^{(l)} a^{(l)} + b^{(l)}   \\
a^{(l+1)} &= f(z^{(l+1)})
\end{align}[/itex]
By organizing our parameters in matrices and using matrix-vector operations, we can take
advantage of fast linear algebra routines to quickly perform calculations in our network.

We have so far focused on one example neural network, but one can also build neural
networks with other '''architectures''' (meaning patterns of connectivity between neurons), including ones with multiple hidden layers.
The most common choice is a $\textstyle n_l$-layered network
where layer $\textstyle 1$ is the input layer, layer $\textstyle n_l$ is the output layer, and each
layer $\textstyle l$ is densely connected to layer $\textstyle l+1$.  In this setting, to compute the
output of the network, we can successively compute all the activations in layer
$\textstyle L_2$, then layer $\textstyle L_3$, and so on, up to layer $\textstyle L_{n_l}$, using the equations above that describe the forward propagation step.  This is one
example of a '''feedforward''' neural network, since the connectivity graph
does not have any directed loops or cycles.