# Neural Networks

 Revision as of 06:23, 26 February 2011 (view source)Ang (Talk | contribs)← Older edit Revision as of 22:56, 26 February 2011 (view source)Ang (Talk | contribs) Newer edit → Line 8: Line 8: diagram to denote a single neuron: diagram to denote a single neuron: - [[Image:SingleNeuron.png|400px|center]] + [[Image:SingleNeuron.png|300px|center]] This "neuron" is a computational unit that takes as input $x_1, x_2, x_3$ (and a +1 intercept term), and This "neuron" is a computational unit that takes as input $x_1, x_2, x_3$ (and a +1 intercept term), and Line 49: Line 49: - == Neural Network formulation == + == Neural Network model == - + A neural network is put together by hooking together many of our simple A neural network is put together by hooking together many of our simple Line 97: Line 96: including the bias term (e.g., $z_i^{(2)} = \sum_{j=1}^n W^{(1)}_{ij} x_j + b^{(1)}_i$), so that including the bias term (e.g., $z_i^{(2)} = \sum_{j=1}^n W^{(1)}_{ij} x_j + b^{(1)}_i$), so that $a^{(l)}_i = f(z^{(l)}_i)$. $a^{(l)}_i = f(z^{(l)}_i)$. + + Note that this easily lends itself to a more compact notation.  Specifically, if we extend the + activation function $f(\cdot)$ + to apply to vectors in an element-wise fashion (i.e., + $f([z_1, z_2, z_3]) = [f(z_1), f(z_2), f(z_3)]$), then we can write + Equations~(\ref{eqn-network331a}-\ref{eqn-network331d}) more + compactly as: + :\begin{align} + z^{(2)} &= W^{(1)} x + b^{(1)} \\ + a^{(2)} &= f(z^{(2)}) \\ + z^{(3)} &= W^{(2)} a^{(2)} + b^{(2)} \\ + h_{W,b}(x) &= a^{(3)} = f(z^{(3)}) + \end{align} + More generally, recalling that we also use $a^{(1)} = x$ to also denote the values from the input layer, + then given layer $l$'s activations $a^{(l)}$, we can compute layer $l+1$'s activations $a^{(l+1)}$ as: + :\begin{align} + z^{(l+1)} &= W^{(l)} a^{(l)} + b^{(l)} \\ + a^{(l+1)} &= f(z^{(l+1)}) + \end{align} + By organizing our parameters in matrices and using matrix-vector operations, we can take + advantage of fast linear algebra routines to quickly perform calculations in our network. + + We have so far focused on one example neural network, but one can also build neural + networks with other {\bf + architectures} (meaning patterns of connectivity between neurons), including ones with multiple hidden layers. + The most common choice is a $n_l$-layered network + where layer $1$ is the input layer, layer $n_l$ is the output layer, and each + layer $l$ is densely connected to layer $l+1$.  In this setting, to compute the + output of the network, we can successively compute all the activations in layer + $L_2$, then layer $L_3$, and so on, up to layer $L_{n_l}$, using Equations~(\ref{eqn-forwardprop1}-\ref{eqn-forwardprop2}).  This is one + example of a {\bf feedforward} neural network, since the connectivity graph + does not have any directed loops or cycles. + %We will write $s_l$ to denote the + %number of units in layer $l$ of the network (not counting the bias unit). + + Neural networks can also have multiple output units.  For example, here is a network + with two hidden layers layers $L_2$ and $L_3$ and two output units in layer $L_4$: + + [[Image:Network3322.png|500px|center]] + + To train this network, we would need training examples $(x^{(i)}, y^{(i)})$ + where $y^{(i)} \in \Re^2$.  This sort of network is useful if there're multiple + outputs that you're interested in predicting.  (For example, in a medical + diagnosis application, the vector $x$ might give the input features of a + patient, and the different outputs $y_i$'s might indicate presence or absence + of different diseases.)