Neural Networks

Revision as of 06:23, 26 February 2011 (view source)

Ang (Talk | contribs)

← Older edit

Revision as of 22:56, 26 February 2011 (view source)

Ang (Talk | contribs)

Newer edit →

Line 8:

diagram to denote a single neuron:

-

[[Image:SingleNeuron.png|~~400px~~|center]]

+

[[Image:SingleNeuron.png|300px|center]]

This "neuron" is a computational unit that takes as input <math>x_1, x_2, x_3</math> (and a +1 intercept term), and

Line 49:

-

== Neural Network ~~formulation~~ ==

+

== Neural Network model ==

-

+

A neural network is put together by hooking together many of our simple

Line 97:

Line 96:

including the bias term (e.g., <math>z_i^{(2)} = \sum_{j=1}^n W^{(1)}_{ij} x_j + b^{(1)}_i</math>), so that

<math>a^{(l)}_i = f(z^{(l)}_i)</math>.

+

Note that this easily lends itself to a more compact notation. Specifically, if we extend the

+

activation function <math>f(\cdot)</math>

+

to apply to vectors in an element-wise fashion (i.e.,

+

<math>f([z_1, z_2, z_3]) = [f(z_1), f(z_2), f(z_3)]</math>), then we can write

+

Equations~(\ref{eqn-network331a}-\ref{eqn-network331d}) more

+

compactly as:

+

:<math>\begin{align}

+

z^{(2)} &= W^{(1)} x + b^{(1)} \\

+

a^{(2)} &= f(z^{(2)}) \\

+

z^{(3)} &= W^{(2)} a^{(2)} + b^{(2)} \\

+

h_{W,b}(x) &= a^{(3)} = f(z^{(3)})

+

\end{align}</math>

+

More generally, recalling that we also use <math>a^{(1)} = x</math> to also denote the values from the input layer,

+

then given layer <math>l</math>'s activations <math>a^{(l)}</math>, we can compute layer <math>l+1</math>'s activations <math>a^{(l+1)}</math> as:

+

:<math>\begin{align}

+

z^{(l+1)} &= W^{(l)} a^{(l)} + b^{(l)} \\

+

a^{(l+1)} &= f(z^{(l+1)})

+

\end{align}</math>

+

By organizing our parameters in matrices and using matrix-vector operations, we can take

+

advantage of fast linear algebra routines to quickly perform calculations in our network.

+

We have so far focused on one example neural network, but one can also build neural

+

networks with other {\bf

+

architectures} (meaning patterns of connectivity between neurons), including ones with multiple hidden layers.

+

The most common choice is a <math>n_l</math>-layered network

+

where layer <math>1</math> is the input layer, layer <math>n_l</math> is the output layer, and each

+

layer <math>l</math> is densely connected to layer <math>l+1</math>. In this setting, to compute the

+

output of the network, we can successively compute all the activations in layer

+

<math>L_2</math>, then layer <math>L_3</math>, and so on, up to layer <math>L_{n_l}</math>, using Equations~(\ref{eqn-forwardprop1}-\ref{eqn-forwardprop2}). This is one

+

example of a {\bf feedforward} neural network, since the connectivity graph

+

does not have any directed loops or cycles.

+

%We will write <math>s_l</math> to denote the

+

%number of units in layer <math>l</math> of the network (not counting the bias unit).

+

Neural networks can also have multiple output units. For example, here is a network

+

with two hidden layers layers <math>L_2</math> and <math>L_3</math> and two output units in layer <math>L_4</math>:

+

[[Image:Network3322.png|500px|center]]

+

To train this network, we would need training examples <math>(x^{(i)}, y^{(i)})</math>

+

where <math>y^{(i)} \in \Re^2</math>. This sort of network is useful if there're multiple

+

outputs that you're interested in predicting. (For example, in a medical

+

diagnosis application, the vector <math>x</math> might give the input features of a

+

patient, and the different outputs <math>y_i</math>'s might indicate presence or absence

+

of different diseases.)

Neural Networks

From Ufldl

Revision as of 22:56, 26 February 2011

Views

Personal tools

ufldl resources

wiki

Search

Toolbox

@@ Line 8: / Line 8: @@
 diagram to denote a single neuron:
-[[Image:SingleNeuron.png|400px|center]]
+[[Image:SingleNeuron.png|300px|center]]
 This "neuron" is a computational unit that takes as input <math>x_1, x_2, x_3</math> (and a +1 intercept term), and
@@ Line 49: / Line 49: @@
-== Neural Network formulation ==
+== Neural Network model ==
 A neural network is put together by hooking together many of our simple
@@ Line 97: / Line 96: @@
 including the bias term (e.g., <math>z_i^{(2)} = \sum_{j=1}^n W^{(1)}_{ij} x_j + b^{(1)}_i</math>), so that
 <math>a^{(l)}_i = f(z^{(l)}_i)</math>.
+Note that this easily lends itself to a more compact notation.  Specifically, if we extend the
+activation function <math>f(\cdot)</math>
+to apply to vectors in an element-wise fashion (i.e.,
+<math>f([z_1, z_2, z_3]) = [f(z_1), f(z_2), f(z_3)]</math>), then we can write
+Equations~(\ref{eqn-network331a}-\ref{eqn-network331d}) more
+compactly as:
+:<math>\begin{align}
+z^{(2)} &= W^{(1)} x + b^{(1)} \\
+a^{(2)} &= f(z^{(2)}) \\
+z^{(3)} &= W^{(2)} a^{(2)} + b^{(2)} \\
+h_{W,b}(x) &= a^{(3)} = f(z^{(3)})
+\end{align}</math>
+More generally, recalling that we also use <math>a^{(1)} = x</math> to also denote the values from the input layer,
+then given layer <math>l</math>'s activations <math>a^{(l)}</math>, we can compute layer <math>l+1</math>'s activations <math>a^{(l+1)}</math> as:
+:<math>\begin{align}
+z^{(l+1)} &= W^{(l)} a^{(l)} + b^{(l)}   \\
+a^{(l+1)} &= f(z^{(l+1)})
+\end{align}</math>
+By organizing our parameters in matrices and using matrix-vector operations, we can take
+advantage of fast linear algebra routines to quickly perform calculations in our network.
+We have so far focused on one example neural network, but one can also build neural
+networks with other {\bf
+architectures} (meaning patterns of connectivity between neurons), including ones with multiple hidden layers.
+The most common choice is a <math>n_l</math>-layered network
+where layer <math>1</math> is the input layer, layer <math>n_l</math> is the output layer, and each
+layer <math>l</math> is densely connected to layer <math>l+1</math>.  In this setting, to compute the
+output of the network, we can successively compute all the activations in layer
+<math>L_2</math>, then layer <math>L_3</math>, and so on, up to layer <math>L_{n_l}</math>, using Equations~(\ref{eqn-forwardprop1}-\ref{eqn-forwardprop2}).  This is one
+example of a {\bf feedforward} neural network, since the connectivity graph
+does not have any directed loops or cycles.
+%We will write <math>s_l</math> to denote the
+%number of units in layer <math>l</math> of the network (not counting the bias unit).
+Neural networks can also have multiple output units.  For example, here is a network
+with two hidden layers layers <math>L_2</math> and <math>L_3</math> and two output units in layer <math>L_4</math>:
+[[Image:Network3322.png|500px|center]]
+To train this network, we would need training examples <math>(x^{(i)}, y^{(i)})</math>
+where <math>y^{(i)} \in \Re^2</math>.  This sort of network is useful if there're multiple
+outputs that you're interested in predicting.  (For example, in a medical
+diagnosis application, the vector <math>x</math> might give the input features of a
+patient, and the different outputs <math>y_i</math>'s might indicate presence or absence
+of different diseases.)