Deep Networks: Overview

Revision as of 19:39, 13 May 2011 (view source)

Ang (Talk | contribs)

← Older edit

Revision as of 20:21, 13 May 2011 (view source)

Ang (Talk | contribs)

(→Overview)

Newer edit →

Line 2:

In the previous sections, you constructed a 3-layer neural network comprising

-

an input, hidden and output layer. While fairly effective for MNIST, ~~the~~

+

an input, hidden and output layer. While fairly effective for MNIST, this

-

3-layer ~~network~~ is a fairly '''shallow''' network; by this, we mean that the

+

3-layer model is a fairly '''shallow''' network; by this, we mean that the

features (hidden layer activations <math>a^{(2)}</math>) are computed using

only "one layer" of computation (the hidden layer).

In this section, we begin to discuss '''deep''' neural networks, meaning ones

-

in which we have multiple hidden layers~~, so that we use multiple layers of~~

+

in which we have multiple hidden layers; this will allow us to compute much

-

~~computation~~ to compute ~~increasingly~~ complex features ~~from~~ the input. ~~Each~~

+

more complex features of the input. Because each hidden layer computes a

-

hidden layer computes a non-linear transformation of the previous layer~~. By~~

+

non-linear transformation of the previous layer, a deep network can have

-

~~using more hidden layers~~, deep ~~networks~~ can have significantly greater

+

significantly greater representational power (i.e., can learn

-

~~expressive~~ power (i.e., can learn significantly more complex functions)

+

significantly more complex functions) than a shallow one.

-

than ~~simple ones~~.

+

-

~~When~~ training a deep network, it is important ~~that we~~ use a ''non-linear''

+

Note that when training a deep network, it is important to use a ''non-linear''

activation function <math>f(\cdot)</math> in each hidden layer. This is

because multiple layers of linear functions would itself compute only a linear

Deep Networks: Overview

From Ufldl

Revision as of 20:21, 13 May 2011

Views

Personal tools

ufldl resources

wiki

Search

Toolbox

@@ Line 2: / Line 2: @@
 In the previous sections, you constructed a 3-layer neural network comprising
-an input, hidden and output layer.  While fairly effective for MNIST, the
+an input, hidden and output layer.  While fairly effective for MNIST, this
--layer network is a fairly '''shallow''' network; by this, we mean that the
+-layer model is a fairly '''shallow''' network; by this, we mean that the
 features (hidden layer activations <math>a^{(2)}</math>) are computed using
 only "one layer" of computation (the hidden layer).
 In this section, we begin to discuss '''deep''' neural networks, meaning ones
-in which we have multiple hidden layers, so that we use multiple layers of
+in which we have multiple hidden layers; this will allow us to compute much
-computation to compute increasingly complex features from the input.  Each
+more complex features of the input.  Because each hidden layer computes a
-hidden layer computes a non-linear transformation of the previous layer.  By
+non-linear transformation of the previous layer, a deep network can have
-using more hidden layers, deep networks can have significantly greater
+significantly greater representational power (i.e., can learn
-expressive power (i.e., can learn significantly more complex functions)
+significantly more complex functions) than a shallow one.
-than simple ones.
-When training a deep network, it is important that we use a ''non-linear''
+Note that when training a deep network, it is important to use a ''non-linear''
 activation function <math>f(\cdot)</math> in each hidden layer.   This is
 because multiple layers of linear functions would itself compute only a linear