Deep Networks: Overview

From Ufldl

Jump to: navigation, search
(Overview)
Line 2: Line 2:
In the previous sections, you constructed a 3-layer neural network comprising
In the previous sections, you constructed a 3-layer neural network comprising
-
an input, hidden and output layer.  While fairly effective for MNIST, the
+
an input, hidden and output layer.  While fairly effective for MNIST, this
-
3-layer network is a fairly '''shallow''' network; by this, we mean that the
+
3-layer model is a fairly '''shallow''' network; by this, we mean that the
features (hidden layer activations <math>a^{(2)}</math>) are computed using
features (hidden layer activations <math>a^{(2)}</math>) are computed using
only "one layer" of computation (the hidden layer).
only "one layer" of computation (the hidden layer).
In this section, we begin to discuss '''deep''' neural networks, meaning ones
In this section, we begin to discuss '''deep''' neural networks, meaning ones
-
in which we have multiple hidden layers, so that we use multiple layers of
+
in which we have multiple hidden layers; this will allow us to compute much
-
computation to compute increasingly complex features from the input.  Each
+
more complex features of the input.  Because each hidden layer computes a  
-
hidden layer computes a non-linear transformation of the previous layer.  By
+
non-linear transformation of the previous layer, a deep network can have
-
using more hidden layers, deep networks can have significantly greater
+
significantly greater representational power (i.e., can learn
-
expressive power (i.e., can learn significantly more complex functions)
+
significantly more complex functions) than a shallow one.  
-
than simple ones.
+
-
When training a deep network, it is important that we use a ''non-linear''
+
Note that when training a deep network, it is important to use a ''non-linear''
activation function <math>f(\cdot)</math> in each hidden layer.  This is
activation function <math>f(\cdot)</math> in each hidden layer.  This is
because multiple layers of linear functions would itself compute only a linear
because multiple layers of linear functions would itself compute only a linear

Revision as of 20:21, 13 May 2011

Personal tools