Autoencoders and Sparsity

Revision as of 23:38, 26 February 2011 (view source)

(Created page with "So far, we have described the application of neural networks to supervised learning, in which we are have labeled training examples. Now suppose we have only unlabeled training ...")

← Older edit

Latest revision as of 12:43, 7 April 2013 (view source)

Kandeng (Talk | contribs)

Line 1:

-

So far, we have described the application of neural networks to supervised learning, in which we ~~are~~ have labeled

+

So far, we have described the application of neural networks to supervised learning, in which we have labeled

-

training examples. Now suppose we have only unlabeled training examples ~~set~~ <math>\textstyle \{x^{(1)}, x^{(2)}, x^{(3)}, \ldots\}</math>,

+

training examples. Now suppose we have only a set of unlabeled training examples <math>\textstyle \{x^{(1)}, x^{(2)}, x^{(3)}, \ldots\}</math>,

where <math>\textstyle x^{(i)} \in \Re^{n}</math>. An

'''autoencoder''' neural network is an unsupervised learning algorithm that applies backpropagation,

Line 18:

pixels) so <math>\textstyle n=100</math>, and there are <math>\textstyle s_2=50</math> hidden units in layer <math>\textstyle L_2</math>. Note that

we also have <math>\textstyle y \in \Re^{100}</math>. Since there are only 50 hidden units, the

-

network is forced to learn a ~~\emph{~~compressed} representation of the input.

+

network is forced to learn a ''compressed'' representation of the input.

I.e., given only the vector of hidden unit activations <math>\textstyle a^{(2)} \in \Re^{50}</math>,

it must try to '''reconstruct''' the 100-pixel input <math>\textstyle x</math>. If the input were completely

Line 24:

features---then this compression task would be very difficult. But if there is

structure in the data, for example, if some of the input features are correlated,

-

then this algorithm will be able to discover some of those correlations.~~\footnote{~~In fact,

+

then this algorithm will be able to discover some of those correlations. In fact,

this simple autoencoder often ends up learning a low-dimensional representation very similar

-

to ~~PCA's~~.}

+

to PCAs.

Our argument above relied on the number of hidden units <math>\textstyle s_2</math> being small. But

Line 36:

is large.

-

Informally, we will think of a neuron as being ``active'' (or as ``firing'') if

+

Informally, we will think of a neuron as being "active" (or as "firing") if

-

its output value is close to 1, or as being ``inactive'' if its output value is

+

its output value is close to 1, or as being "inactive" if its output value is

close to 0. We would like to constrain the neurons to be inactive most of the

-

time.~~\footnote{~~This discussion assumes a sigmoid activation function. If you are

+

time. This discussion assumes a sigmoid activation function. If you are

using a tanh activation function, then we think of a neuron as being inactive

-

when it outputs values close to -1.}

+

when it outputs values close to -1.

Recall that <math>\textstyle a^{(2)}_j</math> denotes the activation of hidden unit <math>\textstyle j</math> in the

Line 135:

<math>\textstyle J_{\rm sparse}(W,b)</math>. Using the derivative checking method, you will be able to verify

this for yourself as well.

+

Autoencoders and Sparsity

From Ufldl

Latest revision as of 12:43, 7 April 2013

Views

Personal tools

ufldl resources

wiki

Search

Toolbox

@@ Line 1: / Line 1: @@
-So far, we have described the application of neural networks to supervised learning, in which we are have labeled
+So far, we have described the application of neural networks to supervised learning, in which we have labeled
-training examples.  Now suppose we have only unlabeled training examples set <math>\textstyle \{x^{(1)}, x^{(2)}, x^{(3)}, \ldots\}</math>,
+training examples.  Now suppose we have only a set of unlabeled training examples <math>\textstyle \{x^{(1)}, x^{(2)}, x^{(3)}, \ldots\}</math>,
 where <math>\textstyle x^{(i)} \in \Re^{n}</math>.  An
 '''autoencoder''' neural network is an unsupervised learning algorithm that applies backpropagation,
@@ Line 18: / Line 18: @@
 pixels) so <math>\textstyle n=100</math>, and there are <math>\textstyle s_2=50</math> hidden units in layer <math>\textstyle L_2</math>.  Note that
 we also have <math>\textstyle y \in \Re^{100}</math>.  Since there are only 50 hidden units, the
-network is forced to learn a \emph{compressed} representation of the input.
+network is forced to learn a ''compressed'' representation of the input.
 I.e., given only the vector of hidden unit activations <math>\textstyle a^{(2)} \in \Re^{50}</math>,
 it must try to '''reconstruct''' the 100-pixel input <math>\textstyle x</math>.  If the input were completely
@@ Line 24: / Line 24: @@
 features---then this compression task would be very difficult.  But if there is
 structure in the data, for example, if some of the input features are correlated,
-then this algorithm will be able to discover some of those correlations.\footnote{In fact,
+then this algorithm will be able to discover some of those correlations. In fact,
 this simple autoencoder often ends up learning a low-dimensional representation very similar
-to PCA's.}
+to PCAs.
 Our argument above relied on the number of hidden units <math>\textstyle s_2</math> being small.  But
@@ Line 36: / Line 36: @@
 is large.
-Informally, we will think of a neuron as being ``active'' (or as ``firing'') if
+Informally, we will think of a neuron as being "active" (or as "firing") if
-its output value is close to 1, or as being ``inactive'' if its output value is
+its output value is close to 1, or as being "inactive" if its output value is
 close to 0.  We would like to constrain the neurons to be inactive most of the
-time.\footnote{This discussion assumes a sigmoid activation function.  If you are
+time. This discussion assumes a sigmoid activation function.  If you are
 using a tanh activation function, then we think of a neuron as being inactive
-when it outputs values close to -1.}
+when it outputs values close to -1.
 Recall that <math>\textstyle a^{(2)}_j</math> denotes the activation of hidden unit <math>\textstyle j</math> in the
@@ Line 135: / Line 135: @@
 <math>\textstyle J_{\rm sparse}(W,b)</math>.  Using the derivative checking method, you will be able to verify
 this for yourself as well.
+{{Sparse_Autoencoder}}
+{{Languages|自编码算法与稀疏性|中文}}