Self-Taught Learning to Deep Networks

From Ufldl

Jump to: navigation, search
Line 1: Line 1:
-
== From Self-Taught Learning to Deep Networks ==
+
== Overview ==
-
Recall that in self-taught learning, we first train a sparse
+
In this section, we will improve upon the features learned from self-taught learning by ''finetuning'' them for our classification objective.
-
autoencoder on our unlabeled data.  Then, given a new example <math>\textstyle x</math>, we can use the
+
 
 +
Recall that in self-taught learning, we first train a sparse autoencoder on our unlabeled data.  Then, given a new example <math>\textstyle x</math>, we can use the
hidden layer to extract features <math>\textstyle a</math>.  This is shown as follows:  
hidden layer to extract features <math>\textstyle a</math>.  This is shown as follows:  
-
[[File:STL_SparseAE_Features.png|300px]]
+
[[File:STL_SparseAE_Features.png|200px]]
Now, we are interested in solving a classification task, where our goal is to
Now, we are interested in solving a classification task, where our goal is to
Line 41: Line 42:
we can now further perform gradient descent from the current value of
we can now further perform gradient descent from the current value of
the weights to try to further drive down training error.
the weights to try to further drive down training error.
-
 
-
===Discussion===
 
-
 
-
Given that the whole algorithm is just a big neural network, why don't we just
 
-
carry out the fine-tuning step, without doing any pre-training/unsupervised
 
-
feature learning?  There are several reasons:
 
-
 
-
<ul>
 
-
<li> First and most important, labeled data is often scarce, and unlabeled
 
-
data is cheap and plentiful.  The promise of self-taught learning is that by
 
-
exploiting the massive amount of unlabeled data, we can learn much better
 
-
models.  The fine-tuning step can be done only using labeled data.  In
 
-
contrast, by using unlabeled data to learn a good initial value for the
 
-
first layer of weights <math>\textstyle W^{(1)}</math>, we usually get much better classifiers
 
-
after fine-tuning.
 
-
 
-
<li> Second, training a neural network using supervised learning involves
 
-
solving a highly non-convex optimization problem (say, minimizing the training
 
-
error <math>\textstyle \sum_i ||h_W(x^{(i)}) - y^{(i)}||^2</math> as a function of the network parameters
 
-
<math>\textstyle W</math>). 
 
-
The optimization problem can therefore be rife with local optima, and training
 
-
with gradient descent (or methods like conjugate gradient and L-BFGS) do not
 
-
work well.  In contrast, by first initializing the parameters using an
 
-
unsupervised feature learning/pre-training step, we can end up at much better
 
-
solutions. (Actually, pre-training has benefits beyond just helping to
 
-
get out of local optima.  In particular, it has been shown to also have
 
-
a useful "regularization" effect. (Erhan et al., 2010) But a full discussion
 
-
is beyond the scope of these notes)
 
-
</ul>
 

Revision as of 00:04, 12 May 2011

Personal tools