Self-Taught Learning

From Ufldl

Jump to: navigation, search
 
Line 1: Line 1:
== Overview ==
== Overview ==
-
In machine learning, one of the most reliable ways to get better performance is
+
Assuming that we have a sufficiently powerful learning algorithm, one of the most reliable  
-
to give your algorithms more data.  This has led to the that aphorism that in
+
ways to get better performance is to give the algorithm more data.  This has led to the  
 +
that aphorism that in
machine learning, "sometimes it's not who has the best algorithm that wins; it's  
machine learning, "sometimes it's not who has the best algorithm that wins; it's  
who has the most data."  
who has the most data."  
Line 28: Line 29:
supervised learning on that labeled data to solve the classification task.
supervised learning on that labeled data to solve the classification task.
-
These ideas are probably most powerful in settings where we have a lot of
+
These ideas probably have the most powerful effects in problems where we have a lot of
-
unlabeled data, and a relatively smaller amount of labeled data.  However,
+
unlabeled data, and a smaller amount of labeled data.  However,
-
these models often given good results even if we have only
+
they typically give good results even if we have only
labeled data (in which case we usually perform the feature learning step using
labeled data (in which case we usually perform the feature learning step using
-
the labeled data, but ignoring the labels).  
+
the labeled data, but ignoring the labels).
-
 
+
== Learning features ==
== Learning features ==
Line 44: Line 44:
(perhaps with appropriate whitening or other pre-processing):
(perhaps with appropriate whitening or other pre-processing):
-
[[File:STL_SparseAE.png]]
+
[[File:STL_SparseAE.png|350px]]
-
Having trained the parameters <math>\textstyle W^{(1)}, b^{(1)}, W^{(2)} b^{(2)}</math> of this model,
+
Having trained the parameters <math>\textstyle W^{(1)}, b^{(1)}, W^{(2)}, b^{(2)}</math> of this model,
given any new input <math>\textstyle x</math>, we can now compute the corresponding vector of
given any new input <math>\textstyle x</math>, we can now compute the corresponding vector of
activations <math>\textstyle a</math> of the hidden units.  As we saw previously, this often gives a
activations <math>\textstyle a</math> of the hidden units.  As we saw previously, this often gives a
Line 53: Line 53:
neural network:
neural network:
-
[[File:STL_SparseAE_Features.png]]
+
[[File:STL_SparseAE_Features.png|300px]]
This is just the sparse autoencoder that we previously had, with with the final
This is just the sparse autoencoder that we previously had, with with the final
Line 73: Line 73:
\}</math> (if we use the replacement representation, and use <math>\textstyle a_l^{(i)}</math> to represent the  
\}</math> (if we use the replacement representation, and use <math>\textstyle a_l^{(i)}</math> to represent the  
<math>\textstyle i</math>-th training example), or <math>\textstyle \{
<math>\textstyle i</math>-th training example), or <math>\textstyle \{
-
((x_l^{(1)}, a_l^{(1)}), y^{(1)}), ((x_l^{(2)}, a_l^{(1)}), y^{(2)}), \ldots
+
((x_l^{(1)}, a_l^{(1)}), y^{(1)}), ((x_l^{(2)}, a_l^{(1)}), y^{(2)}), \ldots,
((x_l^{(m_l)}, a_l^{(1)}), y^{(m_l)}) \}</math> (if we use the concatenated
((x_l^{(m_l)}, a_l^{(1)}), y^{(m_l)}) \}</math> (if we use the concatenated
representation).  In practice, the concatenated representation often works
representation).  In practice, the concatenated representation often works
Line 83: Line 83:
Given a test example <math>\textstyle x_{\rm test}</math>, we would then follow the same procedure:
Given a test example <math>\textstyle x_{\rm test}</math>, we would then follow the same procedure:
For feed it to the autoencoder to get <math>\textstyle a_{\rm test}</math>.  Then, feed  
For feed it to the autoencoder to get <math>\textstyle a_{\rm test}</math>.  Then, feed  
-
either <math>\textstyle a_{\rm test}</math> or <math>\textstyle (x_{\rm test}, a_{\rm test})</math> to the trained classifier to get a prediction.  
+
either <math>\textstyle a_{\rm test}</math> or <math>\textstyle (x_{\rm test}, a_{\rm test})</math> to the trained classifier to get a prediction.
== On pre-processing the data ==  
== On pre-processing the data ==  
Line 91: Line 91:
various pre-processing parameters.  For example, one may have computed
various pre-processing parameters.  For example, one may have computed
a mean value of the data and subtracted off this mean to perform mean normalization,
a mean value of the data and subtracted off this mean to perform mean normalization,
-
or used PCA to compute a matrix <math>\textstyle U</math> to represent the data as <math>\textstyle U^Tx</math> (or PCA  
+
or used PCA to compute a matrix <math>\textstyle U</math> to represent the data as <math>\textstyle U^Tx</math> (or used
 +
PCA  
whitening or ZCA whitening).  If this is the case, then it is important to
whitening or ZCA whitening).  If this is the case, then it is important to
save away these preprocessing parameters, and to use the ''same'' parameters
save away these preprocessing parameters, and to use the ''same'' parameters
Line 102: Line 103:
labeled training set, since that might result in a dramatically different
labeled training set, since that might result in a dramatically different
pre-processing transformation, which would make the input distribution to
pre-processing transformation, which would make the input distribution to
-
the autoencoder very different from what it was actually trained on.  
+
the autoencoder very different from what it was actually trained on.
== On the terminology of unsupervised feature learning ==  
== On the terminology of unsupervised feature learning ==  
There are two common unsupervised feature learning settings, depending on what type of  
There are two common unsupervised feature learning settings, depending on what type of  
-
unlabeled data you have.  The more powerful setting is the '''self-taught learning'''
+
unlabeled data you have.  The more general and powerful setting is the '''self-taught learning'''
setting, which does not assume that your unlabeled data <math>x_u</math> has to
setting, which does not assume that your unlabeled data <math>x_u</math> has to
be drawn from the same distribution as your labeled data <math>x_l</math>.  The  
be drawn from the same distribution as your labeled data <math>x_l</math>.  The  
Line 130: Line 131:
ones are motorcycles), then we could use this form of unlabeled data to
ones are motorcycles), then we could use this form of unlabeled data to
learn the features.  This setting---where each unlabeled example is drawn from the same
learn the features.  This setting---where each unlabeled example is drawn from the same
-
distribution as your labeled examples---is sometimes called the '''semi-supervised'''
+
distribution as your labeled examples---is sometimes called the semi-supervised  
-
setting.  In practice, we rarely have this sort of unlabeled data (where would you
+
setting.  In practice, we often do not have this sort of unlabeled data (where would you
get a database of images where every image is either a car or a motorcycle, but
get a database of images where every image is either a car or a motorcycle, but
just missing its label?), and so in the context of learning features from unlabeled
just missing its label?), and so in the context of learning features from unlabeled
-
data, the self-taught learning setting is much more broadly applicable.
+
data, the self-taught learning setting is more broadly applicable.
 +
 
 +
 
 +
{{STL}}
 +
 
 +
 
 +
{{Languages|自我学习|中文}}

Latest revision as of 13:26, 7 April 2013

Personal tools