Self-Taught Learning

Revision as of 23:09, 10 May 2011 (view source)

Ang (Talk | contribs)

← Older edit

Latest revision as of 13:26, 7 April 2013 (view source)

Kandeng (Talk | contribs)

Line 1:

== Overview ==

-

~~In machine~~ learning, one of the most reliable ways to get better performance is

+

Assuming that we have a sufficiently powerful learning algorithm, one of the most reliable

-

to give ~~your algorithms~~ more data. This has led to the that aphorism that in

+

ways to get better performance is to give the algorithm more data. This has led to the

+

that aphorism that in

machine learning, "sometimes it's not who has the best algorithm that wins; it's

who has the most data."

Line 28:

Line 29:

supervised learning on that labeled data to solve the classification task.

-

These ideas ~~are~~ probably most powerful in ~~settings~~ where we have a lot of

+

These ideas probably have the most powerful effects in problems where we have a lot of

-

unlabeled data, and a ~~relatively~~ smaller amount of labeled data. However,

+

unlabeled data, and a smaller amount of labeled data. However,

-

~~these models often given~~ good results even if we have only

+

they typically give good results even if we have only

labeled data (in which case we usually perform the feature learning step using

-

the labeled data, but ignoring the labels).

+

the labeled data, but ignoring the labels).

-

+

== Learning features ==

Line 44:

(perhaps with appropriate whitening or other pre-processing):

-

[[File:STL_SparseAE.png]]

+

[[File:STL_SparseAE.png|350px]]

-

Having trained the parameters <math>\textstyle W^{(1)}, b^{(1)}, W^{(2)} b^{(2)}</math> of this model,

+

Having trained the parameters <math>\textstyle W^{(1)}, b^{(1)}, W^{(2)}, b^{(2)}</math> of this model,

given any new input <math>\textstyle x</math>, we can now compute the corresponding vector of

activations <math>\textstyle a</math> of the hidden units. As we saw previously, this often gives a

Line 53:

neural network:

-

[[File:STL_SparseAE_Features.png]]

+

[[File:STL_SparseAE_Features.png|300px]]

This is just the sparse autoencoder that we previously had, with with the final

Line 73:

\}</math> (if we use the replacement representation, and use <math>\textstyle a_l^{(i)}</math> to represent the

<math>\textstyle i</math>-th training example), or <math>\textstyle \{

-

((x_l^{(1)}, a_l^{(1)}), y^{(1)}), ((x_l^{(2)}, a_l^{(1)}), y^{(2)}), \ldots

+

((x_l^{(1)}, a_l^{(1)}), y^{(1)}), ((x_l^{(2)}, a_l^{(1)}), y^{(2)}), \ldots,

((x_l^{(m_l)}, a_l^{(1)}), y^{(m_l)}) \}</math> (if we use the concatenated

representation). In practice, the concatenated representation often works

Line 83:

Given a test example <math>\textstyle x_{\rm test}</math>, we would then follow the same procedure:

For feed it to the autoencoder to get <math>\textstyle a_{\rm test}</math>. Then, feed

-

either <math>\textstyle a_{\rm test}</math> or <math>\textstyle (x_{\rm test}, a_{\rm test})</math> to the trained classifier to get a prediction.

+

either <math>\textstyle a_{\rm test}</math> or <math>\textstyle (x_{\rm test}, a_{\rm test})</math> to the trained classifier to get a prediction.

== On pre-processing the data ==

Line 91:

various pre-processing parameters. For example, one may have computed

a mean value of the data and subtracted off this mean to perform mean normalization,

-

or used PCA to compute a matrix <math>\textstyle U</math> to represent the data as <math>\textstyle U^Tx</math> (or PCA

+

or used PCA to compute a matrix <math>\textstyle U</math> to represent the data as <math>\textstyle U^Tx</math> (or used

+

PCA

whitening or ZCA whitening). If this is the case, then it is important to

save away these preprocessing parameters, and to use the ''same'' parameters

Line 102:

Line 103:

labeled training set, since that might result in a dramatically different

pre-processing transformation, which would make the input distribution to

-

the autoencoder very different from what it was actually trained on.

+

the autoencoder very different from what it was actually trained on.

== On the terminology of unsupervised feature learning ==

There are two common unsupervised feature learning settings, depending on what type of

-

unlabeled data you have. The more powerful setting is the '''self-taught learning'''

+

unlabeled data you have. The more general and powerful setting is the '''self-taught learning'''

setting, which does not assume that your unlabeled data <math>x_u</math> has to

be drawn from the same distribution as your labeled data <math>x_l</math>. The

Line 130:

Line 131:

ones are motorcycles), then we could use this form of unlabeled data to

learn the features. This setting---where each unlabeled example is drawn from the same

-

distribution as your labeled examples---is sometimes called the ~~'''~~semi-supervised~~'''~~

+

distribution as your labeled examples---is sometimes called the semi-supervised

-

setting. In practice, we ~~rarely~~ have this sort of unlabeled data (where would you

+

setting. In practice, we often do not have this sort of unlabeled data (where would you

get a database of images where every image is either a car or a motorcycle, but

just missing its label?), and so in the context of learning features from unlabeled

-

data, the self-taught learning setting is ~~much~~ more broadly applicable.

+

data, the self-taught learning setting is more broadly applicable.

+

Self-Taught Learning

From Ufldl

Latest revision as of 13:26, 7 April 2013

Views

Personal tools

ufldl resources

wiki

Search

Toolbox

@@ Line 1: / Line 1: @@
 == Overview ==
-In machine learning, one of the most reliable ways to get better performance is
+Assuming that we have a sufficiently powerful learning algorithm, one of the most reliable
-to give your algorithms more data.  This has led to the  that aphorism that in
+ways to get better performance is to give the algorithm more data.  This has led to the
+that aphorism that in
 machine learning, "sometimes it's not who has the best algorithm that wins; it's
 who has the most data."
@@ Line 28: / Line 29: @@
 supervised learning on that labeled data to solve the classification task.
-These ideas are probably most powerful in settings where we have a lot of
+These ideas probably have the most powerful effects in problems where we have a lot of
-unlabeled data, and a relatively smaller amount of labeled data.  However,
+unlabeled data, and a smaller amount of labeled data.  However,
-these models often given good results even if we have only
+they typically give good results even if we have only
 labeled data (in which case we usually perform the feature learning step using
 the labeled data, but ignoring the labels).
 == Learning features ==
@@ Line 44: / Line 44: @@
 (perhaps with appropriate whitening or other pre-processing):
-[[File:STL_SparseAE.png]]
+[[File:STL_SparseAE.png|350px]]
-Having trained the parameters <math>\textstyle W^{(1)}, b^{(1)}, W^{(2)} b^{(2)}</math> of this model,
+Having trained the parameters <math>\textstyle W^{(1)}, b^{(1)}, W^{(2)}, b^{(2)}</math> of this model,
 given any new input <math>\textstyle x</math>, we can now compute the corresponding vector of
 activations <math>\textstyle a</math> of the hidden units.  As we saw previously, this often gives a
@@ Line 53: / Line 53: @@
 neural network:
-[[File:STL_SparseAE_Features.png]]
+[[File:STL_SparseAE_Features.png|300px]]
 This is just the sparse autoencoder that we previously had, with with the final
@@ Line 73: / Line 73: @@
 \}</math> (if we use the replacement representation, and use <math>\textstyle a_l^{(i)}</math> to represent the
 <math>\textstyle i</math>-th training example), or <math>\textstyle \{
-((x_l^{(1)}, a_l^{(1)}), y^{(1)}), ((x_l^{(2)}, a_l^{(1)}), y^{(2)}), \ldots
+((x_l^{(1)}, a_l^{(1)}), y^{(1)}), ((x_l^{(2)}, a_l^{(1)}), y^{(2)}), \ldots,
 ((x_l^{(m_l)}, a_l^{(1)}), y^{(m_l)}) \}</math> (if we use the concatenated
 representation).  In practice, the concatenated representation often works
@@ Line 83: / Line 83: @@
 Given a test example <math>\textstyle x_{\rm test}</math>, we would then follow the same procedure:
 For feed it to the autoencoder to get <math>\textstyle a_{\rm test}</math>.  Then, feed
 either <math>\textstyle a_{\rm test}</math> or <math>\textstyle (x_{\rm test}, a_{\rm test})</math> to the trained classifier to get a prediction.
 == On pre-processing the data ==
@@ Line 91: / Line 91: @@
 various pre-processing parameters.  For example, one may have computed
 a mean value of the data and subtracted off this mean to perform mean normalization,
-or used PCA to compute a matrix <math>\textstyle U</math> to represent the data as <math>\textstyle U^Tx</math> (or PCA
+or used PCA to compute a matrix <math>\textstyle U</math> to represent the data as <math>\textstyle U^Tx</math> (or used
+PCA
 whitening or ZCA whitening).  If this is the case, then it is important to
 save away these preprocessing parameters, and to use the ''same'' parameters
@@ Line 102: / Line 103: @@
 labeled training set, since that might result in a dramatically different
 pre-processing transformation, which would make the input distribution to
 the autoencoder very different from what it was actually trained on.
 == On the terminology of unsupervised feature learning ==
 There are two common unsupervised feature learning settings, depending on what type of
-unlabeled data you have.  The more powerful setting is the '''self-taught learning'''
+unlabeled data you have.  The more general and powerful setting is the '''self-taught learning'''
 setting, which does not assume that your unlabeled data <math>x_u</math> has to
 be drawn from the same distribution as your labeled data <math>x_l</math>.  The
@@ Line 130: / Line 131: @@
 ones are motorcycles), then we could use this form of unlabeled data to
 learn the features.  This setting---where each unlabeled example is drawn from the same
-distribution as your labeled examples---is sometimes called the '''semi-supervised'''
+distribution as your labeled examples---is sometimes called the semi-supervised
-setting.  In practice, we rarely have this sort of unlabeled data (where would you
+setting.  In practice, we often do not have this sort of unlabeled data (where would you
 get a database of images where every image is either a car or a motorcycle, but
 just missing its label?), and so in the context of learning features from unlabeled
-data, the self-taught learning setting is much more broadly applicable.
+data, the self-taught learning setting is more broadly applicable.
+{{STL}}
+{{Languages|自我学习|中文}}