Self-Taught Learning
From Ufldl
Line 1: | Line 1: | ||
== Overview == | == Overview == | ||
- | + | Assuming that we have a sufficiently powerful learning algorithm, one of the most reliable | |
- | to give | + | ways to get better performance is to give the algorithm more data. This has led to the |
+ | that aphorism that in | ||
machine learning, "sometimes it's not who has the best algorithm that wins; it's | machine learning, "sometimes it's not who has the best algorithm that wins; it's | ||
who has the most data." | who has the most data." | ||
Line 28: | Line 29: | ||
supervised learning on that labeled data to solve the classification task. | supervised learning on that labeled data to solve the classification task. | ||
- | These ideas | + | These ideas probably have the most powerful effects in problems where we have a lot of |
- | unlabeled data, and a | + | unlabeled data, and a smaller amount of labeled data. However, |
- | + | they typically give good results even if we have only | |
labeled data (in which case we usually perform the feature learning step using | labeled data (in which case we usually perform the feature learning step using | ||
- | the labeled data, but ignoring the labels). | + | the labeled data, but ignoring the labels). |
- | + | ||
== Learning features == | == Learning features == | ||
Line 44: | Line 44: | ||
(perhaps with appropriate whitening or other pre-processing): | (perhaps with appropriate whitening or other pre-processing): | ||
- | [[File:STL_SparseAE.png]] | + | [[File:STL_SparseAE.png|350px]] |
- | Having trained the parameters <math>\textstyle W^{(1)}, b^{(1)}, W^{(2)} b^{(2)}</math> of this model, | + | Having trained the parameters <math>\textstyle W^{(1)}, b^{(1)}, W^{(2)}, b^{(2)}</math> of this model, |
given any new input <math>\textstyle x</math>, we can now compute the corresponding vector of | given any new input <math>\textstyle x</math>, we can now compute the corresponding vector of | ||
activations <math>\textstyle a</math> of the hidden units. As we saw previously, this often gives a | activations <math>\textstyle a</math> of the hidden units. As we saw previously, this often gives a | ||
Line 53: | Line 53: | ||
neural network: | neural network: | ||
- | [[File:STL_SparseAE_Features.png]] | + | [[File:STL_SparseAE_Features.png|300px]] |
This is just the sparse autoencoder that we previously had, with with the final | This is just the sparse autoencoder that we previously had, with with the final | ||
Line 73: | Line 73: | ||
\}</math> (if we use the replacement representation, and use <math>\textstyle a_l^{(i)}</math> to represent the | \}</math> (if we use the replacement representation, and use <math>\textstyle a_l^{(i)}</math> to represent the | ||
<math>\textstyle i</math>-th training example), or <math>\textstyle \{ | <math>\textstyle i</math>-th training example), or <math>\textstyle \{ | ||
- | ((x_l^{(1)}, a_l^{(1)}), y^{(1)}), ((x_l^{(2)}, a_l^{(1)}), y^{(2)}), \ldots | + | ((x_l^{(1)}, a_l^{(1)}), y^{(1)}), ((x_l^{(2)}, a_l^{(1)}), y^{(2)}), \ldots, |
((x_l^{(m_l)}, a_l^{(1)}), y^{(m_l)}) \}</math> (if we use the concatenated | ((x_l^{(m_l)}, a_l^{(1)}), y^{(m_l)}) \}</math> (if we use the concatenated | ||
representation). In practice, the concatenated representation often works | representation). In practice, the concatenated representation often works | ||
Line 83: | Line 83: | ||
Given a test example <math>\textstyle x_{\rm test}</math>, we would then follow the same procedure: | Given a test example <math>\textstyle x_{\rm test}</math>, we would then follow the same procedure: | ||
For feed it to the autoencoder to get <math>\textstyle a_{\rm test}</math>. Then, feed | For feed it to the autoencoder to get <math>\textstyle a_{\rm test}</math>. Then, feed | ||
- | either <math>\textstyle a_{\rm test}</math> or <math>\textstyle (x_{\rm test}, a_{\rm test})</math> to the trained classifier to get a prediction. | + | either <math>\textstyle a_{\rm test}</math> or <math>\textstyle (x_{\rm test}, a_{\rm test})</math> to the trained classifier to get a prediction. |
== On pre-processing the data == | == On pre-processing the data == | ||
Line 91: | Line 91: | ||
various pre-processing parameters. For example, one may have computed | various pre-processing parameters. For example, one may have computed | ||
a mean value of the data and subtracted off this mean to perform mean normalization, | a mean value of the data and subtracted off this mean to perform mean normalization, | ||
- | or used PCA to compute a matrix <math>\textstyle U</math> to represent the data as <math>\textstyle U^Tx</math> (or PCA | + | or used PCA to compute a matrix <math>\textstyle U</math> to represent the data as <math>\textstyle U^Tx</math> (or used |
+ | PCA | ||
whitening or ZCA whitening). If this is the case, then it is important to | whitening or ZCA whitening). If this is the case, then it is important to | ||
save away these preprocessing parameters, and to use the ''same'' parameters | save away these preprocessing parameters, and to use the ''same'' parameters | ||
Line 102: | Line 103: | ||
labeled training set, since that might result in a dramatically different | labeled training set, since that might result in a dramatically different | ||
pre-processing transformation, which would make the input distribution to | pre-processing transformation, which would make the input distribution to | ||
- | the autoencoder very different from what it was actually trained on. | + | the autoencoder very different from what it was actually trained on. |
== On the terminology of unsupervised feature learning == | == On the terminology of unsupervised feature learning == | ||
There are two common unsupervised feature learning settings, depending on what type of | There are two common unsupervised feature learning settings, depending on what type of | ||
- | unlabeled data you have. The more powerful setting is the '''self-taught learning''' | + | unlabeled data you have. The more general and powerful setting is the '''self-taught learning''' |
setting, which does not assume that your unlabeled data <math>x_u</math> has to | setting, which does not assume that your unlabeled data <math>x_u</math> has to | ||
be drawn from the same distribution as your labeled data <math>x_l</math>. The | be drawn from the same distribution as your labeled data <math>x_l</math>. The | ||
Line 130: | Line 131: | ||
ones are motorcycles), then we could use this form of unlabeled data to | ones are motorcycles), then we could use this form of unlabeled data to | ||
learn the features. This setting---where each unlabeled example is drawn from the same | learn the features. This setting---where each unlabeled example is drawn from the same | ||
- | distribution as your labeled examples---is sometimes called the | + | distribution as your labeled examples---is sometimes called the semi-supervised |
- | setting. In practice, we | + | setting. In practice, we often do not have this sort of unlabeled data (where would you |
get a database of images where every image is either a car or a motorcycle, but | get a database of images where every image is either a car or a motorcycle, but | ||
just missing its label?), and so in the context of learning features from unlabeled | just missing its label?), and so in the context of learning features from unlabeled | ||
- | data, the self-taught learning setting is | + | data, the self-taught learning setting is more broadly applicable. |
+ | |||
+ | |||
+ | {{STL}} | ||
+ | |||
+ | |||
+ | {{Languages|自我学习|中文}} |