Whitening
From Ufldl
(→2D example) |
|||
Line 5: | Line 5: | ||
which is needed for some algorithms. If we are training on images, | which is needed for some algorithms. If we are training on images, | ||
the raw input is redundant, since adjacent pixel values | the raw input is redundant, since adjacent pixel values | ||
- | are highly correlated. The goal of whitening is to make the input less redundant, | + | are highly correlated. The goal of whitening is to make the input less redundant; more formally, |
- | + | our desiderata are that our learning algorithms sees a training input where (i) the features are less | |
correlated with each other, and (ii) the features all have the same variance. | correlated with each other, and (ii) the features all have the same variance. | ||
Line 21: | Line 21: | ||
[[File:PCA-rotated.png | 600px]] | [[File:PCA-rotated.png | 600px]] | ||
- | The covariance matrix of this data is given by | + | The covariance matrix of this data is given by: |
- | + | ||
- | + | <math>\begin{align} | |
- | + | ||
- | + | ||
- | + | ||
- | + | ||
\begin{bmatrix} | \begin{bmatrix} | ||
7.29 & 0 \\ | 7.29 & 0 \\ | ||
Line 33: | Line 29: | ||
\end{bmatrix}. | \end{bmatrix}. | ||
\end{align}</math> | \end{align}</math> | ||
+ | |||
+ | (Note: Technically, many of the | ||
+ | statements in this section about the "covariance" will be true only if the data | ||
+ | has zero mean. In the rest of this section, we will take this assumption as | ||
+ | implicit in our statements. However, even if the data's mean isn't exactly zero, | ||
+ | the intuitions we're presenting here still hold true, and so this isn't something | ||
+ | that you should worry about.) | ||
+ | |||
It is no accident that the diagonal values are <math>\textstyle \lambda_1</math> and <math>\textstyle \lambda_2</math>. | It is no accident that the diagonal values are <math>\textstyle \lambda_1</math> and <math>\textstyle \lambda_2</math>. | ||
Further, | Further, | ||
the off-diagonal entries are zero; thus, | the off-diagonal entries are zero; thus, | ||
- | <math>\textstyle x_{{\rm rot},1}</math> and <math>\textstyle x_{{\rm rot},2}</math> are uncorrelated, satisfying one of | + | <math>\textstyle x_{{\rm rot},1}</math> and <math>\textstyle x_{{\rm rot},2}</math> are uncorrelated, satisfying one of our desiderata |
- | our desiderata for whitened data. | + | for whitened data (that the features be less correlated). |
To make each of our input features have unit variance, we can simply rescale | To make each of our input features have unit variance, we can simply rescale | ||
Line 46: | Line 50: | ||
\end{align}</math> | \end{align}</math> | ||
Plotting <math>\textstyle x_{{\rm PCAwhite}}</math>, we get: | Plotting <math>\textstyle x_{{\rm PCAwhite}}</math>, we get: | ||
- | + | ||
- | + | [[File:PCA-whitened.png | 600px]] | |
- | + | ||
- | + | ||
This data now has covariance equal to the identity matrix <math>\textstyle I</math>. We say that | This data now has covariance equal to the identity matrix <math>\textstyle I</math>. We say that | ||
<math>\textstyle x_{{\rm PCAwhite}}</math> is our '''PCA whitened''' version of the data: The | <math>\textstyle x_{{\rm PCAwhite}}</math> is our '''PCA whitened''' version of the data: The | ||
Line 55: | Line 58: | ||
unit variance. | unit variance. | ||
- | |||
- | |||
'''Whitening combined with dimensionality reduction.''' | '''Whitening combined with dimensionality reduction.''' | ||
If you want to have data that is whitened and which is lower dimensional than | If you want to have data that is whitened and which is lower dimensional than | ||
Line 62: | Line 63: | ||
<math>\textstyle x_{{\rm PCAwhite}}</math>. When we combine PCA whitening with regularization | <math>\textstyle x_{{\rm PCAwhite}}</math>. When we combine PCA whitening with regularization | ||
(described later), the last few components of <math>\textstyle x_{{\rm PCAwhite}}</math> will be | (described later), the last few components of <math>\textstyle x_{{\rm PCAwhite}}</math> will be | ||
- | nearly zero anyway, and thus can safely be dropped. | + | nearly zero anyway, and thus can safely be dropped. |
- | + | == ZCA Whitening == | |
- | + | ||
- | + | ||
- | == | + | |
Finally, it turns out that this way of getting the | Finally, it turns out that this way of getting the | ||
data to have covariance identity <math>\textstyle I</math> isn't unique. | data to have covariance identity <math>\textstyle I</math> isn't unique. | ||
Line 80: | Line 78: | ||
\end{align}</math> | \end{align}</math> | ||
Plotting <math>\textstyle x_{\rm ZCAwhite}</math>, we get: | Plotting <math>\textstyle x_{\rm ZCAwhite}</math>, we get: | ||
- | + | ||
- | + | [[File:ZCA-whitened.png | 600px]] | |
- | + | ||
- | + | ||
It can be shown that out of all possible choices for <math>\textstyle R</math>, | It can be shown that out of all possible choices for <math>\textstyle R</math>, | ||
this choice of rotation causes <math>\textstyle x_{\rm ZCAwhite}</math> to be as close as possible to the | this choice of rotation causes <math>\textstyle x_{\rm ZCAwhite}</math> to be as close as possible to the | ||
Line 89: | Line 86: | ||
When using ZCA whitening (unlike PCA whitening), we usually keep all <math>\textstyle n</math> dimensions | When using ZCA whitening (unlike PCA whitening), we usually keep all <math>\textstyle n</math> dimensions | ||
- | of the data, and do not try to reduce its dimension. | + | of the data, and do not try to reduce its dimension. |
== Regularizaton == | == Regularizaton == | ||
When implementing PCA whitening or ZCA whitening in practice, sometimes some | When implementing PCA whitening or ZCA whitening in practice, sometimes some | ||
of the eigenvalues <math>\textstyle \lambda_i</math> will be numerically close to 0, and thus the scaling | of the eigenvalues <math>\textstyle \lambda_i</math> will be numerically close to 0, and thus the scaling | ||
- | step | + | step where we divide by <math>\sqrt{\lambda_i}</math> would involve dividing by a value close to zero; this |
- | the data to blow up (take on large values) or otherwise be numerically unstable. In practice, we | + | may cause the data to blow up (take on large values) or otherwise be numerically unstable. In practice, we |
- | implement | + | therefore implement this scaling step using |
a small amount of regularization, and add a small constant <math>\textstyle \epsilon</math> | a small amount of regularization, and add a small constant <math>\textstyle \epsilon</math> | ||
to the eigenvalues before taking their square root and inverse: | to the eigenvalues before taking their square root and inverse: | ||
Line 103: | Line 100: | ||
\end{align}</math> | \end{align}</math> | ||
When <math>\textstyle x</math> takes values around <math>\textstyle [-1,1]</math>, a value of <math>\textstyle \epsilon \approx 10^{-5}</math> | When <math>\textstyle x</math> takes values around <math>\textstyle [-1,1]</math>, a value of <math>\textstyle \epsilon \approx 10^{-5}</math> | ||
- | might be typical. | + | might be typical. |
For the case of images, adding <math>\textstyle \epsilon</math> here also has the effect of slightly smoothing (or low-pass | For the case of images, adding <math>\textstyle \epsilon</math> here also has the effect of slightly smoothing (or low-pass | ||
Line 121: | Line 118: | ||
performed by ZCA. This results in a less redundant representation of the input | performed by ZCA. This results in a less redundant representation of the input | ||
image, which is then transmitted to your brain. | image, which is then transmitted to your brain. | ||
+ | |||
+ | |||
+ | |||
+ | {{PCA}} | ||
+ | |||
+ | |||
+ | {{Languages|白化|中文}} |