Whitening

Revision as of 20:20, 4 April 2011 (view source)

Watsuen (Talk | contribs)

(→2D example)

← Older edit

Latest revision as of 13:20, 7 April 2013 (view source)

Kandeng (Talk | contribs)

Line 5:

which is needed for some algorithms. If we are training on images,

the raw input is redundant, since adjacent pixel values

-

are highly correlated. The goal of whitening is to make the input less redundant,

+

are highly correlated. The goal of whitening is to make the input less redundant; more formally,

-

so that our learning algorithms sees a training input where (i) the features are less

+

our desiderata are that our learning algorithms sees a training input where (i) the features are less

correlated with each other, and (ii) the features all have the same variance.

Line 21:

[[File:PCA-rotated.png | 600px]]

-

The covariance matrix of this data is given by~~\footnote{Technically, many of the~~

+

The covariance matrix of this data is given by:

-

~~statements in this section about the "covariance" will be true only if the data~~

+

-

~~has zero mean. In the rest of this section, we will take this assumption as~~

+

<math>\begin{align}

-

~~implicit in our statements. However, even if the data's mean isn't exactly zero,~~

+

-

~~the intuitions we're presenting here still hold true, and so this isn't something~~

+

-

~~that you should worry about.}~~

+

-

:<math>\begin{align}

+

\begin{bmatrix}

7.29 & 0 \\

Line 33:

Line 29:

\end{bmatrix}.

\end{align}</math>

+

(Note: Technically, many of the

+

statements in this section about the "covariance" will be true only if the data

+

has zero mean. In the rest of this section, we will take this assumption as

+

implicit in our statements. However, even if the data's mean isn't exactly zero,

+

the intuitions we're presenting here still hold true, and so this isn't something

+

that you should worry about.)

+

It is no accident that the diagonal values are <math>\textstyle \lambda_1</math> and <math>\textstyle \lambda_2</math>.

Further,

the off-diagonal entries are zero; thus,

-

<math>\textstyle x_{{\rm rot},1}</math> and <math>\textstyle x_{{\rm rot},2}</math> are uncorrelated, satisfying one of

+

<math>\textstyle x_{{\rm rot},1}</math> and <math>\textstyle x_{{\rm rot},2}</math> are uncorrelated, satisfying one of our desiderata

-

our desiderata for whitened data.

+

for whitened data (that the features be less correlated).

To make each of our input features have unit variance, we can simply rescale

Line 46:

Line 50:

\end{align}</math>

Plotting <math>\textstyle x_{{\rm PCAwhite}}</math>, we get:

-

~~\begin{center}~~

+

-

~~\includegraphics~~[~~width=0.6\maxfigwidth]{~~PCA-whitened.png}

+

[[File:PCA-whitened.png | 600px]]

-

~~\vspace*{-0.2in}~~

+

-

~~\end{center}~~

+

This data now has covariance equal to the identity matrix <math>\textstyle I</math>. We say that

<math>\textstyle x_{{\rm PCAwhite}}</math> is our '''PCA whitened''' version of the data: The

Line 55:

Line 58:

unit variance.

-

~~\smallskip~~

-

~~\noindent~~

'''Whitening combined with dimensionality reduction.'''

If you want to have data that is whitened and which is lower dimensional than

Line 62:

Line 63:

<math>\textstyle x_{{\rm PCAwhite}}</math>. When we combine PCA whitening with regularization

(described later), the last few components of <math>\textstyle x_{{\rm PCAwhite}}</math> will be

-

nearly zero anyway, and thus can safely be dropped.

+

nearly zero anyway, and thus can safely be dropped.

-

~~\smallskip~~

+

== ZCA Whitening ==

-

~~\noindent~~

+

-

+

-

== ~~ZCE~~ Whitening ==

+

Finally, it turns out that this way of getting the

data to have covariance identity <math>\textstyle I</math> isn't unique.

Line 80:

Line 78:

\end{align}</math>

Plotting <math>\textstyle x_{\rm ZCAwhite}</math>, we get:

-

~~\begin{center}~~

+

-

~~\includegraphics~~[~~width=0.6\maxfigwidth]{~~ZCA-whitened.png}

+

[[File:ZCA-whitened.png | 600px]]

-

~~\vspace*{-0.2in}~~

+

-

~~\end{center}~~

+

It can be shown that out of all possible choices for <math>\textstyle R</math>,

this choice of rotation causes <math>\textstyle x_{\rm ZCAwhite}</math> to be as close as possible to the

Line 89:

Line 86:

When using ZCA whitening (unlike PCA whitening), we usually keep all <math>\textstyle n</math> dimensions

-

of the data, and do not try to reduce its dimension.

+

of the data, and do not try to reduce its dimension.

== Regularizaton ==

When implementing PCA whitening or ZCA whitening in practice, sometimes some

of the eigenvalues <math>\textstyle \lambda_i</math> will be numerically close to 0, and thus the scaling

-

step ~~in Equation~(~~\~~ref~~{~~eqn-sqrtlambda~~}~~) above~~ would involve dividing by a value close to zero~~, and~~ may cause

+

step where we divide by <math>\sqrt{\lambda_i}</math> would involve dividing by a value close to zero; this

-

the data to blow up (take on large values) or otherwise be numerically unstable. In practice, we

+

may cause the data to blow up (take on large values) or otherwise be numerically unstable. In practice, we

-

implement ~~the~~ scaling step using

+

therefore implement this scaling step using

a small amount of regularization, and add a small constant <math>\textstyle \epsilon</math>

to the eigenvalues before taking their square root and inverse:

Line 103:

Line 100:

\end{align}</math>

When <math>\textstyle x</math> takes values around <math>\textstyle [-1,1]</math>, a value of <math>\textstyle \epsilon \approx 10^{-5}</math>

-

might be typical. ~~With this form of regularization, the features won't all~~

+

might be typical.

For the case of images, adding <math>\textstyle \epsilon</math> here also has the effect of slightly smoothing (or low-pass

Line 121:

Line 118:

performed by ZCA. This results in a less redundant representation of the input

image, which is then transmitted to your brain.

+

From Ufldl

Latest revision as of 13:20, 7 April 2013

Views

Personal tools

ufldl resources

wiki

Search

Toolbox

@@ Line 5: / Line 5: @@
 which is needed for some algorithms.  If we are training on images,
 the raw input is redundant, since adjacent pixel values
-are highly correlated.  The goal of whitening is to make the input less redundant,
+are highly correlated.  The goal of whitening is to make the input less redundant; more formally,
-so that our learning algorithms sees a training input where (i) the features are less
+our desiderata are that our learning algorithms sees a training input where (i) the features are less
 correlated with each other, and (ii) the features all have the same variance.
@@ Line 21: / Line 21: @@
 [[File:PCA-rotated.png | 600px]]
-The covariance matrix of this data is given by\footnote{Technically, many of the
+The covariance matrix of this data is given by:
-statements in this section about the "covariance" will be true only if the data
-has zero mean.  In the rest of this section, we will take this assumption as
+<math>\begin{align}
-implicit in our statements.  However, even if the data's mean isn't exactly zero,
-the intuitions we're presenting here still hold true, and so this isn't something
-that you should worry about.}
-:<math>\begin{align}
 \begin{bmatrix}
 .29 & 0  \\
@@ Line 33: / Line 29: @@
 \end{bmatrix}.
 \end{align}</math>
+(Note: Technically, many of the
+statements in this section about the "covariance" will be true only if the data
+has zero mean.  In the rest of this section, we will take this assumption as
+implicit in our statements.  However, even if the data's mean isn't exactly zero,
+the intuitions we're presenting here still hold true, and so this isn't something
+that you should worry about.)
 It is no accident that the diagonal values are <math>\textstyle \lambda_1</math> and <math>\textstyle \lambda_2</math>.
 Further,
 the off-diagonal entries are zero; thus,
-<math>\textstyle x_{{\rm rot},1}</math> and <math>\textstyle x_{{\rm rot},2}</math> are uncorrelated, satisfying one of
+<math>\textstyle x_{{\rm rot},1}</math> and <math>\textstyle x_{{\rm rot},2}</math> are uncorrelated, satisfying one of our desiderata
-our desiderata for whitened data.
+for whitened data (that the features be less correlated).
 To make each of our input features have unit variance, we can simply rescale
@@ Line 46: / Line 50: @@
 \end{align}</math>
 Plotting <math>\textstyle x_{{\rm PCAwhite}}</math>, we get:
-\begin{center}
-\includegraphics[width=0.6\maxfigwidth]{PCA-whitened.png}
+[[File:PCA-whitened.png | 600px]]
-\vspace*{-0.2in}
-\end{center}
 This data now has covariance equal to the identity matrix <math>\textstyle I</math>.  We say that
 <math>\textstyle x_{{\rm PCAwhite}}</math> is our '''PCA whitened''' version of the data: The
@@ Line 55: / Line 58: @@
 unit variance.
-\smallskip
-\noindent
 '''Whitening combined with dimensionality reduction.'''
 If you want to have data that is whitened and which is lower dimensional than
@@ Line 62: / Line 63: @@
 <math>\textstyle x_{{\rm PCAwhite}}</math>.  When we combine PCA whitening with regularization
 (described later), the last few components of <math>\textstyle x_{{\rm PCAwhite}}</math> will be
 nearly zero anyway, and thus can safely be dropped.
-\smallskip
+== ZCA Whitening ==
-\noindent
-== ZCE Whitening ==
 Finally, it turns out that this way of getting the
 data to have covariance identity <math>\textstyle I</math> isn't unique.
@@ Line 80: / Line 78: @@
 \end{align}</math>
 Plotting <math>\textstyle x_{\rm ZCAwhite}</math>, we get:
-\begin{center}
-\includegraphics[width=0.6\maxfigwidth]{ZCA-whitened.png}
+[[File:ZCA-whitened.png | 600px]]
-\vspace*{-0.2in}
-\end{center}
 It can be shown that out of all possible choices for <math>\textstyle R</math>,
 this choice of rotation causes <math>\textstyle x_{\rm ZCAwhite}</math> to be as close as possible to the
@@ Line 89: / Line 86: @@
 When using ZCA whitening (unlike PCA whitening), we usually keep all <math>\textstyle n</math> dimensions
 of the data, and do not try to reduce its dimension.
 == Regularizaton ==
 When implementing PCA whitening or ZCA whitening in practice, sometimes some
 of the eigenvalues <math>\textstyle \lambda_i</math> will be numerically close to 0, and thus the scaling
-step in Equation~(\ref{eqn-sqrtlambda}) above would involve dividing by a value close to zero, and may cause
+step where we divide by <math>\sqrt{\lambda_i}</math> would involve dividing by a value close to zero; this
-the data to blow up (take on large values) or otherwise be numerically unstable.  In practice, we
+may cause the data to blow up (take on large values) or otherwise be numerically unstable.  In practice, we
-implement the scaling step using
+therefore implement this scaling step using
 a small amount of regularization, and add a small constant <math>\textstyle \epsilon</math>
 to the eigenvalues before taking their square root and inverse:
@@ Line 103: / Line 100: @@
 \end{align}</math>
 When <math>\textstyle x</math> takes values around <math>\textstyle [-1,1]</math>, a value of <math>\textstyle \epsilon \approx 10^{-5}</math>
-might be typical.  With this form of regularization, the features won't all
+might be typical.
 For the case of images, adding <math>\textstyle \epsilon</math> here also has the effect of slightly smoothing (or low-pass
@@ Line 121: / Line 118: @@
 performed by ZCA.  This results in a less redundant representation of the input
 image, which is then transmitted to your brain.
+{{PCA}}
+{{Languages|白化|中文}}