Whitening

Revision as of 20:21, 4 April 2011 (view source)

Watsuen (Talk | contribs)

(→2D example)

← Older edit

Latest revision as of 13:20, 7 April 2013 (view source)

Kandeng (Talk | contribs)

Line 5:

which is needed for some algorithms. If we are training on images,

the raw input is redundant, since adjacent pixel values

-

are highly correlated. The goal of whitening is to make the input less redundant,

+

are highly correlated. The goal of whitening is to make the input less redundant; more formally,

-

so that our learning algorithms sees a training input where (i) the features are less

+

our desiderata are that our learning algorithms sees a training input where (i) the features are less

correlated with each other, and (ii) the features all have the same variance.

Line 40:

Further,

the off-diagonal entries are zero; thus,

-

<math>\textstyle x_{{\rm rot},1}</math> and <math>\textstyle x_{{\rm rot},2}</math> are uncorrelated, satisfying one of

+

<math>\textstyle x_{{\rm rot},1}</math> and <math>\textstyle x_{{\rm rot},2}</math> are uncorrelated, satisfying one of our desiderata

-

our desiderata for whitened data.

+

for whitened data (that the features be less correlated).

To make each of our input features have unit variance, we can simply rescale

Line 65:

nearly zero anyway, and thus can safely be dropped.

-

== ~~ZCE~~ Whitening ==

+

== ZCA Whitening ==

Finally, it turns out that this way of getting the

data to have covariance identity <math>\textstyle I</math> isn't unique.

Line 78:

\end{align}</math>

Plotting <math>\textstyle x_{\rm ZCAwhite}</math>, we get:

-

~~\begin{center}~~

+

-

~~\includegraphics~~[~~width=0.6\maxfigwidth]{~~ZCA-whitened.png}

+

[[File:ZCA-whitened.png | 600px]]

-

~~\vspace*{-0.2in}~~

+

-

~~\end{center}~~

+

It can be shown that out of all possible choices for <math>\textstyle R</math>,

this choice of rotation causes <math>\textstyle x_{\rm ZCAwhite}</math> to be as close as possible to the

Line 87:

Line 86:

When using ZCA whitening (unlike PCA whitening), we usually keep all <math>\textstyle n</math> dimensions

-

of the data, and do not try to reduce its dimension.

+

of the data, and do not try to reduce its dimension.

== Regularizaton ==

When implementing PCA whitening or ZCA whitening in practice, sometimes some

of the eigenvalues <math>\textstyle \lambda_i</math> will be numerically close to 0, and thus the scaling

-

step ~~in Equation~(~~\~~ref~~{~~eqn-sqrtlambda~~}~~) above~~ would involve dividing by a value close to zero~~, and~~ may cause

+

step where we divide by <math>\sqrt{\lambda_i}</math> would involve dividing by a value close to zero; this

-

the data to blow up (take on large values) or otherwise be numerically unstable. In practice, we

+

may cause the data to blow up (take on large values) or otherwise be numerically unstable. In practice, we

-

implement ~~the~~ scaling step using

+

therefore implement this scaling step using

a small amount of regularization, and add a small constant <math>\textstyle \epsilon</math>

to the eigenvalues before taking their square root and inverse:

Line 101:

Line 100:

\end{align}</math>

When <math>\textstyle x</math> takes values around <math>\textstyle [-1,1]</math>, a value of <math>\textstyle \epsilon \approx 10^{-5}</math>

-

might be typical. ~~With this form of regularization, the features won't all~~

+

might be typical.

For the case of images, adding <math>\textstyle \epsilon</math> here also has the effect of slightly smoothing (or low-pass

Line 119:

Line 118:

performed by ZCA. This results in a less redundant representation of the input

image, which is then transmitted to your brain.

+

From Ufldl

Latest revision as of 13:20, 7 April 2013

Views

Personal tools

ufldl resources

wiki

Search

Toolbox

@@ Line 5: / Line 5: @@
 which is needed for some algorithms.  If we are training on images,
 the raw input is redundant, since adjacent pixel values
-are highly correlated.  The goal of whitening is to make the input less redundant,
+are highly correlated.  The goal of whitening is to make the input less redundant; more formally,
-so that our learning algorithms sees a training input where (i) the features are less
+our desiderata are that our learning algorithms sees a training input where (i) the features are less
 correlated with each other, and (ii) the features all have the same variance.
@@ Line 40: / Line 40: @@
 Further,
 the off-diagonal entries are zero; thus,
-<math>\textstyle x_{{\rm rot},1}</math> and <math>\textstyle x_{{\rm rot},2}</math> are uncorrelated, satisfying one of
+<math>\textstyle x_{{\rm rot},1}</math> and <math>\textstyle x_{{\rm rot},2}</math> are uncorrelated, satisfying one of our desiderata
-our desiderata for whitened data.
+for whitened data (that the features be less correlated).
 To make each of our input features have unit variance, we can simply rescale
@@ Line 65: / Line 65: @@
 nearly zero anyway, and thus can safely be dropped.
-== ZCE Whitening ==
+== ZCA Whitening ==
 Finally, it turns out that this way of getting the
 data to have covariance identity <math>\textstyle I</math> isn't unique.
@@ Line 78: / Line 78: @@
 \end{align}</math>
 Plotting <math>\textstyle x_{\rm ZCAwhite}</math>, we get:
-\begin{center}
-\includegraphics[width=0.6\maxfigwidth]{ZCA-whitened.png}
+[[File:ZCA-whitened.png | 600px]]
-\vspace*{-0.2in}
-\end{center}
 It can be shown that out of all possible choices for <math>\textstyle R</math>,
 this choice of rotation causes <math>\textstyle x_{\rm ZCAwhite}</math> to be as close as possible to the
@@ Line 87: / Line 86: @@
 When using ZCA whitening (unlike PCA whitening), we usually keep all <math>\textstyle n</math> dimensions
 of the data, and do not try to reduce its dimension.
 == Regularizaton ==
 When implementing PCA whitening or ZCA whitening in practice, sometimes some
 of the eigenvalues <math>\textstyle \lambda_i</math> will be numerically close to 0, and thus the scaling
-step in Equation~(\ref{eqn-sqrtlambda}) above would involve dividing by a value close to zero, and may cause
+step where we divide by <math>\sqrt{\lambda_i}</math> would involve dividing by a value close to zero; this
-the data to blow up (take on large values) or otherwise be numerically unstable.  In practice, we
+may cause the data to blow up (take on large values) or otherwise be numerically unstable.  In practice, we
-implement the scaling step using
+therefore implement this scaling step using
 a small amount of regularization, and add a small constant <math>\textstyle \epsilon</math>
 to the eigenvalues before taking their square root and inverse:
@@ Line 101: / Line 100: @@
 \end{align}</math>
 When <math>\textstyle x</math> takes values around <math>\textstyle [-1,1]</math>, a value of <math>\textstyle \epsilon \approx 10^{-5}</math>
-might be typical.  With this form of regularization, the features won't all
+might be typical.
 For the case of images, adding <math>\textstyle \epsilon</math> here also has the effect of slightly smoothing (or low-pass
@@ Line 119: / Line 118: @@
 performed by ZCA.  This results in a less redundant representation of the input
 image, which is then transmitted to your brain.
+{{PCA}}
+{{Languages|白化|中文}}