Implementing PCA/Whitening

Revision as of 20:30, 4 April 2011 (view source)

(Created page with "In this section, we summarize the PCA, PCA whitening and ZCA whitening algorithms, and also describe how you can implement them using efficient linear algebra libraries. First, ...")

← Older edit

Latest revision as of 13:22, 7 April 2013 (view source)

Kandeng (Talk | contribs)

Line 2:

and also describe how you can implement them using efficient linear algebra libraries.

-

First, we need to ~~compute <math>\textstyle \Sigma = \sum_{i=1}^m~~ (~~x^{(i~~)~~})(x^{(i)})^T</math>~~. ~~If you're~~

+

First, we need to ensure that the data has (approximately) zero-mean. For natural images, we achieve this (approximately) by subtracting the mean value of each image patch.

-

~~implementing~~ this ~~in Matlab~~ (~~or even if you're implementing this in C++, Java, etc.,~~

+

-

~~but have access to an efficient linear algebra library~~)~~, doing it as an explicit sum is inefficient~~.

+

-

~~Instead, we can instead compute this in one fell swoop as~~

+

-

Sigma = x * x';

+

We achieve this by computing the mean for each patch and subtracting it for each patch. In Matlab, we can do this by using

+

avg = mean(x, 1); % Compute the mean pixel intensity value separately for each patch.

+

x = x - repmat(avg, size(x, 1), 1);

+

Next, we need to compute <math>\textstyle \Sigma = \frac{1}{m} \sum_{i=1}^m (x^{(i)})(x^{(i)})^T</math>. If you're implementing this in Matlab (or even if you're implementing this in C++, Java, etc., but have access to an efficient linear algebra library), doing it as an explicit sum is inefficient. Instead, we can compute this in one fell swoop as

+

sigma = x * x' / size(x, 2);

(Check the math yourself for correctness.)

-

Here, we assume that <math>x</math> is a ~~datastructure~~ that contains one training

+

Here, we assume that <math>x</math> is a data structure that contains one training example per column (so, <math>x</math> is a <math>\textstyle n</math>-by-<math>\textstyle m</math> matrix).

-

example per column (so, <math>x</math> is a <math>\textstyle n</math>-by-<math>\textstyle m</math> matrix).

+

-

Next, PCA computes the eigenvectors of {\tt Sigma}. One could do this using the

+

Next, PCA computes the eigenvectors of <math>\Sigma</math>. One could do this using the Matlab <tt>eig</tt> function. However, because <math>\Sigma</math> is a symmetric positive semi-definite matrix, it is more numerically reliable to do this using the <tt>svd</tt> function. Concretely, if you implement

-

Matlab <~~math~~>eig</~~math~~> function. However, because <math>Sigma</math> is a symmetric

+

-

positive semi-definite matrix, it is more numerically reliable to do this

+

-

using the <~~math~~>svd</~~math~~> function. Concretely, if you implement

+

-

[U,S,V] = svd(~~Sigma~~);

+

[U,S,V] = svd(sigma);

-

then the matrix <math>U</math> will contain the eigenvectors of <math>Sigma</math> (one eigenvector per column,

+

then the matrix <math>U</math> will contain the eigenvectors of <math>Sigma</math> (one eigenvector per column, sorted in order from top to bottom eigenvector), and the diagonal entries of the matrix <math>S</math> will contain the corresponding eigenvalues (also sorted in decreasing order). The matrix <math>V</math> will be equal to transpose of <math>U</math>, and can be safely ignored.

-

sorted in order from top to bottom eigenvector),

+

-

and the diagonal entries of the matrix <math>S</math> will contain the corresponding eigenvalues

+

-

(also sorted in decreasing order). The matrix <math>V</math> will be equal to transpose of <math>U</math>,

+

-

and can be safely ignored.

+

-

(Note: The <~~math~~>svd</~~math~~>

+

(Note: The <tt>svd</tt> function actually computes the singular vectors and singular values of a matrix, which for the special case of a symmetric positive semi-definite matrix---which is all that we're concerned with here---is equal to its eigenvectors and eigenvalues. A full discussion of singular vectors vs. eigenvectors is beyond the scope of these notes.)

-

function actually computes the singular vectors and singular values of a matrix, which for

+

-

the special case of a symmetric positive semi-definite matrix---which is all that

+

-

we're concerned with here---is equal to its eigenvectors and eigenvalues. A

+

-

full discussion of singular vectors vs. eigenvectors is beyond the scope of

+

-

these notes.)

+

Finally, you can compute <math>\textstyle x_{\rm rot}</math> and <math>\textstyle \tilde{x}</math> as follows:

-

~~xrot~~ = U' * x;

+

xRot = U' * x; % rotated version of the data.

-

~~xtilde~~ = U(:,1:k)' * x;

+

xTilde = U(:,1:k)' * x; % reduced dimension representation of the data,

+

% where k is the number of eigenvectors to keep

This gives your PCA representation of the data in terms of <math>\textstyle \tilde{x} \in \Re^k</math>.

Incidentally, if <math>x</math> is a <math>\textstyle n</math>-by-<math>\textstyle m</math> matrix containing all your training data, this is a vectorized

implementation, and the expressions

-

above work too for computing <math>~~xrot~~</math> and <math>~~xtilde~~</math> for your entire training set

+

above work too for computing <math>x_{\rm rot}</math> and <math>\tilde{x}</math> for your entire training set

all in one go. The resulting

-

<math>~~xrot~~</math> and <math>~~xtilde~~</math> will have one column corresponding to each training example.

+

<math>x_{\rm rot}</math> and <math>\tilde{x}</math> will have one column corresponding to each training example.

To compute the PCA whitened data <math>\textstyle x_{\rm PCAwhite}</math>, use

-

~~\begin{verbatim}~~

+

-

xPCAwhite = diag(1./sqrt(diag(S) + epsilon)) * U' * x;

+

xPCAwhite = diag(1./sqrt(diag(S) + epsilon)) * U' * x;

-

~~\end{verbatim}~~

+

Since <math>S</math>'s diagonal contains the eigenvalues <math>\textstyle \lambda_i</math>,

this turns out to be a compact way

Line 56:

Line 48:

Finally, you can also compute the ZCA whitened data <math>\textstyle x_{\rm ZCAwhite}</math> as:

-

~~xPCAwhite~~ = U * diag(1./sqrt(diag(S) + epsilon)) * U' * x;

+

xZCAwhite = U * diag(1./sqrt(diag(S) + epsilon)) * U' * x;

+

Implementing PCA/Whitening

From Ufldl

Latest revision as of 13:22, 7 April 2013

Views

Personal tools

ufldl resources

wiki

Search

Toolbox

@@ Line 2: / Line 2: @@
 and also describe how you can implement them using efficient linear algebra libraries.
-First, we need to compute <math>\textstyle \Sigma = \sum_{i=1}^m (x^{(i)})(x^{(i)})^T</math>.  If you're
+First, we need to ensure that the data has (approximately) zero-mean. For natural images, we achieve this (approximately) by subtracting the mean value of each image patch.
-implementing this in Matlab (or even if you're implementing this in C++, Java, etc.,
-but have access to an efficient linear algebra library), doing it as an explicit sum is inefficient.
-Instead, we can instead compute this in one fell swoop as
-  Sigma = x * x';
+We achieve this by computing the mean for each patch and subtracting it for each patch. In Matlab, we can do this by using
+  avg = mean(x, 1);     % Compute the mean pixel intensity value separately for each patch.
+ x = x - repmat(avg, size(x, 1), 1);
+Next, we need to compute <math>\textstyle \Sigma = \frac{1}{m} \sum_{i=1}^m (x^{(i)})(x^{(i)})^T</math>.  If you're implementing this in Matlab (or even if you're implementing this in C++, Java, etc., but have access to an efficient linear algebra library), doing it as an explicit sum is inefficient. Instead, we can compute this in one fell swoop as
+ sigma = x * x' / size(x, 2);
 (Check the math yourself for correctness.)
-Here, we assume that <math>x</math> is a datastructure that contains one training
+Here, we assume that <math>x</math> is a data structure that contains one training example per column (so, <math>x</math> is a <math>\textstyle n</math>-by-<math>\textstyle m</math> matrix).
-example per column (so, <math>x</math> is a <math>\textstyle n</math>-by-<math>\textstyle m</math> matrix).
-Next, PCA computes the eigenvectors of {\tt Sigma}.  One could do this using the
+Next, PCA computes the eigenvectors of <math>\Sigma</math>.  One could do this using the Matlab <tt>eig</tt> function.  However, because <math>\Sigma</math> is a symmetric positive semi-definite matrix, it is more numerically reliable to do this using the <tt>svd</tt> function. Concretely, if you implement
-Matlab <math>eig</math> function.  However, because <math>Sigma</math> is a symmetric
-positive semi-definite matrix, it is more numerically reliable to do this
-using the <math>svd</math> function.  Concretely, if you implement
-  [U,S,V] = svd(Sigma);
+  [U,S,V] = svd(sigma);
-then the matrix <math>U</math> will contain the eigenvectors of <math>Sigma</math> (one eigenvector per column,
+then the matrix <math>U</math> will contain the eigenvectors of <math>Sigma</math> (one eigenvector per column,  sorted in order from top to bottom eigenvector), and the diagonal entries of the matrix <math>S</math> will contain the corresponding eigenvalues (also sorted in decreasing order).  The matrix <math>V</math> will be equal to transpose of <math>U</math>, and can be safely ignored.
-sorted in order from top to bottom eigenvector),
-and the diagonal entries of the matrix <math>S</math> will contain the corresponding eigenvalues
-(also sorted in decreasing order).  The matrix <math>V</math> will be equal to transpose of <math>U</math>,
-and can be safely ignored.
-(Note: The <math>svd</math>
+(Note: The <tt>svd</tt> function actually computes the singular vectors and singular values of a matrix, which for the special case of a symmetric positive semi-definite matrix---which is all that we're concerned with here---is equal to its eigenvectors and eigenvalues.  A full discussion of singular vectors vs. eigenvectors is beyond the scope of these notes.)
-function actually computes the singular vectors and singular values of a matrix, which for
-the special case of a symmetric positive semi-definite matrix---which is all that
-we're concerned with here---is equal to its eigenvectors and eigenvalues.  A
-full discussion of singular vectors vs. eigenvectors is beyond the scope of
-these notes.)
 Finally, you can compute <math>\textstyle x_{\rm rot}</math> and <math>\textstyle \tilde{x}</math> as follows:
-  xrot = U' * x;
+  xRot = U' * x;          % rotated version of the data.
-  xtilde = U(:,1:k)' * x;
+  xTilde = U(:,1:k)' * x; % reduced dimension representation of the data,
+                         % where k is the number of eigenvectors to keep
 This gives your PCA representation of the data in terms of <math>\textstyle \tilde{x} \in \Re^k</math>.
 Incidentally, if <math>x</math> is a <math>\textstyle n</math>-by-<math>\textstyle m</math> matrix containing all your training data, this is a vectorized
 implementation, and the expressions
-above work too for computing <math>xrot</math> and <math>xtilde</math> for your entire training set
+above work too for computing <math>x_{\rm rot}</math> and <math>\tilde{x}</math> for your entire training set
 all in one go.  The resulting
-<math>xrot</math> and <math>xtilde</math> will have one column corresponding to each training example.
+<math>x_{\rm rot}</math> and <math>\tilde{x}</math> will have one column corresponding to each training example.
 To compute the PCA whitened data <math>\textstyle x_{\rm PCAwhite}</math>, use
-\begin{verbatim}
-xPCAwhite = diag(1./sqrt(diag(S) + epsilon)) * U' * x;
+ xPCAwhite = diag(1./sqrt(diag(S) + epsilon)) * U' * x;
-\end{verbatim}
 Since <math>S</math>'s diagonal contains the eigenvalues <math>\textstyle \lambda_i</math>,
 this turns out to be a compact way
@@ Line 56: / Line 48: @@
 Finally, you can also compute the ZCA whitened data <math>\textstyle x_{\rm ZCAwhite}</math> as:
-  xPCAwhite = U * diag(1./sqrt(diag(S) + epsilon)) * U' * x;
+  xZCAwhite = U * diag(1./sqrt(diag(S) + epsilon)) * U' * x;
+{{PCA}}
+{{Languages|实现主成分分析和白化|中文}}