Implementing PCA/Whitening

From Ufldl

Jump to: navigation, search
m (sigma bug)
Line 4: Line 4:
First, we need to compute <math>\textstyle \Sigma = \frac{1}{m} \sum_{i=1}^m (x^{(i)})(x^{(i)})^T</math>.  If you're implementing this in Matlab (or even if you're implementing this in C++, Java, etc., but have access to an efficient linear algebra library), doing it as an explicit sum is inefficient. Instead, we can instead compute this in one fell swoop as  
First, we need to compute <math>\textstyle \Sigma = \frac{1}{m} \sum_{i=1}^m (x^{(i)})(x^{(i)})^T</math>.  If you're implementing this in Matlab (or even if you're implementing this in C++, Java, etc., but have access to an efficient linear algebra library), doing it as an explicit sum is inefficient. Instead, we can instead compute this in one fell swoop as  
-
  Sigma = x * x' / size(x, 2);
+
  sigma = x * x' / size(x, 2);
(Check the math yourself for correctness.)  
(Check the math yourself for correctness.)  
Here, we assume that <math>x</math> is a data structure that contains one training example per column (so, <math>x</math> is a <math>\textstyle n</math>-by-<math>\textstyle m</math> matrix).  
Here, we assume that <math>x</math> is a data structure that contains one training example per column (so, <math>x</math> is a <math>\textstyle n</math>-by-<math>\textstyle m</math> matrix).  
-
Next, PCA computes the eigenvectors of {\tt Sigma}.  One could do this using the Matlab <math>eig</math> function.  However, because <math>Sigma</math> is a symmetric positive semi-definite matrix, it is more numerically reliable to do this using the <math>svd</math> function. Concretely, if you implement  
+
Next, PCA computes the eigenvectors of <math>\Sigma</math>.  One could do this using the Matlab <tt>eig</tt> function.  However, because <math>\Sigma</math> is a symmetric positive semi-definite matrix, it is more numerically reliable to do this using the <tt>svd</tt> function. Concretely, if you implement  
-
  [U,S,V] = svd(Sigma);
+
  [U,S,V] = svd(sigma);
then the matrix <math>U</math> will contain the eigenvectors of <math>Sigma</math> (one eigenvector per column,  sorted in order from top to bottom eigenvector), and the diagonal entries of the matrix <math>S</math> will contain the corresponding eigenvalues (also sorted in decreasing order).  The matrix <math>V</math> will be equal to transpose of <math>U</math>, and can be safely ignored.
then the matrix <math>U</math> will contain the eigenvectors of <math>Sigma</math> (one eigenvector per column,  sorted in order from top to bottom eigenvector), and the diagonal entries of the matrix <math>S</math> will contain the corresponding eigenvalues (also sorted in decreasing order).  The matrix <math>V</math> will be equal to transpose of <math>U</math>, and can be safely ignored.
-
(Note: The <math>svd</math> function actually computes the singular vectors and singular values of a matrix, which for the special case of a symmetric positive semi-definite matrix---which is all that we're concerned with here---is equal to its eigenvectors and eigenvalues.  A full discussion of singular vectors vs. eigenvectors is beyond the scope of these notes.)
+
(Note: The <tt>svd</tt> function actually computes the singular vectors and singular values of a matrix, which for the special case of a symmetric positive semi-definite matrix---which is all that we're concerned with here---is equal to its eigenvectors and eigenvalues.  A full discussion of singular vectors vs. eigenvectors is beyond the scope of these notes.)
Finally, you can compute <math>\textstyle x_{\rm rot}</math> and <math>\textstyle \tilde{x}</math> as follows:
Finally, you can compute <math>\textstyle x_{\rm rot}</math> and <math>\textstyle \tilde{x}</math> as follows:
Line 25: Line 25:
Incidentally, if <math>x</math> is a <math>\textstyle n</math>-by-<math>\textstyle m</math> matrix containing all your training data, this is a vectorized
Incidentally, if <math>x</math> is a <math>\textstyle n</math>-by-<math>\textstyle m</math> matrix containing all your training data, this is a vectorized
implementation, and the expressions
implementation, and the expressions
-
above work too for computing <math>xrot</math> and <math>xtilde</math> for your entire training set
+
above work too for computing <math>x_{rot}</math> and <math>\tilde{x}</math> for your entire training set
all in one go.  The resulting  
all in one go.  The resulting  
-
<math>xrot</math> and <math>xtilde</math> will have one column corresponding to each training example.  
+
<math>xrot</math> and <math>\tilde{x}</math> will have one column corresponding to each training example.  
To compute the PCA whitened data <math>\textstyle x_{\rm PCAwhite}</math>, use  
To compute the PCA whitened data <math>\textstyle x_{\rm PCAwhite}</math>, use  
-
\begin{verbatim}
+
 
-
xPCAwhite = diag(1./sqrt(diag(S) + epsilon)) * U' * x;
+
xPCAwhite = diag(1./sqrt(diag(S) + epsilon)) * U' * x;
-
\end{verbatim}
+
 
Since <math>S</math>'s diagonal contains the eigenvalues <math>\textstyle \lambda_i</math>,  
Since <math>S</math>'s diagonal contains the eigenvalues <math>\textstyle \lambda_i</math>,  
this turns out to be a compact way  
this turns out to be a compact way  

Revision as of 03:30, 6 April 2011

Personal tools