Implementing PCA/Whitening
From Ufldl
Line 6: | Line 6: | ||
We achieve this by computing the mean for each patch and subtracting it for each patch. In Matlab, we can do this by using | We achieve this by computing the mean for each patch and subtracting it for each patch. In Matlab, we can do this by using | ||
- | avg = mean(x, 1); | + | avg = mean(x, 1); % Compute the mean pixel intensity value separately for each patch. |
x = x - repmat(avg, size(x, 1), 1); | x = x - repmat(avg, size(x, 1), 1); | ||
- | Next, we need to compute <math>\textstyle \Sigma = \frac{1}{m} \sum_{i=1}^m (x^{(i)})(x^{(i)})^T</math>. If you're implementing this in Matlab (or even if you're implementing this in C++, Java, etc., but have access to an efficient linear algebra library), doing it as an explicit sum is inefficient. Instead, we can | + | Next, we need to compute <math>\textstyle \Sigma = \frac{1}{m} \sum_{i=1}^m (x^{(i)})(x^{(i)})^T</math>. If you're implementing this in Matlab (or even if you're implementing this in C++, Java, etc., but have access to an efficient linear algebra library), doing it as an explicit sum is inefficient. Instead, we can compute this in one fell swoop as |
sigma = x * x' / size(x, 2); | sigma = x * x' / size(x, 2); | ||
Line 26: | Line 26: | ||
Finally, you can compute <math>\textstyle x_{\rm rot}</math> and <math>\textstyle \tilde{x}</math> as follows: | Finally, you can compute <math>\textstyle x_{\rm rot}</math> and <math>\textstyle \tilde{x}</math> as follows: | ||
- | xRot = U | + | xRot = U' * x; % rotated version of the data. |
- | xTilde = U(:,1:k) * | + | xTilde = U(:,1:k)' * x; % reduced dimension representation of the data, |
- | + | % where k is the number of eigenvectors to keep | |
This gives your PCA representation of the data in terms of <math>\textstyle \tilde{x} \in \Re^k</math>. | This gives your PCA representation of the data in terms of <math>\textstyle \tilde{x} \in \Re^k</math>. | ||
Incidentally, if <math>x</math> is a <math>\textstyle n</math>-by-<math>\textstyle m</math> matrix containing all your training data, this is a vectorized | Incidentally, if <math>x</math> is a <math>\textstyle n</math>-by-<math>\textstyle m</math> matrix containing all your training data, this is a vectorized | ||
implementation, and the expressions | implementation, and the expressions | ||
- | above work too for computing <math>x_{rot}</math> and <math>\tilde{x}</math> for your entire training set | + | above work too for computing <math>x_{\rm rot}</math> and <math>\tilde{x}</math> for your entire training set |
all in one go. The resulting | all in one go. The resulting | ||
- | <math> | + | <math>x_{\rm rot}</math> and <math>\tilde{x}</math> will have one column corresponding to each training example. |
To compute the PCA whitened data <math>\textstyle x_{\rm PCAwhite}</math>, use | To compute the PCA whitened data <math>\textstyle x_{\rm PCAwhite}</math>, use | ||
Line 49: | Line 49: | ||
xZCAwhite = U * diag(1./sqrt(diag(S) + epsilon)) * U' * x; | xZCAwhite = U * diag(1./sqrt(diag(S) + epsilon)) * U' * x; | ||
+ | |||
+ | |||
+ | {{PCA}} | ||
+ | |||
+ | |||
+ | {{Languages|实现主成分分析和白化|中文}} |