PCA

From Ufldl

Jump to: navigation, search
m (sigma bug - should be 1/m xx^T, not just xx^T)
Line 201: Line 201:
approximation to the data.  
approximation to the data.  
-
To decide how to set <math>\textstyle k</math>, we will usually look at the {\bf percentage of variance
+
To decide how to set <math>\textstyle k</math>, we will usually look at the '''percentage of variance
-
retained} for different values of <math>\textstyle k</math>.  Concretely, if <math>\textstyle k=n</math>, then we have
+
retained''' for different values of <math>\textstyle k</math>.  Concretely, if <math>\textstyle k=n</math>, then we have
-
an exact approximation to the data, and we say that 100\% of the variance is
+
an exact approximation to the data, and we say that 100% of the variance is
retained.  I.e., all of the variation of the original data is retained.   
retained.  I.e., all of the variation of the original data is retained.   
Conversely, if <math>\textstyle k=0</math>, then we are approximating all the data with the zero vector,
Conversely, if <math>\textstyle k=0</math>, then we are approximating all the data with the zero vector,
-
and thus 0\% of the variance is retained.  
+
and thus 0% of the variance is retained.  
More generally, let <math>\textstyle \lambda_1, \lambda_2, \ldots, \lambda_n</math> be the eigenvalues  
More generally, let <math>\textstyle \lambda_1, \lambda_2, \ldots, \lambda_n</math> be the eigenvalues  
Line 217: Line 217:
In our simple 2D example above, <math>\textstyle \lambda_1 = 7.29</math>, and <math>\textstyle \lambda_2 = 0.69</math>.  Thus,
In our simple 2D example above, <math>\textstyle \lambda_1 = 7.29</math>, and <math>\textstyle \lambda_2 = 0.69</math>.  Thus,
by keeping only <math>\textstyle k=1</math> principal components, we retained <math>\textstyle 7.29/(7.29+0.69) = 0.913</math>,
by keeping only <math>\textstyle k=1</math> principal components, we retained <math>\textstyle 7.29/(7.29+0.69) = 0.913</math>,
-
or 91.3\% of the variance.
+
or 91.3% of the variance.
A more formal definition of percentage of variance retained is beyond the scope
A more formal definition of percentage of variance retained is beyond the scope
Line 229: Line 229:
and for which we would incur a greater approximation error if we were to set them to zero.  
and for which we would incur a greater approximation error if we were to set them to zero.  
-
In the case of images, one common heuristic is to choose <math>\textstyle k</math> so as to retain 99\% of
+
In the case of images, one common heuristic is to choose <math>\textstyle k</math> so as to retain 99% of
the variance.  In other words, we pick the smallest value of <math>\textstyle k</math> that satisfies  
the variance.  In other words, we pick the smallest value of <math>\textstyle k</math> that satisfies  
:<math>\begin{align}
:<math>\begin{align}
Line 235: Line 235:
\end{align}</math>
\end{align}</math>
Depending on the application, if you are willing to incur some  
Depending on the application, if you are willing to incur some  
-
additional error, values in the 90-98\% range are also sometimes used.  When you
+
additional error, values in the 90-98% range are also sometimes used.  When you
-
describe to others how you applied PCA, saying that you chose <math>\textstyle k</math> to retain 95\% of
+
describe to others how you applied PCA, saying that you chose <math>\textstyle k</math> to retain 95% of
the variance will also be a much more easily interpretable description than saying
the variance will also be a much more easily interpretable description than saying
that you retained 120 (or whatever other number of) components.
that you retained 120 (or whatever other number of) components.

Revision as of 01:02, 23 April 2011

Personal tools