PCA

Revision as of 03:25, 6 April 2011 (view source)

Cyfoo (Talk | contribs)

m (sigma bug - should be 1/m xx^T, not just xx^T)

← Older edit

Revision as of 01:02, 23 April 2011 (view source)

Maiyifan (Talk | contribs)

Newer edit →

Line 201:

approximation to the data.

-

To decide how to set <math>\textstyle k</math>, we will usually look at the ~~{\bf~~ percentage of variance

+

To decide how to set <math>\textstyle k</math>, we will usually look at the '''percentage of variance

-

retained} for different values of <math>\textstyle k</math>. Concretely, if <math>\textstyle k=n</math>, then we have

+

retained''' for different values of <math>\textstyle k</math>. Concretely, if <math>\textstyle k=n</math>, then we have

-

an exact approximation to the data, and we say that 100\% of the variance is

+

an exact approximation to the data, and we say that 100% of the variance is

retained. I.e., all of the variation of the original data is retained.

Conversely, if <math>\textstyle k=0</math>, then we are approximating all the data with the zero vector,

-

and thus 0\% of the variance is retained.

+

and thus 0% of the variance is retained.

More generally, let <math>\textstyle \lambda_1, \lambda_2, \ldots, \lambda_n</math> be the eigenvalues

Line 217:

In our simple 2D example above, <math>\textstyle \lambda_1 = 7.29</math>, and <math>\textstyle \lambda_2 = 0.69</math>. Thus,

by keeping only <math>\textstyle k=1</math> principal components, we retained <math>\textstyle 7.29/(7.29+0.69) = 0.913</math>,

-

or 91.3\% of the variance.

+

or 91.3% of the variance.

A more formal definition of percentage of variance retained is beyond the scope

Line 229:

and for which we would incur a greater approximation error if we were to set them to zero.

-

In the case of images, one common heuristic is to choose <math>\textstyle k</math> so as to retain 99\% of

+

In the case of images, one common heuristic is to choose <math>\textstyle k</math> so as to retain 99% of

the variance. In other words, we pick the smallest value of <math>\textstyle k</math> that satisfies

:<math>\begin{align}

Line 235:

\end{align}</math>

Depending on the application, if you are willing to incur some

-

additional error, values in the 90-98\% range are also sometimes used. When you

+

additional error, values in the 90-98% range are also sometimes used. When you

-

describe to others how you applied PCA, saying that you chose <math>\textstyle k</math> to retain 95\% of

+

describe to others how you applied PCA, saying that you chose <math>\textstyle k</math> to retain 95% of

the variance will also be a much more easily interpretable description than saying

that you retained 120 (or whatever other number of) components.

From Ufldl

Revision as of 01:02, 23 April 2011

Views

Personal tools

ufldl resources

wiki

Search

Toolbox

@@ Line 201: / Line 201: @@
 approximation to the data.
-To decide how to set <math>\textstyle k</math>, we will usually look at the {\bf percentage of variance
+To decide how to set <math>\textstyle k</math>, we will usually look at the '''percentage of variance
-retained} for different values of <math>\textstyle k</math>.  Concretely, if <math>\textstyle k=n</math>, then we have
+retained''' for different values of <math>\textstyle k</math>.  Concretely, if <math>\textstyle k=n</math>, then we have
-an exact approximation to the data, and we say that 100\% of the variance is
+an exact approximation to the data, and we say that 100% of the variance is
 retained.  I.e., all of the variation of the original data is retained.
 Conversely, if <math>\textstyle k=0</math>, then we are approximating all the data with the zero vector,
-and thus 0\% of the variance is retained.
+and thus 0% of the variance is retained.
 More generally, let <math>\textstyle \lambda_1, \lambda_2, \ldots, \lambda_n</math> be the eigenvalues
@@ Line 217: / Line 217: @@
 In our simple 2D example above, <math>\textstyle \lambda_1 = 7.29</math>, and <math>\textstyle \lambda_2 = 0.69</math>.  Thus,
 by keeping only <math>\textstyle k=1</math> principal components, we retained <math>\textstyle 7.29/(7.29+0.69) = 0.913</math>,
-or 91.3\% of the variance.
+or 91.3% of the variance.
 A more formal definition of percentage of variance retained is beyond the scope
@@ Line 229: / Line 229: @@
 and for which we would incur a greater approximation error if we were to set them to zero.
-In the case of images, one common heuristic is to choose <math>\textstyle k</math> so as to retain 99\% of
+In the case of images, one common heuristic is to choose <math>\textstyle k</math> so as to retain 99% of
 the variance.  In other words, we pick the smallest value of <math>\textstyle k</math> that satisfies
 :<math>\begin{align}
@@ Line 235: / Line 235: @@
 \end{align}</math>
 Depending on the application, if you are willing to incur some
-additional error, values in the 90-98\% range are also sometimes used.  When you
+additional error, values in the 90-98% range are also sometimes used.  When you
-describe to others how you applied PCA, saying that you chose <math>\textstyle k</math> to retain 95\% of
+describe to others how you applied PCA, saying that you chose <math>\textstyle k</math> to retain 95% of
 the variance will also be a much more easily interpretable description than saying
 that you retained 120 (or whatever other number of) components.