主成分分析

Revision as of 15:58, 18 March 2013 (view source)

Kandeng (Talk | contribs)

← Older edit

Revision as of 16:19, 18 March 2013 (view source)

Kandeng (Talk | contribs)

Newer edit →

Line 64:

[[File:PCA-rotated.png|600px]]

-

这就是把训练数据集旋转到 <math>\textstyle u_1</math>，<math>\textstyle u_2</math> 基后的结果。一般而言，运算 <math>\textstyle U^Tx</math> 表示旋转到基 <math>\textstyle u_1</math>,<math>\textstyle u_2</math>, ...,<math>\textstyle u_n</math> 之上的训练数据。矩阵 <math>\textstyle U</math> 有正交性，即满足 <math>\textstyle U^TU = UU^T = I</math> ，所以若想将旋转后的向量 <math>\textstyle x_{\rm rot}</math> 还原为原始数据 <math>\textstyle x</math> ，将其左乘矩阵<math>\textstyle U</math>即可： <math>\textstyle x=~~Ux_~~{\rm rot}</math> , 验算一下：

+

这就是把训练数据集旋转到 <math>\textstyle u_1</math>，<math>\textstyle u_2</math> 基后的结果。一般而言，运算 <math>\textstyle U^Tx</math> 表示旋转到基 <math>\textstyle u_1</math>,<math>\textstyle u_2</math>, ...,<math>\textstyle u_n</math> 之上的训练数据。矩阵 <math>\textstyle U</math> 有正交性，即满足 <math>\textstyle U^TU = UU^T = I</math> ，所以若想将旋转后的向量 <math>\textstyle x_{\rm rot}</math> 还原为原始数据 <math>\textstyle x</math> ，将其左乘矩阵<math>\textstyle U</math>即可： <math>\textstyle x=U x_{\rm rot}</math> , 验算一下： <math>\textstyle U x_{\rm rot} = UU^T x = x</math>.

-

+

-

~~:<math>\begin{align}~~

+

-

~~x = U x_{\rm rot} ,~~

+

-

~~\end{align}</math>~~

+

-

~~because~~ <math>\textstyle U x_{\rm rot} = UU^T x = x</math>.

+

== 数据降维 ==

-

+

数据的主方向就是旋转数据的第一维 <math>\textstyle x_{{\rm rot},1}</math> 。因此，若想把这数据降到一维，可令：

-

+

-

~~【原文】：We see that the principal direction of variation of the data is the first~~

+

-

~~dimension~~ <math>\textstyle x_{{\rm rot},1}</math> ~~of this rotated data. Thus, if we want to~~

+

-

~~reduce this data to one dimension, we can set~~

+

-

+

-

~~【初译】：数据变化的主方向相应于旋转数据的第一维<math>\textstyle x_{{\rm rot},1}</math>。因此，如若想把数据降到一维，我们仅需令：~~

+

-

+

-

~~【一审】：数据变化的主方向相应于旋转数据的第一维<math>\textstyle x_{{\rm rot},1}</math>。因此，如若想把数据降到一维，我们仅需令：~~

+

-

+

-

~~【二审】：数据变化的主方向是旋转数据的第一维<math>\textstyle x_{{\rm rot},1}</math>。因此，若想把这数据降到一维，需令：~~

+

:<math>\begin{align}

Line 88:

Line 73:

\end{align}</math>

+

更一般的，假如想把数据 <math>\textstyle x \in \Re^n</math> 降到 <math>\textstyle k</math> 维表示

+

<math>\textstyle \tilde{x} \in \Re^k</math> （令 <math>\textstyle k < n</math> ）,只需选取 <math>\textstyle x_{\rm rot}</math> 的前 <math>\textstyle k</math> 个成分，分别对应前 <math>\textstyle k</math> 个数据变化的主方向。

-

~~【原文】：More generally, if~~ <math>\textstyle x \~~in \Re^n~~</math> ~~and we want to reduce it to~~

+

PCA的另外一种解释是：<math>\textstyle x_{\rm rot}</math> 是一个 <math>\textstyle n</math> 维向量，其中前几个成分可能比较大（例如，上例中大部分样本第一个成分 <math>\textstyle x_{{\rm rot},1}^{(i)} = u_1^Tx^{(i)}</math> 的取值相对较大），而后面成分可能会比较小（例如，上例中大部分样本的 <math>\textstyle x_{{\rm rot},2}^{(i)} = u_2^Tx^{(i)}</math> 较小）。

-

a <math>\textstyle k</math> ~~dimensional representation~~ <math>\textstyle \~~tilde~~{x} ~~\in \Re~~^~~k</math>~~ (~~where <math>\textstyle k < n</math>~~)~~, we would~~

+

-

~~take the first <math>\textstyle k~~</math> ~~components of~~ <math>\textstyle x_{\rm rot}~~</math>~~, ~~which correspond to~~

+

-

~~the top <math>\textstyle k~~</math> ~~directions of variation.~~

+

-

【初译】：一般情况下，假如想把<math>\textstyle x \in \Re^n</math>的数据降到<math>\textstyle k</math>维，即<math>\textstyle \tilde{x} \in \Re^k</math>（<math>\textstyle k < n</math>），需要选取<math>\textstyle x_{\rm rot}</math>~~的前<math>\textstyle k</math>个成分，其相应于变量的前<math>\textstyle k</math>个主方向。~~

+

PCA算法做的其实就是丢弃 <math>\textstyle x_{\rm rot}</math> 中后面（取值较小）的成分，就是将这些成分的值近似为零。具体的说，设 <math>\textstyle \tilde{x}</math> 是 <math>\textstyle x_{{\rm rot}}</math> 的近似表示，那么将 <math>\textstyle x_{{\rm rot}}</math> 除了前 <math>\textstyle k</math> 个成分外，其余全赋值为零，就得到：

-

+

-

~~【一审】：一般情况下，假如想把<math>\textstyle x \in \Re^n</math>的数据降到<math>\textstyle k</math>维，即~~<math>\textstyle \tilde{x} ~~\in \Re^k~~</math>~~（<math>\textstyle k < n</math>），需要选取~~<math>\textstyle x_{~~\rm rot}</math>的前<math>\textstyle k</math>个成分，其相应于变量的前<math>\textstyle k</math>个方向。~~

+

-

+

-

【二审】：更一般的情况下，假如想把数据<math>\textstyle x \in \Re^n</math>降到<math>\textstyle k</math>维表示<math>\textstyle \tilde{x} \in \Re^k</math>（令<math>\textstyle k < n</math>），需要选取<math>\textstyle x_{\rm rot}~~</math>的前<math>\textstyle k</math>个成分，对应方差的前<math>\textstyle k</math>个方向。~~

+

-

+

-

+

-

~~【原文】：Another way of explaining PCA is that <math>\textstyle x_{\rm rot}</math> is an <math>\textstyle n</math> dimensional~~

+

-

~~vector, where the first few components are likely to~~

+

-

~~be large (e.g., in our example, we saw that <math>\textstyle x_{{\rm rot},1}^{(i)} = u_1^Tx^{(i)}</math> takes~~

+

-

~~reasonably large values for most examples <math>\textstyle i</math>), and~~

+

-

~~the later components are likely to be small (e.g., in our example,~~

+

-

~~<math>\textstyle x_{{\rm rot},2}^{(i)} = u_2^Tx^{(i)}</math> was more likely to be small).~~

+

-

+

-

【初译】：PCA的另外一种解释是：<math>\textstyle x_{\rm rot}</math>是一个<math>\textstyle n</math>维的向量，其中前面几维比较大（例如，上例中对于大部分的样本<math>\textstyle i</math>通过<math>\textstyle x_{{\rm rot},1}^{(i)} = u_1^Tx^{(i)}</math>计算出了比较大的值），后面几维可能会比较小（例如，上例中<math>\textstyle x_{{\rm rot},2}^{(i)} = u_2^Tx^{(i)}</math>计算出了值较小）。

+

-

+

-

【一审】：PCA的另外一种解释是：<math>\textstyle x_{\rm rot}</math>是一个<math>\textstyle n</math>维的向量，其中前面几维比较大（例如，上例中对于大部分的样本<math>\textstyle i</math>通过<math>\textstyle x_{{\rm rot},1}^{(i)} = u_1^Tx^{(i)}</math>计算出了比较大的值），后面几维可能会比较小（例如，上例中<math>\textstyle x_{{\rm rot},2}^{(i)} = u_2^Tx^{(i)}</math>计算出了值较小）。

+

-

+

-

【二审】：PCA的另外一种解释是：<math>\textstyle x_{\rm rot}</math>是一个<math>\textstyle n</math>维向量，其中前几维比较大（例如，上例中对于大部分样本<math>\textstyle i</math>，通过<math>\textstyle x_{{\rm rot},1}^{(i)} = u_1^Tx^{(i)}</math>计算出了比较大的值），后面的成分可能会比较小（例如，上例中<math>\textstyle x_{{\rm rot},2}^{(i)} = u_2^Tx^{(i)}</math>计算出的值较小）。

+

-

+

-

+

-

~~【原文】：What~~

+

-

~~PCA does it it~~

+

-

~~drops the the later (smaller) components of <math>\textstyle x_{\rm rot}</math>, and~~

+

-

~~just approximates them with 0's. Concretely, our definition of~~

+

-

~~<math>\textstyle \tilde{x~~}</math> ~~can also be arrived at by using an approximation to~~

+

-

<math>\textstyle x_{{\rm rot}}</math> ~~where~~

+

-

~~all but the first~~

+

-

<math>\textstyle k</math> ~~components are zeros. In other words, we have:~~

+

:<math>\begin{align}

Line 146:

Line 102:

\end{align}</math>

-

~~In our example, this gives us the following plot of~~ <math>\textstyle \tilde{x}</math> ~~(using <math>\textstyle n=2, k=1</math>):~~

+

在本例中，可得 <math>\textstyle \tilde{x}</math> 的点图如下（取 <math>\textstyle n=2, k=1</math> ）：

-

+

-

【初译】：而PCA算法就是丢弃<math>\textstyle x_{\rm rot}</math>中后面的（较小的）成分，把他们赋值为零。更直观的说就是，<math>\textstyle \tilde{x}</math>也可以用<math>\textstyle x_{{\rm rot}}</math>的除了前<math>\textstyle k</math>个成分，其余全赋值为零来表示。换句话说就是，我们可以运用如下算式：

+

-

~~:<math>\begin{align}~~

+

-

~~\tilde{x} =~~

+

-

~~\begin{bmatrix}~~

+

-

~~x_{{\rm rot},1} \\~~

+

-

~~\vdots \\~~

+

-

~~x_{{\rm rot},k} \\~~

+

-

~~0 \\~~

+

-

~~\vdots \\~~

+

-

~~0 \\~~

+

-

~~\end{bmatrix}~~

+

-

~~\approx~~

+

-

~~\begin{bmatrix}~~

+

-

~~x_{{\rm rot},1} \\~~

+

-

~~\vdots \\~~

+

-

~~x_{{\rm rot},k} \\~~

+

-

~~x_{{\rm rot},k+1} \\~~

+

-

~~\vdots \\~~

+

-

~~x_{{\rm rot},n}~~

+

-

~~\end{bmatrix}~~

+

-

~~= x_{\rm rot}~~

+

-

~~\end{align}</math>~~

+

-

+

-

~~在我们的实验中，得到了如下的<math>\textstyle \tilde{x}</math>的图形（取<math>\textstyle n=2, k=1</math>）：~~

+

-

+

-

【一审】：而PCA算法就是丢弃<math>\textstyle x_{\rm rot}</math>中后面的（较小的）成分，把他们赋值为零。更直观地说就是，<math>\textstyle \tilde{x}</math>也可以用<math>\textstyle x_{{\rm rot}}</math>的除了前<math>\textstyle k</math>个成分，其余全赋值为零来表示。换句话说就是，我们可以运用如下算式：

+

-

~~:<math>\begin{align}~~

+

-

~~\tilde{x} =~~

+

-

~~\begin{bmatrix}~~

+

-

~~x_{{\rm rot},1} \\~~

+

-

~~\vdots \\~~

+

-

~~x_{{\rm rot},k} \\~~

+

-

~~0 \\~~

+

-

~~\vdots \\~~

+

-

~~0 \\~~

+

-

~~\end{bmatrix}~~

+

-

~~\approx~~

+

-

~~\begin{bmatrix}~~

+

-

~~x_{{\rm rot},1} \\~~

+

-

~~\vdots \\~~

+

-

~~x_{{\rm rot},k} \\~~

+

-

~~x_{{\rm rot},k+1} \\~~

+

-

~~\vdots \\~~

+

-

~~x_{{\rm rot},n}~~

+

-

~~\end{bmatrix}~~

+

-

~~= x_{\rm rot}~~

+

-

~~\end{align}</math>~~

+

-

+

-

~~在我们的实验中，得到了如下的<math>\textstyle \tilde{x}</math>的图形（取<math>\textstyle n=2, k=1</math>）：~~

+

-

+

-

【二审】：PCA算法做的就是丢弃<math>\textstyle x_{\rm rot}</math>后面的（较小的）成分，将其取近似为零。具体的说，<math>\textstyle \tilde{x}</math>可以定义为，<math>\textstyle x_{{\rm rot}}</math>除了前<math>\textstyle k</math>个成分，其余全赋值为零的近似表示。换句话说：

+

-

~~:<math>\begin{align}~~

+

-

~~\tilde{x} =~~

+

-

~~\begin{bmatrix}~~

+

-

~~x_{{\rm rot},1} \\~~

+

-

~~\vdots \\~~

+

-

~~x_{{\rm rot},k} \\~~

+

-

~~0 \\~~

+

-

~~\vdots \\~~

+

-

~~0 \\~~

+

-

~~\end{bmatrix}~~

+

-

~~\approx~~

+

-

~~\begin{bmatrix}~~

+

-

~~x_{{\rm rot},1} \\~~

+

-

~~\vdots \\~~

+

-

~~x_{{\rm rot},k} \\~~

+

-

~~x_{{\rm rot},k+1} \\~~

+

-

~~\vdots \\~~

+

-

~~x_{{\rm rot},n}~~

+

-

~~\end{bmatrix}~~

+

-

~~= x_{\rm rot}~~

+

-

~~\end{align}</math>~~

+

-

~~在我们的示例中，可得<math>\textstyle \tilde{x}</math>的图形如下（取~~<math>\textstyle n=2, k=1</math>）：

+

[[File:PCA-xtilde.png | 600px]]

+

然而，由于上面 <math>\textstyle \tilde{x}</math> 的后<math>\textstyle n-k</math>项均为零，没必要把这些零项保留下来。所以，我们仅用前 <math>\textstyle n-k</math> 个（非零）成分来定义 <math>\textstyle n-k</math> 维向量 <math>\textstyle \tilde{x}</math> 。

-

~~【原文】：However, since the final <math>\textstyle n-k</math> components of <math>\textstyle \tilde{x}</math> as defined above would~~

+

这也解释了我们为什么会以 <math>\textstyle u_1, u_2, \ldots, u_n</math> 为基来表示数据：要决定保留哪些成分变得很简单，只需取前 <math>\textstyle k</math> 个成分即可。这时也可以说，我们“保留了前 <math>\textstyle k</math> 个PCA（主）成分”。

-

~~always be zero, there is no need to keep these zeros around, and so we~~

+

-

~~define <math>\textstyle \tilde{x}</math> as a <math>\textstyle k</math>-dimensional vector with just the first <math>\textstyle k</math> (non-zero) components.~~

+

-

+

-

【初译】：然而，由于上面<math>\textstyle \tilde{x}</math>的后<math>\textstyle n-k</math>项都归为了零，所以也就没必要把这些零项保留下来。因此，我们仅用前<math>\textstyle k</math>个成分来定义<math>\textstyle k</math>维向量的<math>\textstyle \tilde{x}</math>。

+

-

+

-

【一审】：然而，由于上面<math>\textstyle \tilde{x}</math>的后<math>\textstyle n-k</math>项都归为了零，所以也就没必要把这些零项保留下来。因此，我们仅用前<math>\textstyle k</math>个成分来定义<math>\textstyle k</math>维向量的<math>\textstyle \tilde{x}</math>。

+

-

+

-

【二审】：然而，由于上面<math>\textstyle \tilde{x}</math>的后<math>\textstyle n-k</math>项均为零，所以没必要把这些零项保留下来。因此，我们仅用前<math>\textstyle k</math>个（非零）成分来定义<math>\textstyle k</math>维向量<math>\textstyle \tilde{x}</math>。

+

-

+

-

+

-

~~【原文】：This also explains why we wanted to express our data in the~~ <math>\textstyle u_1, u_2, \ldots, u_n</math> ~~basis:~~

+

-

~~Deciding which components to keep becomes just keeping the top~~ <math>\textstyle k</math> ~~components. When we~~

+

-

~~do this, we also say that we are "retaining the top <math>\textstyle k</math> PCA (or principal) components."~~

+

-

+

-

【初译】：这也解释了为什么会以<math>\textstyle u_1, u_2, \ldots, u_n</math>为基来表示我们的数据：任务从决定哪些成分需要保留变成了只需简单选取<math>\textstyle k</math>个成分。这种运算我们也可以描述为“我们保留了<math>\textstyle k</math>个PCA（主）成分”。

+

-

+

-

【一审】：这也解释了为什么会以<math>\textstyle u_1, u_2, \ldots, u_n</math>为基来表示我们的数据：从决定哪些成分需要保留演变成了只需简单选取<math>\textstyle k</math>个关键成分。这种运算我们也可以描述为“我们保留了<math>\textstyle k</math>个PCA（主）成分”。

+

-

+

-

【二审】：这也解释了我们为什么会以<math>\textstyle u_1, u_2, \ldots, u_n</math>为基来表示数据：从决定哪些成分需要保留，变成只需简单选取前<math>\textstyle k</math>个成分。当我们这么做的时候，可以描述为我们“保留了前<math>\textstyle k</math>个PCA（主）成分”。

+

-

+

-

+

-

+

-

~~== Recovering an Approximation of the Data 数据还原 ==~~

+

-

+

-

+

-

~~【原文】：Now, <math>\textstyle \tilde{x} \in \Re^k</math> is a lower-dimensional, "compressed" representation~~

+

-

~~of the original <math>\textstyle x \in \Re^n</math>. Given <math>\textstyle \tilde{x}</math>, how can we recover an approximation <math>\textstyle \hat{x}</math> to~~

+

-

~~the original value of <math>\textstyle x</math>? From an [[#Rotating the Data|earlier section]], we know that <math>\textstyle x = U x_{\rm rot}</math>. Further,~~

+

-

~~we can think of <math>\textstyle \tilde{x}</math> as an approximation to <math>\textstyle x_{\rm rot}</math>, where we have~~

+

-

~~set the last <math>\textstyle n-k</math> components to zeros. Thus, given <math>\textstyle \tilde{x} \in \Re^k</math>, we can~~

+

-

~~pad it out with <math>\textstyle n-k</math> zeros to get our approximation to <math>\textstyle x_{\rm rot} \in \Re^n</math>. Finally, we pre-multiply~~

+

-

~~by <math>\textstyle U</math> to get our approximation to <math>\textstyle x</math>. Concretely, we get~~

+

-

+

-

【初译】：经过上一步骤，我们得到了对应原始数据<math>\textstyle x \in \Re^n</math>的低维“压缩”表征量<math>\textstyle \tilde{x} \in \Re^k</math>，反过来，如果给定<math>\textstyle \tilde{x}</math>，我们应如何尽可能地还原原始数据<math>\textstyle \hat{x}</math>呢？由本章第三节可知<math>\textstyle x = U x_{\rm rot}</math>，且<math>\textstyle \tilde{x}</math>可以看作<math>\textstyle x_{\rm rot}</math>的近似值，因为<math>\textstyle \tilde{x}</math>只是将<math>\textstyle x_{\rm rot}</math>最后的<math>\textstyle n-k</math>个元素用0代替并省略而得到，因此如果给定<math>\textstyle \tilde{x} \in \Re^k</math>，可以通过在其末尾添加<math>\textstyle n-k</math>个0来得到<math>\textstyle x_{\rm rot} \in \Re^n</math>的近似，接着用<math>\textstyle U</math>左乘该<math>\textstyle x_{\rm rot}</math>近似值便可得到对原数据<math>\textstyle x</math>的近似还原。具体来说，我们需进行如下计算：

+

-

+

-

【一审】：现在，<math>\textstyle \tilde{x} \in \Re^k</math>成为<math>\textstyle x \in \Re^n</math>在低维空间的一个“压缩”表征。反过来，如果给定<math>\textstyle \tilde{x}</math>，我们应如何尽可能地还原原始数据<math>\textstyle \hat{x}</math>呢？由本章第三节可知<math>\textstyle x = U x_{\rm rot}</math>，且<math>\textstyle \tilde{x}</math>可以看作<math>\textstyle x_{\rm rot}</math>的近似值，因为<math>\textstyle \tilde{x}</math>只是将<math>\textstyle x_{\rm rot}</math>最后的<math>\textstyle n-k</math>个元素用0代替并省略而得到，因此如果给定<math>\textstyle \tilde{x} \in \Re^k</math>，可以通过在其末尾添加<math>\textstyle n-k</math>个0来得到<math>\textstyle x_{\rm rot} \in \Re^n</math>的近似，接着用<math>\textstyle U</math>左乘该<math>\textstyle x_{\rm rot}</math>近似值便可得到对原数据<math>\textstyle x</math>的近似还原。具体来说，我们需进行如下计算：

+

-

~~【二审】：现在，我们得到了原始数据~~<math>\textstyle x \in \Re^n</math>的低维“压缩”表征量<math>\textstyle \tilde{x} \in \Re^k</math>，反过来，如果给定<math>\textstyle \tilde{x}</math>，我们应如何还原原始数据<math>\textstyle ~~\hat{~~x}</math>~~呢？查看以往章节可知~~<math>\textstyle x = U x_{\rm rot}</math>~~，且可把~~<math>\textstyle \tilde{x}</math>看作将<math>\textstyle x_{\rm rot}</math>的最后<math>\textstyle n-k</math>~~个元素置0所得的近似值，因此如果给定~~<math>\textstyle \tilde{x} \in \Re^k</math>，可以通过在其末尾添加<math>\textstyle n-k</math>~~个0来得到~~<math>\textstyle x_{\rm rot} \in \Re^n</math>的近似，最后，左乘<math>\textstyle U</math>~~便可得原数据<math>\textstyle x</math>的近似。具体来说，计算如下：~~

+

== 还原近似数据 ==

+

现在，我们得到了原始数据 <math>\textstyle x \in \Re^n</math> 的低维“压缩”表征量 <math>\textstyle \tilde{x} \in \Re^k</math> ，反过来，如果给定 <math>\textstyle \tilde{x}</math> ，我们应如何还原原始数据 <math>\textstyle x</math> 呢？查看[[#Rotating the Data|以往章节]]以往章节可知，要转换回来，只需 <math>\textstyle x = U x_{\rm rot}</math> 即可。进一步，我们把 <math>\textstyle \tilde{x}</math> 看作将 <math>\textstyle x_{\rm rot}</math> 的最后 <math>\textstyle n-k</math> 个元素被置0所得的近似表示，因此如果给定 <math>\textstyle \tilde{x} \in \Re^k</math> ，可以通过在其末尾添加 <math>\textstyle n-k</math> 个0来得到对 <math>\textstyle x_{\rm rot} \in \Re^n</math> 的近似，最后，左乘 <math>\textstyle U</math> 便可近似还原出原数据。具体来说，计算如下：

:<math>\begin{align}

From Ufldl

Revision as of 16:19, 18 March 2013

Views

Personal tools

ufldl resources

wiki

Search

Toolbox

@@ Line 64: / Line 64: @@
 [[File:PCA-rotated.png|600px]]
-这就是把训练数据集旋转到 <math>\textstyle u_1</math>，<math>\textstyle u_2</math> 基后的结果。一般而言，运算 <math>\textstyle U^Tx</math> 表示旋转到基 <math>\textstyle u_1</math>,<math>\textstyle u_2</math>, ...,<math>\textstyle u_n</math> 之上的训练数据。矩阵 <math>\textstyle U</math> 有正交性，即满足  <math>\textstyle U^TU = UU^T = I</math> ，所以若想将旋转后的向量 <math>\textstyle x_{\rm rot}</math> 还原为原始数据 <math>\textstyle x</math> ，将其左乘矩阵<math>\textstyle U</math>即可： <math>\textstyle x=Ux_{\rm rot}</math> , 验算一下：
+这就是把训练数据集旋转到 <math>\textstyle u_1</math>，<math>\textstyle u_2</math> 基后的结果。一般而言，运算 <math>\textstyle U^Tx</math> 表示旋转到基 <math>\textstyle u_1</math>,<math>\textstyle u_2</math>, ...,<math>\textstyle u_n</math> 之上的训练数据。矩阵 <math>\textstyle U</math> 有正交性，即满足  <math>\textstyle U^TU = UU^T = I</math> ，所以若想将旋转后的向量 <math>\textstyle x_{\rm rot}</math> 还原为原始数据 <math>\textstyle x</math> ，将其左乘矩阵<math>\textstyle U</math>即可： <math>\textstyle x=U x_{\rm rot}</math> , 验算一下： <math>\textstyle U x_{\rm rot} =  UU^T x = x</math>.
-:<math>\begin{align}
-x = U x_{\rm rot}   ,
-\end{align}</math>
-because <math>\textstyle U x_{\rm rot} =  UU^T x = x</math>.
 == 数据降维 ==
+数据的主方向就是旋转数据的第一维 <math>\textstyle x_{{\rm rot},1}</math> 。因此，若想把这数据降到一维，可令：
-【原文】：We see that the principal direction of variation of the data is the first
-dimension <math>\textstyle x_{{\rm rot},1}</math> of this rotated data.  Thus, if we want to
-reduce this data to one dimension, we can set
-【初译】：数据变化的主方向相应于旋转数据的第一维<math>\textstyle x_{{\rm rot},1}</math>。因此，如若想把数据降到一维，我们仅需令：
-【一审】：数据变化的主方向相应于旋转数据的第一维<math>\textstyle x_{{\rm rot},1}</math>。因此，如若想把数据降到一维，我们仅需令：
-【二审】：数据变化的主方向是旋转数据的第一维<math>\textstyle x_{{\rm rot},1}</math>。因此，若想把这数据降到一维，需令：
 :<math>\begin{align}
@@ Line 88: / Line 73: @@
 \end{align}</math>
+更一般的，假如想把数据 <math>\textstyle x \in \Re^n</math> 降到 <math>\textstyle k</math> 维表示
+<math>\textstyle \tilde{x} \in \Re^k</math> （令 <math>\textstyle k < n</math> ）,只需选取 <math>\textstyle x_{\rm rot}</math> 的前 <math>\textstyle k</math> 个成分，分别对应前 <math>\textstyle k</math> 个数据变化的主方向。
-【原文】：More generally, if <math>\textstyle x \in \Re^n</math> and we want to reduce it to
+PCA的另外一种解释是：<math>\textstyle x_{\rm rot}</math> 是一个 <math>\textstyle n</math> 维向量，其中前几个成分可能比较大（例如，上例中大部分样本第一个成分 <math>\textstyle x_{{\rm rot},1}^{(i)} = u_1^Tx^{(i)}</math> 的取值相对较大），而后面成分可能会比较小（例如，上例中大部分样本的 <math>\textstyle x_{{\rm rot},2}^{(i)} = u_2^Tx^{(i)}</math> 较小）。
-a <math>\textstyle k</math> dimensional representation <math>\textstyle \tilde{x} \in \Re^k</math> (where <math>\textstyle k < n</math>), we would
-take the first <math>\textstyle k</math> components of <math>\textstyle x_{\rm rot}</math>, which correspond to
-the top <math>\textstyle k</math> directions of variation.
-【初译】：一般情况下，假如想把<math>\textstyle x \in \Re^n</math>的数据降到<math>\textstyle k</math>维，即<math>\textstyle \tilde{x} \in \Re^k</math>（<math>\textstyle k < n</math>），需要选取<math>\textstyle x_{\rm rot}</math>的前<math>\textstyle k</math>个成分，其相应于变量的前<math>\textstyle k</math>个主方向。
+PCA算法做的其实就是丢弃 <math>\textstyle x_{\rm rot}</math> 中后面（取值较小）的成分，就是将这些成分的值近似为零。具体的说，设 <math>\textstyle \tilde{x}</math> 是 <math>\textstyle x_{{\rm rot}}</math> 的近似表示，那么将 <math>\textstyle x_{{\rm rot}}</math> 除了前 <math>\textstyle k</math> 个成分外，其余全赋值为零，就得到：
-【一审】：一般情况下，假如想把<math>\textstyle x \in \Re^n</math>的数据降到<math>\textstyle k</math>维，即<math>\textstyle \tilde{x} \in \Re^k</math>（<math>\textstyle k < n</math>），需要选取<math>\textstyle x_{\rm rot}</math>的前<math>\textstyle k</math>个成分，其相应于变量的前<math>\textstyle k</math>个方向。
-【二审】：更一般的情况下，假如想把数据<math>\textstyle x \in \Re^n</math>降到<math>\textstyle k</math>维表示<math>\textstyle \tilde{x} \in \Re^k</math>（令<math>\textstyle k < n</math>），需要选取<math>\textstyle x_{\rm rot}</math>的前<math>\textstyle k</math>个成分，对应方差的前<math>\textstyle k</math>个方向。
-【原文】：Another way of explaining PCA is that <math>\textstyle x_{\rm rot}</math> is an <math>\textstyle n</math> dimensional
-vector, where the first few components are likely to
-be large (e.g., in our example, we saw that <math>\textstyle x_{{\rm rot},1}^{(i)} = u_1^Tx^{(i)}</math> takes
-reasonably large values for most examples <math>\textstyle i</math>), and
-the later components are likely to be small (e.g., in our example,
-<math>\textstyle x_{{\rm rot},2}^{(i)} = u_2^Tx^{(i)}</math> was more likely to be small).
-【初译】：PCA的另外一种解释是：<math>\textstyle x_{\rm rot}</math>是一个<math>\textstyle n</math>维的向量，其中前面几维比较大（例如，上例中对于大部分的样本<math>\textstyle i</math>通过<math>\textstyle x_{{\rm rot},1}^{(i)} = u_1^Tx^{(i)}</math>计算出了比较大的值），后面几维可能会比较小（例如，上例中<math>\textstyle x_{{\rm rot},2}^{(i)} = u_2^Tx^{(i)}</math>计算出了值较小）。
-【一审】：PCA的另外一种解释是：<math>\textstyle x_{\rm rot}</math>是一个<math>\textstyle n</math>维的向量，其中前面几维比较大（例如，上例中对于大部分的样本<math>\textstyle i</math>通过<math>\textstyle x_{{\rm rot},1}^{(i)} = u_1^Tx^{(i)}</math>计算出了比较大的值），后面几维可能会比较小（例如，上例中<math>\textstyle x_{{\rm rot},2}^{(i)} = u_2^Tx^{(i)}</math>计算出了值较小）。
-【二审】：PCA的另外一种解释是：<math>\textstyle x_{\rm rot}</math>是一个<math>\textstyle n</math>维向量，其中前几维比较大（例如，上例中对于大部分样本<math>\textstyle i</math>，通过<math>\textstyle x_{{\rm rot},1}^{(i)} = u_1^Tx^{(i)}</math>计算出了比较大的值），后面的成分可能会比较小（例如，上例中<math>\textstyle x_{{\rm rot},2}^{(i)} = u_2^Tx^{(i)}</math>计算出的值较小）。
-【原文】：What
-PCA does it it
-drops the the later (smaller) components of <math>\textstyle x_{\rm rot}</math>, and
-just approximates them with 0's.  Concretely, our definition of
-<math>\textstyle \tilde{x}</math> can also be arrived at by using an approximation to
-<math>\textstyle x_{{\rm rot}}</math> where
-all but the first
-<math>\textstyle k</math> components are zeros.  In other words, we have:
 :<math>\begin{align}
@@ Line 146: / Line 102: @@
 \end{align}</math>
-In our example, this gives us the following plot of <math>\textstyle \tilde{x}</math> (using <math>\textstyle n=2, k=1</math>):
+在本例中，可得 <math>\textstyle \tilde{x}</math> 的点图如下（取 <math>\textstyle n=2, k=1</math> ）：
-【初译】：而PCA算法就是丢弃<math>\textstyle x_{\rm rot}</math>中后面的（较小的）成分，把他们赋值为零。更直观的说就是，<math>\textstyle \tilde{x}</math>也可以用<math>\textstyle x_{{\rm rot}}</math>的除了前<math>\textstyle k</math>个成分，其余全赋值为零来表示。换句话说就是，我们可以运用如下算式：
-:<math>\begin{align}
-\tilde{x} =
-\begin{bmatrix}
-x_{{\rm rot},1} \\
-\vdots \\
-x_{{\rm rot},k} \\
-\\
-\vdots \\
-\\
-\end{bmatrix}
-\approx
-\begin{bmatrix}
-x_{{\rm rot},1} \\
-\vdots \\
-x_{{\rm rot},k} \\
-x_{{\rm rot},k+1} \\
-\vdots \\
-x_{{\rm rot},n}
-\end{bmatrix}
-= x_{\rm rot}
-\end{align}</math>
-在我们的实验中，得到了如下的<math>\textstyle \tilde{x}</math>的图形（取<math>\textstyle n=2, k=1</math>）：
-【一审】：而PCA算法就是丢弃<math>\textstyle x_{\rm rot}</math>中后面的（较小的）成分，把他们赋值为零。更直观地说就是，<math>\textstyle \tilde{x}</math>也可以用<math>\textstyle x_{{\rm rot}}</math>的除了前<math>\textstyle k</math>个成分，其余全赋值为零来表示。换句话说就是，我们可以运用如下算式：
-:<math>\begin{align}
-\tilde{x} =
-\begin{bmatrix}
-x_{{\rm rot},1} \\
-\vdots \\
-x_{{\rm rot},k} \\
-\\
-\vdots \\
-\\
-\end{bmatrix}
-\approx
-\begin{bmatrix}
-x_{{\rm rot},1} \\
-\vdots \\
-x_{{\rm rot},k} \\
-x_{{\rm rot},k+1} \\
-\vdots \\
-x_{{\rm rot},n}
-\end{bmatrix}
-= x_{\rm rot}
-\end{align}</math>
-在我们的实验中，得到了如下的<math>\textstyle \tilde{x}</math>的图形（取<math>\textstyle n=2, k=1</math>）：
-【二审】：PCA算法做的就是丢弃<math>\textstyle x_{\rm rot}</math>后面的（较小的）成分，将其取近似为零。具体的说，<math>\textstyle \tilde{x}</math>可以定义为，<math>\textstyle x_{{\rm rot}}</math>除了前<math>\textstyle k</math>个成分，其余全赋值为零的近似表示。换句话说：
-:<math>\begin{align}
-\tilde{x} =
-\begin{bmatrix}
-x_{{\rm rot},1} \\
-\vdots \\
-x_{{\rm rot},k} \\
-\\
-\vdots \\
-\\
-\end{bmatrix}
-\approx
-\begin{bmatrix}
-x_{{\rm rot},1} \\
-\vdots \\
-x_{{\rm rot},k} \\
-x_{{\rm rot},k+1} \\
-\vdots \\
-x_{{\rm rot},n}
-\end{bmatrix}
-= x_{\rm rot}
-\end{align}</math>
-在我们的示例中，可得<math>\textstyle \tilde{x}</math>的图形如下（取<math>\textstyle n=2, k=1</math>）：
 [[File:PCA-xtilde.png | 600px]]
+然而，由于上面 <math>\textstyle \tilde{x}</math> 的后<math>\textstyle n-k</math>项均为零，没必要把这些零项保留下来。所以，我们仅用前 <math>\textstyle n-k</math> 个（非零）成分来定义 <math>\textstyle n-k</math> 维向量 <math>\textstyle \tilde{x}</math> 。
-【原文】：However, since the final <math>\textstyle n-k</math> components of <math>\textstyle \tilde{x}</math> as defined above would
+这也解释了我们为什么会以 <math>\textstyle u_1, u_2, \ldots, u_n</math> 为基来表示数据：要决定保留哪些成分变得很简单，只需取前 <math>\textstyle k</math> 个成分即可。这时也可以说，我们“保留了前 <math>\textstyle k</math> 个PCA（主）成分”。
-always be zero, there is no need to keep these zeros around, and so we
-define <math>\textstyle \tilde{x}</math> as a <math>\textstyle k</math>-dimensional vector with just the first <math>\textstyle k</math> (non-zero) components.
-【初译】：然而，由于上面<math>\textstyle \tilde{x}</math>的后<math>\textstyle n-k</math>项都归为了零，所以也就没必要把这些零项保留下来。因此，我们仅用前<math>\textstyle k</math>个成分来定义<math>\textstyle k</math>维向量的<math>\textstyle \tilde{x}</math>。
-【一审】：然而，由于上面<math>\textstyle \tilde{x}</math>的后<math>\textstyle n-k</math>项都归为了零，所以也就没必要把这些零项保留下来。因此，我们仅用前<math>\textstyle k</math>个成分来定义<math>\textstyle k</math>维向量的<math>\textstyle \tilde{x}</math>。
-【二审】：然而，由于上面<math>\textstyle \tilde{x}</math>的后<math>\textstyle n-k</math>项均为零，所以没必要把这些零项保留下来。因此，我们仅用前<math>\textstyle k</math>个（非零）成分来定义<math>\textstyle k</math>维向量<math>\textstyle \tilde{x}</math>。
-【原文】：This also explains why we wanted to express our data in the <math>\textstyle u_1, u_2, \ldots, u_n</math> basis:
-Deciding which components to keep becomes just keeping the top <math>\textstyle k</math> components.  When we
-do this, we also say that we are "retaining the top <math>\textstyle k</math> PCA (or principal) components."
-【初译】：这也解释了为什么会以<math>\textstyle u_1, u_2, \ldots, u_n</math>为基来表示我们的数据：任务从决定哪些成分需要保留变成了只需简单选取<math>\textstyle k</math>个成分。这种运算我们也可以描述为“我们保留了<math>\textstyle k</math>个PCA（主）成分”。
-【一审】：这也解释了为什么会以<math>\textstyle u_1, u_2, \ldots, u_n</math>为基来表示我们的数据：从决定哪些成分需要保留演变成了只需简单选取<math>\textstyle k</math>个关键成分。这种运算我们也可以描述为“我们保留了<math>\textstyle k</math>个PCA（主）成分”。
-【二审】：这也解释了我们为什么会以<math>\textstyle u_1, u_2, \ldots, u_n</math>为基来表示数据：从决定哪些成分需要保留，变成只需简单选取前<math>\textstyle k</math>个成分。当我们这么做的时候，可以描述为我们“保留了前<math>\textstyle k</math>个PCA（主）成分”。
-== Recovering an Approximation of the Data 数据还原 ==
-【原文】：Now, <math>\textstyle \tilde{x} \in \Re^k</math> is a lower-dimensional, "compressed" representation
-of the original <math>\textstyle x \in \Re^n</math>.  Given <math>\textstyle \tilde{x}</math>, how can we recover an approximation <math>\textstyle \hat{x}</math> to
-the original value of <math>\textstyle x</math>?  From an [[#Rotating the Data|earlier section]], we know that <math>\textstyle x = U x_{\rm rot}</math>.  Further,
-we can think of <math>\textstyle \tilde{x}</math> as an approximation to <math>\textstyle x_{\rm rot}</math>, where we have
-set the last <math>\textstyle n-k</math> components to zeros.  Thus, given <math>\textstyle \tilde{x} \in \Re^k</math>, we can
-pad it out with <math>\textstyle n-k</math> zeros to get our approximation to <math>\textstyle x_{\rm rot} \in \Re^n</math>.  Finally, we pre-multiply
-by <math>\textstyle U</math> to get our approximation to <math>\textstyle x</math>.  Concretely, we get
-【初译】：经过上一步骤，我们得到了对应原始数据<math>\textstyle x \in \Re^n</math>的低维“压缩”表征量<math>\textstyle \tilde{x} \in \Re^k</math>，反过来，如果给定<math>\textstyle \tilde{x}</math>，我们应如何尽可能地还原原始数据<math>\textstyle \hat{x}</math>呢？由本章第三节可知<math>\textstyle x = U x_{\rm rot}</math>，且<math>\textstyle \tilde{x}</math>可以看作<math>\textstyle x_{\rm rot}</math>的近似值，因为<math>\textstyle \tilde{x}</math>只是将<math>\textstyle x_{\rm rot}</math>最后的<math>\textstyle n-k</math>个元素用0代替并省略而得到，因此如果给定<math>\textstyle \tilde{x} \in \Re^k</math>，可以通过在其末尾添加<math>\textstyle n-k</math>个0来得到<math>\textstyle x_{\rm rot} \in \Re^n</math>的近似，接着用<math>\textstyle U</math>左乘该<math>\textstyle x_{\rm rot}</math>近似值便可得到对原数据<math>\textstyle x</math>的近似还原。具体来说，我们需进行如下计算：
-【一审】：现在，<math>\textstyle \tilde{x} \in \Re^k</math>成为<math>\textstyle x \in \Re^n</math>在低维空间的一个“压缩”表征。 反过来，如果给定<math>\textstyle \tilde{x}</math>，我们应如何尽可能地还原原始数据<math>\textstyle \hat{x}</math>呢？由本章第三节可知<math>\textstyle x = U x_{\rm rot}</math>，且<math>\textstyle \tilde{x}</math>可以看作<math>\textstyle x_{\rm rot}</math>的近似值，因为<math>\textstyle \tilde{x}</math>只是将<math>\textstyle x_{\rm rot}</math>最后的<math>\textstyle n-k</math>个元素用0代替并省略而得到，因此如果给定<math>\textstyle \tilde{x} \in \Re^k</math>，可以通过在其末尾添加<math>\textstyle n-k</math>个0来得到<math>\textstyle x_{\rm rot} \in \Re^n</math>的近似，接着用<math>\textstyle U</math>左乘该<math>\textstyle x_{\rm rot}</math>近似值便可得到对原数据<math>\textstyle x</math>的近似还原。具体来说，我们需进行如下计算：
-【二审】：现在，我们得到了原始数据<math>\textstyle x \in \Re^n</math>的低维“压缩”表征量<math>\textstyle \tilde{x} \in \Re^k</math>， 反过来，如果给定<math>\textstyle \tilde{x}</math>，我们应如何还原原始数据<math>\textstyle \hat{x}</math>呢？查看以往章节可知<math>\textstyle x = U x_{\rm rot}</math>，且可把<math>\textstyle \tilde{x}</math>看作将<math>\textstyle x_{\rm rot}</math>的最后<math>\textstyle n-k</math>个元素置0所得的近似值，因此如果给定<math>\textstyle \tilde{x} \in \Re^k</math>，可以通过在其末尾添加<math>\textstyle n-k</math>个0来得到<math>\textstyle x_{\rm rot} \in \Re^n</math>的近似，最后，左乘<math>\textstyle U</math>便可得原数据<math>\textstyle x</math>的近似。具体来说，计算如下：
+== 还原近似数据 ==
+现在，我们得到了原始数据 <math>\textstyle x \in \Re^n</math> 的低维“压缩”表征量 <math>\textstyle \tilde{x} \in \Re^k</math> ， 反过来，如果给定 <math>\textstyle \tilde{x}</math> ，我们应如何还原原始数据 <math>\textstyle x</math> 呢？查看[[#Rotating the Data|以往章节]]以往章节可知，要转换回来，只需 <math>\textstyle x = U x_{\rm rot}</math> 即可。进一步，我们把 <math>\textstyle \tilde{x}</math> 看作将 <math>\textstyle x_{\rm rot}</math> 的最后 <math>\textstyle n-k</math> 个元素被置0所得的近似表示，因此如果给定 <math>\textstyle \tilde{x} \in \Re^k</math> ，可以通过在其末尾添加 <math>\textstyle n-k</math> 个0来得到对 <math>\textstyle x_{\rm rot} \in \Re^n</math> 的近似，最后，左乘 <math>\textstyle U</math> 便可近似还原出原数据 。具体来说，计算如下：
 :<math>\begin{align}