自编码算法与稀疏性

初译： 新浪微博，@上篮高手要抓紧时间少上微博 http://www.weibo.com/gapbridger

一审： 新浪微博，@达博西 http://www.weibo.com/mercivi

二审： 新浪微博，@大黄蜂的思索 http://weibo.com/u/1733291480

wiki上传： 新浪微博，@上篮高手要抓紧时间少上微博 http://www.weibo.com/gapbridger

【原文】

So far, we have described the application of neural networks to supervised learning, in which we have labeled
training examples.  Now suppose we have only a set of unlabeled training examples <math>\textstyle \{x^{(1)}, x^{(2)}, x^{(3)}, \ldots\}</math>,
where <math>\textstyle x^{(i)} \in \Re^{n}</math>.  An
'''autoencoder''' neural network is an unsupervised learning algorithm that applies backpropagation,
setting the target values to be equal to the inputs.  I.e., it uses <math>\textstyle y^{(i)} = x^{(i)}</math>.

Here is an autoencoder:

【初译】

目前为止，我们已经讨论了神经网络在监督学习中的应用。在监督学习中，训练样本是有类别标签的。现在假设我们只有一个没有类别标签的训练样本集合 <math>\textstyle \{x^{(1)}, x^{(2)}, x^{(3)}, \ldots\}</math> ，其中 <math>\textstyle x^{(i)} \in \Re^{n}</math> 。一个自编码神经网络是一种非监督学习算法，它使用了反向传播算法，并将目标值设为输入值，比如 <math>\textstyle y^{(i)} = x^{(i)}</math> 。下图是一个自编码神经网络的示例。

【一审】

目前为止，我们已经讨论了神经网络在监督学习中的应用。在监督学习中，训练样本是有类别标签的。现在假设我们只有一个没有类别标签的训练样本集合 <math>\textstyle \{x^{(1)}, x^{(2)}, x^{(3)}, \ldots\}</math> ，其中 <math>\textstyle x^{(i)} \in \Re^{n}</math> 。一个自编码神经网络是一种非监督学习算法，它使用了反向传播算法，并将目标值设为输入值，比如 <math>\textstyle y^{(i)} = x^{(i)}</math> 。下图是一个自编码神经网络的示例。

【二审】

目前为止，我们已经讨论了神经网络在有监督学习中的应用。在有监督学习中，训练样本是有类别标签的。现在假设我们只有一个没有带类别标签的训练样本集合 <math>\textstyle \{x^{(1)}, x^{(2)}, x^{(3)}, \ldots\}</math> ，其中 <math>\textstyle x^{(i)} \in \Re^{n}</math> 。自编码神经网络是一种无监督学习算法，它使用了反向传播算法，并让目标值等于输入值，比如 <math>\textstyle y^{(i)} = x^{(i)}</math> 。下图是一个自编码神经网络的示例。

[[Image:Autoencoder636.png|400px|center]]

【原文】

The autoencoder tries to learn a function <math>\textstyle h_{W,b}(x) \approx x</math>.  In other
words, it is trying to learn an approximation to the identity function, so as
to output <math>\textstyle \hat{x}</math> that is similar to <math>\textstyle x</math>.  The identity function seems a
particularly trivial function to be trying to learn; but by placing constraints
on the network, such as by limiting the number of hidden units, we can discover
interesting structure about the data.  As a concrete example, suppose the
inputs <math>\textstyle x</math> are the pixel intensity values from a <math>\textstyle 10 \times 10</math> image (100
pixels) so <math>\textstyle n=100</math>, and there are <math>\textstyle s_2=50</math> hidden units in layer <math>\textstyle L_2</math>.  Note that
we also have <math>\textstyle y \in \Re^{100}</math>.  Since there are only 50 hidden units, the
network is forced to learn a ''compressed'' representation of the input.
I.e., given only the vector of hidden unit activations <math>\textstyle a^{(2)} \in \Re^{50}</math>,
it must try to '''reconstruct''' the 100-pixel input <math>\textstyle x</math>.  If the input were completely
random---say, each <math>\textstyle x_i</math> comes from an IID Gaussian independent of the other
features---then this compression task would be very difficult.  But if there is
structure in the data, for example, if some of the input features are correlated,
then this algorithm will be able to discover some of those correlations. In fact,
this simple autoencoder often ends up learning a low-dimensional representation very similar
to PCAs.

【初译】

自编码神经网络尝试学习一个 <math>\textstyle h_{W,b}(x) \approx x</math> 的函数。换句话说，它尝试逼近一个单位函数，从而使得输出 <math>\textstyle \hat{x}</math> 接近于输入 <math>\textstyle x</math> 。单位函数虽然看起来非常容易学习，但是当我们为自编码神经网络加入某些限制，比如限定隐藏神经元的数量，我们就可以从输入数据中发现一些有趣的结构。举例来说，假设某个自编码神经网络的输入 <math>\textstyle x</math> 是一张 <math>\textstyle 10 \times 10</math> 图像的像素值，于是 <math>\textstyle n=100</math> ，其隐层 <math>\textstyle L_2</math> 中有 <math>\textstyle s_2=50</math> 个隐藏神经元 。注意，输出是100维的 <math>\textstyle y \in \Re^{100}</math> 。由于只有50个隐藏神经元，我们迫使自编码神经网络去学习输入数据的'''压缩'''表示，因为它需要从50维的隐藏神经元激活度向量 <math>\textstyle a^{(2)} \in \Re^{50}</math> 中'''重构'''出100维的像素值输入 <math>\textstyle x</math> 。如果网络的输入数据是完全随机的，比如每一个输入 <math>\textstyle x_i</math> 都是一个跟其它特征完全无关的独立同分布高斯随机变量，那么这一压缩表示将会非常难学习。但是如果输入数据中隐含着一些特定的结构，比如某些输入特征是相关的，那么这一算法就可以发现输入数据中的这些相关性。事实上，这一简单的自编码神经网络通常可以学习出一个跟主元分析（PCA）结果非常相似的输入数据的低维表示。

【一审】

自编码算法要做的是学习得到一个函数 <math>\textstyle h_{W,b}(x) \approx x</math> 。换句话说，就是要为这个恒等函式学习找到一个近似值，从而使得输出 <math>\textstyle \hat{x}</math> 接近于输入 <math>\textstyle x</math> 。虽然学习这个恒等函式看起来是非常繁琐的事，但是通过对这个神经网络加入某些限制，比如限定隐藏层神经元的数量，我们就可以从输入数据中发现一些有趣的结构。举例来说，假设某个自编码神经网络的输入 <math>\textstyle x</math> 是一张分辨率为 <math>\textstyle 10 \times 10</math> 的图像（100个像素点），于是 <math>\textstyle n=100</math> ，其隐藏层 <math>\textstyle L_2</math> 中有50个隐藏神经元 ，注意 。由于只有50个隐藏神经元，就迫使神经网络要为输入值学习获取一个'''经过压缩的'''表示方式。也就是说，给定一个隐藏层的激活向量 <math>\textstyle a^{(2)} \in \Re^{50}</math> ，必须对有100个像素值的输入 <math>\textstyle x</math> 进行'''重构'''。如果输入是随机的――即，每个 <math>\textstyle x_i</math> 都彼此独立，并服从独立同高斯分布――那么这种压缩工作就会变得异常困难。但是，如果样本数据内部存在某种相关结构，比如，如果输入数据中某些特征变量是相关的，那么这种算法就可以找出这些相关关系。事实上，这种基本的自编码算法通常就能学习得到一个低维度的数据表现方式，它跟主成分分析方法很像。

【二审】

自编码神经网络尝试学习一个 <math>\textstyle h_{W,b}(x) \approx x</math> 的函数。换句话说，它尝试逼近一个恒等函数，从而使得输出 <math>\textstyle \hat{x}</math> 接近于输入 <math>\textstyle x</math> 。恒等函数虽然看上去不太有学习的意义，但是当我们为自编码神经网络加入某些限制，比如限定隐藏神经元的数量，我们就可以从输入数据中发现一些有趣的结构。举例来说，假设某个自编码神经网络的输入 <math>\textstyle x</math> 是一张 <math>\textstyle 10 \times 10</math> 图像（共100个像素）的像素灰度值，于是 <math>\textstyle n=100</math> ，其隐藏层 <math>\textstyle L_2</math> 中有50个隐藏神经元。注意，输出也是100维的 <math>\textstyle y \in \Re^{100}</math> 。由于只有50个隐藏神经元，我们迫使自编码神经网络去学习输入数据的'''压缩'''表示，也就是说，它必须从50维的隐藏神经元激活度向量 <math>\textstyle a^{(2)} \in \Re^{50}</math> 中'''重构'''出100维的像素灰度值输入 <math>\textstyle x</math> 。如果网络的输入数据是完全随机的，比如每一个输入 <math>\textstyle x_i</math> 都是一个跟其它特征完全无关的独立同分布高斯随机变量，那么这一压缩表示将会非常难学习。但是如果输入数据中隐含着一些特定的结构，比如某些输入特征是彼此相关的，那么这一算法就可以发现输入数据中的这些相关性。事实上，这一简单的自编码神经网络通常可以学习出一个跟主元分析（PCA）结果非常相似的输入数据的低维表示。