稀疏编码自编码表达

From Ufldl

Jump to: navigation, search
(Sparse coding in practice)
(Topographic sparse coding)
Line 177: Line 177:
[原文]
[原文]
-
== Topographic sparse coding ==
+
== Topographic sparse coding[拓扑稀疏编码] ==
With sparse coding, we can learn a set of features useful for representing the data. However, drawing inspiration from the brain, we would like to learn a set of features that are "orderly" in some manner. For instance, consider visual features. As suggested earlier, the V1 cortex of the brain contains neurons which detect edges at particular orientations. However, these neurons are also organized into hypercolumns in which adjacent neurons detect edges at similar orientations. One neuron could detect a horizontal edge, its neighbors edges oriented slightly off the horizontal, and moving further along the hypercolumn, the neurons detect edges oriented further off the horizontal.  
With sparse coding, we can learn a set of features useful for representing the data. However, drawing inspiration from the brain, we would like to learn a set of features that are "orderly" in some manner. For instance, consider visual features. As suggested earlier, the V1 cortex of the brain contains neurons which detect edges at particular orientations. However, these neurons are also organized into hypercolumns in which adjacent neurons detect edges at similar orientations. One neuron could detect a horizontal edge, its neighbors edges oriented slightly off the horizontal, and moving further along the hypercolumn, the neurons detect edges oriented further off the horizontal.  
Line 183: Line 183:
[初译]
[初译]
 +
拓扑稀疏编码
 +
 +
通过稀疏编码,能够得到很多用于表示数据的特征。而且根据从大脑结构中获得的灵感,能够学习到一系列以某种方式“有序”组合在一起的特征。以视觉特征为例。如前面所提到的,V1区的大脑皮层神经元能够在特定方向进行边缘检测。同时,相邻的神经元能够在相同的方向进行边缘检测,因此它们被组织成超柱(hypercolumns)。单个神经元仅可以检测水平边缘,其相邻边缘稍微偏离检测到的水平边缘,但是由相邻神经元组成的超柱(hypercolumns)检测的边缘会极大的偏离单个神经元检测到的水平值。
[一审]
[一审]
-
[原文]
+
通过稀疏编码,能够得到一组用于表示数据的特征。而且根据从大脑结构中获得的灵感,我们希望学习到一组以某种方式“有序”组合在一起的特征。以视觉特征为例。如前面所提到的,大脑皮层V1区神经元能够在特定方向进行边缘检测。同时,这些神经元(在生理上)被组织成超柱(hypercolumns),使得相邻神经元完成相似方向的边缘检测。一个神经元可以检测水平边缘,其相邻边缘稍微偏离水平,并沿着超柱移动,那些神经元就可以检测方向与水平方向相差甚远的边缘了。
 +
[原文]
Inspired by this example, we would like to learn features which are similarly "topographically ordered". What does this imply for our learned features? Intuitively, if "adjacent" features are "similar", we would expect that if one feature is activated, its neighbors will also be activated to a lesser extent.  
Inspired by this example, we would like to learn features which are similarly "topographically ordered". What does this imply for our learned features? Intuitively, if "adjacent" features are "similar", we would expect that if one feature is activated, its neighbors will also be activated to a lesser extent.  
Line 193: Line 197:
[初译]
[初译]
 +
根据该例子的启发,我们对“拓扑有序”的特征感兴趣。这意味着我们要学习的特征是什么呢?直观的讲,如果“相邻”的特征是“相似”的,就意味着如果某个特征被激活,那么与之相邻的特征在较小的程度上被激活。
[一审]
[一审]
-
[原文]
+
受该例子的启发,我们希望学到的特征具有这样“拓扑有序”的性质。这对于我们要学习的特征意味着什么呢?直观的讲,如果“相邻”的特征是“相似”的,就意味着如果某个特征被激活,那么与之相邻的特征在较小的程度上也会被激活。
 +
[原文]
Concretely, suppose we (arbitrarily) organized our features into a square matrix. We would then like adjacent features in the matrix to be similar. The way this is accomplished is to group these adjacent features together in the smoothed L1 penalty, so that instead of say <math>\sqrt{s_{1,1}^2 + \epsilon}</math>, we use say <math>\sqrt{s_{1,1}^2 + s_{1,2}^2 + s_{1,3}^2 + s_{2,1}^2 + s_{2,2}^2 + s_{3,2}^2 + s_{3,1}^2 + s_{3,2}^2 + s_{3,3}^2 + \epsilon}</math> instead, if we group in 3x3 regions. The grouping is usually overlapping, so that the 3x3 region starting at the 1st row and 1st column is one group, the 3x3 region starting at the 1st row and 2nd column is another group, and so on. Further, the grouping is also usually done wrapping around, as if the matrix were a torus, so that every feature is counted an equal number of times.
Concretely, suppose we (arbitrarily) organized our features into a square matrix. We would then like adjacent features in the matrix to be similar. The way this is accomplished is to group these adjacent features together in the smoothed L1 penalty, so that instead of say <math>\sqrt{s_{1,1}^2 + \epsilon}</math>, we use say <math>\sqrt{s_{1,1}^2 + s_{1,2}^2 + s_{1,3}^2 + s_{2,1}^2 + s_{2,2}^2 + s_{3,2}^2 + s_{3,1}^2 + s_{3,2}^2 + s_{3,3}^2 + \epsilon}</math> instead, if we group in 3x3 regions. The grouping is usually overlapping, so that the 3x3 region starting at the 1st row and 1st column is one group, the 3x3 region starting at the 1st row and 2nd column is another group, and so on. Further, the grouping is also usually done wrapping around, as if the matrix were a torus, so that every feature is counted an equal number of times.
Line 205: Line 211:
[初译]
[初译]
 +
具体而言,假设我们(随意地)将特征组织成一个方阵。矩阵中相邻特征是相似的。实现这一点的方法是将相邻特征按经过平滑的L1范数惩罚值进行分组,如果在3x3的区域内分组,则用 <math>\sqrt{s_{1,1}^2 + \epsilon}</math>, we use say <math>\sqrt{s_{1,1}^2 + s_{1,2}^2 + s_{1,3}^2 + s_{2,1}^2 + s_{2,2}^2 + s_{3,2}^2 + s_{3,1}^2 + s_{3,2}^2 + s_{3,3}^2 + \epsilon}</math> 代替 。其分组通常是重合的,因此从第1行第1列开始的3x3区域是一个分组,从第1行第2列开始的区域是另一个分组,以此类推。另外,因为矩阵是环形的,分组也通常是环绕进行,所以每个特征的计数是相等的。
 +
 +
用所有分组中经过平滑的L1惩罚值之和代替经过平滑的L1惩罚值,得到新的目标函数如下:
[一审]
[一审]
-
[原文]
+
具体而言,假设我们(随意地)将特征组织成一个方阵。我们就希望矩阵中相邻特征是相似的。实现这一点的方法是将相邻特征按经过平滑的L1范数惩罚值进行分组,如果在3x3的区域内分组,则用 <math>\sqrt{s_{1,1}^2 + \epsilon}</math>, we use say <math>\sqrt{s_{1,1}^2 + s_{1,2}^2 + s_{1,3}^2 + s_{2,1}^2 + s_{2,2}^2 + s_{3,2}^2 + s_{3,1}^2 + s_{3,2}^2 + s_{3,3}^2 + \epsilon}</math> 代替 。其分组通常是重合的,因此从第1行第1列开始的3x3区域是一个分组,从第1行第2列开始的区域是另一个分组,以此类推。另外,把矩阵当作围成的环形一样,分组也通常是环绕进行,所以每个特征的计数是相等的。
 +
用所有分组中经过平滑的L1惩罚值之和代替经过平滑的L1惩罚值,得到新的目标函数如下:
 +
 +
[原文]
:<math>
:<math>
J(A, s) = \lVert As - x \rVert_2^2 + \lambda \sum_{\text{all groups } g}{\sqrt{ \left( \sum_{\text{all } s \in g}{s^2} \right) + \epsilon} } + \gamma \lVert A \rVert_2^2
J(A, s) = \lVert As - x \rVert_2^2 + \lambda \sum_{\text{all groups } g}{\sqrt{ \left( \sum_{\text{all } s \in g}{s^2} \right) + \epsilon} } + \gamma \lVert A \rVert_2^2
</math>
</math>
-
 
-
[初译]
 
-
 
-
 
-
[一审]
 
[原文]
[原文]
-
 
In practice, the "grouping" can be accomplished using a "grouping matrix" <math>V</math>, such that the <math>r</math>th row of <math>V</math> indicates which features are grouped in the <math>r</math>th group, so <math>V_{r, c} = 1</math> if group <math>r</math> contains feature <math>c</math>. Thinking of the grouping as being achieved by a grouping matrix makes the computation of the gradients more intuitive. Using this grouping matrix, the objective function can be rewritten as:
In practice, the "grouping" can be accomplished using a "grouping matrix" <math>V</math>, such that the <math>r</math>th row of <math>V</math> indicates which features are grouped in the <math>r</math>th group, so <math>V_{r, c} = 1</math> if group <math>r</math> contains feature <math>c</math>. Thinking of the grouping as being achieved by a grouping matrix makes the computation of the gradients more intuitive. Using this grouping matrix, the objective function can be rewritten as:
Line 227: Line 233:
[初译]
[初译]
 +
实际上,“分组”可以通过“分组矩阵”<math>V</math>完成,例如矩阵<math>V</math>的第<math>r</math>列表示特征被分到第<math>r</math>组中,即如果第<math>r</math>组包含特征<math>c</math>则<math>V_{r, c} = 1</math>。通过分组矩阵实现分组使得梯度的计算更加直观。使用此分组矩阵,目标函数被重写为:
[一审]
[一审]
-
[原文]
+
实际上,“分组”可以通过“分组矩阵”<math>V</math>完成,例如矩阵<math>V</math>的第<math>r</math>列记录哪些特征被分到第<math>r</math>组中,即如果第<math>r</math>组包含特征<math>c</math>则<math>V_{r, c} = 1</math>。通过分组矩阵实现分组使得梯度的计算更加直观。使用此分组矩阵,目标函数被重写为:
 +
[原文]
:<math>
:<math>
Line 241: Line 249:
[初译]
[初译]
 +
(令<math>D = \sqrt{Vss^T + \epsilon}</math>,<math>\sum{ \sqrt{Vss^T + \epsilon} }</math> 等价于 <math>\sum_r{ \sum_c { D_{r, c} } } </math>)
[一审]
[一审]
 +
 +
(令<math>D = \sqrt{Vss^T + \epsilon}</math>,<math>\sum{ \sqrt{Vss^T + \epsilon} }</math> 等价于 <math>\sum_r{ \sum_c { D_{r, c} } } </math>)
[原文]
[原文]
-
 
This objective function can be optimized using the iterated method described in the earlier section. Topographic sparse coding will learn features similar to those learned by sparse coding, except that the features will now be "ordered" in some way.
This objective function can be optimized using the iterated method described in the earlier section. Topographic sparse coding will learn features similar to those learned by sparse coding, except that the features will now be "ordered" in some way.
Line 251: Line 261:
[初译]
[初译]
 +
该目标函数能够使用之前部分提到的迭代方法进行求解。拓扑稀疏编码得到的特征与稀疏编码得到的类似,但是拓扑稀疏编码得到的特征是以某种方式“有序”的。
[一审]
[一审]
 +
 +
该目标函数能够使用之前部分提到的迭代方法进行求解。拓扑稀疏编码得到的特征与稀疏编码得到的类似,但是拓扑稀疏编码得到的特征是以某种方式“有序”的。
[原文]
[原文]

Revision as of 07:38, 8 March 2013

Personal tools