稀疏编码自编码表达

From Ufldl

Jump to: navigation, search
(Batching examples into mini-batches)
Line 299: Line 299:
[原文]
[原文]
-
=== Batching examples into mini-batches ===
+
=== Batching examples into mini-batches [将样本处理为“迷你块”]===
If you try running the simple iterative algorithm on a large dataset of say 10 000 patches at one go, you will find that each iteration takes a long time, and the algorithm may hence take a long time to converge. To increase the rate of convergence, you can instead run the algorithm on mini-batches instead. To do this, instead of running the algorithm on all 10 000 patches, in each iteration, select a mini-batch - a (different) random subset of say 2000 patches from the 10 000 patches - and run the algorithm on that mini-batch for the iteration instead. This accomplishes two things - firstly, it speeds up each iteration, since now each iteration is operating on 2000 rather than 10 000 patches; secondly, and more importantly, it increases the rate of convergence [[(TODO]]: explain why).
If you try running the simple iterative algorithm on a large dataset of say 10 000 patches at one go, you will find that each iteration takes a long time, and the algorithm may hence take a long time to converge. To increase the rate of convergence, you can instead run the algorithm on mini-batches instead. To do this, instead of running the algorithm on all 10 000 patches, in each iteration, select a mini-batch - a (different) random subset of say 2000 patches from the 10 000 patches - and run the algorithm on that mini-batch for the iteration instead. This accomplishes two things - firstly, it speeds up each iteration, since now each iteration is operating on 2000 rather than 10 000 patches; secondly, and more importantly, it increases the rate of convergence [[(TODO]]: explain why).
Line 305: Line 305:
[初译]
[初译]
 +
如果在一个大规模数据集(例如,10 000 patches)上运行上面描述的简单迭代算法,每次迭代都会耗费大量时间,因此算法收敛速度极慢。为了提高收敛速度,可以选择在迷你块上运行该算法。每次迭代中,选择一个迷你块代替10 000 patches 的大数据集合,然后在迷你块上运行算法。此处的迷你块是从10 000 patches中随机选取2000 patches。这样做有两点好处,一是加速了每次迭代,因为此时算法是在2000 patches而非10 000 patches上运行;更重要的是,加速了收敛速度。
[一审]
[一审]
 +
 +
如果在一个大规模数据集(例如,10 000 patches)上运行上面描述的简单迭代算法,每次迭代都会耗费大量时间,因此算法收敛速度极慢。为了提高收敛速度,可以选择在迷你块上运行该算法。每次迭代中,选择一个迷你块代替10 000 patches 的大数据集合,然后在迷你块上运行算法。此处的迷你块是从10 000 patches中随机选取2000 patches。这样做有两点好处,一是加速了每次迭代,因为此时算法是在2000 patches而非10 000 patches上运行;更重要的是,加速了收敛速度。
[原文]
[原文]

Revision as of 07:26, 8 March 2013

Personal tools