Gradient checking and advanced optimization

From Ufldl

Jump to: navigation, search

@@ Line 1: / Line 1: @@
 Backpropagation is a notoriously difficult algorithm to debug and get right,
-especially since many subtly buggy implementations of it---for example, one
+especially since many subtly buggy implementations of it&mdash;for example, one
 that has an off-by-one error in the indices and that thus only trains some of
-the layers of weights, or an implementation that omits the bias term---will
+the layers of weights, or an implementation that omits the bias term&mdash;will
 manage to learn something that can look surprisingly reasonable
 (while performing less well than a correct implementation).  Thus, even with a
@@ Line 32: / Line 32: @@
 \frac{J(\theta+{\rm EPSILON}) - J(\theta-{\rm EPSILON})}{2 \times {\rm EPSILON}}
 \end{align}</math>
-In practice, we set {\rm EPSILON} to a small constant, say around <math>\textstyle 10^{-4}</math>.
+In practice, we set <math>{\rm EPSILON}</math> to a small constant, say around <math>\textstyle 10^{-4}</math>.
-(There's a large range of values of {\rm EPSILON} that should work well, but
+(There's a large range of values of <math>{\rm EPSILON}</math> that should work well, but
-we don't set {\rm EPSILON} to be "extremely" small, say <math>\textstyle 10^{-20}</math>,
+we don't set <math>{\rm EPSILON}</math> to be "extremely" small, say <math>\textstyle 10^{-20}</math>,
 as that would lead to numerical roundoff errors.)
@@ Line 68: / Line 68: @@
 and "0"s everywhere else).  So,
 <math>\textstyle \theta^{(i+)}</math> is the same as <math>\textstyle \theta</math>, except its <math>\textstyle i</math>-th element has been incremented
-by {\rm EPSILON}.  Similarly, let <math>\textstyle \theta^{(i-)} = \theta - {\rm EPSILON} \times \vec{e}_i</math> be the
+by <math>{\rm EPSILON}</math>.  Similarly, let <math>\textstyle \theta^{(i-)} = \theta - {\rm EPSILON} \times \vec{e}_i</math> be the
-corresponding vector with the <math>\textstyle i</math>-th element decreased by {\rm EPSILON}.
+corresponding vector with the <math>\textstyle i</math>-th element decreased by <math>{\rm EPSILON}</math>.
 We can now numerically verify <math>\textstyle g_i(\theta)</math>'s correctness by checking, for each <math>\textstyle i</math>,
 that:
@@ Line 84: / Line 84: @@
 \nabla_{b^{(l)}} J(W,b) &= \frac{1}{m} \Delta b^{(l)}.
 \end{align}</math>
-This result shows that the final block of psuedo-code in Section~\ref{sec-backprop} is indeed
+This result shows that the final block of psuedo-code in [[Backpropagation Algorithm]] is indeed
 implementing gradient descent.
 To make sure your implementation of gradient descent is correct, it is
@@ Line 108: / Line 108: @@
 to automatically search for a value of <math>\textstyle \theta</math> that minimizes <math>\textstyle J(\theta)</math>.  Algorithms
 such as L-BFGS and conjugate gradient can often be much faster than gradient descent.
+{{Sparse_Autoencoder}}
+{{Languages|梯度检验与高级优化|中文}}

Gradient checking and advanced optimization

From Ufldl

Latest revision as of 12:40, 7 April 2013

Views

Personal tools

ufldl resources

wiki

Search

Toolbox