Gradient checking and advanced optimization
From Ufldl
Line 1: | Line 1: | ||
Backpropagation is a notoriously difficult algorithm to debug and get right, | Backpropagation is a notoriously difficult algorithm to debug and get right, | ||
- | especially since many subtly buggy implementations of it | + | especially since many subtly buggy implementations of it—for example, one |
that has an off-by-one error in the indices and that thus only trains some of | that has an off-by-one error in the indices and that thus only trains some of | ||
- | the layers of weights, or an implementation that omits the bias term | + | the layers of weights, or an implementation that omits the bias term—will |
manage to learn something that can look surprisingly reasonable | manage to learn something that can look surprisingly reasonable | ||
(while performing less well than a correct implementation). Thus, even with a | (while performing less well than a correct implementation). Thus, even with a | ||
Line 108: | Line 108: | ||
to automatically search for a value of <math>\textstyle \theta</math> that minimizes <math>\textstyle J(\theta)</math>. Algorithms | to automatically search for a value of <math>\textstyle \theta</math> that minimizes <math>\textstyle J(\theta)</math>. Algorithms | ||
such as L-BFGS and conjugate gradient can often be much faster than gradient descent. | such as L-BFGS and conjugate gradient can often be much faster than gradient descent. | ||
+ | |||
+ | |||
+ | {{Sparse_Autoencoder}} | ||
+ | |||
+ | |||
+ | {{Languages|梯度检验与高级优化|中文}} |