Gradient checking and advanced optimization
From Ufldl
Line 32: | Line 32: | ||
\frac{J(\theta+{\rm EPSILON}) - J(\theta-{\rm EPSILON})}{2 \times {\rm EPSILON}} | \frac{J(\theta+{\rm EPSILON}) - J(\theta-{\rm EPSILON})}{2 \times {\rm EPSILON}} | ||
\end{align}</math> | \end{align}</math> | ||
- | In practice, we set {\rm EPSILON} to a small constant, say around <math>\textstyle 10^{-4}</math>. | + | In practice, we set <math>{\rm EPSILON}</math> to a small constant, say around <math>\textstyle 10^{-4}</math>. |
- | (There's a large range of values of {\rm EPSILON} that should work well, but | + | (There's a large range of values of <math>{\rm EPSILON}</math> that should work well, but |
- | we don't set {\rm EPSILON} to be "extremely" small, say <math>\textstyle 10^{-20}</math>, | + | we don't set <math>{\rm EPSILON}</math> to be "extremely" small, say <math>\textstyle 10^{-20}</math>, |
as that would lead to numerical roundoff errors.) | as that would lead to numerical roundoff errors.) | ||
Line 68: | Line 68: | ||
and "0"s everywhere else). So, | and "0"s everywhere else). So, | ||
<math>\textstyle \theta^{(i+)}</math> is the same as <math>\textstyle \theta</math>, except its <math>\textstyle i</math>-th element has been incremented | <math>\textstyle \theta^{(i+)}</math> is the same as <math>\textstyle \theta</math>, except its <math>\textstyle i</math>-th element has been incremented | ||
- | by {\rm EPSILON}. Similarly, let <math>\textstyle \theta^{(i-)} = \theta - {\rm EPSILON} \times \vec{e}_i</math> be the | + | by <math>{\rm EPSILON}</math>. Similarly, let <math>\textstyle \theta^{(i-)} = \theta - {\rm EPSILON} \times \vec{e}_i</math> be the |
- | corresponding vector with the <math>\textstyle i</math>-th element decreased by {\rm EPSILON}. | + | corresponding vector with the <math>\textstyle i</math>-th element decreased by <math>{\rm EPSILON}</math>. |
We can now numerically verify <math>\textstyle g_i(\theta)</math>'s correctness by checking, for each <math>\textstyle i</math>, | We can now numerically verify <math>\textstyle g_i(\theta)</math>'s correctness by checking, for each <math>\textstyle i</math>, | ||
that: | that: | ||
Line 84: | Line 84: | ||
\nabla_{b^{(l)}} J(W,b) &= \frac{1}{m} \Delta b^{(l)}. | \nabla_{b^{(l)}} J(W,b) &= \frac{1}{m} \Delta b^{(l)}. | ||
\end{align}</math> | \end{align}</math> | ||
- | This result shows that the final block of psuedo-code in | + | This result shows that the final block of psuedo-code in [[Backpropagation Algorithm]] is indeed |
implementing gradient descent. | implementing gradient descent. | ||
To make sure your implementation of gradient descent is correct, it is | To make sure your implementation of gradient descent is correct, it is |