Gradient checking and advanced optimization

From Ufldl

Jump to: navigation, search
(Created page with "Backpropagation is a notoriously difficult algorithm to debug and get right, especially since many subtly buggy implementations of it---for example, one that has an off-by-one er...")
m (Fixed quotation marks)
Line 34: Line 34:
In practice, we set {\rm EPSILON} to a small constant, say around <math>\textstyle 10^{-4}</math>.
In practice, we set {\rm EPSILON} to a small constant, say around <math>\textstyle 10^{-4}</math>.
(There's a large range of values of {\rm EPSILON} that should work well, but
(There's a large range of values of {\rm EPSILON} that should work well, but
-
we don't set {\rm EPSILON} to be ``extremely'' small, say <math>\textstyle 10^{-20}</math>,
+
we don't set {\rm EPSILON} to be "extremely" small, say <math>\textstyle 10^{-20}</math>,
as that would lead to numerical roundoff errors.)
as that would lead to numerical roundoff errors.)
Line 51: Line 51:
Now, consider the case where <math>\textstyle \theta \in \Re^n</math> is a vector rather than a single real
Now, consider the case where <math>\textstyle \theta \in \Re^n</math> is a vector rather than a single real
number (so that we have <math>\textstyle n</math> parameters that we want to learn), and <math>\textstyle J: \Re^n \mapsto \Re</math>.  In
number (so that we have <math>\textstyle n</math> parameters that we want to learn), and <math>\textstyle J: \Re^n \mapsto \Re</math>.  In
-
our neural network example we used ``<math>\textstyle J(W,b)</math>,'' but one can imagine ``unrolling''
+
our neural network example we used "<math>\textstyle J(W,b)</math>," but one can imagine "unrolling"
the parameters <math>\textstyle W,b</math> into a long vector <math>\textstyle \theta</math>.  We now generalize our derivative
the parameters <math>\textstyle W,b</math> into a long vector <math>\textstyle \theta</math>.  We now generalize our derivative
checking procedure to the case where <math>\textstyle \theta</math> may be a vector.
checking procedure to the case where <math>\textstyle \theta</math> may be a vector.
Line 65: Line 65:
\end{align}</math>
\end{align}</math>
is the <math>\textstyle i</math>-th basis vector (a
is the <math>\textstyle i</math>-th basis vector (a
-
vector of the same dimension as <math>\textstyle \theta</math>, with a ``1'' in the <math>\textstyle i</math>-th position
+
vector of the same dimension as <math>\textstyle \theta</math>, with a "1" in the <math>\textstyle i</math>-th position
-
and ``0''s everywhere else).  So,
+
and "0"s everywhere else).  So,
<math>\textstyle \theta^{(i+)}</math> is the same as <math>\textstyle \theta</math>, except its <math>\textstyle i</math>-th element has been incremented
<math>\textstyle \theta^{(i+)}</math> is the same as <math>\textstyle \theta</math>, except its <math>\textstyle i</math>-th element has been incremented
by {\rm EPSILON}.  Similarly, let <math>\textstyle \theta^{(i-)} = \theta - {\rm EPSILON} \times \vec{e}_i</math> be the
by {\rm EPSILON}.  Similarly, let <math>\textstyle \theta^{(i-)} = \theta - {\rm EPSILON} \times \vec{e}_i</math> be the

Revision as of 01:20, 22 April 2011

Personal tools