Backpropagation Algorithm

From Ufldl

Jump to: navigation, search
Line 25: Line 25:
as reflected in our definition for <math>J(W, b)</math>.  Applying weight decay
as reflected in our definition for <math>J(W, b)</math>.  Applying weight decay
to the bias units usually makes only a small different to the final network,
to the bias units usually makes only a small different to the final network,
-
however.  If you took CS229, you may also recognize weight decay this as
+
however.  If you've taken CS229 (Machine Learning) at Stanford or watched the course's videos
 +
on YouTube, you may also recognize weight decay this as
essentially a variant of the Bayesian regularization method you saw there,
essentially a variant of the Bayesian regularization method you saw there,
where we placed a Gaussian prior on the parameters and did MAP (instead of  
where we placed a Gaussian prior on the parameters and did MAP (instead of  
maximum likelihood) estimation.]
maximum likelihood) estimation.]
-
 
The '''weight decay parameter''' <math>\lambda</math> controls the relative importance
The '''weight decay parameter''' <math>\lambda</math> controls the relative importance
Line 115: Line 115:
the algorithm using matrix-vectorial notation.
the algorithm using matrix-vectorial notation.
We will use "<math>\textstyle \bullet</math>" to denote the element-wise product
We will use "<math>\textstyle \bullet</math>" to denote the element-wise product
-
operator (denoted ``{\tt .*}'' in Matlab or Octave, and also called the Hadamard product),
+
operator (denoted ``<tt>.*</tt>'' in Matlab or Octave, and also called the Hadamard product),
so
so
that if <math>\textstyle a = b \bullet c</math>, then <math>\textstyle a_i = b_ic_i</math>.
that if <math>\textstyle a = b \bullet c</math>, then <math>\textstyle a_i = b_ic_i</math>.

Revision as of 20:59, 22 April 2011

Personal tools