Backpropagation Algorithm

Revision as of 01:19, 22 April 2011 (view source)

Revision as of 20:59, 22 April 2011 (view source)

Line 25:

as reflected in our definition for <math>J(W, b)</math>. Applying weight decay

to the bias units usually makes only a small different to the final network,

-

however. If you ~~took~~ CS229, you may also recognize weight decay this as

+

however. If you've taken CS229 (Machine Learning) at Stanford or watched the course's videos

+

on YouTube, you may also recognize weight decay this as

essentially a variant of the Bayesian regularization method you saw there,

where we placed a Gaussian prior on the parameters and did MAP (instead of

maximum likelihood) estimation.]

-

The '''weight decay parameter''' <math>\lambda</math> controls the relative importance

Line 115:

the algorithm using matrix-vectorial notation.

We will use "<math>\textstyle \bullet</math>" to denote the element-wise product

-

operator (denoted ``{\tt .*}'' in Matlab or Octave, and also called the Hadamard product),

+

operator (denoted ``<tt>.*</tt>'' in Matlab or Octave, and also called the Hadamard product),

so

that if <math>\textstyle a = b \bullet c</math>, then <math>\textstyle a_i = b_ic_i</math>.

Revision as of 20:59, 22 April 2011

@@ Line 25: / Line 25: @@
 as reflected in our definition for <math>J(W, b)</math>.  Applying weight decay
 to the bias units usually makes only a small different to the final network,
-however.  If you took CS229, you may also recognize weight decay this as
+however.  If you've taken CS229 (Machine Learning) at Stanford or watched the course's videos
+on YouTube, you may also recognize weight decay this as
 essentially a variant of the Bayesian regularization method you saw there,
 where we placed a Gaussian prior on the parameters and did MAP (instead of
 maximum likelihood) estimation.]
 The '''weight decay parameter''' <math>\lambda</math> controls the relative importance
@@ Line 115: / Line 115: @@
 the algorithm using matrix-vectorial notation.
 We will use "<math>\textstyle \bullet</math>" to denote the element-wise product
-operator (denoted ``{\tt .*}'' in Matlab or Octave, and also called the Hadamard product),
+operator (denoted ``<tt>.*</tt>'' in Matlab or Octave, and also called the Hadamard product),
 so
 that if <math>\textstyle a = b \bullet c</math>, then <math>\textstyle a_i = b_ic_i</math>.