# Backpropagation Algorithm

 Revision as of 01:19, 22 April 2011 (view source)Maiyifan (Talk | contribs)← Older edit Revision as of 20:59, 22 April 2011 (view source)Ang (Talk | contribs) Newer edit → Line 25: Line 25: as reflected in our definition for $J(W, b)$.  Applying weight decay as reflected in our definition for $J(W, b)$.  Applying weight decay to the bias units usually makes only a small different to the final network, to the bias units usually makes only a small different to the final network, - however.  If you took CS229, you may also recognize weight decay this as + however.  If you've taken CS229 (Machine Learning) at Stanford or watched the course's videos + on YouTube, you may also recognize weight decay this as essentially a variant of the Bayesian regularization method you saw there, essentially a variant of the Bayesian regularization method you saw there, where we placed a Gaussian prior on the parameters and did MAP (instead of where we placed a Gaussian prior on the parameters and did MAP (instead of maximum likelihood) estimation.] maximum likelihood) estimation.] - The '''weight decay parameter''' $\lambda$ controls the relative importance The '''weight decay parameter''' $\lambda$ controls the relative importance Line 115: Line 115: the algorithm using matrix-vectorial notation. the algorithm using matrix-vectorial notation. We will use "$\textstyle \bullet$" to denote the element-wise product We will use "$\textstyle \bullet$" to denote the element-wise product - operator (denoted {\tt .*}'' in Matlab or Octave, and also called the Hadamard product), + operator (denoted .*'' in Matlab or Octave, and also called the Hadamard product), so so that if $\textstyle a = b \bullet c$, then $\textstyle a_i = b_ic_i$. that if $\textstyle a = b \bullet c$, then $\textstyle a_i = b_ic_i$.