# Backpropagation Algorithm

### From Ufldl

Line 25: | Line 25: | ||

as reflected in our definition for <math>J(W, b)</math>. Applying weight decay | as reflected in our definition for <math>J(W, b)</math>. Applying weight decay | ||

to the bias units usually makes only a small different to the final network, | to the bias units usually makes only a small different to the final network, | ||

- | however. If you | + | however. If you've taken CS229 (Machine Learning) at Stanford or watched the course's videos |

+ | on YouTube, you may also recognize weight decay this as | ||

essentially a variant of the Bayesian regularization method you saw there, | essentially a variant of the Bayesian regularization method you saw there, | ||

where we placed a Gaussian prior on the parameters and did MAP (instead of | where we placed a Gaussian prior on the parameters and did MAP (instead of | ||

maximum likelihood) estimation.] | maximum likelihood) estimation.] | ||

- | |||

The '''weight decay parameter''' <math>\lambda</math> controls the relative importance | The '''weight decay parameter''' <math>\lambda</math> controls the relative importance | ||

Line 115: | Line 115: | ||

the algorithm using matrix-vectorial notation. | the algorithm using matrix-vectorial notation. | ||

We will use "<math>\textstyle \bullet</math>" to denote the element-wise product | We will use "<math>\textstyle \bullet</math>" to denote the element-wise product | ||

- | operator (denoted `` | + | operator (denoted ``<tt>.*</tt>'' in Matlab or Octave, and also called the Hadamard product), |

so | so | ||

that if <math>\textstyle a = b \bullet c</math>, then <math>\textstyle a_i = b_ic_i</math>. | that if <math>\textstyle a = b \bullet c</math>, then <math>\textstyle a_i = b_ic_i</math>. |