Deriving gradients using the backpropagation idea

@@ Line 26: / Line 26: @@
 <li><math>a^{(l)}_i</math> is the activation of the <math>i</math>th unit in the <math>l</math>th layer
 <li><math>A \bullet B</math> is the Hadamard or element-wise product, which for <math>r \times c</math> matrices <math>A</math> and <math>B</math> yields the <math>r \times c</math> matrix <math>C = A \bullet B</math> such that <math>C_{r, c} = A_{r, c} \cdot B_{r, c}</math>
-<li><math>f^{(l})</math> is the activation function for units in the <math>l</math>th layer
+<li><math>f^{(l)}</math> is the activation function for units in the <math>l</math>th layer
 </ul>
@@ Line 51: / Line 51: @@
 </ol>
-[[File:Backpropagation Method Example 1.png]]
+[[File:Backpropagation Method Example 1.png | 400px]]
 === Example 2: Smoothed topographic L1 sparsity penalty in sparse coding  ===
@@ Line 60: / Line 60: @@
 We would like to find <math>\nabla_s \sum{ \sqrt{Vss^T + \epsilon} }</math>. As above, let's see this term as an instantiation of a neural network:
-[[File:Backpropagation Method Example 2.png]]
+[[File:Backpropagation Method Example 2.png | 600px]]

Revision as of 08:05, 29 May 2011