Neural Network Vectorization

From Ufldl

Jump to: navigation, search
 
Line 93: Line 93:
\end{align}  
\end{align}  
</math>
</math>
-
Here, <math>\bullet</math> denote element-wise product.  For simplicity, our description here will ignore the derivatives with respect to <math>b^{(l)}</math>, though your implementation of backpropagation will have to compute those derivatives too.  
+
Here, <math>\bullet</math> denotes element-wise product.  For simplicity, our description here will ignore the derivatives with respect to <math>b^{(l)}</math>, though your implementation of backpropagation will have to compute those derivatives too.  
Suppose we have already implemented the vectorized forward propagation method, so that the matrix-valued <tt>z2</tt>, <tt>a2</tt>,  <tt>z3</tt> and <tt>h</tt> are computed as described above. We can then implement an ''unvectorized'' version of backpropagation as follows:
Suppose we have already implemented the vectorized forward propagation method, so that the matrix-valued <tt>z2</tt>, <tt>a2</tt>,  <tt>z3</tt> and <tt>h</tt> are computed as described above. We can then implement an ''unvectorized'' version of backpropagation as follows:
Line 110: Line 110:
This implementation has a <tt>for</tt> loop.  We would like to come up with an implementation that simultaneously performs backpropagation on all the examples, and eliminates this <tt>for</tt> loop.  
This implementation has a <tt>for</tt> loop.  We would like to come up with an implementation that simultaneously performs backpropagation on all the examples, and eliminates this <tt>for</tt> loop.  
 +
To do so, we will replace the vectors <tt>delta3</tt> and <tt>delta2</tt> with matrices, where one column of each matrix corresponds to each training example.  We will also implement a function <tt>fprime(z)</tt> that takes as input a matrix <tt>z</tt>, and applies <math>f'(\cdot)</math> element-wise.  Each of the four lines of Matlab in the <tt>for</tt> loop above can then be vectorized and replaced with a single line of Matlab code (without a surrounding <tt>for</tt> loop). 
 +
In the [[Exercise:Vectorization|Vectorization exercise]], we ask you to derive the vectorized version of this algorithm by yourself.  If you are able to do it from this description, we strongly encourage you to do so.  Here also are some [[Backpropagation vectorization hints]]; however, we encourage you to try to carry out the vectorization yourself without looking at the hints.
-
Further, suppose the Matlab/Octave variable <tt>y</tt> is a ''row'' vector of the labels in the
+
 
-
training set, so that the variable <tt>y(i)</tt> is <math>\textstyle y^{(i)} \in \{0,1\}</math>. (Here we differ from the
+
== Sparse autoencoder ==
-
CS229 notation. Specifically, in the matrix-valued <tt>x</tt> we stack the training inputs in columns rather than in rows;
+
 
-
and <tt>y</tt><math>\in \Re^{1\times m}</math> is a row vector rather than a column vector.)
+
The [[Autoencoders_and_Sparsity|sparse autoencoder]] neural network has an additional sparsity penalty that constrains neurons' average firing rate to be close to some target activation <math>\rho</math>.  When performing backpropagation on a single training example, we had taken into the account the sparsity penalty by computing the following:
 +
 
 +
:<math>\begin{align}
 +
\delta^{(2)}_i =
 +
  \left( \left( \sum_{j=1}^{s_{2}} W^{(2)}_{ji} \delta^{(3)}_j \right)
 +
+ \beta \left( - \frac{\rho}{\hat\rho_i} + \frac{1-\rho}{1-\hat\rho_i} \right) \right) f'(z^{(2)}_i) .
 +
\end{align}</math>
 +
 
 +
In the ''unvectorized'' case, this was computed as:
 +
 
 +
<syntaxhighlight>
 +
% Sparsity Penalty Delta
 +
sparsity_delta = - rho ./ rho_hat + (1 - rho) ./ (1 - rho_hat);
 +
for i=1:m,
 +
  ...
 +
  delta2 = (W2'*delta3(:,i) + beta*sparsity_delta).* fprime(z2(:,i));
 +
  ...
 +
end;
 +
</syntaxhighlight>
 +
 
 +
The code above still had a <tt>for</tt> loop over the training set, and <tt>delta2</tt> was a column vector.
 +
 
 +
In contrast, recall that in the vectorized case, <tt>delta2</tt> is now a matrix with <math>m</math> columns corresponding to the <math>m</math> training examples.  Now, notice that the <tt>sparsity_delta</tt> term is the same regardless of what training example we are processing.  This suggests that vectorizing the computation above can be done by simply adding the same value to each column when constructing the <tt>delta2</tt> matrix. Thus, to vectorize the above computation, we can simply add <tt>sparsity_delta</tt> (e.g., using <tt>repmat</tt>) to each column of <tt>delta2</tt>.
 +
 
 +
 
 +
{{Vectorized Implementation}}
 +
 
 +
 
 +
{{Languages|神经网络向量化|中文}}

Latest revision as of 13:13, 7 April 2013

Personal tools