Logistic Regression Vectorization Example

From Ufldl

Jump to: navigation, search
 
Line 4: Line 4:
h_\theta(x) = \frac{1}{1+\exp(-\theta^Tx)},
h_\theta(x) = \frac{1}{1+\exp(-\theta^Tx)},
\end{align}</math>
\end{align}</math>
-
where (following CS229 notational convention) we let <math>\textstyle x_0=1</math>, so that <math>\textstyle x \in \Re^{n+1}</math>
+
where (following the notational convention from the OpenClassroom videos and from CS229) we let <math>\textstyle x_0=1</math>, so that <math>\textstyle x \in \Re^{n+1}</math>  
and <math>\textstyle \theta \in \Re^{n+1}</math>, and <math>\textstyle \theta_0</math> is our intercept term.  We have a training set
and <math>\textstyle \theta \in \Re^{n+1}</math>, and <math>\textstyle \theta_0</math> is our intercept term.  We have a training set
<math>\textstyle \{(x^{(1)}, y^{(1)}), \ldots, (x^{(m)}, y^{(m)})\}</math> of <math>\textstyle m</math> examples, and the batch gradient
<math>\textstyle \{(x^{(1)}, y^{(1)}), \ldots, (x^{(m)}, y^{(m)})\}</math> of <math>\textstyle m</math> examples, and the batch gradient
-
ascent update rule is "<math>\textstyle \theta := \theta + \alpha \nabla_\theta \ell(\theta)</math>, where <math>\textstyle \ell(\theta)</math>
+
ascent update rule is <math>\textstyle \theta := \theta + \alpha \nabla_\theta \ell(\theta)</math>, where <math>\textstyle \ell(\theta)</math>
is the log likelihood and <math>\textstyle \nabla_\theta \ell(\theta)</math> is its derivative.
is the log likelihood and <math>\textstyle \nabla_\theta \ell(\theta)</math> is its derivative.
-
[Note: Most of the notation below follows that defined in the class  
+
[Note: Most of the notation below follows that defined in the OpenClassroom videos or in the class  
-
CS229: Machine Learning.  Please see Lecture notes #1 from http://cs229.stanford.edu/ for details.]
+
CS229: Machine Learning.  For details, see either the [http://openclassroom.stanford.edu/MainFolder/CoursePage.php?course=MachineLearning OpenClassroom videos] or Lecture Notes #1 of http://cs229.stanford.edu/ .]
We thus need to compute the gradient:
We thus need to compute the gradient:
Line 17: Line 17:
\nabla_\theta \ell(\theta) = \sum_{i=1}^m \left(y^{(i)} - h_\theta(x^{(i)}) \right) x^{(i)}_j.
\nabla_\theta \ell(\theta) = \sum_{i=1}^m \left(y^{(i)} - h_\theta(x^{(i)}) \right) x^{(i)}_j.
\end{align}</math>
\end{align}</math>
-
Suppose that the Matlab/Octave variable <tt>x</tt> is the design matrix, so that
+
 
-
<tt>x(:,i)</tt> is the <math>\textstyle i</math>-th training example <math>\textstyle x^{(i)}</math> and <tt>x(i,j)</tt> is  <math>\textstyle x^{(i)}_j</math>.   
+
Suppose that the Matlab/Octave variable <tt>x</tt> is a matrix containing the training inputs, so that
 +
<tt>x(:,i)</tt> is the <math>\textstyle i</math>-th training example <math>\textstyle x^{(i)}</math>, and <tt>x(j,i)</tt> is  <math>\textstyle x^{(i)}_j</math>.   
Further, suppose the Matlab/Octave variable <tt>y</tt> is a ''row'' vector of the labels in the
Further, suppose the Matlab/Octave variable <tt>y</tt> is a ''row'' vector of the labels in the
-
training set, so that <tt>y(i)</tt> is <math>\textstyle y^{(i)} \in \{0,1\}</math>.  (Here we differ from the  
+
training set, so that the variable <tt>y(i)</tt> is <math>\textstyle y^{(i)} \in \{0,1\}</math>.  (Here we differ from the  
-
CS229 notation, because in $<tt>x</tt> we stack the training inputs in columns rather than in rows;
+
OpenClassroom/CS229 notation. Specifically, in the matrix-valued <tt>x</tt> we stack the training inputs in columns rather than in rows;
-
and <tt>y</tt><math>\in \Re^{1\times m}</math> is a row rather than a column vector.)  
+
and <tt>y</tt><math>\in \Re^{1\times m}</math> is a row vector rather than a column vector.)  
-
Here's truly horrible, extremely slow, implementation:
+
Here's truly horrible, extremely slow, implementation of the gradient computation:
<syntaxhighlight lang="matlab">
<syntaxhighlight lang="matlab">
% Implementation 1
% Implementation 1
Line 54: Line 55:
end;
end;
-
% Fast implementation of matrix-vector multiple
+
% Fast implementation of matrix-vector multiply
-
grad = A*b
+
grad = A*b;
</syntaxhighlight>
</syntaxhighlight>
We recognize that Implementation 2 of our gradient descent calculation above is using the slow version with a for-loop, with
We recognize that Implementation 2 of our gradient descent calculation above is using the slow version with a for-loop, with
-
<tt>b(i)</tt> playing the role of <tt>(y(i) - sigmoid(theta'*x(:,i)))</tt>.  We can derive a fast implementation as follows:  
+
<tt>b(i)</tt> playing the role of <tt>(y(i) - sigmoid(theta'*x(:,i)))</tt>, and <tt>A</tt> playing the role of <tt>x</tt>.  We can derive a fast implementation as follows:  
<syntaxhighlight lang="matlab">
<syntaxhighlight lang="matlab">
% Implementation 3
% Implementation 3
-
grad = x * (y- sigmoid(theta'*x))'
+
grad = x * (y- sigmoid(theta'*x))';
</syntaxhighlight>
</syntaxhighlight>
Here, we assume that the Matlab/Octave <tt>sigmoid(z)</tt> takes as input a vector <tt>z</tt>, applies the sigmoid function component-wise to the input, and returns the result.  The output of <tt>sigmoid(z)</tt> is therefore itself also a vector, of the same dimension as the input <tt>z</tt>  
Here, we assume that the Matlab/Octave <tt>sigmoid(z)</tt> takes as input a vector <tt>z</tt>, applies the sigmoid function component-wise to the input, and returns the result.  The output of <tt>sigmoid(z)</tt> is therefore itself also a vector, of the same dimension as the input <tt>z</tt>  
Line 69: Line 70:
Coming up with vectorized implementations isn't always easy, and sometimes requires careful thought.  But as you gain familiarity with vectorized operations, you'll find that there are design patterns (i.e., a small number of ways of vectorizing) that apply to many different pieces of code.
Coming up with vectorized implementations isn't always easy, and sometimes requires careful thought.  But as you gain familiarity with vectorized operations, you'll find that there are design patterns (i.e., a small number of ways of vectorizing) that apply to many different pieces of code.
 +
 +
 +
{{Vectorized Implementation}}
 +
 +
 +
{{Languages|逻辑回归的向量化实现样例|中文}}

Latest revision as of 13:09, 7 April 2013

Personal tools