Logistic Regression Vectorization Example

From Ufldl

Jump to: navigation, search
m
 
Line 4: Line 4:
h_\theta(x) = \frac{1}{1+\exp(-\theta^Tx)},
h_\theta(x) = \frac{1}{1+\exp(-\theta^Tx)},
\end{align}</math>
\end{align}</math>
-
where (following CS229 notational convention) we let <math>\textstyle x_0=1</math>, so that <math>\textstyle x \in \Re^{n+1}</math>  
+
where (following the notational convention from the OpenClassroom videos and from CS229) we let <math>\textstyle x_0=1</math>, so that <math>\textstyle x \in \Re^{n+1}</math>  
and <math>\textstyle \theta \in \Re^{n+1}</math>, and <math>\textstyle \theta_0</math> is our intercept term.  We have a training set
and <math>\textstyle \theta \in \Re^{n+1}</math>, and <math>\textstyle \theta_0</math> is our intercept term.  We have a training set
<math>\textstyle \{(x^{(1)}, y^{(1)}), \ldots, (x^{(m)}, y^{(m)})\}</math> of <math>\textstyle m</math> examples, and the batch gradient
<math>\textstyle \{(x^{(1)}, y^{(1)}), \ldots, (x^{(m)}, y^{(m)})\}</math> of <math>\textstyle m</math> examples, and the batch gradient
-
ascent update rule is "<math>\textstyle \theta := \theta + \alpha \nabla_\theta \ell(\theta)</math>, where <math>\textstyle \ell(\theta)</math>
+
ascent update rule is <math>\textstyle \theta := \theta + \alpha \nabla_\theta \ell(\theta)</math>, where <math>\textstyle \ell(\theta)</math>
is the log likelihood and <math>\textstyle \nabla_\theta \ell(\theta)</math> is its derivative.
is the log likelihood and <math>\textstyle \nabla_\theta \ell(\theta)</math> is its derivative.
-
[Note: Most of the notation below follows that defined in the class  
+
[Note: Most of the notation below follows that defined in the OpenClassroom videos or in the class  
-
CS229: Machine Learning.  Please see Lecture notes #1 from http://cs229.stanford.edu/ for details.]
+
CS229: Machine Learning.  For details, see either the [http://openclassroom.stanford.edu/MainFolder/CoursePage.php?course=MachineLearning OpenClassroom videos] or Lecture Notes #1 of http://cs229.stanford.edu/ .]
We thus need to compute the gradient:
We thus need to compute the gradient:
Line 22: Line 22:
Further, suppose the Matlab/Octave variable <tt>y</tt> is a ''row'' vector of the labels in the
Further, suppose the Matlab/Octave variable <tt>y</tt> is a ''row'' vector of the labels in the
training set, so that the variable <tt>y(i)</tt> is <math>\textstyle y^{(i)} \in \{0,1\}</math>.  (Here we differ from the  
training set, so that the variable <tt>y(i)</tt> is <math>\textstyle y^{(i)} \in \{0,1\}</math>.  (Here we differ from the  
-
CS229 notation. Specifically, in the matrix-valued <tt>x</tt> we stack the training inputs in columns rather than in rows;
+
OpenClassroom/CS229 notation. Specifically, in the matrix-valued <tt>x</tt> we stack the training inputs in columns rather than in rows;
and <tt>y</tt><math>\in \Re^{1\times m}</math> is a row vector rather than a column vector.)  
and <tt>y</tt><math>\in \Re^{1\times m}</math> is a row vector rather than a column vector.)  
Line 55: Line 55:
end;
end;
-
% Fast implementation of matrix-vector multiple
+
% Fast implementation of matrix-vector multiply
-
grad = A*b
+
grad = A*b;
</syntaxhighlight>
</syntaxhighlight>
Line 63: Line 63:
<syntaxhighlight lang="matlab">
<syntaxhighlight lang="matlab">
% Implementation 3
% Implementation 3
-
grad = x * (y- sigmoid(theta'*x))'
+
grad = x * (y- sigmoid(theta'*x))';
</syntaxhighlight>
</syntaxhighlight>
Here, we assume that the Matlab/Octave <tt>sigmoid(z)</tt> takes as input a vector <tt>z</tt>, applies the sigmoid function component-wise to the input, and returns the result.  The output of <tt>sigmoid(z)</tt> is therefore itself also a vector, of the same dimension as the input <tt>z</tt>  
Here, we assume that the Matlab/Octave <tt>sigmoid(z)</tt> takes as input a vector <tt>z</tt>, applies the sigmoid function component-wise to the input, and returns the result.  The output of <tt>sigmoid(z)</tt> is therefore itself also a vector, of the same dimension as the input <tt>z</tt>  
Line 70: Line 70:
Coming up with vectorized implementations isn't always easy, and sometimes requires careful thought.  But as you gain familiarity with vectorized operations, you'll find that there are design patterns (i.e., a small number of ways of vectorizing) that apply to many different pieces of code.
Coming up with vectorized implementations isn't always easy, and sometimes requires careful thought.  But as you gain familiarity with vectorized operations, you'll find that there are design patterns (i.e., a small number of ways of vectorizing) that apply to many different pieces of code.
 +
 +
 +
{{Vectorized Implementation}}
 +
 +
 +
{{Languages|逻辑回归的向量化实现样例|中文}}

Latest revision as of 13:09, 7 April 2013

Personal tools