Logistic Regression Vectorization Example

From Ufldl

Jump to: navigation, search

@@ Line 4: / Line 4: @@
 h_\theta(x) = \frac{1}{1+\exp(-\theta^Tx)},
 \end{align}</math>
-where (following CS229 notational convention) we let <math>\textstyle x_0=1</math>, so that <math>\textstyle x \in \Re^{n+1}</math>
+where (following the notational convention from the OpenClassroom videos and from CS229) we let <math>\textstyle x_0=1</math>, so that <math>\textstyle x \in \Re^{n+1}</math>
 and <math>\textstyle \theta \in \Re^{n+1}</math>, and <math>\textstyle \theta_0</math> is our intercept term.  We have a training set
 <math>\textstyle \{(x^{(1)}, y^{(1)}), \ldots, (x^{(m)}, y^{(m)})\}</math> of <math>\textstyle m</math> examples, and the batch gradient
-ascent update rule is "<math>\textstyle \theta := \theta + \alpha \nabla_\theta \ell(\theta)</math>, where <math>\textstyle \ell(\theta)</math>
+ascent update rule is <math>\textstyle \theta := \theta + \alpha \nabla_\theta \ell(\theta)</math>, where <math>\textstyle \ell(\theta)</math>
 is the log likelihood and <math>\textstyle \nabla_\theta \ell(\theta)</math> is its derivative.
-[Note: Most of the notation below follows that defined in the class
+[Note: Most of the notation below follows that defined in the OpenClassroom videos or in the class
-CS229: Machine Learning.  Please see Lecture notes #1 from http://cs229.stanford.edu/ for details.]
+CS229: Machine Learning.  For details, see either the [http://openclassroom.stanford.edu/MainFolder/CoursePage.php?course=MachineLearning OpenClassroom videos] or Lecture Notes #1 of http://cs229.stanford.edu/ .]
 We thus need to compute the gradient:
@@ Line 22: / Line 22: @@
 Further, suppose the Matlab/Octave variable <tt>y</tt> is a ''row'' vector of the labels in the
 training set, so that the variable <tt>y(i)</tt> is <math>\textstyle y^{(i)} \in \{0,1\}</math>.  (Here we differ from the
-CS229 notation. Specifically, in the matrix-valued <tt>x</tt> we stack the training inputs in columns rather than in rows;
+OpenClassroom/CS229 notation. Specifically, in the matrix-valued <tt>x</tt> we stack the training inputs in columns rather than in rows;
 and <tt>y</tt><math>\in \Re^{1\times m}</math> is a row vector rather than a column vector.)
@@ Line 55: / Line 55: @@
 end;
-% Fast implementation of matrix-vector multiple
+% Fast implementation of matrix-vector multiply
-grad = A*b
+grad = A*b;
 </syntaxhighlight>
@@ Line 63: / Line 63: @@
 <syntaxhighlight lang="matlab">
 % Implementation 3
-grad = x * (y- sigmoid(theta'*x))'
+grad = x * (y- sigmoid(theta'*x))';
 </syntaxhighlight>
 Here, we assume that the Matlab/Octave <tt>sigmoid(z)</tt> takes as input a vector <tt>z</tt>, applies the sigmoid function component-wise to the input, and returns the result.  The output of <tt>sigmoid(z)</tt> is therefore itself also a vector, of the same dimension as the input <tt>z</tt>
@@ Line 70: / Line 70: @@
 Coming up with vectorized implementations isn't always easy, and sometimes requires careful thought.  But as you gain familiarity with vectorized operations, you'll find that there are design patterns (i.e., a small number of ways of vectorizing) that apply to many different pieces of code.
+{{Vectorized Implementation}}
+{{Languages|逻辑回归的向量化实现样例|中文}}

Logistic Regression Vectorization Example

From Ufldl

Latest revision as of 13:09, 7 April 2013

Views

Personal tools

ufldl resources

wiki

Search

Toolbox