Logistic Regression Vectorization Example
From Ufldl
m (Added semicolons, deleted stray quote) |
|||
Line 4: | Line 4: | ||
h_\theta(x) = \frac{1}{1+\exp(-\theta^Tx)}, | h_\theta(x) = \frac{1}{1+\exp(-\theta^Tx)}, | ||
\end{align}</math> | \end{align}</math> | ||
- | where (following | + | where (following the notational convention from the OpenClassroom videos and from CS229) we let <math>\textstyle x_0=1</math>, so that <math>\textstyle x \in \Re^{n+1}</math> |
and <math>\textstyle \theta \in \Re^{n+1}</math>, and <math>\textstyle \theta_0</math> is our intercept term. We have a training set | and <math>\textstyle \theta \in \Re^{n+1}</math>, and <math>\textstyle \theta_0</math> is our intercept term. We have a training set | ||
<math>\textstyle \{(x^{(1)}, y^{(1)}), \ldots, (x^{(m)}, y^{(m)})\}</math> of <math>\textstyle m</math> examples, and the batch gradient | <math>\textstyle \{(x^{(1)}, y^{(1)}), \ldots, (x^{(m)}, y^{(m)})\}</math> of <math>\textstyle m</math> examples, and the batch gradient | ||
Line 10: | Line 10: | ||
is the log likelihood and <math>\textstyle \nabla_\theta \ell(\theta)</math> is its derivative. | is the log likelihood and <math>\textstyle \nabla_\theta \ell(\theta)</math> is its derivative. | ||
- | [Note: Most of the notation below follows that defined in the class | + | [Note: Most of the notation below follows that defined in the OpenClassroom videos or in the class |
- | CS229: Machine Learning. | + | CS229: Machine Learning. For details, see either the [http://openclassroom.stanford.edu/MainFolder/CoursePage.php?course=MachineLearning OpenClassroom videos] or Lecture Notes #1 of http://cs229.stanford.edu/ .] |
We thus need to compute the gradient: | We thus need to compute the gradient: | ||
Line 22: | Line 22: | ||
Further, suppose the Matlab/Octave variable <tt>y</tt> is a ''row'' vector of the labels in the | Further, suppose the Matlab/Octave variable <tt>y</tt> is a ''row'' vector of the labels in the | ||
training set, so that the variable <tt>y(i)</tt> is <math>\textstyle y^{(i)} \in \{0,1\}</math>. (Here we differ from the | training set, so that the variable <tt>y(i)</tt> is <math>\textstyle y^{(i)} \in \{0,1\}</math>. (Here we differ from the | ||
- | CS229 notation. Specifically, in the matrix-valued <tt>x</tt> we stack the training inputs in columns rather than in rows; | + | OpenClassroom/CS229 notation. Specifically, in the matrix-valued <tt>x</tt> we stack the training inputs in columns rather than in rows; |
and <tt>y</tt><math>\in \Re^{1\times m}</math> is a row vector rather than a column vector.) | and <tt>y</tt><math>\in \Re^{1\times m}</math> is a row vector rather than a column vector.) | ||
Line 55: | Line 55: | ||
end; | end; | ||
- | % Fast implementation of matrix-vector | + | % Fast implementation of matrix-vector multiply |
grad = A*b; | grad = A*b; | ||
</syntaxhighlight> | </syntaxhighlight> |