Logistic Regression Vectorization Example
From Ufldl
for
Logistic Regression Vectorization Example
Jump to:
navigation
,
search
Consider training a logistic regression model using batch gradient ascent. Suppose our hypothesis is :<math>\begin{align} h_\theta(x) = \frac{1}{1+\exp(-\theta^Tx)}, \end{align}</math> where (following CS229 notational convention) we let <math>\textstyle x_0=1</math>, so that <math>\textstyle x \in \Re^{n+1}</math> and <math>\textstyle \theta \in \Re^{n+1}</math>, and <math>\textstyle \theta_0</math> is our intercept term. We have a training set <math>\textstyle \{(x^{(1)}, y^{(1)}), \ldots, (x^{(m)}, y^{(m)})\}</math> of <math>\textstyle m</math> examples, and the batch gradient ascent update rule is "<math>\textstyle \theta := \theta + \alpha \nabla_\theta \ell(\theta)</math>, where <math>\textstyle \ell(\theta)</math> is the log likelihood and <math>\textstyle \nabla_\theta \ell(\theta)</math> is its derivative. [Note: Most of the notation below follows that defined in the class CS229: Machine Learning. Please see Lecture notes #1 from http://cs229.stanford.edu/ for details.] We thus need to compute the gradient: :<math>\begin{align} \nabla_\theta \ell(\theta) = \sum_{i=1}^m \left(y^{(i)} - h_\theta(x^{(i)}) \right) x^{(i)}_j. \end{align}</math> Suppose that the Matlab/Octave variable <tt>x</tt> is the design matrix, so that <tt>x(:,i)</tt> is the <math>\textstyle i</math>-th training example <math>\textstyle x^{(i)}</math> and <tt>x(i,j)</tt> is <math>\textstyle x^{(i)}_j</math>. Further, suppose the Matlab/Octave variable <tt>y</tt> is a ''row'' vector of the labels in the training set, so that <tt>y(i)</tt> is <math>\textstyle y^{(i)} \in \{0,1\}</math>. (Here we differ from the CS229 notation, because in $<tt>x</tt> we stack the training inputs in columns rather than in rows; and <tt>y</tt><math>\in \Re^{1\times m}</math> is a row rather than a column vector.) Here's truly horrible, extremely slow, implementation: <syntaxhighlight lang="matlab"> % Implementation 1 grad = zeros(n+1,1); for i=1:m, h = sigmoid(theta'*x(:,i)); temp = y(i) - h; for j=1:n+1, grad(j) = grad(j) + temp * x(j,i); end; end; </syntaxhighlight> The two nested for-loops makes this very slow. Here's a more typical implementation, that partially vectorizes the algorithm and gets better performance: <syntaxhighlight lang="matlab"> % Implementation 2 grad = zeros(n+1,1); for i=1:m, grad = grad + (y(i) - sigmoid(theta'*x(:,i)))* x(:,i); end; </syntaxhighlight> However, it turns out to be possible to even further vectorize this. In Matlab/Octave, it is possible to get rid of for-loops, and doing so will speed up the algorithm. In particular, we can implement the following: <syntaxhighlight lang="matlab"> % Implementation 3 grad = x * (y- sigmoid(theta'*x))' </syntaxhighlight> Here, we assume that the Matlab/Octave <tt>sigmoid(z)</tt> takes as input a vector <tt>z</tt>, applies the sigmoid function component-wise to the input, and returns the result. The output of <tt>sigmoid(z)</tt> is therefore itself also a vector, of the same dimension as the input <tt>z</tt> When the training set is large, this final implementation takes the greatest advantage of Matlab/Octave's highly optimized numerical linear algebra libraries to carry out the matrix-vector operations, and so this is far more efficient than the earlier implementations.
Template:Languages
(
view source
)
Template:Vectorized Implementation
(
view source
)
Return to
Logistic Regression Vectorization Example
.
Views
Page
Discussion
View source
History
Personal tools
Log in
ufldl resources
UFLDL Tutorial
Recommended Readings
wiki
Main page
Recent changes
Random page
Help
Search
Toolbox
What links here
Related changes
Special pages