Previously we learned how to predict continuous-valued quantities (e.g., housing prices) as a linear function of input values (e.g., the size of the house). Sometimes we will instead wish to predict a discrete variable such as predicting whether a grid of pixel intensities represents a “0” digit or a “1” digit. This is a classification problem. Logistic regression is a simple classification algorithm for learning to make such decisions.
In linear regression we tried to predict the value of
The function
Note that only one of the two terms in the summation is non-zero for each training example (depending on whether the label
We now have a cost function that measures how well a given hypothesis
To minimize
Written in its vector form, the entire gradient can be expressed as:
This is essentially the same as the gradient for linear regression except that now
Exercise 1B
Starter code for this exercise is included in the Starter Code GitHub Repo in the ex1/ directory.
In this exercise you will implement the objective function and gradient computations for logistic regression and use your code to learn to classify images of digits from the MNIST dataset as either “0” or “1”. Some examples of these digits are shown below:
Each of the digits is is represented by a 28x28 grid of pixel intensities, which we will reformat as a vector
You will find starter code for this exercise in the ex1/ex1b_logreg.m
file. The starter code file performs the following tasks for you:
-
Calls
ex1_load_mnist.m
to load the MNIST training and testing data. In addition to loading the pixel values into a matrixX (so that that j’th pixel of the i’th example isX_{ji} = x^{(i)}_j ) and the labels into a row-vectory , it will also perform some simple normalizations of the pixel intensities so that they tend to have zero mean and unit variance. Even though the MNIST dataset contains 10 different digits (0-9), in this exercise we will only load the 0 and 1 digits — the ex1_load_mnist function will do this for you. -
The code will append a row of 1’s so that
\theta_0 will act as an intercept term. -
The code calls
minFunc
with thelogistic_regression.m
file as objective function. Your job will be to fill inlogistic_regression.m
to return the objective function value and its gradient. -
After
minFunc
completes, the classification accuracy on the training set and test set will be printed out.
As for the linear regression exercise, you will need to implement logistic_regression.m
to loop over all of the training examples ex1b_logreg.m
script to train the classifier and test it.
If your code is functioning correctly, you should find that your classifier is able to achieve 100% accuracy on both the training and testing sets! It turns out that this is a relatively easy classification problem because 0 and 1 digits tend to look very different. In future exercises it will be much more difficult to get perfect results like this.