Exercise:Softmax Regression

From Ufldl

Jump to: navigation, search
(Step 3: Gradient checking)
 
Line 1: Line 1:
== Softmax regression ==
== Softmax regression ==
-
In this problem set, you will use [[softmax regression]] on pixels to classify MNIST images. However, since you will also be using softmax regression for the [[Self-Taught Learning]] exercise later, your implementation should be a more general implementation that works on any arbitrary input.
+
In this problem set, you will use [[softmax regression]] to classify MNIST images. The goal of this exercise is to build a softmax classifier that you will be able to reuse in the future exercises and also on other classification problems that you might encounter.
-
In the file <tt>[http://ufldl.stanford.edu/wiki/resources/softmax_exercise.zip softmax_exercise.zip]</tt>, we have provided some starter code. You should write your code in the places indicated by "YOUR CODE HERE" in the files. You will need to modify <tt>softmaxCost.m</tt> and <tt>softmaxPredict.m</tt> for this exercise.
+
In the file <tt>[http://ufldl.stanford.edu/wiki/resources/softmax_exercise.zip softmax_exercise.zip]</tt>, we have provided some starter code. You should write your code in the places indicated by "YOUR CODE HERE" in the files.  
-
=== Step 0: Initialise constants and parameters ===
+
In the starter code, you will need to modify '''<tt>softmaxCost.m</tt>''' and '''<tt>softmaxPredict.m</tt>''' for this exercise.
-
Two constants, <tt>inputSize</tt> and <tt>outputSize</tt>, corresponding to the size of each input vector and the number of class labels have been defined in the starter code. This will allow you to reuse your code on a different data set in a later exercise. We also initialise <tt>lambda</tt>, the weight decay parameter here.
+
We have also provided '''<tt>softmaxExercise.m</tt>''' that will help walk you through the steps in this exercise.
 +
 
 +
=== Dependencies ===
 +
 
 +
The following additional files are required for this exercise:
 +
* [http://yann.lecun.com/exdb/mnist/ MNIST Dataset]
 +
* [[Using the MNIST Dataset | Support functions for loading MNIST in Matlab ]]
 +
* [http://ufldl.stanford.edu/wiki/resources/softmax_exercise.zip Starter Code (softmax_exercise.zip)]
 +
 
 +
You will also need:
 +
* <tt>computeNumericalGradient.m</tt> from [[Exercise:Sparse Autoencoder]]
 +
 
 +
''If you have not completed the exercises listed above, we strongly suggest you complete them first.''
 +
 
 +
=== Step 0: Initialize constants and parameters ===
 +
 
 +
We've provided the code for this step in <tt>softmaxExercise.m</tt>.
 +
 
 +
Two constants, <tt>inputSize</tt> and <tt>numClasses</tt>, corresponding to the size of each input vector and the number of class labels have been defined in the starter code. This will allow you to reuse your code on a different data set in a later exercise. We also initialize <tt>lambda</tt>, the weight decay parameter here.
=== Step 1: Load data ===
=== Step 1: Load data ===
-
The starter code loads the MNIST images and labels into inputData and outputData respectively. The images are pre-processed to scale the pixel values to the range <math>[0, 1]</math>, and the label 0 is remapped to 10 for convenience of implementation. You will not need to change any code in this step for this exercise, but note that your code should be general enough to operate on data of arbitrary size belonging to any number of classes.
+
The starter code loads the MNIST images and labels into <tt>inputData</tt> and <tt>labels</tt> respectively. The images are pre-processed to scale the pixel values to the range <math>[0, 1]</math>, and the label 0 is remapped to 10 for convenience of implementation, so that the labels take values in <math>\{1, 2, \ldots, 10\}</math>. You will not need to change any code in this step for this exercise, but note that your code should be general enough to operate on data of arbitrary size belonging to any number of classes.
=== Step 2: Implement softmaxCost ===
=== Step 2: Implement softmaxCost ===
-
In softmaxCost.m, implement code to compute the softmax cost function. Since minFunc minimises this cost, we consider the '''negative''' of the log-likelihood <math>-\ell(\theta)</math>, in order to maximise <math>\ell(\theta)</math>. Remember also to include the weight decay term in the cost as well. Your code should also compute the appropriate gradients, as well as the predictions for the input data (which will be used in the cross-validation step later).
+
In <tt>softmaxCost.m</tt>, implement code to compute the softmax cost function <math>J(\theta)</math>. Remember to include the weight decay term in the cost as well. Your code should also compute the appropriate gradients, as well as the predictions for the input data (which will be used in the cross-validation step later).  
 +
 
 +
It is important to vectorize your code so that it runs quickly. We also provide several implementation tips below:
 +
{{Quote|
 +
Note: In the provided starter code, <tt>theta</tt> is a matrix where each the ''j<sup>th</sup> row'' is <math>\theta_j^T</math>
 +
}}
-
'''Implementation Tip''': Computing the ground truth matrix - In your code, you may need to compute the ground truth matrix <tt>M</tt>, such that <tt>M(r, c)</tt> is 1 if <math>y^{(c)} = r</math> and 0 otherwise. This can be done quickly, without a loop, using the MATLAB functions <tt>sparse</tt> and <tt>full</tt>. <tt>sparse(r, c, v)</tt> creates a sparse matrix such that <tt>M(r(i), c(i)) = v(i)</tt> for all i. That is, the vectors <tt>r</tt> and <tt>c</tt> give the position of the elements whose values we wish to set, and <tt>v</tt> the corresponding values of the elements. Running <tt>full</tt> on a sparse matrix gives the full representation of the matrix for use. Note that the code for using <tt>sparse</tt> and <tt>full</tt> to compute the ground truth matrix has already been included in softmaxCost.m.
+
'''Implementation Tip''': Computing the ground truth matrix - In your code, you may need to compute the ground truth matrix <tt>M</tt>, such that <tt>M(r, c)</tt> is 1 if <math>y^{(c)} = r</math> and 0 otherwise. This can be done quickly, without a loop, using the MATLAB functions <tt>sparse</tt> and <tt>full</tt>. Specifically, the command <tt>M = sparse(r, c, v)</tt> creates a sparse matrix such that <tt>M(r(i), c(i)) = v(i)</tt> for all i. That is, the vectors <tt>r</tt> and <tt>c</tt> give the position of the elements whose values we wish to set, and <tt>v</tt> the corresponding values of the elements. Running <tt>full</tt> on a sparse matrix gives a "full" representation of the matrix for use (meaning that Matlab will no longer try to represent it as a sparse matrix in memory).  The code for using <tt>sparse</tt> and <tt>full</tt> to compute the ground truth matrix has already been included in softmaxCost.m.
Line 26: Line 49:
\begin{align}  
\begin{align}  
h(x^{(i)}) =  
h(x^{(i)}) =  
-
\frac{1}{ \sum_{j=1}^{n}{e^{ \theta_j^T x^{(i)} }} }
+
\frac{1}{ \sum_{j=1}^{k}{e^{ \theta_j^T x^{(i)} }} }
\begin{bmatrix}  
\begin{bmatrix}  
e^{ \theta_1^T x^{(i)} } \\
e^{ \theta_1^T x^{(i)} } \\
e^{ \theta_2^T x^{(i)} } \\
e^{ \theta_2^T x^{(i)} } \\
\vdots \\
\vdots \\
-
e^{ \theta_n^T x^{(i)} } \\
+
e^{ \theta_k^T x^{(i)} } \\
\end{bmatrix}
\end{bmatrix}
\end{align}
\end{align}
Line 42: Line 65:
h(x^{(i)}) &=
h(x^{(i)}) &=
   
   
-
\frac{1}{ \sum_{j=1}^{n}{e^{ \theta_j^T x^{(i)} }} }
+
\frac{1}{ \sum_{j=1}^{k}{e^{ \theta_j^T x^{(i)} }} }
\begin{bmatrix}  
\begin{bmatrix}  
e^{ \theta_1^T x^{(i)} } \\
e^{ \theta_1^T x^{(i)} } \\
e^{ \theta_2^T x^{(i)} } \\
e^{ \theta_2^T x^{(i)} } \\
\vdots \\
\vdots \\
-
e^{ \theta_n^T x^{(i)} } \\
+
e^{ \theta_k^T x^{(i)} } \\
\end{bmatrix} \\
\end{bmatrix} \\
&=
&=
-
\frac{ e^{-\alpha} }{ e^{-\alpha} \sum_{j=1}^{n}{e^{ \theta_j^T x^{(i)} }} }
+
\frac{ e^{-\alpha} }{ e^{-\alpha} \sum_{j=1}^{k}{e^{ \theta_j^T x^{(i)} }} }
\begin{bmatrix}  
\begin{bmatrix}  
e^{ \theta_1^T x^{(i)} } \\
e^{ \theta_1^T x^{(i)} } \\
e^{ \theta_2^T x^{(i)} } \\
e^{ \theta_2^T x^{(i)} } \\
\vdots \\
\vdots \\
-
e^{ \theta_n^T x^{(i)} } \\
+
e^{ \theta_k^T x^{(i)} } \\
\end{bmatrix} \\
\end{bmatrix} \\
&=
&=
-
\frac{ 1 }{ \sum_{j=1}^{n}{e^{ \theta_j^T x^{(i)} - \alpha }} }
+
\frac{ 1 }{ \sum_{j=1}^{k}{e^{ \theta_j^T x^{(i)} - \alpha }} }
\begin{bmatrix}  
\begin{bmatrix}  
e^{ \theta_1^T x^{(i)} - \alpha } \\
e^{ \theta_1^T x^{(i)} - \alpha } \\
e^{ \theta_2^T x^{(i)} - \alpha } \\
e^{ \theta_2^T x^{(i)} - \alpha } \\
\vdots \\
\vdots \\
-
e^{ \theta_n^T x^{(i)} - \alpha } \\
+
e^{ \theta_k^T x^{(i)} - \alpha } \\
\end{bmatrix} \\
\end{bmatrix} \\
Line 78: Line 101:
  % M is the matrix as described in the text
  % M is the matrix as described in the text
-
  M = bsxfun(@minus, M, max(M));
+
  M = bsxfun(@minus, M, max(M, [], 1));
<tt>max(M)</tt> yields a row vector with each element giving the maximum value in that column. <tt>bsxfun</tt> (short for binary singleton expansion function) applies minus along each row of <tt>M</tt>, hence subtracting the maximum of each column from every element in the column.  
<tt>max(M)</tt> yields a row vector with each element giving the maximum value in that column. <tt>bsxfun</tt> (short for binary singleton expansion function) applies minus along each row of <tt>M</tt>, hence subtracting the maximum of each column from every element in the column.  
-
'''Implementation Tip: ''' Computing the predictions - you may also find <tt>bsxfun</tt> useful in computing your predictions - if you have a matrix <tt>M</tt> containing the <math>e^{\theta_j^T x^{(i)}}</math> terms, such that <tt>M(r, c)</tt> contains the <math>e^{\theta_r^T x^{(c)}}</math> term, you can use the following code to compute the hypothesis (by diving all elements in each column by their column sum):
+
'''Implementation Tip: ''' Computing the predictions - you may also find <tt>bsxfun</tt> useful in computing your predictions - if you have a matrix <tt>M</tt> containing the <math>e^{\theta_j^T x^{(i)}}</math> terms, such that <tt>M(r, c)</tt> contains the <math>e^{\theta_r^T x^{(c)}}</math> term, you can use the following code to compute the hypothesis (by dividing all elements in each column by their column sum):
  % M is the matrix as described in the text
  % M is the matrix as described in the text
Line 103: Line 126:
Use the following parameter when training your softmax classifier:
Use the following parameter when training your softmax classifier:
-
  lambda = 1e-3
+
  lambda = 1e-4
=== Step 5: Testing ===
=== Step 5: Testing ===
Line 109: Line 132:
Now that you've trained your model, you will test it against the MNIST test set, comprising 10000 28x28 images. However, to do so, you will first need to complete the function <tt>softmaxPredict</tt> in <tt>softmaxPredict.m</tt>, a function which generates predictions for input data under a trained softmax model.  
Now that you've trained your model, you will test it against the MNIST test set, comprising 10000 28x28 images. However, to do so, you will first need to complete the function <tt>softmaxPredict</tt> in <tt>softmaxPredict.m</tt>, a function which generates predictions for input data under a trained softmax model.  
-
Once that is done, you will be able to compute the accuracy (the proportion of correctly classified images) of your model using the code provided. Our implementation achieved an accuracy of '''92%'''. If your model's accuracy is significantly less (less than 91%), check your code, ensure that you are using the trained weights, and that you are training your model on the full 60000 training images. Conversely, if your accuracy is too high (99-100%), ensure that you have not accidentally trained your model on the test set as well.
+
Once that is done, you will be able to compute the accuracy (the proportion of correctly classified images) of your model using the code provided. Our implementation achieved an accuracy of '''92.6%'''. If your model's accuracy is significantly less (less than 91%), check your code, ensure that you are using the trained weights, and that you are training your model on the full 60000 training images. Conversely, if your accuracy is too high (99-100%), ensure that you have not accidentally trained your model on the test set as well.
[[Category:Exercises]]
[[Category:Exercises]]
 +
 +
 +
{{Softmax}}

Latest revision as of 11:02, 26 May 2011

Personal tools