Softmax Regression

Revision as of 19:09, 10 May 2011 (view source)

(→Relationship to Logistic Regression)

Latest revision as of 13:24, 7 April 2013 (view source)

Line 73:

For convenience, we will also write

<math>\theta</math> to denote all the

-

parameters of our model. When you implement softmax regression, is is usually

+

parameters of our model. When you implement softmax regression, it is usually

convenient to represent <math>\theta</math> as a <math>k</math>-by-<math>(n+1)</math> matrix obtained by

stacking up <math>\theta_1, \theta_2, \ldots, \theta_k</math> in rows, so that

Line 202:

regression's parameters are "redundant." More formally, we say that our

softmax model is '''overparameterized,''' meaning that for any hypothesis we might

-

fit to the data, there~~'re~~ multiple parameter settings that give rise to exactly

+

fit to the data, there are multiple parameter settings that give rise to exactly

the same hypothesis function <math>h_\theta</math> mapping from inputs <math>x</math>

to the predictions.

Line 380:

or three logistic regression classifiers? (ii) Now suppose your classes are

indoor_scene, black_and_white_image, and image_has_people. Would you use softmax

-

regression of multiple logistic regression classifiers?

+

regression or multiple logistic regression classifiers?

In the first case, the classes are mutually exclusive, so a softmax regression

classifier would be appropriate. In the second case, it would be more appropriate to build

three separate logistic regression classifiers.

+

Latest revision as of 13:24, 7 April 2013

@@ Line 73: / Line 73: @@
 For convenience, we will also write
 <math>\theta</math> to denote all the
-parameters of our model.  When you implement softmax regression, is is usually
+parameters of our model.  When you implement softmax regression, it is usually
 convenient to represent <math>\theta</math> as a <math>k</math>-by-<math>(n+1)</math> matrix obtained by
 stacking up <math>\theta_1, \theta_2, \ldots, \theta_k</math> in rows, so that
@@ Line 202: / Line 202: @@
 regression's parameters are "redundant."  More formally, we say that our
 softmax model is '''overparameterized,''' meaning that for any hypothesis we might
-fit to the data, there're multiple parameter settings that give rise to exactly
+fit to the data, there are multiple parameter settings that give rise to exactly
 the same hypothesis function <math>h_\theta</math> mapping from inputs <math>x</math>
 to the predictions.
@@ Line 380: / Line 380: @@
 or three logistic regression classifiers?  (ii) Now suppose your classes are
 indoor_scene, black_and_white_image, and image_has_people.  Would you use softmax
-regression of multiple logistic regression classifiers?
+regression or multiple logistic regression classifiers?
 In the first case, the classes are mutually exclusive, so a softmax regression
 classifier would be appropriate.  In the second case, it would be more appropriate to build
 three separate logistic regression classifiers.
+{{Softmax}}
+{{Languages|Softmax回归|中文}}