Softmax Regression

Revision as of 19:10, 10 May 2011 (view source)

(→Softmax Regression vs. k Binary Classifiers)

Latest revision as of 13:24, 7 April 2013 (view source)

Line 73:

For convenience, we will also write

<math>\theta</math> to denote all the

-

parameters of our model. When you implement softmax regression, is is usually

+

parameters of our model. When you implement softmax regression, it is usually

convenient to represent <math>\theta</math> as a <math>k</math>-by-<math>(n+1)</math> matrix obtained by

stacking up <math>\theta_1, \theta_2, \ldots, \theta_k</math> in rows, so that

Line 202:

regression's parameters are "redundant." More formally, we say that our

softmax model is '''overparameterized,''' meaning that for any hypothesis we might

-

fit to the data, there~~'re~~ multiple parameter settings that give rise to exactly

+

fit to the data, there are multiple parameter settings that give rise to exactly

the same hypothesis function <math>h_\theta</math> mapping from inputs <math>x</math>

to the predictions.

Line 385:

classifier would be appropriate. In the second case, it would be more appropriate to build

three separate logistic regression classifiers.

+

Latest revision as of 13:24, 7 April 2013

@@ Line 73: / Line 73: @@
 For convenience, we will also write
 <math>\theta</math> to denote all the
-parameters of our model.  When you implement softmax regression, is is usually
+parameters of our model.  When you implement softmax regression, it is usually
 convenient to represent <math>\theta</math> as a <math>k</math>-by-<math>(n+1)</math> matrix obtained by
 stacking up <math>\theta_1, \theta_2, \ldots, \theta_k</math> in rows, so that
@@ Line 202: / Line 202: @@
 regression's parameters are "redundant."  More formally, we say that our
 softmax model is '''overparameterized,''' meaning that for any hypothesis we might
-fit to the data, there're multiple parameter settings that give rise to exactly
+fit to the data, there are multiple parameter settings that give rise to exactly
 the same hypothesis function <math>h_\theta</math> mapping from inputs <math>x</math>
 to the predictions.
@@ Line 385: / Line 385: @@
 classifier would be appropriate.  In the second case, it would be more appropriate to build
 three separate logistic regression classifiers.
+{{Softmax}}
+{{Languages|Softmax回归|中文}}