Softmax回归

From Ufldl

Jump to: navigation, search
(4)
(5)
Line 586: Line 586:
通过对参数 <math>\theta</math>进行函数<math>J(\theta)</math> 的最小化求解,我们就得到了一个可用的 softmax 回归的实现。
通过对参数 <math>\theta</math>进行函数<math>J(\theta)</math> 的最小化求解,我们就得到了一个可用的 softmax 回归的实现。
-
== 5 ==
+
==Softmax回归与Logistic 回归的关系 Relationship to Logistic Regression ==
 +
 
 +
'''原文''':
 +
 
 +
In the special case where <math>k = 2</math>, one can show that softmax regression reduces to logistic regression.
 +
This shows that softmax regression is a generalization of logistic regression.  Concretely, when <math>k=2</math>,
 +
the softmax regression hypothesis outputs
 +
 
 +
<math>
 +
\begin{align}
 +
h_\theta(x) &=
 +
 
 +
\frac{1}{ e^{\theta_1^Tx}  + e^{ \theta_2^T x^{(i)} } }
 +
\begin{bmatrix}
 +
e^{ \theta_1^T x } \\
 +
e^{ \theta_2^T x }
 +
\end{bmatrix}
 +
\end{align}
 +
</math>
 +
 
 +
'''译文''':
 +
 
 +
当类别数<math>k = 2</math>时,softmax回归退化为logistic回归。这一点表明了softmax回归是logistic回归的推广形式。具体地说,当<math>k = 2</math>时,softmax 回归的假设函数:
 +
 
 +
<math>
 +
\begin{align}
 +
h_\theta(x) &=
 +
 
 +
\frac{1}{ e^{\theta_1^Tx}  + e^{ \theta_2^T x^{(i)} } }
 +
\begin{bmatrix}
 +
e^{ \theta_1^T x } \\
 +
e^{ \theta_2^T x }
 +
\end{bmatrix}
 +
\end{align}
 +
</math>
 +
 
 +
'''一审''':
 +
在类别数<math>k = 2</math>的特例中 ,我们会看到softmax回归退化成了logistic 回归。这一点表明了softmax回归是logistic 回归的 一般化形式。具体地说,当<math>k = 2</math>时,softmax回归的估值函数为 :
 +
 
 +
<math>
 +
\begin{align}
 +
h_\theta(x) &=
 +
 
 +
\frac{1}{ e^{\theta_1^Tx}  + e^{ \theta_2^T x^{(i)} } }
 +
\begin{bmatrix}
 +
e^{ \theta_1^T x } \\
 +
e^{ \theta_2^T x }
 +
\end{bmatrix}
 +
\end{align}
 +
</math>
 +
 
 +
'''原文''':
 +
 
 +
Taking advantage of the fact that this hypothesis
 +
is overparameterized and setting <math>\psi = \theta_1</math>,
 +
we can subtract <math>\theta_1</math> from each of the two parameters, giving us
 +
 
 +
<math>
 +
\begin{align}
 +
h(x) &=
 +
 
 +
\frac{1}{ e^{\vec{0}^Tx}  + e^{ (\theta_2-\theta_1)^T x^{(i)} } }
 +
\begin{bmatrix}
 +
e^{ \vec{0}^T x } \\
 +
e^{ (\theta_2-\theta_1)^T x }
 +
\end{bmatrix} \\
 +
 
 +
 
 +
&=
 +
\begin{bmatrix}
 +
\frac{1}{ 1 + e^{ (\theta_2-\theta_1)^T x^{(i)} } } \\
 +
\frac{e^{ (\theta_2-\theta_1)^T x }}{ 1 + e^{ (\theta_2-\theta_1)^T x^{(i)} } }
 +
\end{bmatrix} \\
 +
 
 +
&=
 +
\begin{bmatrix}
 +
\frac{1}{ 1  + e^{ (\theta_2-\theta_1)^T x^{(i)} } } \\
 +
1 - \frac{1}{ 1  + e^{ (\theta_2-\theta_1)^T x^{(i)} } } \\
 +
\end{bmatrix}
 +
\end{align}
 +
</math>
 +
 
 +
'''译文''':
 +
利用 softmax 回归参数冗余的特点,我们设 <math>\psi = \theta_1</math>,在将<math>\theta_1</math>分别从两个参数中减掉,得到:
 +
 
 +
<math>
 +
\begin{align}
 +
h(x) &=
 +
 
 +
\frac{1}{ e^{\vec{0}^Tx}  + e^{ (\theta_2-\theta_1)^T x^{(i)} } }
 +
\begin{bmatrix}
 +
e^{ \vec{0}^T x } \\
 +
e^{ (\theta_2-\theta_1)^T x }
 +
\end{bmatrix} \\
 +
 
 +
 
 +
&=
 +
\begin{bmatrix}
 +
\frac{1}{ 1 + e^{ (\theta_2-\theta_1)^T x^{(i)} } } \\
 +
\frac{e^{ (\theta_2-\theta_1)^T x }}{ 1 + e^{ (\theta_2-\theta_1)^T x^{(i)} } }
 +
\end{bmatrix} \\
 +
 
 +
&=
 +
\begin{bmatrix}
 +
\frac{1}{ 1  + e^{ (\theta_2-\theta_1)^T x^{(i)} } } \\
 +
1 - \frac{1}{ 1  + e^{ (\theta_2-\theta_1)^T x^{(i)} } } \\
 +
\end{bmatrix}
 +
\end{align}
 +
</math>
 +
 
 +
'''一审''':
 +
 
 +
利用估值函数参数冗余的优势,我们令<math>\psi = \theta_1</math>,并且从两个参数向量中都减去向量<math>\theta_1</math>,得到:
 +
 
 +
<math>
 +
\begin{align}
 +
h(x) &=
 +
 
 +
\frac{1}{ e^{\vec{0}^Tx}  + e^{ (\theta_2-\theta_1)^T x^{(i)} } }
 +
\begin{bmatrix}
 +
e^{ \vec{0}^T x } \\
 +
e^{ (\theta_2-\theta_1)^T x }
 +
\end{bmatrix} \\
 +
 
 +
 
 +
&=
 +
\begin{bmatrix}
 +
\frac{1}{ 1 + e^{ (\theta_2-\theta_1)^T x^{(i)} } } \\
 +
\frac{e^{ (\theta_2-\theta_1)^T x }}{ 1 + e^{ (\theta_2-\theta_1)^T x^{(i)} } }
 +
\end{bmatrix} \\
 +
 
 +
&=
 +
\begin{bmatrix}
 +
\frac{1}{ 1  + e^{ (\theta_2-\theta_1)^T x^{(i)} } } \\
 +
1 - \frac{1}{ 1  + e^{ (\theta_2-\theta_1)^T x^{(i)} } } \\
 +
\end{bmatrix}
 +
\end{align}
 +
</math>
 +
 
 +
'''原文''':
 +
 
 +
Thus, replacing <math>\theta_2-\theta_1</math> with a single parameter vector <math>\theta'</math>, we find
 +
that softmax regression predicts the probability of one of the classes as
 +
<math>\frac{1}{ 1  + e^{ (\theta')^T x^{(i)} } }</math>,
 +
and that of the other class as
 +
<math>1 - \frac{1}{ 1 + e^{ (\theta')^T x^{(i)} } }</math>,
 +
same as logistic regression.
 +
 
 +
 
 +
'''译文''':
 +
然后,将<math>\theta_2-\theta_1</math>用<math>\theta'</math>来表示,我们发现softmax回归预测其中一个类别的概率为 <math>\frac{1}{ 1  + e^{ (\theta')^T x^{(i)} } }</math>,另一个类别的概率为<math>1 - \frac{1}{ 1 + e^{ (\theta')^T x^{(i)} } }</math> ,这与 logistic回归是一致的。
 +
 
 +
'''一审''':
 +
于是,将<math>\theta_2-\theta_1</math>用<math>\theta'</math>来表示,我们发现softmax回归预测其中一个类别的概率为 <math>\frac{1}{ 1  + e^{ (\theta')^T x^{(i)} } }</math>,另一个类别的概率为<math>1 - \frac{1}{ 1 + e^{ (\theta')^T x^{(i)} } }</math>,这与 logistic回归是一致的。

Revision as of 11:49, 10 March 2013

Personal tools