Softmax回归
From Ufldl
(→4) |
(→5) |
||
Line 586: | Line 586: | ||
通过对参数 <math>\theta</math>进行函数<math>J(\theta)</math> 的最小化求解,我们就得到了一个可用的 softmax 回归的实现。 | 通过对参数 <math>\theta</math>进行函数<math>J(\theta)</math> 的最小化求解,我们就得到了一个可用的 softmax 回归的实现。 | ||
- | == | + | ==Softmax回归与Logistic 回归的关系 Relationship to Logistic Regression == |
+ | |||
+ | '''原文''': | ||
+ | |||
+ | In the special case where <math>k = 2</math>, one can show that softmax regression reduces to logistic regression. | ||
+ | This shows that softmax regression is a generalization of logistic regression. Concretely, when <math>k=2</math>, | ||
+ | the softmax regression hypothesis outputs | ||
+ | |||
+ | <math> | ||
+ | \begin{align} | ||
+ | h_\theta(x) &= | ||
+ | |||
+ | \frac{1}{ e^{\theta_1^Tx} + e^{ \theta_2^T x^{(i)} } } | ||
+ | \begin{bmatrix} | ||
+ | e^{ \theta_1^T x } \\ | ||
+ | e^{ \theta_2^T x } | ||
+ | \end{bmatrix} | ||
+ | \end{align} | ||
+ | </math> | ||
+ | |||
+ | '''译文''': | ||
+ | |||
+ | 当类别数<math>k = 2</math>时,softmax回归退化为logistic回归。这一点表明了softmax回归是logistic回归的推广形式。具体地说,当<math>k = 2</math>时,softmax 回归的假设函数: | ||
+ | |||
+ | <math> | ||
+ | \begin{align} | ||
+ | h_\theta(x) &= | ||
+ | |||
+ | \frac{1}{ e^{\theta_1^Tx} + e^{ \theta_2^T x^{(i)} } } | ||
+ | \begin{bmatrix} | ||
+ | e^{ \theta_1^T x } \\ | ||
+ | e^{ \theta_2^T x } | ||
+ | \end{bmatrix} | ||
+ | \end{align} | ||
+ | </math> | ||
+ | |||
+ | '''一审''': | ||
+ | 在类别数<math>k = 2</math>的特例中 ,我们会看到softmax回归退化成了logistic 回归。这一点表明了softmax回归是logistic 回归的 一般化形式。具体地说,当<math>k = 2</math>时,softmax回归的估值函数为 : | ||
+ | |||
+ | <math> | ||
+ | \begin{align} | ||
+ | h_\theta(x) &= | ||
+ | |||
+ | \frac{1}{ e^{\theta_1^Tx} + e^{ \theta_2^T x^{(i)} } } | ||
+ | \begin{bmatrix} | ||
+ | e^{ \theta_1^T x } \\ | ||
+ | e^{ \theta_2^T x } | ||
+ | \end{bmatrix} | ||
+ | \end{align} | ||
+ | </math> | ||
+ | |||
+ | '''原文''': | ||
+ | |||
+ | Taking advantage of the fact that this hypothesis | ||
+ | is overparameterized and setting <math>\psi = \theta_1</math>, | ||
+ | we can subtract <math>\theta_1</math> from each of the two parameters, giving us | ||
+ | |||
+ | <math> | ||
+ | \begin{align} | ||
+ | h(x) &= | ||
+ | |||
+ | \frac{1}{ e^{\vec{0}^Tx} + e^{ (\theta_2-\theta_1)^T x^{(i)} } } | ||
+ | \begin{bmatrix} | ||
+ | e^{ \vec{0}^T x } \\ | ||
+ | e^{ (\theta_2-\theta_1)^T x } | ||
+ | \end{bmatrix} \\ | ||
+ | |||
+ | |||
+ | &= | ||
+ | \begin{bmatrix} | ||
+ | \frac{1}{ 1 + e^{ (\theta_2-\theta_1)^T x^{(i)} } } \\ | ||
+ | \frac{e^{ (\theta_2-\theta_1)^T x }}{ 1 + e^{ (\theta_2-\theta_1)^T x^{(i)} } } | ||
+ | \end{bmatrix} \\ | ||
+ | |||
+ | &= | ||
+ | \begin{bmatrix} | ||
+ | \frac{1}{ 1 + e^{ (\theta_2-\theta_1)^T x^{(i)} } } \\ | ||
+ | 1 - \frac{1}{ 1 + e^{ (\theta_2-\theta_1)^T x^{(i)} } } \\ | ||
+ | \end{bmatrix} | ||
+ | \end{align} | ||
+ | </math> | ||
+ | |||
+ | '''译文''': | ||
+ | 利用 softmax 回归参数冗余的特点,我们设 <math>\psi = \theta_1</math>,在将<math>\theta_1</math>分别从两个参数中减掉,得到: | ||
+ | |||
+ | <math> | ||
+ | \begin{align} | ||
+ | h(x) &= | ||
+ | |||
+ | \frac{1}{ e^{\vec{0}^Tx} + e^{ (\theta_2-\theta_1)^T x^{(i)} } } | ||
+ | \begin{bmatrix} | ||
+ | e^{ \vec{0}^T x } \\ | ||
+ | e^{ (\theta_2-\theta_1)^T x } | ||
+ | \end{bmatrix} \\ | ||
+ | |||
+ | |||
+ | &= | ||
+ | \begin{bmatrix} | ||
+ | \frac{1}{ 1 + e^{ (\theta_2-\theta_1)^T x^{(i)} } } \\ | ||
+ | \frac{e^{ (\theta_2-\theta_1)^T x }}{ 1 + e^{ (\theta_2-\theta_1)^T x^{(i)} } } | ||
+ | \end{bmatrix} \\ | ||
+ | |||
+ | &= | ||
+ | \begin{bmatrix} | ||
+ | \frac{1}{ 1 + e^{ (\theta_2-\theta_1)^T x^{(i)} } } \\ | ||
+ | 1 - \frac{1}{ 1 + e^{ (\theta_2-\theta_1)^T x^{(i)} } } \\ | ||
+ | \end{bmatrix} | ||
+ | \end{align} | ||
+ | </math> | ||
+ | |||
+ | '''一审''': | ||
+ | |||
+ | 利用估值函数参数冗余的优势,我们令<math>\psi = \theta_1</math>,并且从两个参数向量中都减去向量<math>\theta_1</math>,得到: | ||
+ | |||
+ | <math> | ||
+ | \begin{align} | ||
+ | h(x) &= | ||
+ | |||
+ | \frac{1}{ e^{\vec{0}^Tx} + e^{ (\theta_2-\theta_1)^T x^{(i)} } } | ||
+ | \begin{bmatrix} | ||
+ | e^{ \vec{0}^T x } \\ | ||
+ | e^{ (\theta_2-\theta_1)^T x } | ||
+ | \end{bmatrix} \\ | ||
+ | |||
+ | |||
+ | &= | ||
+ | \begin{bmatrix} | ||
+ | \frac{1}{ 1 + e^{ (\theta_2-\theta_1)^T x^{(i)} } } \\ | ||
+ | \frac{e^{ (\theta_2-\theta_1)^T x }}{ 1 + e^{ (\theta_2-\theta_1)^T x^{(i)} } } | ||
+ | \end{bmatrix} \\ | ||
+ | |||
+ | &= | ||
+ | \begin{bmatrix} | ||
+ | \frac{1}{ 1 + e^{ (\theta_2-\theta_1)^T x^{(i)} } } \\ | ||
+ | 1 - \frac{1}{ 1 + e^{ (\theta_2-\theta_1)^T x^{(i)} } } \\ | ||
+ | \end{bmatrix} | ||
+ | \end{align} | ||
+ | </math> | ||
+ | |||
+ | '''原文''': | ||
+ | |||
+ | Thus, replacing <math>\theta_2-\theta_1</math> with a single parameter vector <math>\theta'</math>, we find | ||
+ | that softmax regression predicts the probability of one of the classes as | ||
+ | <math>\frac{1}{ 1 + e^{ (\theta')^T x^{(i)} } }</math>, | ||
+ | and that of the other class as | ||
+ | <math>1 - \frac{1}{ 1 + e^{ (\theta')^T x^{(i)} } }</math>, | ||
+ | same as logistic regression. | ||
+ | |||
+ | |||
+ | '''译文''': | ||
+ | 然后,将<math>\theta_2-\theta_1</math>用<math>\theta'</math>来表示,我们发现softmax回归预测其中一个类别的概率为 <math>\frac{1}{ 1 + e^{ (\theta')^T x^{(i)} } }</math>,另一个类别的概率为<math>1 - \frac{1}{ 1 + e^{ (\theta')^T x^{(i)} } }</math> ,这与 logistic回归是一致的。 | ||
+ | |||
+ | '''一审''': | ||
+ | 于是,将<math>\theta_2-\theta_1</math>用<math>\theta'</math>来表示,我们发现softmax回归预测其中一个类别的概率为 <math>\frac{1}{ 1 + e^{ (\theta')^T x^{(i)} } }</math>,另一个类别的概率为<math>1 - \frac{1}{ 1 + e^{ (\theta')^T x^{(i)} } }</math>,这与 logistic回归是一致的。 |