|
|
Line 163: |
Line 163: |
| 通过最小化<math>\textstyle J(\theta)</math>,我们就能实现一个可用的softmax回归模型。 | | 通过最小化<math>\textstyle J(\theta)</math>,我们就能实现一个可用的softmax回归模型。 |
| | | |
- | ==Softmax回归与Logistic 回归的关系 Relationship to Logistic Regression == | + | ==Softmax回归与Logistic 回归的关系== |
| | | |
- | '''原文''':
| + | 当类别数<math>\textstyle k = 2</math>时,softmax回归退化为logistic回归。这表明softmax回归是logistic回归的一般形式。具体地说,当<math>\textstyle k = 2</math>时,softmax 回归的假设函数为: |
- | | + | |
- | In the special case where <math>k = 2</math>, one can show that softmax regression reduces to logistic regression.
| + | |
- | This shows that softmax regression is a generalization of logistic regression. Concretely, when <math>k=2</math>,
| + | |
- | the softmax regression hypothesis outputs
| + | |
| | | |
| <math> | | <math> |
Line 183: |
Line 179: |
| </math> | | </math> |
| | | |
- | '''译文''':
| |
| | | |
- | 当类别数<math>k = 2</math>时,softmax回归退化为logistic回归。这一点表明了softmax回归是logistic回归的推广形式。具体地说,当<math>k = 2</math>时,softmax 回归的假设函数:
| + | 利用softmax回归参数冗余的特点,我们令<math>\textstyle \psi = \theta_1</math>,并且从两个参数向量中都减去向量<math>\textstyle \theta_1</math>,得到: |
- | | + | |
- | <math>
| + | |
- | \begin{align} | + | |
- | h_\theta(x) &=
| + | |
- | | + | |
- | \frac{1}{ e^{\theta_1^Tx} + e^{ \theta_2^T x^{(i)} } }
| + | |
- | \begin{bmatrix}
| + | |
- | e^{ \theta_1^T x } \\
| + | |
- | e^{ \theta_2^T x }
| + | |
- | \end{bmatrix}
| + | |
- | \end{align}
| + | |
- | </math> | + | |
- | | + | |
- | '''一审''':
| + | |
- | 在类别数<math>k = 2</math>的特例中 ,我们会看到softmax回归退化成了logistic 回归。这一点表明了softmax回归是logistic 回归的 一般化形式。具体地说,当<math>k = 2</math>时,softmax回归的估值函数为 :
| + | |
- | | + | |
- | <math>
| + | |
- | \begin{align} | + | |
- | h_\theta(x) &=
| + | |
- | | + | |
- | \frac{1}{ e^{\theta_1^Tx} + e^{ \theta_2^T x^{(i)} } }
| + | |
- | \begin{bmatrix}
| + | |
- | e^{ \theta_1^T x } \\
| + | |
- | e^{ \theta_2^T x }
| + | |
- | \end{bmatrix}
| + | |
- | \end{align}
| + | |
- | </math> | + | |
- | | + | |
- | '''原文''':
| + | |
- | | + | |
- | Taking advantage of the fact that this hypothesis
| + | |
- | is overparameterized and setting <math>\psi = \theta_1</math>,
| + | |
- | we can subtract <math>\theta_1</math> from each of the two parameters, giving us
| + | |
| | | |
| <math> | | <math> |
Line 245: |
Line 207: |
| </math> | | </math> |
| | | |
- | '''译文''':
| |
- | 利用 softmax 回归参数冗余的特点,我们设 <math>\psi = \theta_1</math>,在将<math>\theta_1</math>分别从两个参数中减掉,得到:
| |
- |
| |
- | <math>
| |
- | \begin{align}
| |
- | h(x) &=
| |
- |
| |
- | \frac{1}{ e^{\vec{0}^Tx} + e^{ (\theta_2-\theta_1)^T x^{(i)} } }
| |
- | \begin{bmatrix}
| |
- | e^{ \vec{0}^T x } \\
| |
- | e^{ (\theta_2-\theta_1)^T x }
| |
- | \end{bmatrix} \\
| |
- |
| |
- |
| |
- | &=
| |
- | \begin{bmatrix}
| |
- | \frac{1}{ 1 + e^{ (\theta_2-\theta_1)^T x^{(i)} } } \\
| |
- | \frac{e^{ (\theta_2-\theta_1)^T x }}{ 1 + e^{ (\theta_2-\theta_1)^T x^{(i)} } }
| |
- | \end{bmatrix} \\
| |
- |
| |
- | &=
| |
- | \begin{bmatrix}
| |
- | \frac{1}{ 1 + e^{ (\theta_2-\theta_1)^T x^{(i)} } } \\
| |
- | 1 - \frac{1}{ 1 + e^{ (\theta_2-\theta_1)^T x^{(i)} } } \\
| |
- | \end{bmatrix}
| |
- | \end{align}
| |
- | </math>
| |
- |
| |
- | '''一审''':
| |
- |
| |
- | 利用估值函数参数冗余的优势,我们令<math>\psi = \theta_1</math>,并且从两个参数向量中都减去向量<math>\theta_1</math>,得到:
| |
- |
| |
- | <math>
| |
- | \begin{align}
| |
- | h(x) &=
| |
- |
| |
- | \frac{1}{ e^{\vec{0}^Tx} + e^{ (\theta_2-\theta_1)^T x^{(i)} } }
| |
- | \begin{bmatrix}
| |
- | e^{ \vec{0}^T x } \\
| |
- | e^{ (\theta_2-\theta_1)^T x }
| |
- | \end{bmatrix} \\
| |
- |
| |
- |
| |
- | &=
| |
- | \begin{bmatrix}
| |
- | \frac{1}{ 1 + e^{ (\theta_2-\theta_1)^T x^{(i)} } } \\
| |
- | \frac{e^{ (\theta_2-\theta_1)^T x }}{ 1 + e^{ (\theta_2-\theta_1)^T x^{(i)} } }
| |
- | \end{bmatrix} \\
| |
- |
| |
- | &=
| |
- | \begin{bmatrix}
| |
- | \frac{1}{ 1 + e^{ (\theta_2-\theta_1)^T x^{(i)} } } \\
| |
- | 1 - \frac{1}{ 1 + e^{ (\theta_2-\theta_1)^T x^{(i)} } } \\
| |
- | \end{bmatrix}
| |
- | \end{align}
| |
- | </math>
| |
- |
| |
- | '''原文''':
| |
- |
| |
- | Thus, replacing <math>\theta_2-\theta_1</math> with a single parameter vector <math>\theta'</math>, we find
| |
- | that softmax regression predicts the probability of one of the classes as
| |
- | <math>\frac{1}{ 1 + e^{ (\theta')^T x^{(i)} } }</math>,
| |
- | and that of the other class as
| |
- | <math>1 - \frac{1}{ 1 + e^{ (\theta')^T x^{(i)} } }</math>,
| |
- | same as logistic regression.
| |
- |
| |
- |
| |
- | '''译文''':
| |
- | 然后,将<math>\theta_2-\theta_1</math>用<math>\theta'</math>来表示,我们发现softmax回归预测其中一个类别的概率为 <math>\frac{1}{ 1 + e^{ (\theta')^T x^{(i)} } }</math>,另一个类别的概率为<math>1 - \frac{1}{ 1 + e^{ (\theta')^T x^{(i)} } }</math> ,这与 logistic回归是一致的。
| |
| | | |
- | '''一审''':
| + | 因此,用<math>\textstyle \theta'</math>来表示<math>\textstyle \theta_2-\theta_1</math>,我们就会发现softmax回归器预测其中一个类别的概率为<math>\textstyle \frac{1}{ 1 + e^{ (\theta')^T x^{(i)} } }</math>,另一个类别概率的为<math>\textstyle 1 - \frac{1}{ 1 + e^{ (\theta')^T x^{(i)} } }</math>,这与 logistic回归是一致的。 |
- | 于是,将<math>\theta_2-\theta_1</math>用<math>\theta'</math>来表示,我们发现softmax回归预测其中一个类别的概率为 <math>\frac{1}{ 1 + e^{ (\theta')^T x^{(i)} } }</math>,另一个类别的概率为<math>1 - \frac{1}{ 1 + e^{ (\theta')^T x^{(i)} } }</math>,这与 logistic回归是一致的。
| + | |
| | | |
| ==Softmax 回归 vs. k 个二元分类器 Softmax Regression vs. k Binary Classifiers == | | ==Softmax 回归 vs. k 个二元分类器 Softmax Regression vs. k Binary Classifiers == |