Softmax回归

Revision as of 09:55, 10 March 2013 (view source)

Kandeng (Talk | contribs)

(→3)

← Older edit

Revision as of 11:37, 10 March 2013 (view source)

Kandeng (Talk | contribs)

(→4)

Newer edit →

Line 492:

在实际过程中，实现一个保留所有参数<math>(\theta_1, \theta_2,\ldots, \theta_n)</math>、不去任意地将某一参数向量置0的模型往往更简单清楚。但是我们需要对代价函数做一个改动：增加权重衰减。这将有助于解决由Softmax回归算法的参数冗余形式所带来的计算问题。

-

== 4 ==

+

==权重衰减 Weight Decay ==

+

'''原文''':

+

We will modify the cost function by adding a weight decay term

+

<math>\textstyle \frac{\lambda}{2} \sum_{i=1}^k \sum_{j=0}^{n} \theta_{ij}^2</math>

+

which penalizes large values of the parameters. Our cost function is now

+

<math>

+

\begin{align}

+

J(\theta) = - \frac{1}{m} \left[ \sum_{i=1}^{m} \sum_{j=1}^{k} 1\left\{y^{(i)} = j\right\} \log \frac{e^{\theta_j^T x^{(i)}}}{\sum_{l=1}^k e^{ \theta_l^T x^{(i)} }} \right]

+

+ \frac{\lambda}{2} \sum_{i=1}^k \sum_{j=0}^n \theta_{ij}^2

+

\end{align}

+

</math>

+

'''译文''':

+

我们通过添加一个权重衰减项 <math>\textstyle \frac{\lambda}{2} \sum_{i=1}^k \sum_{j=0}^{n} \theta_{ij}^2</math>来修改损失函数，这个衰减项会惩罚过大的参数值，现在我们的损失函数变成：

+

<math>

+

\begin{align}

+

J(\theta) = - \frac{1}{m} \left[ \sum_{i=1}^{m} \sum_{j=1}^{k} 1\left\{y^{(i)} = j\right\} \log \frac{e^{\theta_j^T x^{(i)}}}{\sum_{l=1}^k e^{ \theta_l^T x^{(i)} }} \right]

+

+ \frac{\lambda}{2} \sum_{i=1}^k \sum_{j=0}^n \theta_{ij}^2

+

\end{align}

+

</math>

+

'''一审''':

+

我们通过添加一个权重衰减项 <math>\textstyle \frac{\lambda}{2} \sum_{i=1}^k \sum_{j=0}^{n} \theta_{ij}^2</math>来修改代价函数，这个衰减项会惩罚过大的参数值，现在我们的代价函数变为：

+

<math>

+

\begin{align}

+

J(\theta) = - \frac{1}{m} \left[ \sum_{i=1}^{m} \sum_{j=1}^{k} 1\left\{y^{(i)} = j\right\} \log \frac{e^{\theta_j^T x^{(i)}}}{\sum_{l=1}^k e^{ \theta_l^T x^{(i)} }} \right]

+

+ \frac{\lambda}{2} \sum_{i=1}^k \sum_{j=0}^n \theta_{ij}^2

+

\end{align}

+

</math>

+

'''原文''':

+

With this weight decay term (for any <math>\lambda > 0</math>), the cost function

+

<math>J(\theta)</math> is now strictly convex, and is guaranteed to have a

+

unique solution. The Hessian is now invertible, and because <math>J(\theta)</math> is

+

convex, algorithms such as gradient descent, L-BFGS, etc. are guaranteed

+

to converge to the global minimum.

+

'''译文''':

+

( 对于任意的<math>\lambda > 0</math>) ，有了这个权重衰减项以后，损失函数就变成了严格的凸函数，可以保证解唯一了。此时的 Hessian 矩阵不再可逆，因为<math>J(\theta)</math>是凸的，梯度下降和 L-BFGS 之类的算法可以保证收敛到全局最优解。

+

'''一审''':

+

有了这个权重衰减项以后 (对于任意的<math>\lambda > 0</math>)，代价函数就变成了严格的凸函数，这样就可以保证得到唯一的解了。此时的 Hessian矩阵变为可逆矩阵，并且因为<math>J(\theta)</math>是凸函数，梯度下降法和 L-BFGS 等算法可以保证收敛到全局最优解。

+

'''原文''':

+

To apply an optimization algorithm, we also need the derivative of this

+

new definition of <math>J(\theta)</math>. One can show that the derivative is:

+

<math>

+

\begin{align}

+

\nabla_{\theta_j} J(\theta) = - \frac{1}{m} \sum_{i=1}^{m}{ \left[ x^{(i)} ( 1\{ y^{(i)} = j\} - p(y^{(i)} = j | x^{(i)}; \theta) ) \right] } + \lambda \theta_j

+

\end{align}

+

</math>

+

'''译文''':

+

为了使用优化算法，我们需要求得这个新<math>J(\theta)</math>.函数的导数形式，如下：

+

<math>

+

\begin{align}

+

\nabla_{\theta_j} J(\theta) = - \frac{1}{m} \sum_{i=1}^{m}{ \left[ x^{(i)} ( 1\{ y^{(i)} = j\} - p(y^{(i)} = j | x^{(i)}; \theta) ) \right] } + \lambda \theta_j

+

\end{align}

+

</math>

+

'''一审''':

+

为了使用优化算法，我们需要求得这个新定义的<math>J(\theta)</math>。函数的导数公式，我们可以得到导数公式如下：

+

<math>

+

\begin{align}

+

\nabla_{\theta_j} J(\theta) = - \frac{1}{m} \sum_{i=1}^{m}{ \left[ x^{(i)} ( 1\{ y^{(i)} = j\} - p(y^{(i)} = j | x^{(i)}; \theta) ) \right] } + \lambda \theta_j

+

\end{align}

+

</math>

+

'''原文''':

+

By minimizing <math>J(\theta)</math> with respect to <math>\theta</math>, we will have a working implementation of softmax regression.

+

'''译文''':

+

通过最小化<math>J(\theta)</math> ，我们就能实现一个可用的softmax回归模型。

+

'''一审''':

+

通过对参数 <math>\theta</math>进行函数<math>J(\theta)</math> 的最小化求解，我们就得到了一个可用的 softmax 回归的实现。

== 5 ==

Softmax回归

From Ufldl

Revision as of 11:37, 10 March 2013

Views

Personal tools

ufldl resources

wiki

Search

Toolbox

@@ Line 492: / Line 492: @@
 在实际过程中，实现一个保留所有参数<math>(\theta_1, \theta_2,\ldots, \theta_n)</math>、 不去任意地将某一参数向量置0的模型往往更简单清楚。但是我们需要对代价函数做一个改动：增加权重衰减。 这将有助于解决由Softmax回归算法的参数冗余形式所带来的计算问题。
-== 4 ==
+==权重衰减  Weight Decay ==
+'''原文''':
+We will modify the cost function by adding a weight decay term
+<math>\textstyle \frac{\lambda}{2} \sum_{i=1}^k \sum_{j=0}^{n} \theta_{ij}^2</math>
+which penalizes large values of the parameters.  Our cost function is now
+<math>
+\begin{align}
+J(\theta) = - \frac{1}{m} \left[ \sum_{i=1}^{m} \sum_{j=1}^{k} 1\left\{y^{(i)} = j\right\} \log \frac{e^{\theta_j^T x^{(i)}}}{\sum_{l=1}^k e^{ \theta_l^T x^{(i)} }}  \right]
+              + \frac{\lambda}{2} \sum_{i=1}^k \sum_{j=0}^n \theta_{ij}^2
+\end{align}
+</math>
+'''译文''':
+我们通过添加一个权重衰减项 <math>\textstyle \frac{\lambda}{2} \sum_{i=1}^k \sum_{j=0}^{n} \theta_{ij}^2</math>来修改损失函数，这个衰减项会惩罚过大的参数值，现在我们的损失函数变成：
+<math>
+\begin{align}
+J(\theta) = - \frac{1}{m} \left[ \sum_{i=1}^{m} \sum_{j=1}^{k} 1\left\{y^{(i)} = j\right\} \log \frac{e^{\theta_j^T x^{(i)}}}{\sum_{l=1}^k e^{ \theta_l^T x^{(i)} }}  \right]
+              + \frac{\lambda}{2} \sum_{i=1}^k \sum_{j=0}^n \theta_{ij}^2
+\end{align}
+</math>
+'''一审''':
+我们通过添加一个权重衰减项 <math>\textstyle \frac{\lambda}{2} \sum_{i=1}^k \sum_{j=0}^{n} \theta_{ij}^2</math>来修改代价函数，这个衰减项会惩罚过大的参数值，现在我们的代价函数变为：
+<math>
+\begin{align}
+J(\theta) = - \frac{1}{m} \left[ \sum_{i=1}^{m} \sum_{j=1}^{k} 1\left\{y^{(i)} = j\right\} \log \frac{e^{\theta_j^T x^{(i)}}}{\sum_{l=1}^k e^{ \theta_l^T x^{(i)} }}  \right]
+              + \frac{\lambda}{2} \sum_{i=1}^k \sum_{j=0}^n \theta_{ij}^2
+\end{align}
+</math>
+'''原文''':
+With this weight decay term (for any <math>\lambda > 0</math>), the cost function
+<math>J(\theta)</math> is now strictly convex, and is guaranteed to have a
+unique solution.  The Hessian is now invertible, and because <math>J(\theta)</math> is
+convex, algorithms such as gradient descent, L-BFGS, etc. are guaranteed
+to converge to the global minimum.
+'''译文''':
+( 对于任意的<math>\lambda > 0</math>) ，有了这个权重衰减项以后，损失函数就变成了严格的凸函数，可以保证解唯一了。此时的 Hessian 矩阵不再可逆，因为<math>J(\theta)</math>是凸的，梯度下降和 L-BFGS 之类的算法可以保证收敛到全局最优解。
+'''一审''':
+有了这个权重衰减项以后 (对于任意的<math>\lambda > 0</math>)，代价函数就变成了严格的凸函数，这样就可以保证得到唯一的解了。 此时的 Hessian矩阵 变为可逆矩阵 ， 并且因为<math>J(\theta)</math>是凸函数 ，梯度下降法和 L-BFGS 等算法可以保证收敛到全局最优解。
+'''原文''':
+To apply an optimization algorithm, we also need the derivative of this
+new definition of <math>J(\theta)</math>.  One can show that the derivative is:
+<math>
+\begin{align}
+\nabla_{\theta_j} J(\theta) = - \frac{1}{m} \sum_{i=1}^{m}{ \left[ x^{(i)} ( 1\{ y^{(i)} = j\}  - p(y^{(i)} = j | x^{(i)}; \theta) ) \right]  } + \lambda \theta_j
+\end{align}
+</math>
+'''译文''':
+为了使用优化算法，我们需要求得这个新<math>J(\theta)</math>.函数的导数形式，如下：
+<math>
+\begin{align}
+\nabla_{\theta_j} J(\theta) = - \frac{1}{m} \sum_{i=1}^{m}{ \left[ x^{(i)} ( 1\{ y^{(i)} = j\}  - p(y^{(i)} = j | x^{(i)}; \theta) ) \right]  } + \lambda \theta_j
+\end{align}
+</math>
+'''一审''':
+为了使用优化算法，我们需要求得这个新定义的<math>J(\theta)</math>。函数的导数公式，我们可以得到导数公式如下：
+<math>
+\begin{align}
+\nabla_{\theta_j} J(\theta) = - \frac{1}{m} \sum_{i=1}^{m}{ \left[ x^{(i)} ( 1\{ y^{(i)} = j\}  - p(y^{(i)} = j | x^{(i)}; \theta) ) \right]  } + \lambda \theta_j
+\end{align}
+</math>
+'''原文''':
+By minimizing <math>J(\theta)</math> with respect to <math>\theta</math>, we will have a working implementation of softmax regression.
+'''译文''':
+通过最小化<math>J(\theta)</math> ，我们就能实现一个可用的softmax回归模型。
+'''一审''':
+通过对参数 <math>\theta</math>进行函数<math>J(\theta)</math> 的最小化求解，我们就得到了一个可用的 softmax 回归的实现。
 == 5 ==