Softmax Regression
From Ufldl
(→Cost Function) |
(→Cost Function) |
||
Line 162: | Line 162: | ||
<math> | <math> | ||
\begin{align} | \begin{align} | ||
- | \nabla_{\theta_j} J(\theta) = - \frac{1}{m} \sum_{i=1}^{m}{ \left[ x^{(i)} ( 1\{ y^{(i)} = j\} - p(y^{(i)} = j | x^{(i)}; \theta) ) \right] } | + | \nabla_{\theta_j} J(\theta) = - \frac{1}{m} \sum_{i=1}^{m}{ \left[ x^{(i)} \left( 1\{ y^{(i)} = j\} - p(y^{(i)} = j | x^{(i)}; \theta) \right) \right] } |
\end{align} | \end{align} | ||
</math> | </math> | ||
Line 170: | Line 170: | ||
<math>p(y^{(i)} = j | x^{(i)} ; \theta) = e^{\theta_j^T x^{(i)}}/(\sum_{l=1}^k e^{ \theta_l^T x^{(i)}} )</math>. !--> | <math>p(y^{(i)} = j | x^{(i)} ; \theta) = e^{\theta_j^T x^{(i)}}/(\sum_{l=1}^k e^{ \theta_l^T x^{(i)}} )</math>. !--> | ||
- | Recall the meaning "<math>\nabla_{\theta_j}</math>" notation. In particular, <math>\nabla_{\theta_j} J(\theta)</math> | + | Recall the meaning of the "<math>\nabla_{\theta_j}</math>" notation. In particular, <math>\nabla_{\theta_j} J(\theta)</math> |
- | is itself a vector, so that | + | is itself a vector, so that its <math>l</math>-th element is <math>\frac{\partial J(\theta)}{\partial \theta_{jl}}</math> |
- | the partial derivative of <math>J(\theta)</math> with respect to the <math>l</math>-th element of <math>\ | + | the partial derivative of <math>J(\theta)</math> with respect to the <math>l</math>-th element of <math>\theta_j</math>. |
Armed with this formula for the derivative, one can then plug it into an algorithm such as gradient descent, and have it | Armed with this formula for the derivative, one can then plug it into an algorithm such as gradient descent, and have it |