Softmax Regression

@@ Line 49: / Line 49: @@
 \begin{align}
 \frac{\partial \ell(\theta)}{\partial \theta_k} &= \frac{\partial}{\partial \theta_k} \ln \theta^T_{y^{(i)}} x^{(i)} - \ln \sum_{j=1}^{n}{e^{ \theta_j^T x^{(i)} }} \\
-&= I_{ \{ y^{(i)} = k\} } x^{(i)} - \frac{1}{ \sum_{j=1}^{n}{e^{ \theta_j^T x^{(i)} }} } e^{ \theta_k^T x^{(i)} } \\
+&= I_{ \{ y^{(i)} = k\} } x^{(i)} - \frac{1}{ \sum_{j=1}^{n}{e^{ \theta_j^T x^{(i)} }} } e^{ \theta_k^T x^{(i)} } \qquad \text{(where } I_{ \{ y^{(i)} = k\} } \text{is 1 when } y^{(i)} = k \text{ and 0 otherwise) }  \\
 &= I_{ \{ y^{(i)} = k\} } x^{(i)} - P(y^{(i)} = k | x^{(i)})
 \end{align}
 </math>
 With this, we can now find a set of parameters that maximises <math>\ell(\theta)</math>, for instance by using gradient ascent.

Revision as of 23:03, 10 April 2011