Softmax Regression
From Ufldl
for
Softmax Regression
Jump to:
navigation
,
search
== Introduction == '''Softmax regression''', also known as '''multinomial logistic regression''', is a generalisation of logistic regression to problems where there are more than 2 class labels. An example would be classifying the digits from the MNIST data set - each input can be labelled with 1 of 10 possible class labels. == Mathematical background== Formally, we consider the classification problem where we have <math>m</math> <math>k</math>-dimensional inputs <math>x^{(1)}, x^{(2)}, \ldots, x^{(m)}</math> with corresponding class labels <math>y^{(1)}, y^{(2)}, \ldots, y^{(m)}</math>, where <math>y^{(i)} \in \{1, 2, \ldots, n\}</math>, with <math>n</math> being the number of classes. Our hypothesis <math>h_{\theta}(x)</math>, returns a vector of probabilities, such that <math> \begin{align} h(x^{(i)}) = \begin{bmatrix} P(y^{(i)} = 1 | x^{(i)}) \\ P(y^{(i)} = 2 | x^{(i)}) \\ \vdots \\ P(y^{(i)} = n | x^{(i)}) \end{bmatrix} = \frac{1}{ \sum_{j=1}^{n}{e^{ \theta_j^T x^{(i)} }} } \begin{bmatrix} e^{ \theta_1^T x^{(i)} } \\ e^{ \theta_2^T x^{(i)} } \\ \vdots \\ e^{ \theta_n^T x^{(i)} } \\ \end{bmatrix} \end{align} </math> where <math>\theta_1, \theta_2, \ldots, \theta_n</math> are each <math>k</math>-dimensional column vectors that constitute the parameters of our hypothesis. Our objective is to maximise the likelihood of the data, <math>L(\theta; x, y) = \prod_{i=1}^{m}{ P(y^{(i)} | x^{(i)}) }</math>. For convenience, we consider the log-likelihood of the data, <math> \begin{align} \ell(\theta) &= \ln L(\theta; x, y) \\ &= \ln \prod_{i=1}^{m}{ P(y^{(i)} | x^{(i)}) } \\ &= \sum_{i=1}^{m}{ \ln \frac{ e^{ \theta^T_{y^{(i)}} x^{(i)} } }{ \sum_{j=1}^{n}{e^{ \theta_j^T x^{(i)} }} } } \\ &= \ln \theta^T_{y^{(i)}} x^{(i)} - \ln \sum_{j=1}^{n}{e^{ \theta_j^T x^{(i)} }} \end{align} </math> To find <math>\theta</math> such that <math>\ell(\theta)</math> is maximised, we first find the derivatives of <math>\ell(\theta)</math> with respect to <math>\theta_{k}</math>: <math> \begin{align} \frac{\partial \ell(\theta)}{\partial \theta_k} &= \frac{\partial}{\partial \theta_k} \ln \theta^T_{y^{(i)}} x^{(i)} - \ln \sum_{j=1}^{n}{e^{ \theta_j^T x^{(i)} }} \\ &= I_{ \{ y^{(i)} = k\} } x^{(i)} - \frac{1}{ \sum_{j=1}^{n}{e^{ \theta_j^T x^{(i)} }} } e^{ \theta_k^T x^{(i)} } \\ &= I_{ \{ y^{(i)} = k\} } x^{(i)} - P(y^{(i)} = k | x^{(i)}) \end{align} </math> With this, we can now find a set of parameters that maximises <math>\ell(\theta)</math>, for instance by using gradient ascent.
Template:Languages
(
view source
)
Template:Softmax
(
view source
)
Return to
Softmax Regression
.
Views
Page
Discussion
View source
History
Personal tools
Log in
ufldl resources
UFLDL Tutorial
Recommended Readings
wiki
Main page
Recent changes
Random page
Help
Search
Toolbox
What links here
Related changes
Special pages