# Independent Component Analysis

### From Ufldl

(→Introduction) |
(→Introduction) |
||

Line 1: | Line 1: | ||

== Introduction == | == Introduction == | ||

- | If you recall, in [[Sparse Coding | sparse coding]], we wanted to learn an '''over-complete''' basis for the data. In particular, this implies that the basis vectors that we learn in sparse coding will not be linearly independent. While this may be desirable in certain situations, sometimes we want to learn a linearly independent basis for the data. In independent component analysis (ICA), this is exactly what we want to do | + | If you recall, in [[Sparse Coding | sparse coding]], we wanted to learn an '''over-complete''' basis for the data. In particular, this implies that the basis vectors that we learn in sparse coding will not be linearly independent. While this may be desirable in certain situations, sometimes we want to learn a linearly independent basis for the data. In independent component analysis (ICA), this is exactly what we want to do. Further, in ICA, we want to learn not just any linearly independent basis, but an '''orthonormal''' basis for the data. (An orthonormal basis is a basis <math>(\phi_1, \ldots \phi_n)</math> such that <math>\phi_i \cdot \phi_j = 0</math> if <math>i \ne j</math> and <math>1</math> if <math>i = j</math>). |

Like sparse coding, independent component analysis has a simple mathematical formulation. Given some data <math>x</math>, we would like to learn a set of basis vectors which we represent in the columns of a matrix <math>W</math>, such that, firstly, as in sparse coding, our features are sparse; and secondly, our basis is an orthonormal basis. (Note that while in sparse coding, our matrix <math>A</math> was for mapping '''features''' <math>s</math> to '''raw data''', in independent component analysis, our matrix <math>W</math> works in the opposite direction, mapping '''raw data''' <math>x</math> to '''features''' instead). This gives us the following objective function: | Like sparse coding, independent component analysis has a simple mathematical formulation. Given some data <math>x</math>, we would like to learn a set of basis vectors which we represent in the columns of a matrix <math>W</math>, such that, firstly, as in sparse coding, our features are sparse; and secondly, our basis is an orthonormal basis. (Note that while in sparse coding, our matrix <math>A</math> was for mapping '''features''' <math>s</math> to '''raw data''', in independent component analysis, our matrix <math>W</math> works in the opposite direction, mapping '''raw data''' <math>x</math> to '''features''' instead). This gives us the following objective function: |

## Revision as of 07:24, 30 May 2011

## Introduction

If you recall, in sparse coding, we wanted to learn an **over-complete** basis for the data. In particular, this implies that the basis vectors that we learn in sparse coding will not be linearly independent. While this may be desirable in certain situations, sometimes we want to learn a linearly independent basis for the data. In independent component analysis (ICA), this is exactly what we want to do. Further, in ICA, we want to learn not just any linearly independent basis, but an **orthonormal** basis for the data. (An orthonormal basis is a basis such that if and 1 if *i* = *j*).

Like sparse coding, independent component analysis has a simple mathematical formulation. Given some data *x*, we would like to learn a set of basis vectors which we represent in the columns of a matrix *W*, such that, firstly, as in sparse coding, our features are sparse; and secondly, our basis is an orthonormal basis. (Note that while in sparse coding, our matrix *A* was for mapping **features** *s* to **raw data**, in independent component analysis, our matrix *W* works in the opposite direction, mapping **raw data** *x* to **features** instead). This gives us the following objective function:

This objective function is equivalent to the sparsity penalty on the features *s* in sparse coding, since *W**x* is precisely the features that represent the data. Adding in the orthonormality constraint gives us the full optimization problem for independent component analysis:

As is usually the case in deep learning, this problem has no simple analytic solution, and to make matters worse, the orthonormality constraint makes it slightly more difficult to optimize for the objective using gradient descent - every iteration of gradient descent must be followed by a step that maps the new basis back to the space of orthonormal bases (hence enforcing the constraint).