Exercise:Independent Component Analysis

From Ufldl

Jump to: navigation, search
(Backtracking line search)
 
Line 43: Line 43:
In step 4, you will optimize for the orthonormal ICA objective using gradient descent with backtracking line search (the code for which has already been provided for you. For more details on the backtracking line search, you may wish to consult the [[Exercise:Independent Component Analysis#Appendix| appendix ]] of this exercise). The orthonormality constraint should be enforced with a projection, which you should fill in.
In step 4, you will optimize for the orthonormal ICA objective using gradient descent with backtracking line search (the code for which has already been provided for you. For more details on the backtracking line search, you may wish to consult the [[Exercise:Independent Component Analysis#Appendix| appendix ]] of this exercise). The orthonormality constraint should be enforced with a projection, which you should fill in.
-
Once you have filled in the code for the projection, check that it is correct by using the verification code provided. Once you have verified that your projection is correct, comment out the verification code and run the optimization. 10 000 iterations of gradient descent should take around 2 hours, and produce a basis which looks like the following:
+
Once you have filled in the code for the projection, check that it is correct by using the verification code provided. Once you have verified that your projection is correct, comment out the verification code and run the optimization. 1000 iterations of gradient descent should take less than 15 minutes, and produce a basis which looks like the following:
[[File:OrthonormalICAFeatures.png | 350px]]
[[File:OrthonormalICAFeatures.png | 350px]]
-
Observe that few of the bases have been completely learned even after 10 000 iterations, highlighting a weakness of orthonormal ICA - it is difficult to optimize for the objective while enforcing the orthonormality constraint using gradient descent, and convergence can be very slow. Hence, in situations where an orthonormal basis is not required, other faster methods of learning bases (such as [[Sparse Coding: Autoencoder Interpretation | sparse coding]]) may be preferable.
+
It is comparatively difficult to optimize for the objective while enforcing the orthonormality constraint using gradient descent, and convergence can be slow. Hence, in situations where an orthonormal basis is not required, other faster methods of learning bases (such as [[Sparse Coding: Autoencoder Interpretation | sparse coding]]) may be preferable.
=== Appendix ===
=== Appendix ===
Line 54: Line 54:
The backtracking line search used in the exercise is based off that in [http://www.stanford.edu/~boyd/cvxbook/ Convex Optimization by Boyd and Vandenbergh]. In the backtracking line search, given a descent direction <math>\vec{u}</math> (in this exercise we use <math>\vec{u} = -\nabla f(\vec{x})</math>), we want to find a good step size <math>t</math> that gives us a steep descent. The general idea is to use a linear approximation (the first order Taylor approximation) to the function <math>f</math> at the current point <math>\vec{x}</math>, and to search for a step size <math>t</math> such that we can decrease the function's value by more than <math>\alpha</math> times the decrease predicted by the linear approximation (<math>\alpha \in (0, 0.5)</math>. For more details, you may wish to consult [http://www.stanford.edu/~boyd/cvxbook/ the book].
The backtracking line search used in the exercise is based off that in [http://www.stanford.edu/~boyd/cvxbook/ Convex Optimization by Boyd and Vandenbergh]. In the backtracking line search, given a descent direction <math>\vec{u}</math> (in this exercise we use <math>\vec{u} = -\nabla f(\vec{x})</math>), we want to find a good step size <math>t</math> that gives us a steep descent. The general idea is to use a linear approximation (the first order Taylor approximation) to the function <math>f</math> at the current point <math>\vec{x}</math>, and to search for a step size <math>t</math> such that we can decrease the function's value by more than <math>\alpha</math> times the decrease predicted by the linear approximation (<math>\alpha \in (0, 0.5)</math>. For more details, you may wish to consult [http://www.stanford.edu/~boyd/cvxbook/ the book].
 +
 +
However, it is not necessary to use the backtracking line search here. Gradient descent with a small step size, or backtracking to a step size so that the objective decreases is sufficient for this exercise.

Latest revision as of 04:31, 4 October 2011

Personal tools