UFLDL Recommended Readings

If you're learning about UFLDL (Unsupervised Feature Learning and Deep Learning), here is a list of papers to consider reading.  We're assuming you're already familiar with basic machine learning at the level of [[http://cs229.stanford.edu/ CS229 (lecture notes available)]]. 

The basics: 
* [[http://cs294a.stanford.edu CS294A]] neural network/sparse autoencoder tutorial. (Most of this is now in the [[UFLDL Tutorial]], but the exercise is still on the CS294A website.) 
* Natural Image Statistics book, Hyvarinen et al.  This is long, so just skim or skip the chapters that you already know.  Important chapters: 5 (PCA and whitening; you'll probably already know the PCA stuff), 6 (sparse coding), 7 (ICA), 10 (ISA), 11 (TICA), 16 (temporal models).  
* Olshausen and Field Sparse Coding paper (1996) 
* [http://www.cs.stanford.edu/~ang/papers/icml07-selftaughtlearning.pdf]  Rajat Raina, Alexis Battle, Honglak Lee, Benjamin Packer and Andrew Y. Ng. Self-taught learning: Transfer learning from unlabeled data. ICML 2007
* [http://www.iro.umontreal.ca/~bengioy/papers/ftml_book.pdf] Yoshua Bengio. Learning Deep Architectures for AI. FTML 2009. (Broad landscape description of the field, but technical details there are hard to follow so ignore that.)


Autoencoders: 
* [http://www.cs.toronto.edu/~hinton/science.pdf]  Hinton, G. E. and Salakhutdinov, R. R. Reducing the dimensionality of data with neural networks. Science 2006.  If you want to play with the code, you can also find it at [http://www.cs.toronto.edu/~hinton/MatlabForSciencePaper.html]. 
* [http://www-etud.iro.umontreal.ca/~larocheh/publications/greedy-deep-nets-nips-06.pdf] Bengio, Y., Lamblin, P., Popovici, P., Larochelle, H. Greedy Layer-Wise Training of Deep Networks. NIPS 2006 
* [http://www.cs.toronto.edu/~larocheh/publications/icml-2008-denoising-autoencoders.pdf] Pascal Vincent, Hugo Larochelle, Yoshua Bengio and Pierre-Antoine Manzagol. Extracting and Composing Robust Features with Denoising Autoencoders. ICML 2008.

Analyzing deep learning/why does deep learning work: 
* Larochelle, Erhan, Courville, Bergstra, Bengio, ICML 2007.  (Someone read this and let us know if this is worth keeping,.) 
* [http://www.jmlr.org/papers/volume11/erhan10a/erhan10a.pdf] Dumitru Erhan, Yoshua Bengio, Aaron Courville, Pierre-Antoine Manzagol, Pascal Vincent, and Samy Bengio. Why Does Unsupervised Pre-training Help Deep Learning? JMLR 2010  
* Goodfellow et al.'s invariance test.  (Not sure if this should be included--someone let us know.) 

RBMs:
* [http://deeplearning.net/tutorial/rbm.html] Tutorial on RBMs. But ignore the Theano code examples. 
* A practical guide (read if you're trying to implement and RBM; but otherwise skip since this is not really a tutorial). [http://www.cs.toronto.edu/~hinton/absps/guideTR.pdf] Geoff Hinton. A practical guide to training restricted Boltzmann machines. UTML TR 2010–003. 

Applications:
* Computer Vision
** [http://www.ifp.illinois.edu/~jyang29/ScSPM.htm] Jianchao Yang, Kai Yu, Yihong Gong, Thomas Huang. Linear Spatial Pyramid Matching using Sparse Coding for Image Classification, CVPR 2009 
** Small codes and large image databases for recognition.  Torralba, Fergus, Weiss. 
* Audio Recognition
** [http://www.cs.stanford.edu/people/ang/papers/nips09-AudioConvolutionalDBN.pdf] Unsupervised feature learning for audio classification using convolutional deep belief networks, Honglak Lee, Yan Largman, Peter Pham and Andrew Y. Ng. In NIPS*2009.



Natural Language Processing:
* [http://www.iro.umontreal.ca/~lisa/publications2/index.php/attachments/single/57] Yoshua Bengio, Réjean Ducharme, Pascal Vincent and Christian Jauvin, A Neural Probabilistic Language Model. JMLR 2003.
* [http://ronan.collobert.com/pub/matos/2008_nlp_icml.pdf] R. Collobert and J. Weston. A Unified Architecture for Natural Language Processing: Deep Neural Networks with Multitask Learning. ICML 2008.
* [http://www.cs.toronto.edu/~hinton/absps/threenew.pdf] Mnih, A. and Hinton, G. E. Three New Graphical Models for Statistical Language Modelling. ICML 2007

Advanced stuff:
* Slow Feature Analysis:
** [http://itb.biologie.hu-berlin.de/~wiskott/Publications/BerkWisk2005c-SFAComplexCells-JoV.pdf] Slow feature analysis yields a rich repertoire of complex cell properties. Journal of Vision, 2005.
* Predictive Sparse Decomposition
** [http://cs.nyu.edu/~koray/publis/koray-psd-08.pdf] Koray Kavukcuoglu, Marc'Aurelio Ranzato, and Yann LeCun, "Fast Inference in Sparse Coding Algorithms with Applications to Object Recognition", Computational and Biological Learning Lab, Courant Institute, NYU, 2008. 
** [http://cs.nyu.edu/~koray/publis/jarrett-iccv-09.pdf] Kevin Jarrett, Koray Kavukcuoglu, Marc'Aurelio Ranzato, and Yann LeCun, "What is the Best Multi-Stage Architecture for Object Recognition?", In ICCV 2009

Mean-Covariance models
* 3-way RBM
* mcRBM  (someone and tell us if you need to read the 3-way RBM paper before the mcRBM one)
* [http://www.cs.toronto.edu/~hinton/absps/mcphone.pdf] Dahl, G., Ranzato, M., Mohamed, A. and Hinton, G. E. Phone Recognition with the Mean-Covariance Restricted Boltzmann Machine. NIPS 2010.
* Karlin & Lewicki Nature paper.  (someone tell us if this should be here.  Interesting algorithm + nice visualizations, though maybe slightly hard to understand.) 



Also, for other lists of papers:
* [http://www.eecs.umich.edu/~honglak/teaching/eecs598/schedule.html] Honglak Lee's Course
* [http://www.cs.toronto.edu/~hinton/deeprefs.html] from Geoff's tutorial