Data Preprocessing

== Overview ==

Data preprocessing plays a very important in many deep learning algorithms. In practice, many methods work best after the data has been normalized and whitened. However, the exact parameters for data preprocessing are usually not immediately apparent unless one has much experience working with the algorithms. In this page, we hope to demystify some of the preprocessing methods and also provide tips (and a "standard pipeline") for preprocessing data.

{{quote |
Tip: When approaching a dataset, the first thing to do is to look at the data itself and observe its properties. While the techniques here apply generally, you might want to opt to do certain things differently given your dataset. For example, one standard preprocessing trick is to subtract the mean of each data point from itself (also known as remove DC, local mean subtraction, subtractive normalization). While this makes sense for data such as natural images, it is less obvious for data with with a natural "zero" point such as MNIST images (where all data points use the same value of 0 to represent an empty background). 
}}


== Feature Normalization ==


== PCA/ZCA Whitening ==

How to choose epsilon? Do we need low-pass filtering?

== Large Images ==

1/f Whitening


== Standard Pipeline ==


== Model Idiosyncrasies ==

=== Sparse Autoencoder ===

==== Sigmoid Decoders ====

==== Linear Decoders ====

=== Independent Component Analysis ===