Data Preprocessing

From Ufldl

Jump to: navigation, search
Line 8: Line 8:
-
== Feature Normalization ==
+
== Data Normalization ==
 +
A standard first step to data preprocessing is data normalization. While there are a few possible approaches, this step is usually clear depending on the data. The common methods for feature normalization are:
 +
 +
* Simple Rescaling
 +
* Per-example mean subtraction (a.k.a. remove DC)
 +
* Feature Standardization (zero-mean and unit variance for each feature across the dataset)
 +
 +
=== Simple Rescaling ===
 +
 +
In simple rescaling, our goal is to rescale the data along each data dimension (possibly independently) so that the final data vectors lie in the range <math>[0, 1]</math> or  <math>[-1, 1]</math>  (depending on your dataset). This is useful for later processing as many ''default'' parameters (e.g., epsilon in PCA-whitening) treat the data as if it has been scaled to a reasonable range.
 +
 +
'''Example: ''' When processing natural images, we often obtain pixel values in the range <math>[0, 255]</math>. It is a common operation to rescale these values to  <math>[0, 1]</math> by dividing the data by 255.
 +
 +
 +
=== Per-example mean subtraction ===
 +
 +
If the data has the property that the
== PCA/ZCA Whitening ==
== PCA/ZCA Whitening ==

Revision as of 06:08, 29 April 2011

Personal tools