Data Preprocessing

@@ Line 8: / Line 8: @@
-== Feature Normalization ==
+== Data Normalization ==
+A standard first step to data preprocessing is data normalization. While there are a few possible approaches, this step is usually clear depending on the data. The common methods for feature normalization are:
+* Simple Rescaling
+* Per-example mean subtraction (a.k.a. remove DC)
+* Feature Standardization (zero-mean and unit variance for each feature across the dataset)
+=== Simple Rescaling ===
+In simple rescaling, our goal is to rescale the data along each data dimension (possibly independently) so that the final data vectors lie in the range <math>[0, 1]</math> or  <math>[-1, 1]</math>  (depending on your dataset). This is useful for later processing as many ''default'' parameters (e.g., epsilon in PCA-whitening) treat the data as if it has been scaled to a reasonable range.
+'''Example: ''' When processing natural images, we often obtain pixel values in the range <math>[0, 255]</math>. It is a common operation to rescale these values to  <math>[0, 1]</math> by dividing the data by 255.
+=== Per-example mean subtraction ===
+If the data has the property that the
 == PCA/ZCA Whitening ==

Revision as of 06:08, 29 April 2011