# Data Preprocessing

### From Ufldl

Line 8: | Line 8: | ||

- | == | + | == Data Normalization == |

+ | A standard first step to data preprocessing is data normalization. While there are a few possible approaches, this step is usually clear depending on the data. The common methods for feature normalization are: | ||

+ | |||

+ | * Simple Rescaling | ||

+ | * Per-example mean subtraction (a.k.a. remove DC) | ||

+ | * Feature Standardization (zero-mean and unit variance for each feature across the dataset) | ||

+ | |||

+ | === Simple Rescaling === | ||

+ | |||

+ | In simple rescaling, our goal is to rescale the data along each data dimension (possibly independently) so that the final data vectors lie in the range <math>[0, 1]</math> or <math>[-1, 1]</math> (depending on your dataset). This is useful for later processing as many ''default'' parameters (e.g., epsilon in PCA-whitening) treat the data as if it has been scaled to a reasonable range. | ||

+ | |||

+ | '''Example: ''' When processing natural images, we often obtain pixel values in the range <math>[0, 255]</math>. It is a common operation to rescale these values to <math>[0, 1]</math> by dividing the data by 255. | ||

+ | |||

+ | |||

+ | === Per-example mean subtraction === | ||

+ | |||

+ | If the data has the property that the | ||

== PCA/ZCA Whitening == | == PCA/ZCA Whitening == |