Data Preprocessing

Revision as of 07:26, 29 April 2011 (view source)

Jngiam (Talk | contribs)

(→PCA/ZCA Whitening)

← Older edit

Revision as of 07:32, 29 April 2011 (view source)

Jngiam (Talk | contribs)

Newer edit →

Line 4:

{{quote |

-

Tip: When approaching a dataset, the first thing to do is to look at the data itself and observe its properties. While the techniques here apply generally, you might want to opt to do certain things differently given your dataset. For example, one standard preprocessing trick is to subtract the mean of each data point from itself (also known as remove DC, local mean subtraction, subtractive normalization). While this makes sense for data such as natural images, it is less obvious for data ~~with with a natural "zero" point such as MNIST images (~~where ~~all examples use the same value of 0 to represent an empty background)~~.

+

Tip: When approaching a dataset, the first thing to do is to look at the data itself and observe its properties. While the techniques here apply generally, you might want to opt to do certain things differently given your dataset. For example, one standard preprocessing trick is to subtract the mean of each data point from itself (also known as remove DC, local mean subtraction, subtractive normalization). While this makes sense for data such as natural images, it is less obvious for data where stationarity does not hold.

}}

Line 89:

=== MNIST Handwritten Digits ===

-

The MNIST dataset has pixel values in the range <math>[0, 255]</math>. We thus start with simple rescaling to shift the data into the range <math>[0, 1]</math>. ~~A sparse autoencoder often works well after this simple normalization~~. While one could also elect to use PCA/ZCA whitening if desired, this is not often done in practice~~. ''Note: Since the 0 value is meaningful in MNIST, we do ''not'' perform per-example mean normalization~~.''

+

The MNIST dataset has pixel values in the range <math>[0, 255]</math>. We thus start with simple rescaling to shift the data into the range <math>[0, 1]</math>. In practice, removing the mean-value per example can also help feature learning. ''Note: While one could also elect to use PCA/ZCA whitening if desired, this is not often done in practice.''

Data Preprocessing

From Ufldl

Revision as of 07:32, 29 April 2011

Views

Personal tools

ufldl resources

wiki

Search

Toolbox

@@ Line 4: / Line 4: @@
 {{quote |
-Tip: When approaching a dataset, the first thing to do is to look at the data itself and observe its properties. While the techniques here apply generally, you might want to opt to do certain things differently given your dataset. For example, one standard preprocessing trick is to subtract the mean of each data point from itself (also known as remove DC, local mean subtraction, subtractive normalization). While this makes sense for data such as natural images, it is less obvious for data with with a natural "zero" point such as MNIST images (where all examples use the same value of 0 to represent an empty background).
+Tip: When approaching a dataset, the first thing to do is to look at the data itself and observe its properties. While the techniques here apply generally, you might want to opt to do certain things differently given your dataset. For example, one standard preprocessing trick is to subtract the mean of each data point from itself (also known as remove DC, local mean subtraction, subtractive normalization). While this makes sense for data such as natural images, it is less obvious for data where stationarity does not hold.
 }}
@@ Line 89: / Line 89: @@
 === MNIST Handwritten Digits ===
-The MNIST dataset has pixel values in the range <math>[0, 255]</math>. We thus start with simple rescaling to shift the data into the range <math>[0, 1]</math>. A sparse autoencoder often works well after this simple normalization. While one could also elect to use PCA/ZCA whitening if desired, this is not often done in practice. ''Note: Since the 0 value is meaningful in MNIST, we do ''not'' perform per-example mean normalization.''
+The MNIST dataset has pixel values in the range <math>[0, 255]</math>. We thus start with simple rescaling to shift the data into the range <math>[0, 1]</math>. In practice, removing the mean-value per example can also help feature learning. ''Note: While one could also elect to use PCA/ZCA whitening if desired, this is not often done in practice.''