Using the MNIST Dataset

From Ufldl

Jump to: navigation, search
(Created page with "=== Introduction === The MNIST dataset is a dataset of handwritten digits, comprising 60 000 training exmaples and 10 000 test examples. The dataset can be downloaded from http:...")
(Usage)
 
Line 1: Line 1:
=== Introduction ===
=== Introduction ===
-
The MNIST dataset is a dataset of handwritten digits, comprising 60 000 training exmaples and 10 000 test examples. The dataset can be downloaded from http://yann.lecun.com/exdb/mnist/.
+
The MNIST dataset is a dataset of handwritten digits, comprising 60 000 training examples and 10 000 test examples. The dataset can be downloaded from http://yann.lecun.com/exdb/mnist/.
=== Usage ===
=== Usage ===
-
The image and label data is stored in a binary format described on the website. For your convenience, two MATLAB functions for extracting the data are provided below.
+
The image and label data is stored in a binary format described on the website. For your convenience, we have provided two MATLAB helper functions for extracting the data. These functions are available at http://ufldl.stanford.edu/wiki/resources/mnistHelper.zip.
-
 
+
-
For loading the images:
+
-
<syntaxhighlight>
+
-
function images = loadMNISTImages(filename)
+
-
%loadMNISTImages returns a 28x28x[number of MNIST images] matrix
+
-
%containing the MNIST images
+
-
 
+
-
fp = fopen(filename, 'rb');
+
-
assert(fp ~= -1, ['Could not open ', filename, '']);
+
-
 
+
-
magic = fread(fp, 1, 'int32', 0, 'ieee-be');
+
-
assert(magic == 2051, ['Bad magic number in ', filename, '']);
+
-
 
+
-
numImages = fread(fp, 1, 'int32', 0, 'ieee-be');
+
-
numRows = fread(fp, 1, 'int32', 0, 'ieee-be');
+
-
numCols = fread(fp, 1, 'int32', 0, 'ieee-be');
+
-
 
+
-
images = fread(fp, inf, 'unsigned char');
+
-
images = reshape(images, numCols, numRows, numImages);
+
-
images = permute(images,[2 1 3]);
+
-
 
+
-
fclose(fp);
+
-
 
+
-
end
+
-
</syntaxhighlight>
+
-
 
+
-
For loading the labels:
+
-
<syntaxhighlight>
+
-
function labels = loadMNISTLabels(filename)
+
-
%loadMNISTLabels returns a [number of MNIST images]x1 matrix containing
+
-
%the labels for the MNIST images
+
-
 
+
-
fp = fopen(filename, 'rb');
+
-
assert(fp ~= -1, ['Could not open ', filename, '']);
+
-
 
+
-
magic = fread(fp, 1, 'int32', 0, 'ieee-be');
+
-
assert(magic == 2049, ['Bad magic number in ', filename, '']);
+
-
 
+
-
numLabels = fread(fp, 1, 'int32', 0, 'ieee-be');
+
-
 
+
-
labels = fread(fp, inf, 'unsigned char');
+
-
 
+
-
assert(size(labels,1) == numLabels, 'Mismatch in label count');
+
-
 
+
-
fclose(fp);
+
-
 
+
-
end
+
-
</syntaxhighlight>
+
As an example of how to use these functions, you can check the images and labels using the following code:
As an example of how to use these functions, you can check the images and labels using the following code:
<syntaxhighlight>
<syntaxhighlight>
 +
% Change the filenames if you've saved the files under different names
 +
% On some platforms, the files might be saved as
 +
% train-images.idx3-ubyte / train-labels.idx1-ubyte
images = loadMNISTImages('train-images-idx3-ubyte');
images = loadMNISTImages('train-images-idx3-ubyte');
labels = loadMNISTLabels('train-labels-idx1-ubyte');
labels = loadMNISTLabels('train-labels-idx1-ubyte');
-
% where we are using display_network from the autoencoder code
+
% We are using display_network from the autoencoder code
-
display_network(reshape(images,28 * 28,[]));
+
display_network(images(:,1:100)); % Show the first 100 images
disp(labels(1:10));
disp(labels(1:10));
</syntaxhighlight>
</syntaxhighlight>

Latest revision as of 14:46, 3 May 2011

Personal tools