In this project I made Exploratory Data Analysis, Data Visualisation and lastly Modelling. Fashion Mnist Dataset contains 70000 row in two files. Each example is 28x28 image and associated with 10 labels(targets). After examining the dataset I made data preprocessing for reshaping columns from 784 to (28,28,1) and save the target feature as a seperate vector. In modelling part, with a sequential model with multiple convolution layers with 50 Epochs for training the data. For prediction overfitting and underfitting I adjust Dropout Layers. Overally, model gives 0.9236 accuracy. Furthermore with Data augmentation and/or incresing data size can be helpful for taking better result.
Fashion-MNIST is a dataset of Zalando's article images consisting of a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 labels.Each image is 28 pixels in height and 28 pixels in width, for a total of 784 pixels in total. Each pixel has a single pixel-value associated with it, indicating the lightness or darkness of that pixel, with higher numbers meaning darker. This pixel-value is an integer between 0 and 255. The training and test data sets have 785 columns. The first column consists of the class labels (see above), and represents the article of clothing. The rest of the columns contain the pixel-values of the associated image.
- To locate a pixel on the image, suppose that we have decomposed x as x = i * 28 + j, where i and j are integers between 0 and 27. The pixel is located on row i and column j of a 28 x 28 matrix.
- For example, pixel31 indicates the pixel that is in the fourth column from the left, and the second row from the top, as in the ascii-diagram below.
- Each row is a separate image.
- Column 1 is the class label.
- Remaining columns are pixel numbers (784 total).
- Each value is the darkness of the pixel (1 to 255).
Each training and test example is assigned to one of the following labels:
- 0 T-shirt/top
- 1 Trouser
- 2 Pullover
- 3 Dress
- 4 Coat
- 5 Sandal
- 6 Shirt
- 7 Sneaker
- 8 Bag
- 9 Ankle boot
Train Dataset Example
Test Dataset Example
Firstly, I checked data, which came two different dataset which are train and test. Later I checked distribution of labels in datasets and I create a list for expressing images for both datasets, moreover I see all the classes(labels) equally distributed. So I dont need to do Oversampling or Undersampling.
For preparing datasets to the model I made data processing which is reshaping columns from (784) to (28,28,1), and for seperate vector I save label feature then process test and train data. After that I split train set into train and validation dataset. Validation set contains %30 of original train dataset and split will be 0.7/0.03. Later this process I controlled distribution of labels in train dataset and validation dataset.
Number of Items in Each Class in Dataset
Number of Items in Each Class in Validation Dataset
I used Sequential model. The sequential model is appropriate for a plain stack of layers where each layer has exactly one input tensor and one output tensor. Then I add Conv2D layer, MaxPooling2D, Flatten and Dense. For each layer I used these parameters.
1.Conv2D
- filters = 32
- kernel_size = (3,3)
- activation function = relu
- kernel_initializer = normal
- input_shape = (28,28,1)
2.MaxPooling2D
- pool_size = (2,2)
3.Conv2D
- filters = 64
- kernel_size = (3,3)
- activation function = relu
4.Flatten
A flatten operation on a tensor reshapes the tensor to have the shape that is equal to the number of elements contained in tensor non including the batch dimension and doesnt need any parameters.
5.Dense
In first Dense Layer,
- units = 128
- activation function = relu
In second Dense Layer,
- units = 10
- activation function = softmax
Finally I am compiling model according these parameters,
- loss = categorical cross entrophy
- optimizer = adam
- metrics = accuracy
As a result, my model gives overally good results.
Accuracy of the Model
Loss of the Model
Test Loss is 0.2166
Test Accuracy is 0.9236
Classification Report
Correctly Predicted Items
Falsely Predicted Items
The Best accuracy is for Trousers(Class 1), Sandals(Class 5) with 0.99 and worst accuracy is Shirt(Class 6) with 0.78.
The Best recall is for Trousers(Class 1), with 0.99 and worst recall is Shirt(Class 6) with 0.79
The Best F-1 Score is for Trousers(Class 1) with 0.99 and worst F-1 Score is Shirt(Class 6) with 0.78
For better results, data augmentation can be implemented or data size can be expandable.