The Street Number Challenge aims to detect and interpretate the number of houses on images. Therefore, we present a two-stage solution for this challenge. The first stage performs the detection of digits using two different CNN models (EfficientDet D0 and YOLOv4). The second stage concatenates the digits based on a rule based algorithm to provide the predicted number shown on the image.
Important: In this repository only EfficientDet D0 is covered.
The project was built using:
- Tensorflow 2.3.1
- Tensorflow Object Detection API 2.3.1
- Python 3.6.8
The folder structure of the project can be seen as follows:
├── Model_Inference # Test the trained model and concatenation using the testset
│ ├── Images # Testset images from StreetNumber Dataset
│ ├── Prediction # Testset images with the predictions from EfficientDet D0
│ ├── Street Number Concatenation.ipynb # Jupyter notebook with digits detections and concatenation
│ ├── prediction.txt # Predicted Classes from Testset using EfficientDet D0
│ ├── testset_gt.txt # Ground Truth Clases of the Testset.
├── RealTimeObjectDetection # Train the model using EfficientDet D0
│ ├── Tensorflow
│ ├── scripts # Provides a script to convert images and pascal voc annotations to tfrecords.
│ ├── workspace
│ ├── annotations # Contains the label maps, and tfrecords
│ ├── images # Contains the raw images and annotations for each partitions (train, test, validation)
│ ├── models # Contains the trained model
│ ├── pre-trained-models # Contains the pre-trained model from COCO dataset
│ ├── Tutorial.ipynb # Jupyer notebook that creates the label map, and tfrecods
├── models # Tensorflow Object Detection API folder
└── ...
During the training phase, two models standout and the metrics evaluated are mAP, recall, loss. This can be shown in the following images: ![](Images/loss function.PNG)
For the testset, we only consider the model with the highest mAP which had 0.9190 in the testset and the AP for each digits can be shown below:
Classes | [email protected] |
---|---|
0 | 1.0000 |
1 | 0.9841 |
2 | 0.9662 |
3 | 0.8187 |
4 | 0.9030 |
5 | 0.8292 |
6 | 0.9042 |
7 | 0.8666 |
8 | 0.9616 |
9 | 1.000 |
The next step to be evaluated is the precision, recall, f1-score of the predicted number after the concatenation step. Using the dataset we obtained the following results:
Micro Avg | Macro Avg | Weighted Avg | |
---|---|---|---|
Precision | 0.92 | 0.66 | 0.71 |
Recall | 0.68 | 0.66 | 0.68 |
F1-Score | 0.78 | 0.65 | 0.68 |
In this section we provide several example of the system working. The output of the system is written in the top left corner on the image.
Distributed under the MIT License. See LICENSE
for more information.