Species Identification is an open-source repository that's part of Syngenta's Biodiversity Sensor Project. This document describes how a YOLOv5-based architecture could be used to tackle the challenge of detecting and identifying insect species from remotely captured images and presents the results achieved from it. Additionally, it includes a runnable example with detailed instructions for reproduction.
In the initial phase of the research, a comprehensive dataset was generated using internet images sourced from iNaturalist and GBIF. As the project progressed, new images obtained during field deployment were added iteratively to enhance the robustness of the model.
To enhance the efficiency of the data annotation process, Roboflow was utilized. This web-based platform is designed to streamline the management, augmentation, and labeling of datasets for computer vision tasks. By leveraging insights from previously trained models, Roboflow significantly accelerated the annotation workflow of the additional images.
Bee species | Butterfly species | Hoverfly species | Other |
---|---|---|---|
Amegilla sp. | Aglais io | Episyrphus balteatus | Chrysoperla carnea |
Andrena cineraria | Aglais urticae | Eristalis tenax | Nezara viridula |
Andrena fulva | Hylaeus signatus | ||
Andrena haemorrhoa | Lasiommata megera | ||
Apis mellifera ** | Lycaena phlaeas | ||
Bombus terrestris ** | Maniola jurtina | ||
Halictus sp. | Pieris rapae | ||
Lasioglossum sp. | Polyommatus icarus | ||
Lipotriches sp. | Vanessa atalanta | ||
Megachile sp. ** | Vanessa cardui | ||
Osmia cornuta ** | |||
Xylocopa violacea |
** Species with available robust dataset of on-field image examples
The attached script provides a clear overview of the steps to create a YOLOv5 model, from training to image detection and identification. For detailed information, the full documentation is available on the Ultralytics YOLOv5 webpage.
This section presents confusion matrices and inference samples for species models that combine both on-field and internet image datasets. We focus specifically on species with robust on-field data, as these represent our most comprehensive real-world datasets. However, it's important to note that while these models have the most on-field examples, they don't necessarily yield the best performance. This is primarily due to the challenging nature of on-field imagery, which often includes lower resolution images, varying environmental conditions, and less clearly defined subjects. In contrast, models trained solely on internet images tend to show higher true positive rates, benefiting from higher resolution images with larger, more clearly defined objects. These factors contribute to easier pattern recognition and feature extraction during model training. Despite these challenges, the results presented here offer valuable insights into model performance under real-world conditions, balancing the ideal scenarios of internet imagery with the practical realities of on-field data collection.
- Apis mellifera + Bombus terrestris
- Osmia cornuta
- Megachille sp.
We invite you to explore our open-sourced motion detection code (https://github.com/syngenta/DigitalEntomologist_MotionDetectionCode) and the machine-learning process described here, from training to species detection and identification, and freely develop this further to bring science, technology, data together to improve our collective responsibility - biodiversity.
Bug reports and pull requests are welcome on GitHub at https://github.com/syngenta/BiodiversitySensorProject_SpeciesIdentificationCode
Please, check our Contribution guide for more details (https://github.com/syngenta/DigitalEntomologist_MotionDetectionCode/blob/main/CONTRIBUTING.md).
This project adheres to the Code of Conduct (https://github.com/syngenta/BiodiversitySensorProject_SpeciesIdentificationCode/blob/main/CODE_OF_CONDUCT.md). We pledge to act and interact in ways that contribute to an open, welcoming, diverse, inclusive, and healthy community.
The project uses the MIT License. See LICENSE (https://github.com/syngenta/BiodiversitySensorProject_SpeciesIdentificationCode/blob/main/LICENSE) for details.