A detailed report of the project and dataset can be found here: Exploring Land Use and Land Cover Classification.
The Google Colab Notebook can be directly interacted with using this link: EuroSAT_CNN
As our population continues to grow and cities continue to expand, a need for understanding and analyzing our physical surroundings grows as well. In order to acquire structured information on our planet’s land, we need access to high quality satellite images. This type of data was normally kept within government and private organizations that held the capabilities of producing satellite images. Now with advancements in the field and a growing need to invite innovation, groups like NASA and ESA have opened up their satellite data to the public. Their systems, Sentinel-2 (ESA) and Landsat-9 (NASA), offer high quality RGB images as well as multi-band TIFF data [3]. These systems are opening doors for future expansion in relevant fields like urban planning, environmental conservation, and disaster relief. However, these innovations start with building initial systems that can classify satellite images by terrain, objects, buildings, etc. This kind of analysis is called Land Use and Land Cover Classification. Examples of Land Use and Land Cover classes can be Forests, Cities, Roads, and Sea Coasts.
The premise of this experiment was to begin my analysis of satellite images with the simplest case: RGB images. RGB images are the basic three-band versions of image data that we encounter daily. Since a lot of the experimentation being done on satellite data is on multi-band images, it seemed reasonable to test whether or not a Convolutional Neural Network could find distinctions between almost identically colored - but very different - land features. For example, water and vegetation can seem almost identical in color within some RGB images but can look drastically different while observed under another band like near-infrared. This is because water absorbs near-infrared light waves and plants with high amounts of chlorophyll heavily reflect it. This experiment aims to prove that RGB images taken from satellite systems, in this case ESA’s Sentinel-2, can be analyzed and classified effectively by a Convolutional Neural Network.
Due to limitations in my personal computer's computing power, I decided to run this project on a Google Colab Notebook. Colab Notebooks are especially useful for testing and running large Data Science and Machine Learning projects by allowing you to remotely connect to Google's GPUs and VPUs. This allows for much more efficient loading, training, and testing of large datasets. When I created this project, I used the T4 GPU which allowed me to make many adjustments during training without wasting unnecessary time.
The dataset we are going to explore is called the EuroSAT dataset. It consists of ten classes separated between 27,000 images, with the distribution per class shown in the figure below. The images are of size 64x64 pixels, at a resolution of 10 meters per pixel. The creators of the dataset decided to reference the European Urban Atlas as a basis for the chosen classes because they come up often enough in the Atlas to be able to generate thousands of image patches. This Atlas contains image data of all 34 european countries, of which the creators chose only those with low cloud mask to ensure higher chance of detecting features of the land itself. The pool of data used was a collection of a year’s worth of visual capturing to ensure a high level of inherent variance. Within each class, there exists variation in the land cover so that models can gain exposure to different varieties of the same class (e.g. different kinds of forests in the Forest class).
All of the credit should be given to the team that put together this dataset, which can be found here
the "torch" library in python includes numerous useful functionalities for developing image classifiers. PyTorch offers methods that can load datasets as iterable objects (DataLoader), transform images using data augmentations (transforms), define layers of the network (nn.Module), and even provide access to loss functions and optimizers. Having all these capabilites available through the same framework makes it much easier to build an end to end model, since the same documentation can be used to find relevant functions. PyTorch can also be used alongside other popular libraries like Numpy and Sci-kit Learn, which further expands what a project can accomplish.