Skip to content

Latest commit

 

History

History
executable file
·
301 lines (203 loc) · 15.1 KB

README.md

File metadata and controls

executable file
·
301 lines (203 loc) · 15.1 KB

FLAIR #1

Semantic segmentation and domain adaptation for land-cover from aerial imagery

Challenge proposed by the French National Institute of Geographical and Forest Information (IGN).

Static Badge license PyTorch LightningStatic Badge license

Participate in obtaining more accurate maps for a more comprehensive description and a better understanding of our environment! Come push the limits of state-of-the-art semantic segmentation approaches on a large and challenging dataset. Get in touch at 📧 [email protected]

Alt bandeau FLAIR-IGN

Context & Data

The FLAIR #1 dataset is sampled countrywide and is composed of over 20 billion annotated pixels, acquired over three years and different months (spatio-temporal domains). The dataset is available to download here. It consists of 512 x 512 patches with 13 (baselines) or 19 (full) semantic classes (see associated datapaper). Each patch has 5 channels (RVB-Infrared-Elevation).


ortho image and train/test geographical repartition

ORTHO HR® aerial image cover of France (left), train and test spatial domains of the dataset (middle) and acquisition months defining temporal domains (right).


Example of input data (first three columns) and corresponding supervision masks (last column).

flair_data = {
1   : ['building','#db0e9a'] ,
2   : ['pervious surface','#938e7b'],
3   : ['impervious surface','#f80c00'],
4   : ['bare soil','#a97101'],
5   : ['water','#1553ae'],
6   : ['coniferous','#194a26'],
7   : ['deciduous','#46e483'],
8   : ['brushwood','#f3a60d'],
9   : ['vineyard','#660082'],
10  : ['herbaceous vegetation','#55ff00'],
11  : ['agricultural land','#fff30d'],
12  : ['plowed land','#e4df7c'],
13  : ['swimming_pool','#3de6eb'],
14  : ['snow','#ffffff'],
15  : ['clear cut','#8ab3a0'],
16  : ['mixed','#6b714f'],
17  : ['ligneous','#c5dc42'],
18  : ['greenhouse','#9999ff'],
19  : ['other','#000000'],
}

Baseline model

A U-Net architecture with a pre-trained ResNet34 encoder from the pytorch segmentation models library is used for the baselines. The used architecture allows integration of patch-wise metadata information and employs commonly used image data augmentation techniques. It has about 24.4M parameters and it is implemented using the segmentation-models-pytorch library. The results are evaluated with an Intersection Over Union (IoU) metric and a single mIoU is reported (see associated datapaper).

The metadata strategy refers encoding metadata with a shallow MLP and concatenate this encoded information to the U-Net encoder output. The augmentation strategy employs three typical geometrical augmentations (see associated datapaper).

Example of a semantic segmentation of an urban and coastal area in the D076 spatial domain, obtained with the baseline trained model:


Example of a semantic segmentation result using the baseline model.


Pre-trained models

Pre-trained models ⚡⚡⚡ with different modalities and architectures are available as a IGNF's HuggingFace collection here : huggingface.co/collections/IGNF/flair-models-landcover-semantic-segmentation
See datacards for more details about each model.


Lib usage



Installation 📌

# it's recommended to install on a conda virtual env
conda create -n FLAIR-INC -c conda-forge python=3.12.4
conda activate FLAIR-INC
git clone [email protected]:IGNF/FLAIR-INC.git
cd FLAIR-INC*
pip install -e .
# if torch.cuda.is_available() returns False, do the following :
# pip install torch>=2.0.0 --extra-index-url=https://download.pytorch.org/whl/cu117



Tasks 🔎

This library comprises two main entry points:

📁 flair_inc

The flair module is used for training, inference and metrics calculation at the patch level. To use this pipeline :

flair --conf=/my/conf/file.yaml

This will perform the tasks specified in the configuration file. If ‘train’ is enabled, it will train the model and save the trained model to the output folder. If ‘predict’ is enabled, it will load the trained model (or a specified checkpoint if ‘train’ is not enabled) and perform prediction on the test data. If ‘metrics’ is enabled, it will calculate the mean Intersection over Union (mIoU) and other IoU metrics for the predicted and ground truth masks. A toy dataset (reduced size) is available to check that your installation and the information in the configuration file are correct. Note: A notebook is available in the legacy-torch branch (which uses different libraries versions and structure) that was used during the challenge.

📁 zone_detect

This module aims to infer a pre-trained model at a larger scale than individual patches. It allows overlapping inferences using a margin argument. Specifically, this module expects a single georeferenced TIFF file as input.

flair-detect --conf=/my/conf/file-detect.yaml



Configuration for flair 📄

The pipeline is configured using a YAML file (flair-1-config.yaml). The configuration file includes sections for data paths, tasks, model configuration, hyperparameters and computational resources.

out_folder: The path to the output folder where the results will be saved.
out_model_name: The name of the output model.
train_csv: Path to the CSV file containing paths to image-mask pairs for training.
val_csv: Path to the CSV file containing paths to image-mask pairs for validation.
test_csv: Path to the CSV file containing paths to image-mask pairs for testing.
ckpt_model_path: The path to the checkpoint file of the model. Used if train_tasks/init_weights_only_from_ckpt or resume_training_from_ckpt is True and for prediction if train is disabled.
path_metadata_aerial: The path to the aerial metadata JSON file if used with FLAIR data and model_provider is SegmentationModelsPytorch.

train: If set to True, the model will be trained.
init_weights_only_from_ckpt: Use if fine-tuning to load weights from the ckpt file and perform training
resume_training_from_ckpt: Use if you want to resume an aborted training or complete a training. This will load the weights, optimizer, scheduler and all relevant hyperparameters from the provided ckpt.

predict: If set to True, predictions will be made using the model.
metrics: If set to True, metrics will be calculated.
delete_preds: Remove prediction files after metrics calculation.

model_provider: the library providing models, either HuggingFace or SegmentationModelsPytorch.
org_model: to be used if model_provider is HuggingFace in the form HFOrganization_Modelname, e.g., "openmmlab/upernet-swin-small".
encoder_decoder: to be used if model_provider is SegmentationModelsPytorch in the form encodername_decoder_name, e.g., "resnet34_unet".

use_augmentation: If set to True, data augmentation will be applied during training.
use_metadata: If set to True, metadata will be used. If other than the FLAIR dataset, see structure to be provided.

channels: The channels opened in your input images. Images are opened with rasterio which starts at 1 for the first channel.
norm_type: Normalization to be applied: scaling (linear interpolation in the range [0,1]), custom (center-reduced with provided means and standard deviantions), without.
norm_means: If custom, means for each input band.
norm_stds: If custom standard deviation for each input band.

seed: The seed for random number generation to ensure reproducibility.
batch_size: The batch size for training.
learning_rate: The learning rate for training.
num_epochs: The number of epochs for training.

use_weights: If set to True, class weights will be used during training.
classes: Dict of semantic classes with value in images as key and list [weight, classname] as value. See config file for an example.

georeferencing_output: If set to True, the output will be georeferenced.

accelerator: The type of accelerator to use (‘gpu’ or ‘cpu’).
num_nodes: The number of nodes to use for training.
gpus_per_node: The number of GPUs to use per node for training.
strategy: The strategy to use for distributed training (‘auto’,‘ddp’,...).
num_workers: The number of workers to use for data loading.

ckpt_save_also_last: on top of best epoch will also save last epoch ckpt file in the same folder.
ckpt_verbose: print whenever a ckpt file is saved.
ckpt_weights_only: save only weights of model in ckpt for storage optimization. This prevents resume_training_from_ckpt.
ckpt_monitor: metric to be monitored for saving ckpt files. By default val_loss.
ckpt_monitor_mode: wether min or max of ckpt_monitor for saving a ckpt file.
ckpt_earlystopping_patience: ending training if no improvement after defined number of epochs. Default is 30.

cp_csv_and_conf_to_output: Makes a copy of paths csv and config file to the output directory.
enable_progress_bar: If set to True, a progress bar will be displayed during training and inference.
progress_rate: The rate at which progress will be displayed.



Configuration for zone_detect 📄

The pipeline is configured using a YAML file (flair-1-config-detect.yaml).

output_path: path to output result.
output_name: name of resulting raster.

input_img_path : path to georeferenced raster.
bands : bands to be used in your raster file.

img_pixels_detection : size in pixels of infered patches, default is 512.
margin : margin between patchs for overlapping detection. 128 by exemple means that every 128*resolution step, a patch center will be computed.
output_type : type of output, can be "class_prob" for integer between 0 and 255 representing the output of the model or "argmax" which will output only one band with the index of the class.
n_classes : number of classes.

model_weights : path to your model weights or checkpoint.
model_provider: the library providing models, either HuggingFace or SegmentationModelsPytorch.
org_model: to be used if model_provider is HuggingFace in the form HFOrganization_Modelname, e.g., "openmmlab/upernet-swin-small".
encoder_decoder: to be used if model_provider is SegmentationModelsPytorch in the form encodername_decoder_name, e.g., "resnet34_unet".

batch_size : size of batch in dataloader, default is 2.
use_gpu : boolean, rather use gpu or cpu for inference, default is true.
num_worker : number of worker used by dataloader, value should not be set at a higher value than 2 for linux because paved detection can have concurrency issues compared with traditional detection and set to 0 for mac and windows (gdal implementation's problem).

write_dataframe : wether to write the dataframe of raster slicing to a file.

norm_type: Normalization to be applied: scaling (linear interpolation in the range [0,1]) or custom (center-reduced with provided means and standard deviantions).
norm_means: If custom, means for each input band.
norm_stds: If custom standard deviation for each input band.



Baseline results

Model mIoU
baseline U-Net (ResNet34) 0.5443±0.0014
baseline U-Net (ResNet34) + metadata + augmentation 0.5570±0.0027

The baseline U-Net with ResNet34 backbone obtains the following confusion matrix:


Baseline confusion matrix of the test dataset normalized by rows.

Reference

Please include a citation to the following article if you use the FLAIR #1 dataset:

@article{garioud2022flair1,
  doi = {10.13140/RG.2.2.30183.73128/1},
  url = {https://arxiv.org/pdf/2211.12979.pdf},
  author = {Garioud, Anatol and Peillet, Stéphane and Bookjans, Eva and Giordano, Sébastien and Wattrelos, Boris},
  title = {FLAIR #1: semantic segmentation and domain adaptation dataset},
  publisher = {arXiv},
  year = {2022}
}

Acknowledgment

This work was performed using HPC/AI resources from GENCI-IDRIS (Grant 2022-A0131013803).

Dataset license

The "OPEN LICENCE 2.0/LICENCE OUVERTE" is a license created by the French government specifically for the purpose of facilitating the dissemination of open data by public administration. If you are looking for an English version of this license, you can find it on the official GitHub page at the official github page.

As stated by the license :

Applicable legislation

This licence is governed by French law.

Compatibility of this licence

This licence has been designed to be compatible with any free licence that at least requires an acknowledgement of authorship, and specifically with the previous version of this licence as well as with the following licences: United Kingdom’s “Open Government Licence” (OGL), Creative Commons’ “Creative Commons Attribution” (CC-BY) and Open Knowledge Foundation’s “Open Data Commons Attribution” (ODC-BY).