Skip to content

Latest commit

 

History

History
192 lines (133 loc) · 9.34 KB

README.md

File metadata and controls

192 lines (133 loc) · 9.34 KB

Content-Adaptive Downsampling in Convolutional Neural Networks

License Framework

This is the official repository accompanying the CVPR Workshop paper:

R. Hesse, S. Schaub-Meyer, and S. Roth. Content-Adaptive Downsampling in Convolutional Neural Networks. CVPRW, The 6th Efficient Deep Learning for Computer Vision (ECV) Workshop, 2023.

Paper | Preprint (arXiv) | Video | Poster | Supplemental

Poster

Semantic Segmentation (Sec. 4.2)

Pretrained Models

Model mIoU Cityscapes Download
ResNet101+DeepLabv3 (OS=16) 0.762 best_deeplabv3_resnet101_cityscapes_os16_seed1.pth
ResNet101+DeepLabv3 (OS=8) 0.776 best_deeplabv3_resnet101_cityscapes_os8_seed1.pth
ResNet101+DeepLabv3 edge (OS=8->16) 0.773 best_deeplabv3_batch_ap_resnet101_cityscapes_os8_modeedges_os16till8_seed2_trimapwidth11_threshold0.15.pth
ResNet101+DeepLabv3 learned (OS=8->16) 0.775 best_deeplabv3_ad_resnet101_cityscapes_modeend2end_seed0_default_tau1.0_lowresactive0.5_w_downsample_shared_andbatchnorm_shared.pth

Available architectures

Specify the model architecture with '--model ARCH_NAME' and set the output stride using '--output_stride OUTPUT_STRIDE'. We here show example runs for ResNet101+DeepLabv3.

Reproduce

1. Install the required packages

Current channels:

conda create --name adaptive_downsampling --file requirements.txt conda activate adaptive_downsampling

2. Download cityscapes and extract it to 'datasets/cityscapes'

/datasets
  /cityscapes
  	/gtFine
		/leftImg8bit

3. Train your models on Cityscapes

For baseline models in Sec 4.2:

python main.py --model deeplabv3_resnet101 --dataset cityscapes --gpu_id 0 --lr 0.1 --crop_size 768 --batch_size 8 --output_stride 16 --data_root /datasets/cityscapes --random_seed 0

python main.py --model deeplabv3_resnet101 --dataset cityscapes --gpu_id 0 --lr 0.1 --crop_size 768 --batch_size 8 --output_stride 8 --data_root /datasets/cityscapes --random_seed 0

For content-adaptive downsampling models in Sec 4.2:

Adaptive downsampling with edge mask:

python main.py --model deeplabv3_batch_ap_resnet101 --dataset cityscapes --gpu_id 0 --lr 0.1 --crop_size 768 --batch_size 8 --output_stride 8 --data_root /datasets/cityscapes --trimap_width 11 --pooling_mask_mode edges_os16till8 --pooling_mask_edge_detection_treshold [0.15, 0.35, 0.95] --random_seed 0 --exp_name trimapwidth11_threshold[0.15, 0.35, 0.95]

Adaptive downsampling with learned mask:

python main_e2e_train.py --model deeplabv3_ad_resnet101 --dataset cityscapes --gpu_id 0 --lr 0.1 --crop_size 768 --batch_size 8 --data_root /datasets/cityscapes --random_seed 0 --exp_name default_tau1.0_lowresactive0.5_w_downsample_shared_andbatchnorm_shared --val_interval 100 --tau 1 --low_res_active 0.5

For evaluation:
python main_e2e_eval.py --model deeplabv3_ad_resnet101 --dataset cityscapes --gpu_id 0 --crop_size 768 --data_root /datasets/cityscapes --random_seed 0 --tau 1 --ckpt ./best_deeplabv3_ad_resnet101_cityscapes_modeend2end_seed0_default_tau1.0_lowresactive0.5_w_downsample_shared_andbatchnorm_shared.pth

4. Evaluate your models

To evaluate your models run the respective training call (main.py) with the parameters --test_only and --ckpt.

5. Get number of multiply-adds

Regular downsampling

python main_flops.py --model deeplabv3_resnet101 --dataset cityscapes --gpu_id 0 --output_stride [8,16] --data_root /datasets/cityscapes

Adaptive downsampling edge mask

python main_flops.py --model deeplabv3_ap_resnet101 --dataset cityscapes --gpu_id 0 --output_stride 8 --output_stride_from_trained 8 --data_root /datasets/cityscapes --pooling_mask_mode edges_os16till8 --trimap_width 11 --pooling_mask_edge_detection_treshold [0.15, 0.35, 0.95]

Adaptive downsampling learned mask

python main_e2e_flops.py --model deeplabv3_ad_resnet101 --dataset cityscapes --gpu_id 0 --crop_size 768 --data_root /datasets/cityscapes --random_seed 0 --ckpt ./best_deeplabv3_ad_resnet101_cityscapes_modeend2end_seed0_default_tau1.0_lowresactive0.5_w_downsample_shared_andbatchnorm_shared.pth

Keypoints (Sec. 4.3)

This code is built on top of the official implementation of the following paper:

"D2-Net: A Trainable CNN for Joint Detection and Description of Local Features".
M. Dusmanu, I. Rocco, T. Pajdla, M. Pollefeys, J. Sivic, A. Torii, and T. Sattler. CVPR 2019.

Paper on arXiv, Project page

Downloading the models and datasets

For instruction on downloading the dataset please see the 'hpatches_sequences' folder

The model weights can be downloaded by running:

mkdir models
wget https://dusmanu.com/files/d2-net/d2_tf.pth -O models/d2_tf.pth

Install the required packages

see ../segmentation

additionally install opencv: pip install opencv-python

Feature extraction

extract_features.py can be used to extract D2 features for a given list of images.

Regular downsampling:

python extract_features.py --gpu_id 0 --image_list_file image_list_hpatches_sequences.txt --model_file models/d2_tf.pth --output_extension .sift_d2net_os[1,2,4,8]_512kpts --output_stride [1,2,4,8] --nr_keypoints 512

Adaptive downsampling (example for dilations 25 51 51):

python extract_features.py --gpu_id 0 --image_list_file image_list_hpatches_sequences.txt --model_file models/d2_tf.pth --output_extension .sift_apd2net_os1_512kpts_dils_25_51_51 --output_stride 1 --nr_keypoints 512 --des APD2Net --dilations 25 51 51

Adaptive downsampling (example for dilations 0 0 31):

python extract_features.py --gpu_id 0 --image_list_file image_list_hpatches_sequences.txt --model_file models/d2_tf.pth --output_extension .sift_apd2net_os4_512kpts_dils_0_0_31 --output_stride 4 --nr_keypoints 512 --des APD2Net --dilations 31

After extracting features, they can be evaluated by running hpatches_sequences/HPatches-Sequences-Matching-Benchmark.ipynb (add the methods that you want to evaluate)

Estimate multiply-adds

Regular downsampling:

python eval_flops.py --gpu_id 0 --image_list_file image_list_hpatches_sequences.txt --output_stride [1,2,4,8] --nr_keypoints 512

Adaptive downsampling (example for dilations 25 51 51):

python eval_flops.py --gpu_id 0 --image_list_file image_list_hpatches_sequences.txt --output_stride 1 --nr_keypoints 512 --des APD2Net --dilations 25 51 51

Adaptive downsampling (example for dilations 0 0 31):

python eval_flops.py --gpu_id 0 --image_list_file image_list_hpatches_sequences.txt --output_stride 4 --nr_keypoints 512 --des APD2Net --dilations 0 0 31

Acknowledgments

We would like to thank the contributors of the following repositories for using parts of their publicly available code:

Citation

If you find our work helpful please consider citing

@inproceedings{Hesse:2023:CAD,
  title     = {Content-Adaptive Downsampling in Convolutional Neural Networks},
  author    = {Hesse, Robin and Schaub-Meyer, Simone and Roth, Stefan},
  booktitle = {IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), The 6$^\text{th}$ Efficient Deep Learning for Computer Vision (ECV) Workshop},
  year      = {2023}
}