[Paper] [Slide] [Explore Drone-view Data] [Explore Satellite-view Data] [Explore Street-view Data] [Video Sample] [中文介绍]
Download [University-1652] upon request (Usually I will reply you in 5 minutes). You may use the request template.
This repository contains the dataset link and the code for our paper University-1652: A Multi-view Multi-source Benchmark for Drone-based Geo-localization, ACM Multimedia 2020. The offical paper link is at https://dl.acm.org/doi/10.1145/3394171.3413896. We collect 1652 buildings of 72 universities around the world. Thank you for your kindly attention.
Task 1: Drone-view target localization. (Drone -> Satellite) Given one drone-view image or video, the task aims to find the most similar satellite-view image to localize the target building in the satellite view.
Task 2: Drone navigation. (Satellite -> Drone) Given one satellite-view image, the drone intends to find the most relevant place (drone-view images) that it has passed by. According to its flight history, the drone could be navigated back to the target place.
12 Jan 2024 We are holding a workshop at ACM ICMR 2024 on Multimedia Object Re-ID. You are welcome to show your insights. See you at Phuket, Thailand!😃 The workshop link is https://www.zdzheng.xyz/MORE2024/ . Submission DDL is 15 April 2024.
2023 Workshop and Sepcial Session
We host a special session on IEEE Intelligent Transportation Systems Conference (ITSC), covering the object re-identification & point cloud topic. The paper ddl is by May 15, 2023 and the paper notification is at June 30, 2023. Please select the session code ``w7r4a'' during submission. More details can be found at Special Session Website.
We raise a special issue on Remote Sensing (IF=5.3) from now to 16 June 2023 16 Dec 2023. You are welcomed to submit your manuscript at (https://www.mdpi.com/journal/remotesensing/special_issues/EMPK490239), but you need to keep open-source fee in mind.
We are holding the workshop at ACM Multimedia 2023 on Aerial-view Imaging. Call for papers 中文介绍
We also provide a challenging cross-view geo-localization dataset, called University160k, and the workshop audience may consider to participate the competition. The motivation is to simulate the real- world geo-localization scenario that we usually face an extremely large satellite-view pool. In particular, University160k extends the current University-1652 dataset with extra 167,486 satellite- view gallery distractors. We have release University160k on the challenge page, and made a public leader board. (More details are at https://codalab.lisn.upsaclay.fr/competitions/12672)
The dataset split is as follows:
Split | #imgs | #buildings | #universities |
---|---|---|---|
Training | 50,218 | 701 | 33 |
Query_drone | 37,855 | 701 | 39 |
Query_satellite | 701 | 701 | 39 |
Query_ground | 2,579 | 701 | 39 |
Gallery_drone | 51,355 | 951 | 39 |
Gallery_satellite | 951 | 951 | 39 |
Gallery_ground | 2,921 | 793 | 39 |
More detailed file structure:
├── University-1652/
│ ├── readme.txt
│ ├── train/
│ ├── drone/ /* drone-view training images
│ ├── 0001
| ├── 0002
| ...
│ ├── street/ /* street-view training images
│ ├── satellite/ /* satellite-view training images
│ ├── google/ /* noisy street-view training images (collected from Google Image)
│ ├── test/
│ ├── query_drone/
│ ├── gallery_drone/
│ ├── query_street/
│ ├── gallery_street/
│ ├── query_satellite/
│ ├── gallery_satellite/
│ ├── 4K_drone/
We note that there are no overlaps between 33 univeristies of training set and 39 univeristies of test set.
26 Jan 2023 1652 Building Name List is at Here.
10 Jul 2022 Rainy?Night?Foggy? Snow? You may check our new paper "Multiple-environment Self-adaptive Network for Aerial-view Geo-localization" at https://github.com/wtyhub/MuseNet (accepted by Pattern Recognition'24)
1 Dec 2021 Fix the issue due to the latest torchvision, which do not allow the empty subfolder. Note that some buildings do not have google images.
3 March 2021 GeM Pooling is added. You may use it by --pool gem
.
21 January 2021 The GPU-Re-Ranking, a GNN-based real-time post-processing code, is at Here.
21 August 2020 The transfer learning code for Oxford and Paris is at Here.
27 July 2020 The meta data of 1652 buildings, such as latitude and longitude, are now available at Google Driver. (You could use Google Earth Pro to open the kml file or use vim to check the value).
We also provide the spiral flight tour file at Google Driver. (You could open the kml file via Google Earth Pro to enable the flight camera).
26 July 2020 The paper is accepted by ACM Multimedia 2020.
12 July 2020 I made the baseline of triplet loss (with soft margin) on University-1652 public available at Here.
12 March 2020 I add the state-of-the-art page for geo-localization and tutorial, which will be updated soon.
Now we have supported:
- Float16 to save GPU memory based on apex
- Multiple Query Evaluation
- Re-Ranking
- Random Erasing
- ResNet/VGG-16
- Visualize Training Curves
- Visualize Ranking Result
- Linear Warm-up
- Python 3.6+
- GPU Memory >= 8G
- Numpy > 1.12.1
- Pytorch 0.3+
- [Optional] apex (for float16)
- Install Pytorch from http://pytorch.org/
- Install Torchvision from the source (Please check the README. Or directly install by anaconda. It will be Okay.)
git clone https://github.com/pytorch/vision
cd vision
python setup.py install
- [Optinal] You may skip it. Install apex from the source
git clone https://github.com/NVIDIA/apex.git
cd apex
python setup.py install --cuda_ext --cpp_ext
Download [University-1652] upon request. You may use the request template.
For CVUSA, I follow the training/test split in (https://github.com/Liumouliu/OriCNN).
python train.py --name three_view_long_share_d0.75_256_s1_google --extra --views 3 --droprate 0.75 --share --stride 1 --h 256 --w 256 --fp16;
python test.py --name three_view_long_share_d0.75_256_s1_google
Default setting: Drone -> Satellite If you want to try other evaluation setting, you may change these lines at: https://github.com/layumi/University1652-Baseline/blob/master/test.py#L217-L225
python train_no_street.py --name two_view_long_no_street_share_d0.75_256_s1 --share --views 3 --droprate 0.75 --stride 1 --h 256 --w 256 --fp16;
python test.py --name two_view_long_no_street_share_d0.75_256_s1
Set three views but set the weight of loss on street images to zero.
python prepare_cvusa.py
python train_cvusa.py --name usa_vgg_noshare_warm5_lr2 --warm 5 --lr 0.02 --use_vgg16 --h 256 --w 256 --fp16 --batchsize 16;
python test_cvusa.py --name usa_vgg_noshare_warm5_lr2
python test.py --name three_view_long_share_d0.75_256_s1_google # after test
python demo.py --query_index 0 # which image you want to query in the query set
It will save an image named `show.png' containig top-10 retrieval results in the folder.
You could download the trained model at GoogleDrive or OneDrive. After download, please put model folders under ./model/
.
The following paper uses and reports the result of the baseline model. You may cite it in your paper.
@article{zheng2020university,
title={University-1652: A Multi-view Multi-source Benchmark for Drone-based Geo-localization},
author={Zheng, Zhedong and Wei, Yunchao and Yang, Yi},
journal={ACM Multimedia},
year={2020}
}
@inproceedings{zheng2023uavm,
title={UAVM'23: 2023 Workshop on UAVs in Multimedia: Capturing the World from a New Perspective},
author={Zheng, Zhedong and Shi, Yujiao and Wang, Tingyu and Liu, Jun and Fang, Jianwu and Wei, Yunchao and Chua, Tat-seng},
booktitle={Proceedings of the 31st ACM International Conference on Multimedia},
pages={9715--9717},
year={2023}
}
Instance loss is defined in
@article{zheng2017dual,
title={Dual-Path Convolutional Image-Text Embeddings with Instance Loss},
author={Zheng, Zhedong and Zheng, Liang and Garrett, Michael and Yang, Yi and Xu, Mingliang and Shen, Yi-Dong},
journal={ACM Transactions on Multimedia Computing, Communications, and Applications (TOMM)},
doi={10.1145/3383184},
volume={16},
number={2},
pages={1--23},
year={2020},
publisher={ACM New York, NY, USA}
}