FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers

Renshan Zhang¹, Rui Shao¹†, Gongwei Chen¹, Weili Guan¹, Kaiwen Zhou², Liqiang Nie¹†

¹Harbin Institute of Technology, Shenzhen
²Huawei Noah's Ark Lab
†Corresponding author

If you find this work useful for your research, please kindly cite our paper and star our repo.

Updates

[11/2024] 🔥 Details will be released. Stay tuned.

Introduction

This is the github repository of FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers. In this work, we propose the FALCON model, which introduces a novel visual register technique to simultaneously address the issues of visual redundancy and fragmentation in the high-resolution visual encoding of MLLMs.

The framework of the proposed FALCON model:

🔥 Details will be released. Stay tuned.

Citation

If you find this work useful for your research, please kindly cite our paper:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers

If you find this work useful for your research, please kindly cite our paper and star our repo.

Updates

Introduction

🔥 Details will be released. Stay tuned.

Citation

Files

README.md

Latest commit

History

README.md

File metadata and controls

FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers

If you find this work useful for your research, please kindly cite our paper and star our repo.

Updates

Introduction

🔥 Details will be released. Stay tuned.

Citation