New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

QueryOTR: Outpainting by Queries #5

Open

Dongwoo-Im opened this issue Nov 20, 2023 · 0 comments

Labels

ECCV Image Outpainting

Owner

Dongwoo-Im commented Nov 20, 2023 •

edited

Loading

github : https://github.com/Kaiseem/QueryOTR

QueryOTR = Query Outpainting TRansformer

CNN은 long range capture를 하지 못해서 outpainting에 적합하지 않다. -> ViT

image outpainting task를 patch-wise seq2seq autoregression로 정의

hybrid ViT-based encoder-decoder framework (MAE based Generator)

Pipeline : pretrained encoder - QEM - decoder - PSM
QEM = query expansion module
- 기존의 token을 key value로 decoding에 활용한다.
- 이때 query는 random noise + residual block를 거쳐 확장된 patch
PSM = patch smoothing module
- 원본 이미지와 확장된 영역 사이의 차이를 줄이기 위해 average 수행
  - 원래 MAE에서는 token을 pixel-level로 변환할 때 linear mapping을 사용하는데,
  - QueryOTR은 ConvTanspose2D 모듈로 여러 token을 복합적으로 고려한 mapping이 가능하게 했다.

QEM : 수렴 속도가 빨라지고 성능도 좋아진다. (noise = sampling) (DC = deform conv)
PSM : 생성 결과, 성능 모두 좋아진다. (per-patch norm : from MAE ?)

objective

patch-wise reconstruction loss = MAE recon loss
- warmup 단계에서는 recon loss만 사용
perceptual loss : (multi-scale) VGG-19 network pretraind on ImageNet
adversarial loss : (multi-scale) (CNN) PatchGAN discriminator (least squared loss -> hinge loss)
- discriminator regularization : DiffAugment + Spectral normalization

pretrianed encoder를 사용하지 않아도 성능 차이는 그리 크지 않지만, 수렴 속도에 차이가 있다고 함

x1, x2, x3는 한번에 생성하는 것이 아니라 반복적으로 수행한다.

위 버전은 QueryOTR 방식을 따라 다른 모델들의 성능을 평가한 것이다.
즉, outpainting의 본질인 외곽 영역에 대한 생성 퀄리티 측면에서 QueryOTR이 준수한 성능을 보인다.

Dongwoo-Im added ECCV Image Outpainting labels

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment