Some Questions and Comments #152

tom99763 · 2022-11-07T00:33:38Z

Do you consider that instead of the feature map from CNN, using vector-quantized AE (VQVAE) for the future work? I think the result will be surprised due to its feature compression and sampleable properties for image-to-image translation task.
It seems like the input-output pixel correlation largely impacts the translation result during early training process (multimodal translation or Animal-to-Human translation). Instead of predicting all at ones, two stage model (first contour, next texture) may improves the result.

Thank you

taesungp · 2022-12-20T21:37:36Z

Hello, thanks for suggestions.

I think incorporating VQVAE can be a good direction, particularly for saving compute.
It may, especially if we go to higher resolution. But two-stage approaches are also more cumbersome to train.

tom99763 changed the title ~~Some Questions~~ Some Questions and Comments Nov 7, 2022

Provide feedback