Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

is it make sense to try image loss in stage 2? #122

Open
AlexzQQQ opened this issue Jan 9, 2025 · 3 comments
Open

is it make sense to try image loss in stage 2? #122

AlexzQQQ opened this issue Jan 9, 2025 · 3 comments

Comments

@AlexzQQQ
Copy link

AlexzQQQ commented Jan 9, 2025

I tried to use gumbel-softmax to transfer latent prediction to image for several loss(like perceptual loss、l1、adv , tried aims to achieve several tasks that crossentropyloss might not suit well. ) in stage2(transformer train period), all of them seemed not work. I wonder if my thoughts was wrong. Thanks for ur excellent work!!

@AlexzQQQ
Copy link
Author

AlexzQQQ commented Jan 9, 2025

also i find sometimes used gumbel-softmax(hard=false) will show a better result for train,but it is a bad set for use hard=false? if codebook is limited, will mixed token show more performance ? my ability doesn't support my question and i could'n find useful research .Thanks for ur excellent work again!!

@jack111331
Copy link

also i find sometimes used gumbel-softmax(hard=false) will show a better result for train,but it is a bad set for use hard=false? if codebook is limited, will mixed token show more performance ? my ability doesn't support my question and i could'n find useful research .Thanks for ur excellent work again!!

@AlexzQQQ
Have you ever tried using only two predicted probability within one token to mix for each latent embedding? I mean, just mask out the non-top-2 probability and use gumbel-softmax on the predicted top-2 probability to produce latent embedding and decode it using VQ-VAE.
This setup simplifies your first question whether the performance drop because of mixing a massive amount of codes.
As your second question, what's your evaulation result when using gumbel-softmax(hard=false)? And what's your temperature setup? Using gumbel-softmax(hard=false) will incorporate multiple token probability when producing a latent embedding, and it's hard to tell and analyze if it's a good move without any evaluation result.

I think although it's adventageous to mix token probability given each code represent distinct feature in theory, whether the pretrained VQ-VAE can fully utilize the rich latent repretentation produce by the mixed token probability or it will simply collapse eventually in the 2nd stage of training is worth exploring.

@AlexzQQQ
Copy link
Author

@jack111331 thx for u reply , I will try what your questioned in serveral weeks due to busy work. the mix token will useful if loss is not only crossentropy loss in my opinion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants