is it make sense to try image loss in stage 2? #122

AlexzQQQ · 2025-01-09T03:46:40Z

I tried to use gumbel-softmax to transfer latent prediction to image for several loss(like perceptual loss、l1、adv , tried aims to achieve several tasks that crossentropyloss might not suit well. ) in stage2(transformer train period), all of them seemed not work. I wonder if my thoughts was wrong. Thanks for ur excellent work!!

AlexzQQQ · 2025-01-09T03:51:01Z

also i find sometimes used gumbel-softmax(hard=false) will show a better result for train,but it is a bad set for use hard=false? if codebook is limited, will mixed token show more performance ? my ability doesn't support my question and i could'n find useful research .Thanks for ur excellent work again!!

jack111331 · 2025-01-16T02:49:24Z

also i find sometimes used gumbel-softmax(hard=false) will show a better result for train,but it is a bad set for use hard=false? if codebook is limited, will mixed token show more performance ? my ability doesn't support my question and i could'n find useful research .Thanks for ur excellent work again!!

@AlexzQQQ
Have you ever tried using only two predicted probability within one token to mix for each latent embedding? I mean, just mask out the non-top-2 probability and use gumbel-softmax on the predicted top-2 probability to produce latent embedding and decode it using VQ-VAE.
This setup simplifies your first question whether the performance drop because of mixing a massive amount of codes.
As your second question, what's your evaulation result when using gumbel-softmax(hard=false)? And what's your temperature setup? Using gumbel-softmax(hard=false) will incorporate multiple token probability when producing a latent embedding, and it's hard to tell and analyze if it's a good move without any evaluation result.

I think although it's adventageous to mix token probability given each code represent distinct feature in theory, whether the pretrained VQ-VAE can fully utilize the rich latent repretentation produce by the mixed token probability or it will simply collapse eventually in the 2nd stage of training is worth exploring.

AlexzQQQ · 2025-01-16T09:48:50Z

@jack111331 thx for u reply , I will try what your questioned in serveral weeks due to busy work. the mix token will useful if loss is not only crossentropy loss in my opinion.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

is it make sense to try image loss in stage 2? #122

is it make sense to try image loss in stage 2? #122

AlexzQQQ commented Jan 9, 2025

AlexzQQQ commented Jan 9, 2025 •

edited

Loading

jack111331 commented Jan 16, 2025

AlexzQQQ commented Jan 16, 2025

is it make sense to try image loss in stage 2? #122

is it make sense to try image loss in stage 2? #122

Comments

AlexzQQQ commented Jan 9, 2025

AlexzQQQ commented Jan 9, 2025 • edited Loading

jack111331 commented Jan 16, 2025

AlexzQQQ commented Jan 16, 2025

AlexzQQQ commented Jan 9, 2025 •

edited

Loading