You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, thanks for your brilliant works and kindly open-source efforts. I'm puzzled by two questions and hoping for your replay.
Could you explain the differences between public pre-trained weights (in red box)? Do they have some relationships with Tab. 4 in the Pixart-alpha paper?
In Sec 2.2, you illustrated the whole training is deivided into THREE stages. As for Stage I-Pixel dependency learning, I think you initial the model with DiT weights to boost efficiency, called reparameterization. In other words, there is no training need on stage I. However, I find that there are still 300K steps on ImageNet in Table 4 (red box). So, do I miss some details? And what is the training obejective of the 300K steps ImageNet training? text2img or cls2img?
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Hi, thanks for your brilliant works and kindly open-source efforts. I'm puzzled by two questions and hoping for your replay.
Could you explain the differences between public pre-trained weights (in red box)? Do they have some relationships with Tab. 4 in the Pixart-alpha paper?
In Sec 2.2, you illustrated the whole training is deivided into THREE stages. As for Stage I-Pixel dependency learning, I think you initial the model with DiT weights to boost efficiency, called reparameterization. In other words, there is no training need on stage I. However, I find that there are still 300K steps on ImageNet in Table 4 (red box). So, do I miss some details? And what is the training obejective of the 300K steps ImageNet training? text2img or cls2img?
Thanks you so mush.
Beta Was this translation helpful? Give feedback.
All reactions