Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Relation to the original XLNet implementation? #7843

Closed
vochicong opened this issue Nov 21, 2019 · 2 comments
Closed

Relation to the original XLNet implementation? #7843

vochicong opened this issue Nov 21, 2019 · 2 comments

Comments

@vochicong
Copy link

Hi @saberkun, @zihangdai, @graykode, @bzantium

The original zihangdai/XLNet repository doesn't get any update recently. Should we assume that the XLNet implementation here official/nlp/xlnet will replace the original one?

BTW, there are several PRs there that are not merged, including my latest PR zihangdai/xlnet#247 that enables pre-training XLNet using Cloud TPU and TPU Pod.

@saberkun
Copy link
Member

Hi, this is a new implementation in TF2 and includes a few updates from the authors directly.
This one will not replace the authors' repo.
Yes, this version supports pretraining on TPU pod but we still need to fill documentation.
Thanks

@LifeIsStrange
Copy link

@saberkun
@allenwang28
What are the pro and cons of your XLnet implementation vs the one in https://huggingface.co/transformers/model_doc/xlnet.html
Did any of the two improved accuracy / efficiency versus the original implementation from the paper?

Off topic:
It's 2020 but the fact is that ALL researchers focused on improving BERT (Roberta, Albert, SpanBERT) which is nice but it is absurd that no researcher really attempted to improve upon the n°1 language model (XLnet)
It is such a low hanging fruit which can improve ML accuracy in all NLP tasks!
E.g researchers could probably transpose the spanBERT idea to a spanXLnet.
Also the trend of bigger models (11B for T5, 175B for GPT 3) has not been ported yet to XLnet.
Finally the biggest absurdity that I want to show is that XLnet hasn't even been tried on some fundamental NLP tasks such as coreference resolution...
Why so many researchers can spend years of effort on a niche idea and not spend a few weeks using XLnet on coreference resolution and publish the result as a new SOTA?

Also XLnet sota results should be retried with swish / RAdam / gradient centralization / Ranger https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer

I sincerely hope that you, the tensorflow engineers can observe this research direction blind spot and correct it. Then you would be covered with glory for the induced empirical results.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants