Relation to the original XLNet implementation? #7843

vochicong · 2019-11-21T02:11:10Z

Hi @saberkun, @zihangdai, @graykode, @bzantium

The original zihangdai/XLNet repository doesn't get any update recently. Should we assume that the XLNet implementation here official/nlp/xlnet will replace the original one?

BTW, there are several PRs there that are not merged, including my latest PR zihangdai/xlnet#247 that enables pre-training XLNet using Cloud TPU and TPU Pod.

saberkun · 2019-11-21T04:51:28Z

Hi, this is a new implementation in TF2 and includes a few updates from the authors directly.
This one will not replace the authors' repo.
Yes, this version supports pretraining on TPU pod but we still need to fill documentation.
Thanks

LifeIsStrange · 2020-10-23T21:48:29Z

@saberkun
@allenwang28
What are the pro and cons of your XLnet implementation vs the one in https://huggingface.co/transformers/model_doc/xlnet.html
Did any of the two improved accuracy / efficiency versus the original implementation from the paper?

Off topic:
It's 2020 but the fact is that ALL researchers focused on improving BERT (Roberta, Albert, SpanBERT) which is nice but it is absurd that no researcher really attempted to improve upon the n°1 language model (XLnet)
It is such a low hanging fruit which can improve ML accuracy in all NLP tasks!
E.g researchers could probably transpose the spanBERT idea to a spanXLnet.
Also the trend of bigger models (11B for T5, 175B for GPT 3) has not been ported yet to XLnet.
Finally the biggest absurdity that I want to show is that XLnet hasn't even been tried on some fundamental NLP tasks such as coreference resolution...
Why so many researchers can spend years of effort on a niche idea and not spend a few weeks using XLnet on coreference resolution and publish the result as a new SOTA?

Also XLnet sota results should be retried with swish / RAdam / gradient centralization / Ranger https://github.com/lessw2020/Ranger-Deep-Learning-Optimizer

I sincerely hope that you, the tensorflow engineers can observe this research direction blind spot and correct it. Then you would be covered with glory for the induced empirical results.

vochicong closed this as completed Nov 26, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Relation to the original XLNet implementation? #7843

Relation to the original XLNet implementation? #7843

vochicong commented Nov 21, 2019

saberkun commented Nov 21, 2019

LifeIsStrange commented Oct 23, 2020

Relation to the original XLNet implementation? #7843

Relation to the original XLNet implementation? #7843

Comments

vochicong commented Nov 21, 2019

saberkun commented Nov 21, 2019

LifeIsStrange commented Oct 23, 2020