You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello accelerate team,
I'm looking to pretrain on v4-32 TPU pod, using a HF dataset and HF Trainer. I have no problems running on a single TPU.
I already found this issue #501 and answer https://github.com/huggingface/accelerate/issues/501 , but it's 2 years old. I successfully install accelerate and xla to all workers, however at the step 2 looks like we need the file xla_dist.py , which doesn't exist any more in xla/master branch. What are the steps to train on TPU pods then? Thnx in advance!
The text was updated successfully, but these errors were encountered:
Hello accelerate team,
I'm looking to pretrain on v4-32 TPU pod, using a HF dataset and HF Trainer. I have no problems running on a single TPU.
I already found this issue #501 and answer https://github.com/huggingface/accelerate/issues/501 , but it's 2 years old. I successfully install accelerate and
xla
to all workers, however at the step 2 looks like we need the filexla_dist.py
, which doesn't exist any more inxla/master
branch. What are the steps to train on TPU pods then? Thnx in advance!The text was updated successfully, but these errors were encountered: