Optimal way to train on TPU POD #3325

DuyguA · 2025-01-06T19:41:46Z

Hello accelerate team,
I'm looking to pretrain on v4-32 TPU pod, using a HF dataset and HF Trainer. I have no problems running on a single TPU.

I already found this issue #501 and answer https://github.com/huggingface/accelerate/issues/501 , but it's 2 years old. I successfully install accelerate and xla to all workers, however at the step 2 looks like we need the file xla_dist.py , which doesn't exist any more in xla/master branch. What are the steps to train on TPU pods then? Thnx in advance!

The text was updated successfully, but these errors were encountered:

DuyguA · 2025-01-06T19:58:47Z

cc @muellerzr

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimal way to train on TPU POD #3325

Optimal way to train on TPU POD #3325

DuyguA commented Jan 6, 2025

DuyguA commented Jan 6, 2025

Optimal way to train on TPU POD #3325

Optimal way to train on TPU POD #3325

Comments

DuyguA commented Jan 6, 2025

DuyguA commented Jan 6, 2025