Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimal way to train on TPU POD #3325

Open
DuyguA opened this issue Jan 6, 2025 · 1 comment
Open

Optimal way to train on TPU POD #3325

DuyguA opened this issue Jan 6, 2025 · 1 comment

Comments

@DuyguA
Copy link

DuyguA commented Jan 6, 2025

Hello accelerate team,
I'm looking to pretrain on v4-32 TPU pod, using a HF dataset and HF Trainer. I have no problems running on a single TPU.

I already found this issue #501 and answer https://github.com/huggingface/accelerate/issues/501 , but it's 2 years old. I successfully install accelerate and xla to all workers, however at the step 2 looks like we need the file xla_dist.py , which doesn't exist any more in xla/master branch. What are the steps to train on TPU pods then? Thnx in advance!

@DuyguA
Copy link
Author

DuyguA commented Jan 6, 2025

cc @muellerzr

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant