Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model loading #25

Merged
merged 14 commits into from
Sep 24, 2024
Merged

Model loading #25

merged 14 commits into from
Sep 24, 2024

Conversation

tigranfah
Copy link
Member

@tigranfah tigranfah commented Sep 20, 2024

A few of key things are implemented

  • slight modification of the json processing code, the json str is explicitly parsed under a. try except block, because the previous implementation raise an Exception connected to not being able to parse the json string.
  • tune the config files for the maximization of training throughput and reproduction training for chemlactica 125m and 1.3b trainings, add gradient clipping after each forward pass to prevent overflows
  • add a debug config for chemlactica model for debugging purposes

torchtitan/logging.py Outdated Show resolved Hide resolved
torchtitan/metrics.py Show resolved Hide resolved
train.py Show resolved Hide resolved
Copy link
Collaborator

@philippguevorguian philippguevorguian left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@tigranfah tigranfah merged commit 7eb6c33 into main Sep 24, 2024
0 of 4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants