Add linear from unifold for a clean-path residual transformer #38

hypnopump · 2023-12-12T10:49:28Z

Transformer layers were not following clean-path residual principles (model starts at identity). This PR borrows the Linear module from UniFold and ensures all transformer residual layers start at 0 in training.

Bonus: modifies the random seed to affect all GPUs at the same time, preventing lack of reproducibility in multi-gpu setup

New models train faster now (blue is new, red is old):

hypnopump · 2023-12-16T15:11:05Z

Previous param update_freq was doing the job of gradient accumulation, but not dividing by the number of accumulated steps. Changed it so it behaves like gradient_accumulation and the effective_batch_size can be used with the same learning rate depending on the number of gpus and memory consumption.

To be fair, this could be a separate PR. Could implement it like this if it's wanted @guolinke

See expected behaviour from huggingface's accelerate:

Toghether with the multi-gpu seed, merging this PR will improve robustness of training params and the ability to dynamically adjust the effective batch size (num_gpus * batch_size * accumulation_steps) given compute availability

hypnopump added 2 commits December 12, 2023 11:44

add linear from unifold for a clean-path residual transformer

bce8731

correct imports

b89f9a3

hypnopump marked this pull request as ready for review December 12, 2023 18:19

hypnopump added 5 commits December 13, 2023 08:52

manual seed all

15ac40f

add small init embedding

890a7ee

add embedding to modules

a07e14e

fix minor bugs

e38adb3

make update frequency behave like gradient accumulation

517cb80

revert loss scaling

64a4300

nbrosse mentioned this pull request Dec 27, 2023

Improvements nbrosse/Uni-Core#8

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add linear from unifold for a clean-path residual transformer #38

Add linear from unifold for a clean-path residual transformer #38

hypnopump commented Dec 12, 2023 •

edited

Loading

hypnopump commented Dec 16, 2023 •

edited

Loading

Add linear from unifold for a clean-path residual transformer #38

Are you sure you want to change the base?

Add linear from unifold for a clean-path residual transformer #38

Conversation

hypnopump commented Dec 12, 2023 • edited Loading

hypnopump commented Dec 16, 2023 • edited Loading

hypnopump commented Dec 12, 2023 •

edited

Loading

hypnopump commented Dec 16, 2023 •

edited

Loading