This repository has been archived by the owner on Dec 6, 2024. It is now read-only.
feat: add more max_length constraint for resource limit machines #41
+15
−9
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Hi, @simonJJJ I am so glad to see you update for
M1/M2 support
, thanks.So, I can close my previous PR #39
There are some useful features can help resource limit machines for computing.
max_length
for initialization into pipeline setting (when pipeline create), because MacBook Air M1's RAM cannot hold original setting (origin model training setting is too long), It will let gpu out of memory easily, also minimize compute space in lower length setting forkv
MEM_SIZE
andSCRATCH_SIZE
make reasonable formax_length
modification.There are my experiments in this PR.
Experiments Setting
./build/bin/main -m qwen7b-ggml.bin -l 128 -v --tiktoken ~/Project/llm/Qwen-7B-Chat/qwen.tiktoken -p hello
Spend Time (Output Time, Lower is better)