-
Notifications
You must be signed in to change notification settings - Fork 10.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Misc. bug: Docker Image llama-quantize Segmentation fault #11196
Labels
Comments
Was b4435 / 017cc5f still working correctly? |
hey johannes ❯ docker run --rm -it \
-v ./models:/models \
ghcr.io/ggerganov/llama.cpp:full-b4435 \
--quantize /models/BAAI/bge-small-en-v1.5/bge-small-en-v1.5-f32.gguf /models/BAAI/bge-small-en-v1.5/bge-small-en-v1.5-Q4_K_M.gguf Q4_K_M
Unable to find image 'ghcr.io/ggerganov/llama.cpp:full-b4435' locally
docker: Error response from daemon: manifest unknown.
See 'docker run --help'.
6s 08:57:34
❯ docker run --rm -it \
-v ./models:/models \
ghcr.io/ggerganov/llama.cpp:full-b4434 \
--quantize /models/BAAI/bge-small-en-v1.5/bge-small-en-v1.5-f32.gguf /models/BAAI/bge-small-en-v1.5/bge-small-en-v1.5-Q4_K_M.gguf Q4_K_M
Unable to find image 'ghcr.io/ggerganov/llama.cpp:full-b4434' locally
docker: Error response from daemon: manifest unknown.
See 'docker run --help'.
5s 08:57:48
❯ docker run --rm -it \
-v ./models:/models \
ghcr.io/ggerganov/llama.cpp:full-b4433 \
--quantize /models/BAAI/bge-small-en-v1.5/bge-small-en-v1.5-f32.gguf /models/BAAI/bge-small-en-v1.5/bge-small-en-v1.5-Q4_K_M.gguf Q4_K_M
Unable to find image 'ghcr.io/ggerganov/llama.cpp:full-b4433' locally
docker: Error response from daemon: manifest unknown.
See 'docker run --help'.
6s 08:57:58
❯ docker run --rm -it \
-v ./models:/models \
ghcr.io/ggerganov/llama.cpp:full-b4432 \
--quantize /models/BAAI/bge-small-en-v1.5/bge-small-en-v1.5-f32.gguf /models/BAAI/bge-small-en-v1.5/bge-small-en-v1.5-Q4_K_M.gguf Q4_K_M
Unable to find image 'ghcr.io/ggerganov/llama.cpp:full-b4432' locally
docker: Error response from daemon: manifest unknown.
See 'docker run --help'.
6s 08:58:08
❯ docker run --rm -it \
-v ./models:/models \
ghcr.io/ggerganov/llama.cpp:full-b4431 \
--quantize /models/BAAI/bge-small-en-v1.5/bge-small-en-v1.5-f32.gguf /models/BAAI/bge-small-en-v1.5/bge-small-en-v1.5-Q4_K_M.gguf Q4_K_M
Unable to find image 'ghcr.io/ggerganov/llama.cpp:full-b4431' locally
full-b4431: Pulling from ggerganov/llama.cpp
6414378b6477: Already exists
79d927991872: Pull complete
a3a25759bec8: Pull complete
20d3c56b53dd: Pull complete
4f4fb700ef54: Pull complete
f41f15e9e25c: Pull complete
Digest: sha256:095e1c8579e6bd70605755454920b4856c5c203b3c3fe9b0f5f5a4f0e747e5fd
Status: Downloaded newer image for ghcr.io/ggerganov/llama.cpp:full-b4431
main: build = 4431 (dc7cef9f)
main: built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
main: quantizing '/models/BAAI/bge-small-en-v1.5/bge-small-en-v1.5-f32.gguf' to '/models/BAAI/bge-small-en-v1.5/bge-small-en-v1.5-Q4_K_M.gguf' as Q4_K_M
llama_model_loader: loaded meta data with 30 key-value pairs and 197 tensors from /models/BAAI/bge-small-en-v1.5/bge-small-en-v1.5-f32.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = bert
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.name str = Bge Small En v1.5
llama_model_loader: - kv 3: general.version str = v1.5
llama_model_loader: - kv 4: general.finetune str = en
llama_model_loader: - kv 5: general.basename str = bge
llama_model_loader: - kv 6: general.size_label str = small
llama_model_loader: - kv 7: general.license str = mit
llama_model_loader: - kv 8: general.tags arr[str,5] = ["sentence-transformers", "feature-ex...
llama_model_loader: - kv 9: general.languages arr[str,1] = ["en"]
llama_model_loader: - kv 10: bert.block_count u32 = 12
llama_model_loader: - kv 11: bert.context_length u32 = 512
llama_model_loader: - kv 12: bert.embedding_length u32 = 384
llama_model_loader: - kv 13: bert.feed_forward_length u32 = 1536
llama_model_loader: - kv 14: bert.attention.head_count u32 = 12
llama_model_loader: - kv 15: bert.attention.layer_norm_epsilon f32 = 0.000000
llama_model_loader: - kv 16: general.file_type u32 = 0
llama_model_loader: - kv 17: bert.attention.causal bool = false
llama_model_loader: - kv 18: bert.pooling_type u32 = 2
llama_model_loader: - kv 19: tokenizer.ggml.token_type_count u32 = 2
llama_model_loader: - kv 20: tokenizer.ggml.model str = bert
llama_model_loader: - kv 21: tokenizer.ggml.pre str = jina-v2-en
llama_model_loader: - kv 22: tokenizer.ggml.tokens arr[str,30522] = ["[PAD]", "[unused0]", "[unused1]", "...
llama_model_loader: - kv 23: tokenizer.ggml.token_type arr[i32,30522] = [3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 24: tokenizer.ggml.unknown_token_id u32 = 100
llama_model_loader: - kv 25: tokenizer.ggml.seperator_token_id u32 = 102
llama_model_loader: - kv 26: tokenizer.ggml.padding_token_id u32 = 0
llama_model_loader: - kv 27: tokenizer.ggml.cls_token_id u32 = 101
llama_model_loader: - kv 28: tokenizer.ggml.mask_token_id u32 = 103
llama_model_loader: - kv 29: general.quantization_version u32 = 2
llama_model_loader: - type f32: 197 tensors
Segmentation fault (core dumped)
❯ docker run --rm -it \
-v ./models:/models \
ghcr.io/ggerganov/llama.cpp:full-b4436 \
--quantize /models/BAAI/bge-small-en-v1.5/bge-small-en-v1.5-f32.gguf /models/BAAI/bge-small-en-v1.5/bge-small-en-v1.5-Q4_K_M.gguf Q4_K_M
main: build = 4436 (53ff6b9b)
main: built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
main: quantizing '/models/BAAI/bge-small-en-v1.5/bge-small-en-v1.5-f32.gguf' to '/models/BAAI/bge-small-en-v1.5/bge-small-en-v1.5-Q4_K_M.gguf' as Q4_K_M
llama_model_loader: loaded meta data with 30 key-value pairs and 197 tensors from /models/BAAI/bge-small-en-v1.5/bge-small-en-v1.5-f32.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv 0: general.architecture str = bert
llama_model_loader: - kv 1: general.type str = model
llama_model_loader: - kv 2: general.name str = Bge Small En v1.5
llama_model_loader: - kv 3: general.version str = v1.5
llama_model_loader: - kv 4: general.finetune str = en
llama_model_loader: - kv 5: general.basename str = bge
llama_model_loader: - kv 6: general.size_label str = small
llama_model_loader: - kv 7: general.license str = mit
llama_model_loader: - kv 8: general.tags arr[str,5] = ["sentence-transformers", "feature-ex...
llama_model_loader: - kv 9: general.languages arr[str,1] = ["en"]
llama_model_loader: - kv 10: bert.block_count u32 = 12
llama_model_loader: - kv 11: bert.context_length u32 = 512
llama_model_loader: - kv 12: bert.embedding_length u32 = 384
llama_model_loader: - kv 13: bert.feed_forward_length u32 = 1536
llama_model_loader: - kv 14: bert.attention.head_count u32 = 12
llama_model_loader: - kv 15: bert.attention.layer_norm_epsilon f32 = 0.000000
llama_model_loader: - kv 16: general.file_type u32 = 0
llama_model_loader: - kv 17: bert.attention.causal bool = false
llama_model_loader: - kv 18: bert.pooling_type u32 = 2
llama_model_loader: - kv 19: tokenizer.ggml.token_type_count u32 = 2
llama_model_loader: - kv 20: tokenizer.ggml.model str = bert
llama_model_loader: - kv 21: tokenizer.ggml.pre str = jina-v2-en
llama_model_loader: - kv 22: tokenizer.ggml.tokens arr[str,30522] = ["[PAD]", "[unused0]", "[unused1]", "...
llama_model_loader: - kv 23: tokenizer.ggml.token_type arr[i32,30522] = [3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv 24: tokenizer.ggml.unknown_token_id u32 = 100
llama_model_loader: - kv 25: tokenizer.ggml.seperator_token_id u32 = 102
llama_model_loader: - kv 26: tokenizer.ggml.padding_token_id u32 = 0
llama_model_loader: - kv 27: tokenizer.ggml.cls_token_id u32 = 101
llama_model_loader: - kv 28: tokenizer.ggml.mask_token_id u32 = 103
llama_model_loader: - kv 29: general.quantization_version u32 = 2
llama_model_loader: - type f32: 197 tensors
Segmentation fault (core dumped) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Name and Version
root@f7545b6b4f65:/app# ./llama-cli --version
load_backend: loaded CPU backend from ./libggml-cpu-alderlake.so
version: 4460 (ba8a1f9)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
Operating systems
Linux, Other? (Please let us know in description)
Which llama.cpp modules do you know to be affected?
llama-quantize
Command line
❯ docker run --rm -it \ -v ./models:/models \ ghcr.io/ggerganov/llama.cpp:full \ --quantize /models/BAAI/bge-small-en-v1.5/bge-small-en-v1.5-f32.gguf /models/BAAI/bge-small-en-v1.5/bge-small-en-v1.5-Q4_K_M.gguf Q4_K_M
Problem description & steps to reproduce
just try to quantize a model and you'll get the segfault
First Bad Commit
No response
Relevant log output
No response
The text was updated successfully, but these errors were encountered: