Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Misc. bug: Docker Image llama-quantize Segmentation fault #11196

Open
aria3ppp opened this issue Jan 11, 2025 · 2 comments
Open

Misc. bug: Docker Image llama-quantize Segmentation fault #11196

aria3ppp opened this issue Jan 11, 2025 · 2 comments

Comments

@aria3ppp
Copy link

Name and Version

root@f7545b6b4f65:/app# ./llama-cli --version
load_backend: loaded CPU backend from ./libggml-cpu-alderlake.so
version: 4460 (ba8a1f9)
built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu

Operating systems

Linux, Other? (Please let us know in description)

Which llama.cpp modules do you know to be affected?

llama-quantize

Command line

❯ docker run --rm -it \                                                                                                                                          
  -v ./models:/models \
  ghcr.io/ggerganov/llama.cpp:full \
  --quantize /models/BAAI/bge-small-en-v1.5/bge-small-en-v1.5-f32.gguf /models/BAAI/bge-small-en-v1.5/bge-small-en-v1.5-Q4_K_M.gguf Q4_K_M

Problem description & steps to reproduce

just try to quantize a model and you'll get the segfault

❯ docker run --rm -it \                                    
  -v ./models:/models \
  ghcr.io/ggerganov/llama.cpp:full \
  --quantize /models/BAAI/bge-small-en-v1.5/bge-small-en-v1.5-f32.gguf /models/BAAI/bge-small-en-v1.5/bge-small-en-v1.5-Q4_K_M.gguf Q4_K_M
main: build = 4460 (ba8a1f9c)
main: built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
main: quantizing '/models/BAAI/bge-small-en-v1.5/bge-small-en-v1.5-f32.gguf' to '/models/BAAI/bge-small-en-v1.5/bge-small-en-v1.5-Q4_K_M.gguf' as Q4_K_M
llama_model_loader: loaded meta data with 30 key-value pairs and 197 tensors from /models/BAAI/bge-small-en-v1.5/bge-small-en-v1.5-f32.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = bert
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Bge Small En v1.5
llama_model_loader: - kv   3:                            general.version str              = v1.5
llama_model_loader: - kv   4:                           general.finetune str              = en
llama_model_loader: - kv   5:                           general.basename str              = bge
llama_model_loader: - kv   6:                         general.size_label str              = small
llama_model_loader: - kv   7:                            general.license str              = mit
llama_model_loader: - kv   8:                               general.tags arr[str,5]       = ["sentence-transformers", "feature-ex...
llama_model_loader: - kv   9:                          general.languages arr[str,1]       = ["en"]
llama_model_loader: - kv  10:                           bert.block_count u32              = 12
llama_model_loader: - kv  11:                        bert.context_length u32              = 512
llama_model_loader: - kv  12:                      bert.embedding_length u32              = 384
llama_model_loader: - kv  13:                   bert.feed_forward_length u32              = 1536
llama_model_loader: - kv  14:                  bert.attention.head_count u32              = 12
llama_model_loader: - kv  15:          bert.attention.layer_norm_epsilon f32              = 0.000000
llama_model_loader: - kv  16:                          general.file_type u32              = 0
llama_model_loader: - kv  17:                      bert.attention.causal bool             = false
llama_model_loader: - kv  18:                          bert.pooling_type u32              = 2
llama_model_loader: - kv  19:            tokenizer.ggml.token_type_count u32              = 2
llama_model_loader: - kv  20:                       tokenizer.ggml.model str              = bert
llama_model_loader: - kv  21:                         tokenizer.ggml.pre str              = jina-v2-en
llama_model_loader: - kv  22:                      tokenizer.ggml.tokens arr[str,30522]   = ["[PAD]", "[unused0]", "[unused1]", "...
llama_model_loader: - kv  23:                  tokenizer.ggml.token_type arr[i32,30522]   = [3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  24:            tokenizer.ggml.unknown_token_id u32              = 100
llama_model_loader: - kv  25:          tokenizer.ggml.seperator_token_id u32              = 102
llama_model_loader: - kv  26:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  27:                tokenizer.ggml.cls_token_id u32              = 101
llama_model_loader: - kv  28:               tokenizer.ggml.mask_token_id u32              = 103
llama_model_loader: - kv  29:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  197 tensors
Segmentation fault (core dumped)

First Bad Commit

No response

Relevant log output

No response

@JohannesGaessler
Copy link
Collaborator

Was b4435 / 017cc5f still working correctly?

@aria3ppp
Copy link
Author

aria3ppp commented Jan 12, 2025

Was b4435 / 017cc5f still working correctly?

hey johannes
do you mean b4431? i could not find b4435

❯ docker run --rm -it \
  -v ./models:/models \
  ghcr.io/ggerganov/llama.cpp:full-b4435 \
  --quantize /models/BAAI/bge-small-en-v1.5/bge-small-en-v1.5-f32.gguf /models/BAAI/bge-small-en-v1.5/bge-small-en-v1.5-Q4_K_M.gguf Q4_K_M
Unable to find image 'ghcr.io/ggerganov/llama.cpp:full-b4435' locally
docker: Error response from daemon: manifest unknown.
See 'docker run --help'.
                                                                                                                          6s 08:57:34
❯ docker run --rm -it \
  -v ./models:/models \
  ghcr.io/ggerganov/llama.cpp:full-b4434 \
  --quantize /models/BAAI/bge-small-en-v1.5/bge-small-en-v1.5-f32.gguf /models/BAAI/bge-small-en-v1.5/bge-small-en-v1.5-Q4_K_M.gguf Q4_K_M
Unable to find image 'ghcr.io/ggerganov/llama.cpp:full-b4434' locally
docker: Error response from daemon: manifest unknown.
See 'docker run --help'.
                                                                                                                          5s 08:57:48
❯ docker run --rm -it \
  -v ./models:/models \
  ghcr.io/ggerganov/llama.cpp:full-b4433 \
  --quantize /models/BAAI/bge-small-en-v1.5/bge-small-en-v1.5-f32.gguf /models/BAAI/bge-small-en-v1.5/bge-small-en-v1.5-Q4_K_M.gguf Q4_K_M
Unable to find image 'ghcr.io/ggerganov/llama.cpp:full-b4433' locally
docker: Error response from daemon: manifest unknown.
See 'docker run --help'.
                                                                                                                          6s 08:57:58
❯ docker run --rm -it \
  -v ./models:/models \
  ghcr.io/ggerganov/llama.cpp:full-b4432 \
  --quantize /models/BAAI/bge-small-en-v1.5/bge-small-en-v1.5-f32.gguf /models/BAAI/bge-small-en-v1.5/bge-small-en-v1.5-Q4_K_M.gguf Q4_K_M
Unable to find image 'ghcr.io/ggerganov/llama.cpp:full-b4432' locally
docker: Error response from daemon: manifest unknown.
See 'docker run --help'.
                                                                                                                          6s 08:58:08
❯ docker run --rm -it \
  -v ./models:/models \
  ghcr.io/ggerganov/llama.cpp:full-b4431 \
  --quantize /models/BAAI/bge-small-en-v1.5/bge-small-en-v1.5-f32.gguf /models/BAAI/bge-small-en-v1.5/bge-small-en-v1.5-Q4_K_M.gguf Q4_K_M
Unable to find image 'ghcr.io/ggerganov/llama.cpp:full-b4431' locally
full-b4431: Pulling from ggerganov/llama.cpp
6414378b6477: Already exists 
79d927991872: Pull complete 
a3a25759bec8: Pull complete 
20d3c56b53dd: Pull complete 
4f4fb700ef54: Pull complete 
f41f15e9e25c: Pull complete 
Digest: sha256:095e1c8579e6bd70605755454920b4856c5c203b3c3fe9b0f5f5a4f0e747e5fd
Status: Downloaded newer image for ghcr.io/ggerganov/llama.cpp:full-b4431
main: build = 4431 (dc7cef9f)
main: built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
main: quantizing '/models/BAAI/bge-small-en-v1.5/bge-small-en-v1.5-f32.gguf' to '/models/BAAI/bge-small-en-v1.5/bge-small-en-v1.5-Q4_K_M.gguf' as Q4_K_M
llama_model_loader: loaded meta data with 30 key-value pairs and 197 tensors from /models/BAAI/bge-small-en-v1.5/bge-small-en-v1.5-f32.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = bert
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Bge Small En v1.5
llama_model_loader: - kv   3:                            general.version str              = v1.5
llama_model_loader: - kv   4:                           general.finetune str              = en
llama_model_loader: - kv   5:                           general.basename str              = bge
llama_model_loader: - kv   6:                         general.size_label str              = small
llama_model_loader: - kv   7:                            general.license str              = mit
llama_model_loader: - kv   8:                               general.tags arr[str,5]       = ["sentence-transformers", "feature-ex...
llama_model_loader: - kv   9:                          general.languages arr[str,1]       = ["en"]
llama_model_loader: - kv  10:                           bert.block_count u32              = 12
llama_model_loader: - kv  11:                        bert.context_length u32              = 512
llama_model_loader: - kv  12:                      bert.embedding_length u32              = 384
llama_model_loader: - kv  13:                   bert.feed_forward_length u32              = 1536
llama_model_loader: - kv  14:                  bert.attention.head_count u32              = 12
llama_model_loader: - kv  15:          bert.attention.layer_norm_epsilon f32              = 0.000000
llama_model_loader: - kv  16:                          general.file_type u32              = 0
llama_model_loader: - kv  17:                      bert.attention.causal bool             = false
llama_model_loader: - kv  18:                          bert.pooling_type u32              = 2
llama_model_loader: - kv  19:            tokenizer.ggml.token_type_count u32              = 2
llama_model_loader: - kv  20:                       tokenizer.ggml.model str              = bert
llama_model_loader: - kv  21:                         tokenizer.ggml.pre str              = jina-v2-en
llama_model_loader: - kv  22:                      tokenizer.ggml.tokens arr[str,30522]   = ["[PAD]", "[unused0]", "[unused1]", "...
llama_model_loader: - kv  23:                  tokenizer.ggml.token_type arr[i32,30522]   = [3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  24:            tokenizer.ggml.unknown_token_id u32              = 100
llama_model_loader: - kv  25:          tokenizer.ggml.seperator_token_id u32              = 102
llama_model_loader: - kv  26:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  27:                tokenizer.ggml.cls_token_id u32              = 101
llama_model_loader: - kv  28:               tokenizer.ggml.mask_token_id u32              = 103
llama_model_loader: - kv  29:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  197 tensors
Segmentation fault (core dumped)

❯ docker run --rm -it \
  -v ./models:/models \
  ghcr.io/ggerganov/llama.cpp:full-b4436 \
  --quantize /models/BAAI/bge-small-en-v1.5/bge-small-en-v1.5-f32.gguf /models/BAAI/bge-small-en-v1.5/bge-small-en-v1.5-Q4_K_M.gguf Q4_K_M
main: build = 4436 (53ff6b9b)
main: built with cc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0 for x86_64-linux-gnu
main: quantizing '/models/BAAI/bge-small-en-v1.5/bge-small-en-v1.5-f32.gguf' to '/models/BAAI/bge-small-en-v1.5/bge-small-en-v1.5-Q4_K_M.gguf' as Q4_K_M
llama_model_loader: loaded meta data with 30 key-value pairs and 197 tensors from /models/BAAI/bge-small-en-v1.5/bge-small-en-v1.5-f32.gguf (version GGUF V3 (latest))
llama_model_loader: Dumping metadata keys/values. Note: KV overrides do not apply in this output.
llama_model_loader: - kv   0:                       general.architecture str              = bert
llama_model_loader: - kv   1:                               general.type str              = model
llama_model_loader: - kv   2:                               general.name str              = Bge Small En v1.5
llama_model_loader: - kv   3:                            general.version str              = v1.5
llama_model_loader: - kv   4:                           general.finetune str              = en
llama_model_loader: - kv   5:                           general.basename str              = bge
llama_model_loader: - kv   6:                         general.size_label str              = small
llama_model_loader: - kv   7:                            general.license str              = mit
llama_model_loader: - kv   8:                               general.tags arr[str,5]       = ["sentence-transformers", "feature-ex...
llama_model_loader: - kv   9:                          general.languages arr[str,1]       = ["en"]
llama_model_loader: - kv  10:                           bert.block_count u32              = 12
llama_model_loader: - kv  11:                        bert.context_length u32              = 512
llama_model_loader: - kv  12:                      bert.embedding_length u32              = 384
llama_model_loader: - kv  13:                   bert.feed_forward_length u32              = 1536
llama_model_loader: - kv  14:                  bert.attention.head_count u32              = 12
llama_model_loader: - kv  15:          bert.attention.layer_norm_epsilon f32              = 0.000000
llama_model_loader: - kv  16:                          general.file_type u32              = 0
llama_model_loader: - kv  17:                      bert.attention.causal bool             = false
llama_model_loader: - kv  18:                          bert.pooling_type u32              = 2
llama_model_loader: - kv  19:            tokenizer.ggml.token_type_count u32              = 2
llama_model_loader: - kv  20:                       tokenizer.ggml.model str              = bert
llama_model_loader: - kv  21:                         tokenizer.ggml.pre str              = jina-v2-en
llama_model_loader: - kv  22:                      tokenizer.ggml.tokens arr[str,30522]   = ["[PAD]", "[unused0]", "[unused1]", "...
llama_model_loader: - kv  23:                  tokenizer.ggml.token_type arr[i32,30522]   = [3, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, ...
llama_model_loader: - kv  24:            tokenizer.ggml.unknown_token_id u32              = 100
llama_model_loader: - kv  25:          tokenizer.ggml.seperator_token_id u32              = 102
llama_model_loader: - kv  26:            tokenizer.ggml.padding_token_id u32              = 0
llama_model_loader: - kv  27:                tokenizer.ggml.cls_token_id u32              = 101
llama_model_loader: - kv  28:               tokenizer.ggml.mask_token_id u32              = 103
llama_model_loader: - kv  29:               general.quantization_version u32              = 2
llama_model_loader: - type  f32:  197 tensors
Segmentation fault (core dumped)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants