Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Eval bug: How to load clip_model_load to CUDA #11250

Open
zzc98 opened this issue Jan 15, 2025 · 2 comments
Open

Eval bug: How to load clip_model_load to CUDA #11250

zzc98 opened this issue Jan 15, 2025 · 2 comments

Comments

@zzc98
Copy link

zzc98 commented Jan 15, 2025

Name and Version

version: 4393 (d79d8f3)
built with x86_64-conda-linux-gnu-cc (conda-forge gcc 14.2.0-1) 14.2.0 for x86_64-conda-linux-gnu

Operating systems

Linux

GGML backends

CUDA

Hardware

NVIDIA GeForce RTX 4090

Models

Qwen2-VL-7B-Instruct-Q5_K_M.gguf

Problem description & steps to reproduce

I use the following command:

llama-qwen2vl-cli -m Qwen2-VL-7B-Instruct-Q5_K_M.gguf --mmproj mmproj-Qwen2-VL-7B-Instruct-f16.gguf -p Describe this picture --image demo.jpeg

observe

clip_model_load: CLIP using CPU backend

How to load clip_model_load to CUDA

First Bad Commit

No response

Relevant log output

llm_load_print_meta: max token length = 256
llm_load_tensors: offloading 28 repeating layers to GPU
llm_load_tensors: offloading output layer to GPU
llm_load_tensors: offloaded 29/29 layers to GPU

clip_model_load: - type  f32:  325 tensors
clip_model_load: - type  f16:  196 tensors
clip_model_load: CLIP using CPU backend
clip_model_load: text_encoder:   0
clip_model_load: vision_encoder: 1
@danbev
Copy link
Collaborator

danbev commented Jan 15, 2025

I think you are missing the --n-gpu-layers option so that model layers are offloaded to the GPU:

-ngl N, --n-gpu-layers N: When compiled with GPU support, this option allows offloading some layers to the GPU for computation. Generally results in increased performance.

@ngxson
Copy link
Collaborator

ngxson commented Jan 15, 2025

GPU backend support is intentionally disabled for clip. I'm not sure why, but probably missing kernel (so it will crash if you force loading to GPU)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants