Qwen2-VL-72B-Instruct-GPTQ-Int4模型运行特别慢 #627

LYXFOREVER · 2024-12-31T07:43:34Z

我在一个有8张a100的服务器上运行程序，即便只是一个问题，模型也要花几十分钟甚至几个小时才能有答复。我用nvidia-smi看了一下显卡占用率，显存表示模型成功加载了，但是显卡核心使用率起伏不定。我以为是显卡的通信问题，于是限定程序在一张卡上面跑。但速度还是极慢。：

它似乎并没有占满整一张卡，但还是跑的特别慢。目前已经跑了几十分钟了，但没有反应。请问这个是为什么？要怎么解决？

我使用的代码就是huggingface上面的示例代码，只是把从网络上获取图像的部分改成了自己本地的图片。

LYXFOREVER · 2024-12-31T08:02:06Z

我之前在这个服务器上运行过OS-Atlas模型。它的原型是Qwen2VL的7B模型，运行的很好，速度非常快，一个问题基本上几秒钟就有答复了。换成72b模型之后就成这样了。是不是模型太大了？

Dineshkumar-Anandan-ZS0367 · 2025-01-06T09:51:14Z

For complex images, you can prompt to extract something table or all the data from the image. That is also takes 2-3 minues (190) seconds in Nvidia A100 machine.

I am also facing same issue. Even the param chucked_prefill is enabled?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Qwen2-VL-72B-Instruct-GPTQ-Int4模型运行特别慢 #627

Qwen2-VL-72B-Instruct-GPTQ-Int4模型运行特别慢 #627

LYXFOREVER commented Dec 31, 2024

LYXFOREVER commented Dec 31, 2024

Dineshkumar-Anandan-ZS0367 commented Jan 6, 2025 •

edited

Loading

Qwen2-VL-72B-Instruct-GPTQ-Int4模型运行特别慢 #627

Qwen2-VL-72B-Instruct-GPTQ-Int4模型运行特别慢 #627

Comments

LYXFOREVER commented Dec 31, 2024

LYXFOREVER commented Dec 31, 2024

Dineshkumar-Anandan-ZS0367 commented Jan 6, 2025 • edited Loading

Dineshkumar-Anandan-ZS0367 commented Jan 6, 2025 •

edited

Loading