Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qwen2-VL-72B-Instruct-GPTQ-Int4模型运行特别慢 #627

Open
LYXFOREVER opened this issue Dec 31, 2024 · 2 comments
Open

Qwen2-VL-72B-Instruct-GPTQ-Int4模型运行特别慢 #627

LYXFOREVER opened this issue Dec 31, 2024 · 2 comments

Comments

@LYXFOREVER
Copy link

我在一个有8张a100的服务器上运行程序,即便只是一个问题,模型也要花几十分钟甚至几个小时才能有答复。我用nvidia-smi看了一下显卡占用率,显存表示模型成功加载了,但是显卡核心使用率起伏不定。我以为是显卡的通信问题,于是限定程序在一张卡上面跑。但速度还是极慢。:
image
它似乎并没有占满整一张卡,但还是跑的特别慢。目前已经跑了几十分钟了,但没有反应。请问这个是为什么?要怎么解决?

我使用的代码就是huggingface上面的示例代码,只是把从网络上获取图像的部分改成了自己本地的图片。

@LYXFOREVER
Copy link
Author

我之前在这个服务器上运行过OS-Atlas模型。它的原型是Qwen2VL的7B模型,运行的很好,速度非常快,一个问题基本上几秒钟就有答复了。换成72b模型之后就成这样了。是不是模型太大了?

@Dineshkumar-Anandan-ZS0367
Copy link

Dineshkumar-Anandan-ZS0367 commented Jan 6, 2025

For complex images, you can prompt to extract something table or all the data from the image. That is also takes 2-3 minues (190) seconds in Nvidia A100 machine.

I am also facing same issue. Even the param chucked_prefill is enabled?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants