-
Notifications
You must be signed in to change notification settings - Fork 461
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Docs] inference DeepSeek-V3 with LMDeploy #2960
Comments
What's your testing gpu, can this run on 8*A100-80GB machine? |
8*H200 |
I have 6 servers of DGX 8*H100, how to make it run in multiple machines |
Sorry, LMDeploy hasn't supported pipeline parallelism yet. |
is fp8 supported now in LMDeploy? As above code snippet mentioned: deepseek-ai/DeepSeek-V3-FP8 |
PR #2967 |
I wonder if we can run an AWQ quant version of that big model. |
with 8 * H200 processing a request, how many tokens can be generated per second
|
also wannt to know this |
when trying for online deploy using below command-
I get this error Looks like DeepSeek-V3-FP8 model doesn't exist in the HF hub (https://huggingface.co/api/models) |
RuntimeError: Can not found rewrite for auto_map: DeepseekV3ForCausalLM |
Use this: deepseek-ai/DeepSeek-V3 |
how to deploy it on multi nodes(A100)? |
It hasn't supported deploying DSV3 on multi nodes yet. |
📚 The doc issue
LMDeploy, a flexible and high-performance inference and serving framework tailored for large language models, now supports DeepSeek-V3. It offers both offline pipeline processing and online deployment capabilities, seamlessly integrating with PyTorch-based workflows.
Installation
Offline Inference Pipeline
Online Serving
# run lmdeploy serve api_server deepseek-ai/DeepSeek-V3-FP8 --tp 8 --backend pytorch
To access the service, you can utilize the official OpenAI Python package
pip install openai
. Below is an example demonstrating how to use the entrypointv1/chat/completions
For more information, please refer to the following link: https://github.com/InternLM/lmdeploy/tree/support-dsv3
Suggest a potential alternative/fix
No response
The text was updated successfully, but these errors were encountered: