[Docs] inference DeepSeek-V3 with LMDeploy #2960

haswelliris · 2024-12-26T09:02:11Z

📚 The doc issue

LMDeploy, a flexible and high-performance inference and serving framework tailored for large language models, now supports DeepSeek-V3. It offers both offline pipeline processing and online deployment capabilities, seamlessly integrating with PyTorch-based workflows.

Installation

git clone -b support-dsv3 https://github.com/InternLM/lmdeploy.git
cd lmdeploy
pip install -e .

Offline Inference Pipeline

from lmdeploy import pipeline, PytorchEngineConfig

if __name__ == "__main__":
    pipe = pipeline("deepseek-ai/DeepSeek-V3-FP8", backend_config=PytorchEngineConfig(tp=8))
    messages_list = [
        [{"role": "user", "content": "Who are you?"}],
        [{"role": "user", "content": "Translate the following content into Chinese directly: DeepSeek-V3 adopts innovative architectures to guarantee economical training and efficient inference."}],
        [{"role": "user", "content": "Write a piece of quicksort code in C++."}],
    ]
    output = pipe(messages_list)
    print(output)

Online Serving

# run
lmdeploy serve api_server deepseek-ai/DeepSeek-V3-FP8 --tp 8 --backend pytorch

To access the service, you can utilize the official OpenAI Python package pip install openai. Below is an example demonstrating how to use the entrypoint v1/chat/completions

from openai import OpenAI
client = OpenAI(
    api_key='YOUR_API_KEY',
    base_url="http://0.0.0.0:23333/v1"
)
model_name = client.models.list().data[0].id
response = client.chat.completions.create(
  model=model_name,
  messages=[
    {"role": "user", "content": "Write a piece of quicksort code in C++."}
  ],
    temperature=0.8,
    top_p=0.8
)
print(response)

For more information, please refer to the following link: https://github.com/InternLM/lmdeploy/tree/support-dsv3

Suggest a potential alternative/fix

No response

The text was updated successfully, but these errors were encountered:

DragonFive · 2024-12-26T12:19:10Z

What's your testing gpu, can this run on 8*A100-80GB machine?

lvhan028 · 2024-12-26T12:54:50Z

8*H200

shuson · 2024-12-27T03:22:19Z

I have 6 servers of DGX 8*H100, how to make it run in multiple machines

lvhan028 · 2024-12-27T08:27:46Z

Sorry, LMDeploy hasn't supported pipeline parallelism yet.

Tushar-ml · 2024-12-27T10:09:18Z

is fp8 supported now in LMDeploy? As above code snippet mentioned: deepseek-ai/DeepSeek-V3-FP8
@lvhan028

lvhan028 · 2024-12-27T12:30:15Z

PR #2967
It hasn't been merged to main yet.

QwertyJack · 2024-12-28T11:07:51Z

I wonder if we can run an AWQ quant version of that big model.

tracyCzf · 2024-12-29T07:31:22Z

with 8 * H200 processing a request, how many tokens can be generated per second

8*H200

bb33bb · 2024-12-31T13:41:40Z

H200

also wannt to know this
plz

shashank-sensehq · 2025-01-02T10:59:22Z

when trying for online deploy using below command-

# run
lmdeploy serve api_server deepseek-ai/DeepSeek-V3-FP8 --tp 8 --backend pytorch

I get this error
404 Client Error: Not Found for url: https://huggingface.co/api/models/deepseek-ai/DeepSeek-V3-FP8/revision/main

Looks like DeepSeek-V3-FP8 model doesn't exist in the HF hub (https://huggingface.co/api/models)

haizeiwanglf · 2025-01-08T03:05:02Z

RuntimeError: Can not found rewrite for auto_map: DeepseekV3ForCausalLM

QwertyJack · 2025-01-08T08:02:34Z

when trying for online deploy using below command-
# run
lmdeploy serve api_server deepseek-ai/DeepSeek-V3-FP8 --tp 8 --backend pytorch
I get this error 404 Client Error: Not Found for url: https://huggingface.co/api/models/deepseek-ai/DeepSeek-V3-FP8/revision/main

Looks like DeepSeek-V3-FP8 model doesn't exist in the HF hub (https://huggingface.co/api/models)

Use this: deepseek-ai/DeepSeek-V3

janelu9 · 2025-01-09T06:06:26Z

how to deploy it on multi nodes(A100)?

lvhan028 · 2025-01-09T07:21:53Z

It hasn't supported deploying DSV3 on multi nodes yet.

lvhan028 pinned this issue Dec 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Docs] inference DeepSeek-V3 with LMDeploy #2960

[Docs] inference DeepSeek-V3 with LMDeploy #2960

haswelliris commented Dec 26, 2024

DragonFive commented Dec 26, 2024

lvhan028 commented Dec 26, 2024

shuson commented Dec 27, 2024

lvhan028 commented Dec 27, 2024

Tushar-ml commented Dec 27, 2024 •

edited

Loading

lvhan028 commented Dec 27, 2024

QwertyJack commented Dec 28, 2024

tracyCzf commented Dec 29, 2024

bb33bb commented Dec 31, 2024

shashank-sensehq commented Jan 2, 2025

haizeiwanglf commented Jan 8, 2025

QwertyJack commented Jan 8, 2025

janelu9 commented Jan 9, 2025

lvhan028 commented Jan 9, 2025

[Docs] inference DeepSeek-V3 with LMDeploy #2960

[Docs] inference DeepSeek-V3 with LMDeploy #2960

Comments

haswelliris commented Dec 26, 2024

📚 The doc issue

Installation

Offline Inference Pipeline

Online Serving

Suggest a potential alternative/fix

DragonFive commented Dec 26, 2024

lvhan028 commented Dec 26, 2024

shuson commented Dec 27, 2024

lvhan028 commented Dec 27, 2024

Tushar-ml commented Dec 27, 2024 • edited Loading

lvhan028 commented Dec 27, 2024

QwertyJack commented Dec 28, 2024

tracyCzf commented Dec 29, 2024

bb33bb commented Dec 31, 2024

shashank-sensehq commented Jan 2, 2025

haizeiwanglf commented Jan 8, 2025

QwertyJack commented Jan 8, 2025

janelu9 commented Jan 9, 2025

lvhan028 commented Jan 9, 2025

Tushar-ml commented Dec 27, 2024 •

edited

Loading