Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Docs] inference DeepSeek-V3 with LMDeploy #2960

Open
haswelliris opened this issue Dec 26, 2024 · 14 comments
Open

[Docs] inference DeepSeek-V3 with LMDeploy #2960

haswelliris opened this issue Dec 26, 2024 · 14 comments

Comments

@haswelliris
Copy link

📚 The doc issue

LMDeploy, a flexible and high-performance inference and serving framework tailored for large language models, now supports DeepSeek-V3. It offers both offline pipeline processing and online deployment capabilities, seamlessly integrating with PyTorch-based workflows.

Installation

git clone -b support-dsv3 https://github.com/InternLM/lmdeploy.git
cd lmdeploy
pip install -e .

Offline Inference Pipeline

from lmdeploy import pipeline, PytorchEngineConfig

if __name__ == "__main__":
    pipe = pipeline("deepseek-ai/DeepSeek-V3-FP8", backend_config=PytorchEngineConfig(tp=8))
    messages_list = [
        [{"role": "user", "content": "Who are you?"}],
        [{"role": "user", "content": "Translate the following content into Chinese directly: DeepSeek-V3 adopts innovative architectures to guarantee economical training and efficient inference."}],
        [{"role": "user", "content": "Write a piece of quicksort code in C++."}],
    ]
    output = pipe(messages_list)
    print(output)

Online Serving

# run
lmdeploy serve api_server deepseek-ai/DeepSeek-V3-FP8 --tp 8 --backend pytorch

To access the service, you can utilize the official OpenAI Python package pip install openai. Below is an example demonstrating how to use the entrypoint v1/chat/completions

from openai import OpenAI
client = OpenAI(
    api_key='YOUR_API_KEY',
    base_url="http://0.0.0.0:23333/v1"
)
model_name = client.models.list().data[0].id
response = client.chat.completions.create(
  model=model_name,
  messages=[
    {"role": "user", "content": "Write a piece of quicksort code in C++."}
  ],
    temperature=0.8,
    top_p=0.8
)
print(response)

For more information, please refer to the following link: https://github.com/InternLM/lmdeploy/tree/support-dsv3

Suggest a potential alternative/fix

No response

@lvhan028 lvhan028 pinned this issue Dec 26, 2024
@DragonFive
Copy link

What's your testing gpu, can this run on 8*A100-80GB machine?

@lvhan028
Copy link
Collaborator

8*H200

@shuson
Copy link

shuson commented Dec 27, 2024

I have 6 servers of DGX 8*H100, how to make it run in multiple machines

@lvhan028
Copy link
Collaborator

Sorry, LMDeploy hasn't supported pipeline parallelism yet.

@Tushar-ml
Copy link

Tushar-ml commented Dec 27, 2024

is fp8 supported now in LMDeploy? As above code snippet mentioned: deepseek-ai/DeepSeek-V3-FP8
@lvhan028

@lvhan028
Copy link
Collaborator

PR #2967
It hasn't been merged to main yet.

@QwertyJack
Copy link
Contributor

I wonder if we can run an AWQ quant version of that big model.

@tracyCzf
Copy link

with 8 * H200 processing a request, how many tokens can be generated per second

8*H200

@bb33bb
Copy link

bb33bb commented Dec 31, 2024

H200

also wannt to know this
plz

@shashank-sensehq
Copy link

when trying for online deploy using below command-

# run
lmdeploy serve api_server deepseek-ai/DeepSeek-V3-FP8 --tp 8 --backend pytorch

I get this error
404 Client Error: Not Found for url: https://huggingface.co/api/models/deepseek-ai/DeepSeek-V3-FP8/revision/main

Looks like DeepSeek-V3-FP8 model doesn't exist in the HF hub (https://huggingface.co/api/models)

@haizeiwanglf
Copy link

RuntimeError: Can not found rewrite for auto_map: DeepseekV3ForCausalLM

@QwertyJack
Copy link
Contributor

when trying for online deploy using below command-

# run
lmdeploy serve api_server deepseek-ai/DeepSeek-V3-FP8 --tp 8 --backend pytorch

I get this error 404 Client Error: Not Found for url: https://huggingface.co/api/models/deepseek-ai/DeepSeek-V3-FP8/revision/main

Looks like DeepSeek-V3-FP8 model doesn't exist in the HF hub (https://huggingface.co/api/models)

Use this: deepseek-ai/DeepSeek-V3

@janelu9
Copy link

janelu9 commented Jan 9, 2025

how to deploy it on multi nodes(A100)?

@lvhan028
Copy link
Collaborator

lvhan028 commented Jan 9, 2025

It hasn't supported deploying DSV3 on multi nodes yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests