Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docling serve at scale #10

Open
aniketmaurya opened this issue Nov 19, 2024 · 4 comments
Open

Docling serve at scale #10

aniketmaurya opened this issue Nov 19, 2024 · 4 comments

Comments

@aniketmaurya
Copy link

Docling is a great project! Got to know about this from Spacy-layout.

This is powered by vanilla FastAPI, which is good but won't scale and lacks stuff like dynamic batching and autoscaling. I would suggest to use a library specialized for serving ML based APIs like LitServe or RayServe.

@gsogol
Copy link

gsogol commented Nov 19, 2024

Also, to add to the question above regarding scaling, how do you scale this based hundreds of requests per second? If you're running in the cloud, do you spin up multiple containers?

@aniketmaurya
Copy link
Author

aniketmaurya commented Nov 19, 2024

Single container: So the benchmark shows BERT-Large model with automatic batching and multiprocessing. A single model process can runs prediction on a batch of 16-32 requests to increase the throughput. Additionally, if GPU memory allows it can also spin up extra process to handle more requests. The requests are load balanced on process level via uvicorn socket.

And yes, in cloud you can also spin up multiple containers for further scale.

@vishnoianil
Copy link
Collaborator

Single container: So the benchmark shows BERT-Large model with automatic batching and multiprocessing. A single model process can runs prediction on a batch of 16-32 requests to increase the throughput. Additionally, if GPU memory allows it can also spin up extra process to handle more requests. The requests are load balanced on process level via uvicorn socket.

And yes, in cloud you can also spin up multiple containers for further scale.

Agree, we need both the options to scale the apis. Thanks for raising the issue and capturing the details.

@aniketmaurya
Copy link
Author

Thank you @vishnoianil! If the maintainers are okay with this then I can send a PR for this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants