Docling serve at scale #10

aniketmaurya · 2024-11-19T13:20:39Z

Docling is a great project! Got to know about this from Spacy-layout.

This is powered by vanilla FastAPI, which is good but won't scale and lacks stuff like dynamic batching and autoscaling. I would suggest to use a library specialized for serving ML based APIs like LitServe or RayServe.

gsogol · 2024-11-19T20:07:17Z

Also, to add to the question above regarding scaling, how do you scale this based hundreds of requests per second? If you're running in the cloud, do you spin up multiple containers?

aniketmaurya · 2024-11-19T20:10:43Z

Single container: So the benchmark shows BERT-Large model with automatic batching and multiprocessing. A single model process can runs prediction on a batch of 16-32 requests to increase the throughput. Additionally, if GPU memory allows it can also spin up extra process to handle more requests. The requests are load balanced on process level via uvicorn socket.

And yes, in cloud you can also spin up multiple containers for further scale.

vishnoianil · 2024-12-10T17:44:50Z

Single container: So the benchmark shows BERT-Large model with automatic batching and multiprocessing. A single model process can runs prediction on a batch of 16-32 requests to increase the throughput. Additionally, if GPU memory allows it can also spin up extra process to handle more requests. The requests are load balanced on process level via uvicorn socket.

And yes, in cloud you can also spin up multiple containers for further scale.

Agree, we need both the options to scale the apis. Thanks for raising the issue and capturing the details.

aniketmaurya · 2024-12-15T18:05:40Z

Thank you @vishnoianil! If the maintainers are okay with this then I can send a PR for this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Docling serve at scale #10

Docling serve at scale #10

aniketmaurya commented Nov 19, 2024

gsogol commented Nov 19, 2024

aniketmaurya commented Nov 19, 2024 •

edited

Loading

vishnoianil commented Dec 10, 2024

aniketmaurya commented Dec 15, 2024

Docling serve at scale #10

Docling serve at scale #10

Comments

aniketmaurya commented Nov 19, 2024

gsogol commented Nov 19, 2024

aniketmaurya commented Nov 19, 2024 • edited Loading

vishnoianil commented Dec 10, 2024

aniketmaurya commented Dec 15, 2024

aniketmaurya commented Nov 19, 2024 •

edited

Loading