Skip to content

Supported Languages #41

Answered by dolfim-ibm
Juhong-Namgung asked this question in Q&A
Aug 22, 2024 · 5 comments · 2 replies
Discussion options

You must be logged in to vote

We should distinguish between 1) programmatic documents and 2) scanned documents.

In the first case, we are language independent, we have tested Asian languages with good success.
In the second case, we depend on the underlying OCR engine. At the moment we have binding for EasyOCR which has support for 80+ languages. On their website you find the language parameters to provide.

We are actually extending Docling with a simpler way to change OCR backend and customize the parameters. For the moment changing the config requires you to make a new ModelPipeline object.

Replies: 5 comments 2 replies

Comment options

You must be logged in to vote
2 replies
@EdwardSJ151
Comment options

@cau-git
Comment options

Answer selected by Juhong-Namgung

This comment was marked as spam.

This comment was marked as spam.

This comment was marked as spam.

Comment options

You must be logged in to vote
0 replies
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Category
Q&A
Labels
None yet
6 participants