We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
When I give *xhtml files the output is:
<!-- image --> <!-- image --> <!-- image -->
Logs:
2025-01-14 12:02:29 - docling.backend.html_backend - DEBUG - html_backend.py:26 - __init__() - About to init HTML backend... 2025-01-14 12:02:29 - charset_normalizer - DEBUG - api.py:461 - from_bytes() - Encoding detection: utf_8 is most likely the one. 2025-01-14 12:02:30 - docling.document_converter - INFO - document_converter.py:238 - _convert() - Going to convert document batch... 2025-01-14 12:02:30 - docling.pipeline.base_pipeline - INFO - base_pipeline.py:37 - execute() - Processing document airbus.xhtml 2025-01-14 12:02:30 - docling.backend.html_backend - DEBUG - html_backend.py:77 - convert() - Trying to convert HTML... 2025-01-14 12:02:30 - docling.document_converter - INFO - document_converter.py:253 - _convert() - Finished converting document airbus.xhtml in 0.80 s
Pipeline options:
pipeline_options = PdfPipelineOptions( ocr_options=ocr_options, do_table_structure=True ) pipeline_options.table_structure_options.mode = TableFormerMode.ACCURATE pipeline_options.images_scale = IMAGE_RESOLUTION_SCALE pipeline_options.generate_picture_images = True converter = DocumentConverter( format_options={ InputFormat.PDF: PdfFormatOption( pipeline_options=pipeline_options, backend=DoclingParseV2DocumentBackend, ), InputFormat.HTML: HTMLFormatOption( pipeline_options=pipeline_options, backend=HTMLDocumentBackend, ), }, ) conversion_result: ConversionResult = converter.convert(source=params.file_path) return conversion_result
The text was updated successfully, but these errors were encountered:
No branches or pull requests
When I give *xhtml files the output is:
Logs:
Pipeline options:
The text was updated successfully, but these errors were encountered: