Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError when processing DOCX file with lists #738

Open
Je88e opened this issue Jan 14, 2025 · 0 comments
Open

TypeError when processing DOCX file with lists #738

Je88e opened this issue Jan 14, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@Je88e
Copy link

Je88e commented Jan 14, 2025

Bug

When processing a DOCX file containing lists, the code throws a TypeError due to comparison between an integer and NoneType in the handle_text_elements method of MsWordDocumentBackend. The error occurs when trying to close a list and comparing key with self.level_at_new_list which is None.

Specific error:

TypeError: '>=' not supported between instances of 'int' and 'NoneType'

The issue appears to be in the following code section:

        elif numid is None and self.prev_numid() is not None:  # Close list
            for key, val in self.parents.items():
                if key >= self.level_at_new_list:
                    self.parents[key] = None
            self.level = self.level_at_new_list - 1
            self.level_at_new_list = None

Steps to reproduce

  1. Try to convert the DOCX file using Docling:
from docling.document_converter import DocumentConverter

converter = DocumentConverter()
result = converter.convert("word_sample_v1.docx")
  1. The conversion fails with the TypeError mentioned above

Docling version

docling 2.15.1

Python version

Python 3.12

The bug appears to be related to list handling logic where self.level_at_new_list is not properly initialized or maintained when processing certain list structures in DOCX files. This needs to be fixed to properly handle the case when self.level_at_new_list is None.

@Je88e Je88e added the bug Something isn't working label Jan 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant