We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hey, thanks for awesome doc toolkit.
I tried to run pdf_path = "tests/test_files/direct_extract/single_column.pdf"
pdf_path = "tests/test_files/direct_extract/single_column.pdf"
and got a following error:
2024-11-02 17:47:58,569 - rapid_layout - INFO: pp_layout_cdla contains ['text', 'title', 'figure', 'figure_caption', 'table', 'table_caption', 'header', 'footer', 'reference', 'equation'] 0%| | 0/1 [00:00<?, ?it/s] Traceback (most recent call last): File "/Users/jakit/simonas/open-source/RapidDoc/demo.py", line 13, in <module> result = pdf_parser(pdf_path) ^^^^^^^^^^^^^^^^^^^^ File "/Users/jakit/simonas/open-source/RapidDoc/rapid_doc/main.py", line 74, in __call__ txt_boxes, txts = self.run_direct_extract(i, img_width) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/Users/jakit/simonas/open-source/RapidDoc/rapid_doc/main.py", line 105, in run_direct_extract txt_boxes, txts = self.pdf_extracter.extract_page_text(page_num, img_width) ^^^^^^^^^^^^^^^ ValueError: too many values to unpack (expected 2)
The text was updated successfully, but these errors were encountered:
同问
Sorry, something went wrong.
检查demo里的文档,发现是rapid_doc\main.py中判断是否是扫描版的时候判断结果不一致,遂注释掉以下片段 if self.is_extract(page): img_width = img.shape[1] txt_boxes, txts = self.run_direct_extract(i, img_width) else: tt_boxes, txts = self.run_ocr_extract(img) 不管是什么类型,都跑这个tt_boxes, txts = self.run_ocr_extract(img) 虽然表格依然没有识别到,但是段落都正常了。
if self.is_extract(page): img_width = img.shape[1] txt_boxes, txts = self.run_direct_extract(i, img_width) else: tt_boxes, txts = self.run_ocr_extract(img)
No branches or pull requests
Hey, thanks for awesome doc toolkit.
I tried to run
pdf_path = "tests/test_files/direct_extract/single_column.pdf"
and got a following error:
The text was updated successfully, but these errors were encountered: