error in loading other document #2

simjak · 2024-11-02T15:49:28Z

Hey, thanks for awesome doc toolkit.

I tried to run pdf_path = "tests/test_files/direct_extract/single_column.pdf"

and got a following error:

2024-11-02 17:47:58,569 - rapid_layout - INFO: pp_layout_cdla contains ['text', 'title', 'figure', 'figure_caption', 'table', 'table_caption', 'header', 'footer', 'reference', 'equation']
  0%|                                                                                                                                                     | 0/1 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "/Users/jakit/simonas/open-source/RapidDoc/demo.py", line 13, in <module>
    result = pdf_parser(pdf_path)
             ^^^^^^^^^^^^^^^^^^^^
  File "/Users/jakit/simonas/open-source/RapidDoc/rapid_doc/main.py", line 74, in __call__
    txt_boxes, txts = self.run_direct_extract(i, img_width)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/jakit/simonas/open-source/RapidDoc/rapid_doc/main.py", line 105, in run_direct_extract
    txt_boxes, txts = self.pdf_extracter.extract_page_text(page_num, img_width)
    ^^^^^^^^^^^^^^^
ValueError: too many values to unpack (expected 2)

The text was updated successfully, but these errors were encountered:

liang-xian · 2024-11-18T08:26:07Z

同问

CY202227 · 2025-01-10T02:54:36Z

检查demo里的文档，发现是rapid_doc\main.py中判断是否是扫描版的时候判断结果不一致，遂注释掉以下片段
if self.is_extract(page): img_width = img.shape[1] txt_boxes, txts = self.run_direct_extract(i, img_width) else: tt_boxes, txts = self.run_ocr_extract(img)
不管是什么类型，都跑这个tt_boxes, txts = self.run_ocr_extract(img)
虽然表格依然没有识别到，但是段落都正常了。

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

error in loading other document #2

error in loading other document #2

simjak commented Nov 2, 2024

liang-xian commented Nov 18, 2024

CY202227 commented Jan 10, 2025 •

edited

Loading

error in loading other document #2

error in loading other document #2

Comments

simjak commented Nov 2, 2024

liang-xian commented Nov 18, 2024

CY202227 commented Jan 10, 2025 • edited Loading

CY202227 commented Jan 10, 2025 •

edited

Loading