You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Execute the code : pdftotree.parse(\PATH\TO\AIMCO-2019.pdf, html_path=\PATH\TO\output.html,visualize=False)
check hOCR output
Expected behavior
each page of the output file to have their own texts and tables.
Error Logs/Screenshots
Environment (please complete the following information):
OS: Windows 10, version 20H2
pdftotree Version: v0.5.0
pdfminer.six Version: 20211012
Additional context
if that issue suppose to happen, would it be possible to have a variable to keep track of text and table already extracted? (i am not very experienced in programming).
The text was updated successfully, but these errors were encountered:
Describe the bug
the first page and the second page of the ouput contain the same text. page 4 and 5 are the same thing as well.
To Reproduce
Steps to reproduce the behavior:
pdftotree.parse(\PATH\TO\AIMCO-2019.pdf, html_path=\PATH\TO\output.html,visualize=False)
Expected behavior
each page of the output file to have their own texts and tables.
Error Logs/Screenshots
Environment (please complete the following information):
pdftotree
Version: v0.5.0pdfminer.six
Version: 20211012Additional context
if that issue suppose to happen, would it be possible to have a variable to keep track of text and table already extracted? (i am not very experienced in programming).
The text was updated successfully, but these errors were encountered: