Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inner HTML elements are processed after the tail text #333

Closed
HiromuHota opened this issue Oct 15, 2019 · 2 comments · Fixed by #520
Closed

Inner HTML elements are processed after the tail text #333

HiromuHota opened this issue Oct 15, 2019 · 2 comments · Fixed by #520

Comments

@HiromuHota
Copy link
Contributor

Describe the bug

In tests/data/html_simple/md.html, <em>italics and later <strong>bold</strong></em>. Even is processed in the following order:

  1. italics and later
  2. . Even
  3. bold

This is illustrated in #12 too.

To Reproduce
Steps to reproduce the behavior:

  1. Run tests/parser/test_parser.py::test_parse_md_details and set a breakpoint

Expected behavior

The said sentence is processed in the following order:

  1. italics and later
  2. bold
  3. . Even

Environment (please complete the following information):

  • Fonduer Version: [0.7.0 and master(3d5392c)]

Additional context
Add any other context about the problem here.

Why this happens?

When the node is <em>italics and later <strong>bold</strong></em>, node.text (=italics and later ) and node.tail (=. Even) are processed, the next node <strong>bold</strong> is processed.

@senwu
Copy link
Collaborator

senwu commented Jun 20, 2020

So, the expected order should be (1) node.text, (2) text in the inner node, and (node.tail)?

@HiromuHota
Copy link
Contributor Author

I think so.
This is especially true when you want to align words between HTML and PDF (related to #12).
I don't think this is a major issue, though.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants