Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identify and encode footnotes / endnotes #44

Open
tfmorris opened this issue Mar 24, 2016 · 0 comments
Open

Identify and encode footnotes / endnotes #44

tfmorris opened this issue Mar 24, 2016 · 0 comments

Comments

@tfmorris
Copy link
Collaborator

Some texts have footnotes at the bottom of the page which need to, at a minimum, be identified and moved out of line. Better would be to split them from each other and associate them with the reference callout.

A relatively complex example can be seen on this page: http://access.bl.uk/item/viewer/lsidyv2b04fe9b#ark:/81055/vdc_0000000006D1.0x000011

It uses symbols for the the references (which are misrecognized by the OCR) and is formatted into a variable number of columns (2 or 3, depending on the line). The misrecognized typographical symbols in the body of the text represent noise added to the text.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant