-
Notifications
You must be signed in to change notification settings - Fork 33
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Md syntax compare #356
Md syntax compare #356
Conversation
This script finds all md files, associates all translations and analyzes the html tag structure and can detect when translations differ from the english original. Use --lang de,fr,... to restrict the operation to specific subset of languages only. Use --include-regex to only scan for matching tags. Use --exclude-regex to remove tags that are often irrelevant (e.g. /p#). The above regexes are applied to string representation of the html tag which uses the following schema: /enclosing/tag/names#attr1=val1#attr2=val2#... To match specific tag, you want to use /tagname# regex.
1. sort imports 2. eliminate spurious newlines from diffs 3. return non-zero if diffs found 4. only report on languages where diffs are found
1. added --show-tag-summaries. Shows summaries of tags that diverge. 2. Exclude alt= attribute when building string representation. This is bound to always be different.
This pull request is being automatically deployed with now-deployment Built with commit 3c9d259 ✅ Preview: https://guide-preview-2ftn8934r.now.sh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess the script is OK. Not really good at Python, but I didn't see anything nasty 👍.
I'd like to have some instructions on how to install the dependencies to be able to run this. I guess that'll come with the PR that actually runs this?
Noted. I think it might make sense to document how to use this script on our wiki page (along with instructions on how to install dependencies). If it will be built into action, this will come with the dependency installation too. |
Seems like using docker is the standard way to wrap things in this project so I will investigate if I could use that. |
This script allows comparing structural content (element tree) of the markdown files. This compares English (original) with the translated markdowns files, generates html and then produces diffs for any structural differences found.
Markdown files are transformed into html and parsed into element tree (using lxml). Documents are then traversed and turned into string representations /path/to/tag#attr1=value1#attr2=value2. Lists of these strings are then compared and diffed for each (english, translation) pair.
Flags for this script are:
This script can be used to find issues with markdown styling in translations and detect files where style has drifted between the original content and translations. The underlying issue is described in #354