Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Generalise re_replacement_seq to deal with all cases #136

Merged
merged 2 commits into from
Jan 8, 2025

Conversation

saattrupdan
Copy link
Contributor

@saattrupdan saattrupdan commented Jan 7, 2025

This PR is similar to #90, and generalises the regex to deal with all the previous, and hopefully all future cases as well.

The new special case not covered by the previous approach are the �? and �, tokens, used by Salamandra models. Since all these special tokens (new and old) consist of one or more � symbols, with an optional single-character prefix and/or suffix, we can simplify and generalise the pattern to r"^.?�+.?$".

Tagging @torymur here, since you reviewed the previous PR.

@torymur torymur added the bug Something isn't working label Jan 8, 2025
Copy link
Contributor

@torymur torymur left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you @saattrupdan for improvements on this one!

@torymur torymur merged commit cad6344 into dottxt-ai:main Jan 8, 2025
8 checks passed
@saattrupdan saattrupdan deleted the fix/re-replacement-seq branch January 8, 2025 17:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants