-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix unicode.SimpleFold(r) #30
Comments
lowercase letter 'k' is \u004B |
Simply mapping [a-z] to [A-Z] should work for most simple ASCII-only text documents. |
the Unicode 6.0 spec has this to say about U+212A (KELVIN SIGN):
In other words, you shouldn't really be using U+212A, you should be using U+004B (LATIN CAPITAL LETTER K) instead, and if you normalize your Unicode text, U+212A should be replaced with U+004B. |
Three letterlike symbols have been given canonical equivalence to regular letters: In all three instances, the regular letter should be used. If text is normalized according to Unicode Standard Annex#15, “Unicode Normalization Forms,” these three characters will be replaced by their regular equivalents. |
We expect that
Right now
unicode.SimpleFold('k') == '\u212A'
*\u212A is 'K' the Kelvin char
This is not intuitive for ASCII simple folding.
Fix it so that
unicode.SimpleFold('k') == 'K'
The text was updated successfully, but these errors were encountered: