-
-
Notifications
You must be signed in to change notification settings - Fork 626
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update collation list up to MySQL 8.0.26 #1410
Conversation
utf8_general50_ci and ucs2_general50_ci seem to be removed. |
would it hurt leaving them? not sure if removing might affect user connecting to server version 5.0 |
I think utf8mb3 should be aliased to utf8 for now. Aliasing it to cesu-8 will hurt the performance significantly |
mysql utf8 strings are encoded as cesu-8, "real" utf8 is named utf8mb4 in mysql. utf8mb3 is same as utf8 ( that is, cesu-8 ) |
Plus add 3 more mappings for utf8mb4 307,308,309
I think that is the future. Currently, it is still utf8mb3 (https://dev.mysql.com/doc/refman/8.0/en/charset-unicode-utf8.html) Also this seems to clarify the differences between cesu8 and utf8mb3 https://www.wikiwand.com/en/Talk:UTF-8 |
@@ -104,7 +104,7 @@ module.exports = [ | |||
'eucjpms', | |||
'eucjpms', | |||
'cp1250', | |||
'utf8', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is utf16_unicode_ci. I don't think it should be encoded as utf-8 at all.
lets tackle utf8 / utf8mb3 / cesu-8 separately. Thanks for the link, interesting read! |
I just tried to store 😂 into utf8_unicode_ci and it said
|
is this good or bad? :) ( e.i is this a positive test for 5117e4f ? ) |
I think it's good. It shows that utf8mb3 cannot store non-BMP characters. |
I believe I was able to insert non-BMP 💩 using utf8 mysql encoding ( see #374 (comment) ), so not sure if that confirms "mysql utf8 = cesu-8", utf8mb3 = same as cecu-8 but only BMP is supported, utf8mb4 = "normal modern utf8" |
if you do it that way, you are converting 💩 into 6 invalid bytes (2 UTF-8 characters) in MySQL. MySQL may not be able to understand that character. It may be able to pass it in/out as if it is a two character though. However, I believe if you try to insert 💩 using .query() method, you won't be able to to that, right? |
unit test:
|
Can you try to insert 💩 into a column with UTF8_GENERAL_CI? |
@sidorares re:columnname, I think we should always use utf8 to decode the column name regardless the characterSet returned in Packet.FieldDefinition. We don't need to use cesu8 to decode it as the column should only contain BMP characters only. So whether you use cesu8 or utf8, they won't be any differences. |
can column name use other encodings ( win1251, koi8 for Cyrillic as an example )? If yes not sure if worth adding exception / custom logic for perf benefits |
Let me check. |
any update on this? |
@ahmedbodi I think the discussion got sidetracked a bit into more complex issue, updated collation list LGTM to me. Merging now |
I noticed that some collations are missing so I pulled up the latest collation list from MySQL by