Coding With Fun
Home Docker Django Node.js Articles Python pip guide FAQ Policy

What's the difference between char and utf8 char?


Asked by Kyler Powers on Nov 30, 2021 FAQ



Indeed, char seems to imply a character, whereas in the context of a UTF8 string, it may be just one byte of a multibyte character.
And,
In the character string, the ü shows up as the single character with code number 0xFC. In the UTF-8 version, the code number 0xFC is represented as 0xC3 0xBC. Since this is just a string of octets, Perl thinks that this version is one character longer:
Similarly, The character set named utf8utf8mb3] uses a maximum of three bytes per character and contains only BMP characters. As of MySQL 5.5.3, the utf8mb4 character set uses a maximum of four bytes per character supports supplemental characters:
Keeping this in consideration,
While the utf8 charset is able to store Chinese, Japanese, and Korean characters (which are in the Basic Multilingual Plane), it may still not be able to store all the characters that you want. For example, with a utf8 charset, it is not possible to insert the Unicode character 'SNOWMAN' (U+2603) ☃, but this is possible with utf8mb4 charsets.
Furthermore,
When using ASCII only characters, a UTF-16 encoded file would be roughly twice as big as the same file encoded with UTF-8. The main advantage of UTF-8 is that it is backwards compatible with ASCII. The ASCII character set is fixed width and only uses one byte.