Coding With Fun
Home Docker Django Node.js Articles Python pip guide FAQ Policy

What's the difference between utf 8 and utf 16?


Asked by Sunny Montoya on Dec 14, 2021 FAQ



It's not that UTF-8 doesn't cover Chinese characters and UTF-16 does. UTF-16 uses uniformly 16 bits to represent a character; while UTF-8 uses 1, 2, 3, up to a max of 4 bytes, depending on the character, so that an ASCII character is represented still as 1 byte. Start with this Wikipedia article to get the idea behind it.
Besides,
1.UTF-8 is a widely used encoding while ANSI is an obsolete encoding scheme. 2.ANSI uses a single byte while UTF-8 is a multibyte encoding scheme. 3.UTF-8 can represent a wide variety of characters while ANSI is pretty limited.
Furthermore, UTF-8 is a compromise character encoding that can be as compact as ASCII (if the file is just plain English text) but can also contain any unicode characters (with some increase in file size). UTF stands for Unicode Transformation Format.
In respect to this,
Since UTF-8 is interpreted as a sequence of bytes, there is no endian problem as there is for encoding forms that use 16-bit or 32-bit code units. Where a BOM is used with UTF-8, it is only used as an encoding signature to distinguish UTF-8 from other encodings - it has nothing to do with byte order.
Likewise,
UTF-8 is a variable width character encoding capable of encoding all 1,112,064 valid code points in Unicode using one to four 8-bit bytes. The encoding is defined by the Unicode standard, and was originally designed by Ken Thompson and Rob Pike. The name is derived from Unicode (or Universal Coded Character Set) Transformation Format – 8-bit.