Coding With Fun
Home Docker Django Node.js Articles Python pip guide FAQ Policy

What is the difference between utf-8 and utf-16?


Asked by Evangeline Delacruz on Dec 14, 2021 FAQ



Utf-8 and utf-16 both handle the same Unicode characters. They are both variable length encodings that require up to 32 bits per character. The difference is that Utf-8 encodes the common characters including English and numbers using 8-bits. Utf-16 uses at least 16-bits for every character.
Furthermore,
ISO-8859-1 uses a single byte to represent each character in this range whereas UTF-8 uses two bytes to represent each character in this range. ISO-8859-1 does not support any character mappings above the FF encoding value, whereas UTF-8 continues supporting encodings represented by 2, 3, and 4 byte values.
In this manner, UCS-2 is a fixed width encoding that uses two bytes for each character; meaning, it can represent up to a total of 216 characters or slightly over 65 thousand. On the other hand, UTF-16 is a variable width encoding scheme that uses a minimum of 2 bytes and a maximum of 4 bytes for each character. This lets UTF-16 represent any character in Unicode while using minimal space for the most commonly used characters.
In fact,
UTF-16 is a concept of text represented in 16-bit elements but an actual textual character may consist of more than one element. std::wstring is just a collection of these elements, and is a class primarily concerned with their storage. The elements in a wstring, wchar_t is at least 16-bits but could be 32 bits.
Indeed,
UTF-8 is a variable-width character encoding used for electronic communication. Defined by the Unicode Standard, the name is derived from Unicode (or Universal Coded Character Set) Transformation Format - 8-bit. UTF-8 is capable of encoding all 1,112,064 valid character code points in Unicode using one to four one- byte (8-bit) code units.