Coding With Fun
Home Docker Django Node.js Articles Python pip guide FAQ Policy

The concept of Code Page and its set values


May 22, 2021 DOS Command learning manual



The concept of CODE PAGE in CMD and its set values:


Chcp

Displays the number of the active console code page, or changes the console's active console code page. Used without parameters, chcp displays the number of the active console code page.

Syntax

chcp [nnn]

Parameters

nnn : Specifies the code page. T he following table lists each code page supported and its country/region or language:

Code page Country/region or language

437 United States

850 Multilingual (Latin I)

852 Slavic (Latin II)

855 Cyrillic (Russian)

857 Turkish

860 Portuguese

861 Icelandic

863 Canadian-French

865 Nordic

866 Russian

869 Modern Greek

What is code page and how to modify codepage in windows cmd



If your cmd does not display Chinese characters, or other characters, modified by chcp, the parameter is nnn for 3 numbers. Chinese Simplified codepage is: 936 West: 1252



Code page history:




1. Definition and history of Codepage

The character code refers to the inner code used to represent the character. R eaders are required to use internal codes when entering and storing documents, which are divided into internal codes


Single-byte code - Single-Byte character sets (SBCS) that can support 256 character encodings.

Double-byte code - Double-Byte character sets (DBCS) that can support 65,000 character encodings. I t is mainly used to encode the eastern text of large character sets.

Codepage refers to a selected list of characters in a specific order, and for the language of the early single-byte code, the internal code order in codepage allows the system to follow this list to give a corresponding inner code based on the input value of the keyboard. F or double-byte codes, the corresponding table from MultiByte to Unicode is given so that the characters stored in Unicode can be converted into corresponding character in-character codes, or conversely, the corresponding functions in the Linux core are utf8_mbtowc and utf8_wctomb.

Until 1980, there were still no international standards such as ISO-8859 or Unicode to define how to extend the US-ASCII code for use by users in non-English speaking countries. M any IT vendors invent their own codes and use hard-to-remember numbers to identify:




936, for example, Chinese Simplified. 9 50 represents Chinese Traditional.




1.1 CJK Codepage

In a very different way from the Extended Unix Coding (EUC) encoding, all of the following Far Eastern codepages use the C1 control code, which is used as the first byte, and the ASCII value, which is the second byte, to contain up to tens of thousands of double-byte characters, indicating that the ASCII value of less than 3F does not represent the ASCII character in this encoding.


CP932


Shift-JIS contains the Japanese language charset JIS X 0201 (one byte per character) and JIS X 0208 (two bytes per character), so the JIS X 0201 pseudonym contains one byte and a half wide character, with the remaining 60 bytes being used as the first byte of 7,076 characters and 648 other full-width characters. I n addition to EUC-JP coding, Shift-JIS does not contain the 5802 characters defined in JIS X 202.


CP936


GBK extends the EUC-CN encoding (GB 2312-80 encoding, containing 6763 Characters) to the 20902 Chinese characters defined in Unicode (GB13000.1-93), Chinese mainland using Chinese Simplifiedzh_CN.


CP949


UnifiedHangul (UHC) is an overset of Korean EUC-KR codes (KS C 5601-1992 codes, including 2350 Korean syllables and 4888 Chinese characters a) containing 8822 additional Korean syllables (in C1)


CP950


Is the Big5 code (13072 traditional zh_TW Chinese words) in place of EUC-TW (CNS 11643-1992) Chinese Traditional, all of which are in Ken Lunde's CJK. F ound in INF or in the Unicode coding table.


Note: Microsoft uses the four Codepages above, so access to Microsoft's file system is required using the Codepage above .



1.2 Codepage, IBM's Far Eastern language

IBM's Codepage is divided into SBCS and DBCS:


IBM SBCS Codepage



37 (US)

290 (Japanese)

833 (Korean)

836 (Chinese Simplified)

891 (Korean)

897 (Japanese)

903 (Chinese Simplified)

904 (Chinese Traditional)

IBM DBCS Codepage


300 (Japanese)

301 (Japanese)

834 (Korean)

835 (Chinese Traditional)

837 (Chinese Simplified)

926 (Korean)

927 (Chinese Traditional)

928 (Chinese Simplified)

Mixing codepage from SBCS with Codepage from DBCS becomes: IBM MBCS Codepage


930 (Japanese) (Codepage 300 plus 290)

932 (Japanese) (Codepage 301 plus 897)

933 (Korean) (Codepage 834 plus 833)

934 (Korean) (Codepage 926 plus 891)

938 (Chinese Traditional) (Codepage 927 plus 904)

936 (Chinese Simplified) (Codepage 928 plus 903)

5031 (Chinese Simplified) (Codepage 837 plus 836)

5033 (Chinese Traditional) (Codepage 835 plus 37)

The EBCDIC encoding format is used


Thus, Mircosoft's CJK Codepage comes from IBM's Codepage