A new method for ES6 strings

May 08, 2021 12:00 ES6

Table of contents

1. String.fromCodePoint()

ES5 String.fromCharCode() returning the corresponding character Unicode code point, but this method does not recognize characters with a code point greater than 0xFFFF. 字符

  1. String.fromCharCode(0x20BB7)
  2. // "ஷ"

In the above code, String.fromCharCode() does not recognize the code point greater than 0xFFFF, so the 0x20BB7 overflows, the highest bit 2 is discarded, and finally returns the character for the code point U-0BB7 instead of the character for the code point U-20BB7.

ES6 String.fromCodePoint() which recognizes characters larger than 0xFFFF and compensates for the lack of the String.fromCharCode() method. In effect, it is the opposite of the codePointAt() method below.

  1. String.fromCodePoint(0x20BB7)
  2. // "????"
  3. String.fromCodePoint(0x78, 0x1f680, 0x79) === 'x\uD83D\uDE80y'
  4. // true

In the above code, if the String.fromCodePoint method has more than one parameter, they are combined and returned as a single string.

Note that the fromCodePoint method is defined on the String object, while the codePointAt method is defined on the instance object of the string.

2. String.raw()

ES6 provides a raw() method for native String objects. The method returns a string where a slash is escaped (that is, a slash is added in front of the slash) and is often used for the processing of template strings.

  1. String.raw`Hi\n${2+3}!`
  2. // 实际返回 "Hi\\n5!",显示的是转义后的结果 "Hi\n5!"
  3. String.raw`Hi\u000A!`;
  4. // 实际返回 "Hi\\u000A!",显示的是转义后的结果 "Hi\u000A!"

If the slash of the original string has been escaped, string .raw () is escaped again.

  1. String.raw`Hi\\n`
  2. // 返回 "Hi\\\\n"
  3. String.raw`Hi\\n` === "Hi\\\\n" // true

The String .raw() method can be used as a basic way to handle template strings, it replaces all variables, and the slashes are escaped so that the next step is to use them as strings.

String .raw () is essentially a normal function, just a label function dedicated to template strings. If written as a normal function, its first argument should be an object with a raw property, and the value of the raw property should be an array that corresponds to the parsed value of the template string.

  1. // `foo${1 + 2}bar`
  2. // 等同于
  3. String.raw({ raw: ['foo', 'bar'] }, 1 + 2) // "foo3bar"

In the above code, the .raw of the String method is an object whose raw property is equivalent to the array obtained by the original template string resolution.

As a function, string .raw () code implementation is basically as follows.

  1. String.raw = function (strings, ...values) {
  2. let output = '';
  3. let index;
  4. for (index = 0; index < values.length; index++) {
  5. output += strings.raw[index] + values[index];
  6. }
  7. output += strings.raw[index]
  8. return output;
  9. }

3. Example method: codePointAt()

Inside JavaScript, characters are UTF-16 format, with each character fixed at 2 bytes. JavaScript considers characters that require 4 bytes of storage (characters with Unicode code points greater than 0xFFFF) to be considered two characters.

  1. var s = "????";
  2. s.length // 2
  3. s.charAt(0) // ''
  4. s.charAt(1) // ''
  5. s.charCodeAt(0) // 55362
  6. s.charCodeAt(1) // 57271

In the code above, the Chinese character "????" ( Note that the code point for this word is not "auspicious") is 0x20BB7, and the UTF-16 is coded 0xD842 0xDFB7 (55362 57271) and requires 4 bytes of storage. JavaScript does not handle this 4-byte character correctly, the string length is misjudged to 2, and the charAt() method cannot read the entire character, and the charCodeAt() method returns only the values of the first two bytes and the last two bytes, respectively.

ES6 codePointAt() method that correctly handles 4 bytes of stored characters and returns a character's code point.

  1. let s = '????a';
  2. s.codePointAt(0) // 134071
  3. s.codePointAt(1) // 57271
  4. s.codePointAt(2) // 97

The parameter of the codePointAt() method, which is the position of the character in the string (starting at 0). I n the code above, JavaScript "???? a " is considered a three-character, and the codePointAt method correctly recognizes "????" on the first character, returning its heteer code point 134071 (i.e., heteen 20BB7). I n the second character (i.e., "????" On the last two bytes of and the third character "a", the codePointAt() method results in the same way as the charCodeAt() method.

In summary, the codePointAt() method correctly returns the code point of the 32-bit UTF-16 character. For those general characters stored with two bytes, it returns the same result as the charCodeAt() method.

The codePointAt() method returns a hedding value for the code point, which can be converted using the toString() method if you want a henrical value.

  1. let s = '????a';
  2. s.codePointAt(0).toString(16) // "20bb7"
  3. s.codePointAt(2).toString(16) // "61"

As you may have noticed, the parameters of the codePointAt() method are still incorrect. F or example, in the code above, the correct positional serial number of character a in string s should be 1, but 2 must be passed in to the codePointAt() method. O ne way to solve this problem is to use for... of loop because it correctly recognizes 32-bit UTF-16 characters.

  1. let s = '????a';
  2. for (let ch of s) {
  3. console.log(ch.codePointAt(0).toString(16));
  4. }
  5. // 20bb7
  6. // 61

Another approach is also to use an extension operator (... Expand the operation.

  1. let arr = [...'????a']; // arr.length === 2
  2. arr.forEach(
  3. ch => console.log(ch.codePointAt(0).toString(16))
  4. );
  5. // 20bb7
  6. // 61

The codePointAt() method is the easiest way to test whether a character consists of two or four bytes.

  1. function is32Bit(c) {
  2. return c.codePointAt(0) > 0xFFFF;
  3. }
  4. is32Bit("????") // true
  5. is32Bit("a") // false

4. Example method: normalize()

Many European languages have intonation and accents. T o represent them, Unicode provides two methods. O ne is to provide characters 重音符号 directly, such as Ǒ ('u01D1'). The other is to provide 合成符号 composition symbol, i.e., a composition of the original character with the accent, in which two characters are combined into one character, such as O ('u004F') and 'u030C' synthesis Ǒ ('u004F'u030C)."

Both are visually and semantically equivalent, but JavaScript is not recognized.

  1. '\u01D1'==='\u004F\u030C' //false
  2. '\u01D1'.length // 1
  3. '\u004F\u030C'.length // 2

The code above indicates that JavaScript treats synthetic characters as two characters, resulting in an unethicized two notions.

ES6 provides a normalize() unify the different notations of characters into the same form, which is called Unicode regularization.

  1. '\u01D1'.normalize() === '\u004F\u030C'.normalize()
  2. // true

The normalize method can accept a parameter to specify how normalize is, and the four optional values of the arguments are as follows.

  • NFC, the default parameter, represents a Standard Equivalent Composition that returns multiple simple characters. The so-called "standard equivalent" refers to visual and semantic equivalents.
  • NFD, which means "normalization form equivalent decomposition," which returns multiple simple characters of synthetic character decomposition on a standard equivalent.
  • NFKC, which means "compatible with equivalent composition," returns synthetic characters. T he so-called "compatible equivalent" refers to the semantic equivalent, but the visual inequality, such as "Yi" and "Happy". ( This is just an example, the normalize method does not recognize Chinese.)
  • NFKD, which means "compatible with equivalent decomposition," returns multiple simple characters of synthetic character decomposition, provided compatible with equivalents.

  1. '\u004F\u030C'.normalize('NFC').length // 1
  2. '\u004F\u030C'.normalize('NFD').length // 2

The above code indicates that the NFC parameter returns the synthetic form of the character, and the NFD parameter returns the decomposition form of the character.

However, the normalize method currently does not recognize compositions of three or more characters. In this case, you can still use only regular expressions, determined by the Unicode number interval.

5. Example methods: includes(), startsWith(), endsWith()

Traditionally, JavaScript has only indexOf methods that can be used to determine whether a string is contained in another string. ES6 provides 三种 approaches.

  • includes(): Returns a Boolean value to indicate whether an argument string has been found.
  • StartsWith(): Returns a Boolean value that indicates whether the argument string is at the head of the original string.
  • EndsWith(): Returns a Boolean value that indicates whether the argument string is at the end of the original string.

  1. let s = 'Hello world!';
  2. s.startsWith('Hello') // true
  3. s.endsWith('!') // true
  4. s.includes('o') // true

All three methods support the second parameter, indicating where the search began.

  1. let s = 'Hello world!';
  2. s.startsWith('world', 6) // true
  3. s.endsWith('Hello', 5) // true
  4. s.includes('Hello', 6) // false

The above code indicates that endsWith behaves differently from the other two methods when using the second parameter, n. It targets the first n characters, while the other two methods target from the nth position until the end of the string.

6. Example method: repeat()

repeat method returns a 新字符串 which means that the original string is repeated n times.

  1. 'x'.repeat(3) // "xxx"
  2. 'hello'.repeat(2) // "hellohello"
  3. 'na'.repeat(0) // ""

If the argument is a small number, it is rounded.

  1. 'na'.repeat(2.9) // "nana"

If repeat's argument is negative or Infinity, an error is reported.

  1. 'na'.repeat(Infinity)
  2. // RangeError
  3. 'na'.repeat(-1)
  4. // RangeError

However, if the argument is a small number between 0 and -1, it is equivalent to 0 because rounding is performed first. A small number between 0 and -1, which is equal to -0 after rounding, and repeat is treated as 0.

  1. 'na'.repeat(-0.9) // ""

The parameter NaN is equivalent to 0.

  1. 'na'.repeat(NaN) // ""

If the argument for repeat is a string, it is converted to a number first.

  1. 'na'.repeat('na') // ""
  2. 'na'.repeat('3') // "nanana"

7. Example method: padStart(), padEnd()

ES2017 the feature of 补全长度 I f a string does not have a specified length, it is patched at the head or tail. padStart() for head complements, padEnd() for tail complements.

  1. 'x'.padStart(5, 'ab') // 'ababx'
  2. 'x'.padStart(4, 'ab') // 'abax'
  3. 'x'.padEnd(5, 'ab') // 'xabab'
  4. 'x'.padEnd(4, 'ab') // 'xaba'

In the above code, padStart() and padEnd() accept a total of two parameters, the first is the maximum length of string complement effective, and the second argument is the string used to complete.

If the length of the original string is equal to or greater than the maximum length, the string complement does not take effect and returns the original string.

  1. 'xxx'.padStart(2, 'ab') // 'xxx'
  2. 'xxx'.padEnd(2, 'ab') // 'xxx'

If the string used to complement is the same as the original string, the sum of the lengths exceeds the maximum length, and the complement string that exceeds the number of digits is intercepted.

  1. 'abc'.padStart(10, '0123456789')
  2. // '0123456abc'

If the second argument is omitted, the full length is complemented by a space by default.

  1. 'x'.padStart(4) // ' x'
  2. 'x'.padEnd(4) // 'x '

A common use of padStart() is to specify a number of digits for numeric complementation. The following code generates a 10-bit numeric string.

  1. '1'.padStart(10, '0') // "0000000001"
  2. '12'.padStart(10, '0') // "0000000012"
  3. '123456'.padStart(10, '0') // "0000123456"

Another use is to prompt the string format.

  1. '12'.padStart(10, 'YYYY-MM-DD') // "YYYY-MM-12"
  2. '09-12'.padStart(10, 'YYYY-MM-DD') // "YYYY-09-12"

8. Example method: trimStart(), trimEnd()

ES2019 adds trimStart() trimEnd() string instances. T hey behave in trim() trimStart() eliminates spaces on the head of the string, trimEnd() eliminates spaces on the tail. They return a new string and do not modify the original string.

  1. const s = ' abc ';
  2. s.trim() // "abc"
  3. s.trimStart() // "abc "
  4. s.trimEnd() // " abc"

In the above code, trimStart() removes only the spaces from the head and retains the spaces at the tail. TrimEnd() behaves in a similar way.

In addition to the space bar, these two methods are also valid for blank symbols that are not visible, such as tab keys at the head (or tail) of the string, line breaks, and so on.

The browser also deploys two additional methods, trimLeft() an alias for trimStart() and trimRight() an alias for trimEnd().

9. Example method: matchAll()

matchAll() method returns a 正则表达式 all matches 所有匹配 the current string, as detailed in the chapter of The Extension of the Regular.