深刻理解ES6之《字符串及正則》

時間 2019-11-08

原文原文鏈接

字符串中的字符有兩種，一種是由一個編碼單元16位表示的BMP字符，另外一種是由兩個編碼單元32位表示的輔助平面字符
在ES5中，全部字符串的操做都是基於16位編碼單元
正則表達式

codePointAt

codePointAt 接受編碼單元的位置而非字符位置做爲參數，返回與字符串中給定位置對應的碼位，即一個整數值
也就是說對於BMP字符集中的字符，codePointAt方法的返回值與charCodeAt方法的相同，而對於非BMP字符集來講返回值則不一樣函數

let txt='?a'
console.log(txt.charCodeAt(0))//55362  --僅僅返回位置0處的第一個編碼單元
console.log(txt.charCodeAt(1))//57271
console.log(txt.charCodeAt(2))//97

console.log(txt.codePointAt(0))//134071  --返回完整的碼位，即便這個碼位包含多個編碼單元
console.log(txt.codePointAt(1))//57271
console.log(txt.codePointAt(2))//97

檢測字符佔用編碼單元數量編碼

function is32Bit(c) {
  return c.codePointAt(0)>0xFFFF
}
console.log(is32Bit('?'))//true
console.log(is32Bit('a'))//false

String.fromCodePoint

codePoint方法在字符串中檢索一個字符的碼位，也可使用String.fromCodePoint方法根據指定的碼位生成一個字符

能夠將String.fromCodePoint當作更完整版本的String.fromCharCode，由於對於BMP中的全部字符，這倆方法執行結果相同，只有傳遞非BMP的碼位做用參數時，兩者執行結果纔有可能不一樣spa

normalize

在對不一樣字符進行排序或比較時，會存在一種可能它們是等效的
一、規範的等效是指不管從哪一個角度來看，兩個序列的碼位都是沒有區別的
二、兩個互相兼容的碼位序列看起來不一樣，可是在特定狀況下能夠被交換使用
切記：在對比字符串前必定要把它們標準化爲同一種形式3d

let values = ["test", "demo", "compare", "sort"]
let normalized = values.map(function (txt) {
  return txt.normalize()
})
normalized.sort(function (first, second) {
  if (first < second) return -1
  else if (first === second) return 0
  else return 1
})

或者上述代碼也能夠這樣code

let values = ["test", "demo", "compare", "sort"]
values.sort(function (first, second) {
  // let firstNormalized = first.normalize(),
  //   secondNormalized = second.normalize();  //能夠寫成這種形式也能夠寫成以下這種形式
  let firstNormalized = first.normalize('NFC'),
  secondNormalized = second.normalize('NFC');
  if (firstNormalized < secondNormalized) return -1
  else if (firstNormalized === secondNormalized) return 0
  else return 1
})

Unicode標準化形式有以下幾種
orm

正則u修飾符

當一個正則表達式使用u修飾符時，它就從編碼單元操做切換爲字符模式blog

let txt = '?'
console.log(txt.length)//2
console.log(/^.$/.test(txt))//false
console.log(/^.$/u.test(txt))//true

經過上述特性，可檢測字符串真正長度排序

function codePointLength(txt) {
  let result = txt.match(/[\s\S]/gu)
  return result ? result.length : 0
}
console.log(codePointLength('abc'))//3
console.log(codePointLength('?ab'))//3
console.log('?ab'.length)//4

檢測引擎是否支持u修飾符索引

function hasRegExpU(params) {
  try {
    var pattern = new RegExp(".", "u")
    return true
  } catch (ex) {
    return false
  }
}

若是你的代碼仍然須要運行在老式的JS引擎中，使用修飾符時切記使用RegExp構造函數，這樣能夠避免發生語法錯誤，而且你能夠有選擇的檢測和使用u修飾符而不會形成系統異常終止

字符串中的子串識別

includes 在字符串中檢測是否包含指定文本
startsWith 在字符串的起始部分是否包含指定文本
endsWith 在字符串的結束部分是否包含指定文本
以上三個方法都接受兩個參數：一、要搜索的文本二、可選開始搜索的索引值，若是指定的第二個參數則會比這個索引值的位置開始匹配，endsWith則從字符串長度減法這個索引值的位置開始匹配

let msg = 'Hello world!'
console.log(msg.startsWith('Hello'))//true
console.log(msg.endsWith('!'))//true
console.log(msg.includes('o'))//true

console.log(msg.startsWith('o'))//false
console.log(msg.endsWith('world!'))//true
console.log(msg.includes('x'))//false

console.log(msg.startsWith('o', 4))//true--從字符串Hello中的o開始
console.log(msg.endsWith('o', 8))//true--字符串的位置是12減去8以後還餘下4
console.log(msg.includes('o', 8))//false--從字符串world中的r開始匹配

indexOf和lastIndexof是尋找子串的位置，而且若是你傳一個正則表達式進去的話，它們會將傳入的正則表達式轉化爲字符串並搜索它，而在includes等這三方法中，若是你不是傳入字符串而是一個正則表達式則會報錯

repeat 接受一個number類型的參數，表示該字符串的重複次數，返回當前字符串重複必定次數後的新字符串

console.log('X'.repeat(5))//XXXXX

正則表達式y修飾符

在字符串開始字符匹配時，它會通知搜索從正則表達式的lastIndex屬性開始進行，若是在指定位置未能成功匹配則中止繼續匹配

let text = 'hello1 hello2 hello3',
  pattern = /hello\d\s?/,
  result = pattern.exec(text),
  globalPattern = /hello\d\s?/g,
  glogbalResult = globalPattern.exec(text),
  stickyPattern = /hello\d\s?/y,
  stickyResult = stickyPattern.exec(text);

console.log(result[0])//hello1
console.log(glogbalResult[0])//hello1
console.log(stickyResult[0])//hello1
pattern.lastIndex = 1
globalPattern.lastIndex = 1
stickyPattern.lastIndex = 1
result = pattern.exec(text)
glogbalResult = globalPattern.exec(text)
stickyResult = stickyPattern.exec(text)
console.log(result[0])//hello1
console.log(glogbalResult[0])//hello2
console.log(stickyResult[0])//報錯

關於修飾符有2點：
一、只有調用exec和test方法纔會涉及lastIndex
二、當lastIndex的值爲0時，若是正則表達式中含有^則是否使用粘滯正則表達式並沒有差異，若是lastIndex的值不爲0則該表達式永遠不會匹配到正確結果

let pattern=/hello\d/y
console.log(pattern.sticky)//true

正則表達式複製

ES5中複製正則表達式只能這樣

var re1 = /ab/i,
  re2 = new RegExp(re1);

若是想要對re1從新指定修飾符則不行，ES6 增長了這一新功能

var re1 = /ab/i,
  re2 = new RegExp(re1, "g")
console.log(re1)//   /ab/i
console.log(re2)//   /ab/g
console.log(re1.test('ab'))//true
console.log(re2.test('ab'))//true
console.log(re1.test('AB'))//true
console.log(re2.test('AB'))//false

正則中的flags屬性

let re=/ab/g
console.log(re.source)// ab
console.log(re.flags)// g   --ES新增的屬性

模板字面量

在ES6以前多行字符串只能在一個新行最前方添加反斜槓來承接上一行代碼

var message='Multiline\
string'

可是這樣有一個問題，在控制檯輸出在了同一行，因此只能經過手工插入n換行符

ES6中經過反撇號來建立多行字符串，在反撇號中全部空白符都屬於字符串的一部分，因此千萬要當心縮進

字符串佔位符能夠包含任意的JS表達式，佔位符中可訪問做用域中全部可訪問的變量，嘗試嵌入一個未定義的變量老是會拋出錯誤

let count = 10,
  price = 2.5,
  message = `${count} items cost $ ${(count * price).toFixed(2)}.`
  //10 items cost $ 25.00.

字符串佔位符還能夠內嵌，內嵌的{後必須包含在反撇號中

let name = 'angela',
  message = `Hello,${`
    my name is ${name}`
  }`
console.log(message)

String.raw訪問字符串前的原生字符串

let msg1 = `Multiline\nstirng`,
  msg2 = String.raw`Multiline\nstring`
console.log(msg1)//Multiline
//string
console.log(msg2)//Multiline\nstring

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。