精讀《正則 ES2018》

時間 2019-11-21

標籤精讀正則 ES2018 欄目正則表達式简体版

原文原文鏈接

1. 引言

本週精讀的文章是 regexp-features-regular-expressions。html

這篇文章介紹了 ES2018 正則支持的幾個重要特性：前端

Lookbehind assertions - 後行斷言
Named capture groups - 命名捕獲組
s (dotAll) Flag - . 匹配任意字符
Unicode property escapes - Unicode 屬性轉義

2. 概述

還在用下標匹配內容嗎？匹配任意字符只有 [\w\W] 嗎？如今正則有更簡化的寫法了，事實上正則正在變得更加易用，是時候更新對正則的認知了。git

2.1. Lookbehind assertions

完整的斷言定義分爲：正/負向斷言與先/後行斷言的笛卡爾積組合，在 ES2018 以前僅支持先行斷言，如今終於支持了後行斷言。es6

解釋一下這四種斷言：github

正向先行斷言 (?=...) 表示以後的字符串能匹配 pattern。正則表達式

const re = /Item(?= 10)/;

console.log(re.exec("Item"));
// → null

console.log(re.exec("Item5"));
// → null

console.log(re.exec("Item 5"));
// → null

console.log(re.exec("Item 10"));
// → ["Item", index: 0, input: "Item 10", groups: undefined]

負向先行斷言 (?!...) 表示以後的字符串不能匹配 pattern。typescript

const re = /Red(?!head)/;

console.log(re.exec("Redhead"));
// → null

console.log(re.exec("Redberry"));
// → ["Red", index: 0, input: "Redberry", groups: undefined]

console.log(re.exec("Redjay"));
// → ["Red", index: 0, input: "Redjay", groups: undefined]

console.log(re.exec("Red"));
// → ["Red", index: 0, input: "Red", groups: undefined]

在 ES2018 後，又支持了兩種新的斷言方式：express

正向後行斷言 (?<=...) 表示以前的字符串能匹配 pattern。後端

先行時字符串放前面，pattern 放後面；後行時字符串放後端，pattern 放前面。先行匹配以什麼結尾，後行匹配以什麼開頭。函數

const re = /(?<=€)\d+(\.\d*)?/;

console.log(re.exec("199"));
// → null

console.log(re.exec("$199"));
// → null

console.log(re.exec("€199"));
// → ["199", undefined, index: 1, input: "€199", groups: undefined]

負向後行斷言 (?<!...) 表示以前的字符串不能匹配 pattern。

注：下面的例子表示 meters 以前 不能匹配 三個數字。

const re = /(?<!\d{3}) meters/;

console.log(re.exec("10 meters"));
// → [" meters", index: 2, input: "10 meters", groups: undefined]

console.log(re.exec("100 meters"));
// → null

文中給了一個稍複雜的例子，結合了正向後行斷言與負向後行斷言：

注：下面的例子表示 meters 以前 能匹配 兩個數字，且以前 不能匹配 數字 35.

const re = /(?<=\d{2})(?<!35) meters/;

console.log(re.exec("35 meters"));
// → null

console.log(re.exec("meters"));
// → null

console.log(re.exec("4 meters"));
// → null

console.log(re.exec("14 meters"));
// → ["meters", index: 2, input: "14 meters", groups: undefined]

2.2. Named Capture Groups

命名捕獲組能夠給正則捕獲的內容命名，比起下標來講更可讀。

其語法是 ?<name>：

const re = /(?<year>\d{4})-(?<month>\d{2})-(?<day>\d{2})/;
const [match, year, month, day] = re.exec("2020-03-04");

console.log(match); // → 2020-03-04
console.log(year); // → 2020
console.log(month); // → 03
console.log(day); // → 04

也能夠在正則表達式中，經過下標 \1 直接使用以前的捕獲組，好比：

解釋一下，\1 表明 (\w\w) 匹配的內容而非 (\w\w) 自己，因此當 (\w\w) 匹配了 'ab' 後，\1 表示的就是對 'ab' 的匹配了。

console.log(/(\w\w)\1/.test("abab")); // → true

// if the last two letters are not the same
// as the first two, the match will fail
console.log(/(\w\w)\1/.test("abcd")); // → false

對於命名捕獲組，能夠經過 \k<name> 的語法訪問，而不須要經過 \1 這種下標：

下標和命名能夠同時使用。

const re = /\b(?<dup>\w+)\s+\k<dup>\b/;

const match = re.exec("I'm not lazy, I'm on on energy saving mode");

console.log(match.index); // → 18
console.log(match[0]); // → on on

2.3. s (dotAll) Flag

雖然正則中 . 能夠匹配任何字符，但卻沒法匹配換行符。所以聰明的開發者們用 [\w\W] 巧妙的解決了這個問題。

然而這終究是個設計缺陷，在 ES2018 支持了 /s 模式，這個模式下，. 等價於 [\w\W]：

console.log(/./s.test("\n")); // → true
console.log(/./s.test("\r")); // → true

2.4. Unicode Property Escapes

正則支持了更強大的 Unicode 匹配方式。在 /u 模式下，能夠用 \p{Number} 匹配全部數字：

u 修飾符能夠識別全部大於 0xFFFF 的 Unicode 字符。

const regex = /^\p{Number}+$/u;
regex.test("²³¹¼½¾"); // true
regex.test("㉛㉜㉝"); // true
regex.test("ⅠⅡⅢⅣⅤⅥⅦⅧⅨⅩⅪⅫ"); // true

\p{Alphabetic} 能夠匹配全部 Alphabetic 元素，包括漢字、字母等：

const str = "漢";

console.log(/\p{Alphabetic}/u.test(str)); // → true

// the \w shorthand cannot match 漢
console.log(/\w/u.test(str)); // → false

終於有簡便的方式匹配漢字了。

2.5. 兼容表

能夠到原文查看兼容表，整體上只有 Chrome 與 Safari 支持，Firefox 與 Edge 都不支持。因此大型項目使用要再等幾年。

3. 精讀

文中列舉的四個新特性是 ES2018 加入到正則中的。但正如兼容表所示，這些特性基本還都不能用，因此不如咱們再溫習一下 ES6 對正則的改進，找一找與 ES2018 正則變化的結合點。

3.1. RegExp 構造函數優化

當 RegExp 構造函數第一個參數是正則表達式時，容許指定第二個參數 - 修飾符（ES5 會報錯）：

new RegExp(/book(?=s)/giu, "iu");

不痛不癢的優化，，畢竟大部分時間構造函數不會這麼用。

3.2. 字符串的正則方法

將字符串的 match()、replace()、search、split 方法內部調用時都指向到 RegExp 的實例方法上，好比

String.prototype.match 指向 RegExp.prototype[Symbol.match]。

也就是正則表達式本來應該由正則實例觸發，但如今卻支持字符串直接調用（方便）。但執行時其實指向了正則實例對象，讓邏輯更爲統一。

舉個例子：

"abc".match(/abc/g) /
  // 內部執行時，等價於
  abc /
  g[Symbol.match]("abc");

3.3. u 修飾符

概述中，Unicode Property Escapes 就是對 u 修飾符的加強，而 u 修飾符是在 ES6 中添加的。

u 修飾符的含義爲「Unicode 模式」，用來正確處理大於 \uFFFF 的 Unicode 字符。

同時 u 修飾符還會改變如下正則表達式的行爲：

點字符本來支持單字符，但在 u 模式下，能夠匹配大於 0xFFFF 的 Unicode 字符。
將 \u{61} 含義由匹配 61 個 u 改編爲匹配 Unicode 編碼爲 61 號的字母 a。
能夠正確識別非單字符 Unicode 字符的量詞匹配。
\S 能夠正確識別 Unicode 字符。
u 模式下，[a-z] 還能識別 Unicode 編碼不一樣，可是字型很近的字母，好比 \u212A 表示的另外一個 K。

基本上，在 u 修飾符模式下，全部 Unicode 字符均可以被正確解讀，而在 ES2018，又新增了一些 u 模式的匹配集合來匹配一些常見的字符，好比 \p{Number} 來匹配 ¼。

3.4. y 修飾符

y 修飾符是「粘連」（sticky）修飾符。

y 相似 g 修飾符，都是全局匹配，也就是從上次成功匹配位置開始，繼續匹配。y 的區別是，必須是上一次匹配成功後的下一個位置就當即匹配纔算成功。

好比：

/a+/g.exec("aaa_aa_a"); // ["aaa"]

3.5. flags

經過 flags 屬性拿到修飾符：

const regex = /[a-z]*/gu;

regex.flags; // 'gu'

4. 總結

本週精讀藉着 regexp-features-regular-expressions 這篇文章，一塊兒理解了 ES2018 添加的正則新特性，又順藤摸瓜的整理了 ES6 對正則作的加強。

若是你擅長這種擴散式學習方式，不妨再進一步溫習一下整個 ES6 引入的新特性，筆者強烈推薦阮一峯老師的 ECMAScript 6 入門一書。

ES2018 引入的特性還太新，單在對 ES6 特性的使用應該和對 ES3 同樣熟練。

若是你身邊的小夥伴還對 ES6 特性感到驚訝，請把這篇文章分享給他，防止退化爲「只剩項目經驗的 JS 入門者」。

討論地址是：精讀《正則 ES2018》 · Issue #127 · dt-fe/weekly

若是你想參與討論，請點擊這裏，每週都有新的主題，週末或週一發佈。前端精讀 - 幫你篩選靠譜的內容。

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。