js 中的正則表達式RegExp

時間 2019-11-12

原文原文鏈接

一、RegExp對象html

1.1 RegExp對象實例的建立正則表達式

正則表達式模式：
   g：表示全局模式，即模式將被用於整個字符串，而非發現第一個匹配項時當即中止；
   i：表示不區分大小寫，忽略大小寫；
   m：表示多行模式；
全部元字符都必須轉義，元字符：「(」、「)」、「[」、「]」、「{」、「}」、「\」、「^」、「$」、「?」、「*」、「.」express

建立一個正則表達式能夠：數組

var expression1 = /pattern /flags ;
var expression2 = new RegExp("pattern","flags");


var exp1 = /[bc]at/i;
var exp2 = new RegExp("[bc]at","i");   // 與exp1是等價的

1.2 RegExp實例屬性函數

每一個RegExp實例都具備如下屬性：測試

global：布爾值，表示是否設置了 g 標誌；編碼

ignoreCase：布爾值，表示是否設置了 i 標誌；spa

lastIndex：正數，表示開始搜索下一個匹配項的字符位置，從0算起；設計

multiline：布爾值，表示是否設置 m 標誌；code

source：正則表達式的字符串表示，按照字面量形式而非傳入構造函數的字符串模式返回，如：「[bc]at」；

1.3 正則表達式所有符號解釋

字符	描述
\	將下一個字符標記爲一個特殊字符、或一個原義字符、或一個向後引用、或一個八進制轉義符。例如，'n' 匹配字符 "n"。'\n' 匹配一個換行符。序列 '\\' 匹配 "\" 而 "\(" 則匹配 "("。
^	匹配輸入字符串的開始位置。若是設置了 RegExp 對象的 Multiline 屬性，^ 也匹配 '\n' 或 '\r' 以後的位置。
$	匹配輸入字符串的結束位置。若是設置了RegExp 對象的 Multiline 屬性，$ 也匹配 '\n' 或 '\r' 以前的位置。
*	匹配前面的子表達式零次或屢次。例如，zo* 能匹配 "z" 以及 "zoo"。* 等價於{0,}。
+	匹配前面的子表達式一次或屢次。例如，'zo+' 能匹配 "zo" 以及 "zoo"，但不能匹配 "z"。+ 等價於 {1,}。
?	匹配前面的子表達式零次或一次。例如，"do(es)?" 能夠匹配 "do" 或 "does" 中的"do" 。? 等價於 {0,1}。
{n}	n 是一個非負整數。匹配肯定的 n 次。例如，'o{2}' 不能匹配 "Bob" 中的 'o'，可是能匹配 "food" 中的兩個 o。
{n,}	n 是一個非負整數。至少匹配n 次。例如，'o{2,}' 不能匹配 "Bob" 中的 'o'，但能匹配 "foooood" 中的全部 o。'o{1,}' 等價於 'o+'。'o{0,}' 則等價於 'o*'。
{n,m}	m 和 n 均爲非負整數，其中n <= m。最少匹配 n 次且最多匹配 m 次。例如，"o{1,3}" 將匹配 "fooooood" 中的前三個 o。'o{0,1}' 等價於 'o?'。請注意在逗號和兩個數之間不能有空格。
?	當該字符緊跟在任何一個其餘限制符 (*, +, ?, {n}, {n,}, {n,m}) 後面時，匹配模式是非貪婪的。非貪婪模式儘量少的匹配所搜索的字符串，而默認的貪婪模式則儘量多的匹配所搜索的字符串。例如，對於字符串 "oooo"，'o+?' 將匹配單個 "o"，而 'o+' 將匹配全部 'o'。
.	匹配除 "\n" 以外的任何單個字符。要匹配包括 '\n' 在內的任何字符，請使用象 '[.\n]' 的模式。
(pattern)	匹配 pattern 並獲取這一匹配。所獲取的匹配能夠從產生的 Matches 集合獲得，在VBScript 中使用 SubMatches 集合，在JScript 中則使用 $0…$9 屬性。要匹配圓括號字符，請使用 '$' 或 '$'。
(?:pattern)	匹配 pattern 但不獲取匹配結果，也就是說這是一個非獲取匹配，不進行存儲供之後使用。這在使用 "或" 字符 (\|) 來組合一個模式的各個部分是頗有用。例如， 'industr(?:y\|ies) 就是一個比 'industry\|industries' 更簡略的表達式。
(?=pattern)	正向預查，在任何匹配 pattern 的字符串開始處匹配查找字符串。這是一個非獲取匹配，也就是說，該匹配不須要獲取供之後使用。例如，'Windows (?=95\|98\|NT\|2000)' 能匹配 "Windows 2000" 中的 "Windows" ，但不能匹配 "Windows 3.1" 中的 "Windows"。預查不消耗字符，也就是說，在一個匹配發生後，在最後一次匹配以後當即開始下一次匹配的搜索，而不是從包含預查的字符以後開始。
(?!pattern)	負向預查，在任何不匹配 pattern 的字符串開始處匹配查找字符串。這是一個非獲取匹配，也就是說，該匹配不須要獲取供之後使用。例如'Windows (?!95\|98\|NT\|2000)' 能匹配 "Windows 3.1" 中的 "Windows"，但不能匹配 "Windows 2000" 中的 "Windows"。預查不消耗字符，也就是說，在一個匹配發生後，在最後一次匹配以後當即開始下一次匹配的搜索，而不是從包含預查的字符以後開始
x\|y	匹配 x 或 y。例如，'z\|food' 能匹配 "z" 或 "food"。'(z\|f)ood' 則匹配 "zood" 或 "food"。
[xyz]	字符集合。匹配所包含的任意一個字符。例如， '[abc]' 能夠匹配 "plain" 中的 'a'。
[^xyz]	負值字符集合。匹配未包含的任意字符。例如， '[^abc]' 能夠匹配 "plain" 中的'p'。
[a-z]	字符範圍。匹配指定範圍內的任意字符。例如，'[a-z]' 能夠匹配 'a' 到 'z' 範圍內的任意小寫字母字符。
[^a-z]	負值字符範圍。匹配任何不在指定範圍內的任意字符。例如，'[^a-z]' 能夠匹配任何不在 'a' 到 'z' 範圍內的任意字符。
\b	匹配一個單詞邊界，也就是指單詞和空格間的位置。例如， 'er\b' 能夠匹配"never" 中的 'er'，但不能匹配 "verb" 中的 'er'。
\B	匹配非單詞邊界。'er\B' 能匹配 "verb" 中的 'er'，但不能匹配 "never" 中的 'er'。
\cx	匹配由 x 指明的控制字符。例如， \cM 匹配一個 Control-M 或回車符。x 的值必須爲 A-Z 或 a-z 之一。不然，將 c 視爲一個原義的 'c' 字符。
\d	匹配一個數字字符。等價於 [0-9]。
\D	匹配一個非數字字符。等價於 [^0-9]。
\f	匹配一個換頁符。等價於 \x0c 和 \cL。
\n	匹配一個換行符。等價於 \x0a 和 \cJ。
\r	匹配一個回車符。等價於 \x0d 和 \cM。
\s	匹配任何空白字符，包括空格、製表符、換頁符等等。等價於 [ \f\n\r\t\v]。
\S	匹配任何非空白字符。等價於 [^ \f\n\r\t\v]。
\t	匹配一個製表符。等價於 \x09 和 \cI。
\v	匹配一個垂直製表符。等價於 \x0b 和 \cK。
\w	匹配包括下劃線的任何單詞字符。等價於'[A-Za-z0-9_]'。
\W	匹配任何非單詞字符。等價於 '[^A-Za-z0-9_]'。
\xn	匹配 n，其中 n 爲十六進制轉義值。十六進制轉義值必須爲肯定的兩個數字長。例如，'\x41' 匹配 "A"。'\x041' 則等價於 '\x04' & "1"。正則表達式中可使用 ASCII 編碼。.
\num	匹配 num，其中 num 是一個正整數。對所獲取的匹配的引用。例如，'(.)\1' 匹配兩個連續的相同字符。
\n	標識一個八進制轉義值或一個向後引用。若是 \n 以前至少 n 個獲取的子表達式，則 n 爲向後引用。不然，若是 n 爲八進制數字 (0-7)，則 n 爲一個八進制轉義值。
\nm	標識一個八進制轉義值或一個向後引用。若是 \nm 以前至少有 nm 個得到子表達式，則 nm 爲向後引用。若是 \nm 以前至少有 n 個獲取，則 n 爲一個後跟文字 m 的向後引用。若是前面的條件都不知足，若 n 和 m 均爲八進制數字 (0-7)，則 \nm 將匹配八進制轉義值 nm。
\nml	若是 n 爲八進制數字 (0-3)，且 m 和 l 均爲八進制數字 (0-7)，則匹配八進制轉義值 nml。
\un	匹配 n，其中 n 是一個用四個十六進制數字表示的 Unicode 字符。例如， \u00A9 匹配版權符號 (?)。

1.4 RegExp實例方法

1）exec()

該方法是專門爲捕獲組而設計的。exec() 接受一個參數，即要應用模式的字符串，而後返回包含第一個匹配項信息的數組；或者在沒有匹配項的狀況下返回null。返回的數組雖然是Array實例，但有連個額外的屬性：index 和 input。其中 index 表示匹配項在字符串中的位置，而 input 表示應用正則表達式的字符串。在數組中第一項是與整個模式匹配的字符串，其餘項是與模式中捕獲組匹配的字符串（若是模式中沒有捕獲組，則該數組只包含一項）。

    var text = "mom and dad and baby";
    var pattern = /mom( and dad( and baby)?)?/gi;
    
    var matches = pattern.exec(text);
    console.log(matches);// 0: "mom and dad and baby", 1: " and dad and baby", 2: " and baby", groups: undefined, index: 0, input: "mom and dad and baby", length: 3
    console.log(pattern.lastIndex); // 20

    matches = pattern.exec(text);
    console.log(matches);    //null
    console.log(pattern.lastIndex); // 0

    text = "vat ,bat, sat, fat";
    var pattern1 = /.at/;
    
    var matches = pattern1.exec(text);
    console.log(matches.index);  //0
    console.log(matches);  //0: "vat", groups: undefined, index: 0, input: "vat ,vat, sat, fat", length: 1
    console.log(pattern1.lastIndex); //0
    
    matches = pattern1.exec(text);
    console.log(matches.index);  //0
    console.log(matches);  //0: "vat", groups: undefined, index: 0, input: "vat ,vat, sat, fat", length: 1
    console.log(pattern1.lastIndex); //0
    
    var pattern2 = /.at/g;
    
    matches = pattern2.exec(text);
    console.log(matches.index);  //0
    console.log(matches);  //0: "vat", groups: undefined, index: 0, input: "vat ,vat, sat, fat", length: 1
    console.log(pattern1.lastIndex);  //0
    
    matches = pattern2.exec(text);
    console.log(matches.index);  //5
    console.log(matches);  //0: "bat", groups: undefined, index: 5, input: "vat ,vat, sat, fat", length: 1
    console.log(pattern1.lastIndex);  //0
    
    matches = pattern2.exec(text);
    console.log(matches.index);  //10
    console.log(matches);  //0: "sat", groups: undefined, index: 10, input: "vat ,vat, sat, fat", length: 1
    console.log(pattern1.lastIndex);  //0

exec() 方法，即便模式設置了全局表示（g），它每次返回也只是一個匹配項。沒有設置全局標誌，同一個字符串調用屢次 exec() 始終返回第一個匹配項的信息。而設置了全局標誌，每次調用 exec() 都會在字符串中基礎查找新匹配項。

2）test()

該方法接受一個字符串參數，在模式與改參數匹配的狀況下返回 true ，不然返回 false。很適用於只想知道目標字符串與否與某個模式匹配，但不準要知道其內容的狀況。

var text = "000-00-000";
var pattern = /\d{3}-\d{2}-\d{3}/;
if(pattern.test(text)){    //true
    console.log("the pattern was matched!"); // 執行語句
}

1.5 RegExp對象的靜態屬性

靜態屬性是RegExp這個內置對象的固有屬性。訪問這些靜態屬性，不須要進行聲明實例化，而是直接調用。
調用格式：RegExp.attribute

下面全部屬性的說明，就以：

var desc = 'Hello,everyone.My name is gtshen';

reg = /na(.?)/g;

reg.test(desc);

　　　　　　這段代碼測試爲例，進行說明：

　　　　　　- input
　　　　　　　　功能：返回當前要匹配的字符串
　　　　　　　　示例： console.log('input:'+RegExp.input) // -> 'Hello,everyone.My name is gtshen'
　　　　　　　　短名：RegExp.$_;
　　　　　　　　注意：opera 低版本並不支持。

　　　　　　- lastMatch
　　　　　　　　功能：最後一次匹配到的匹配項，須要開啓修飾符-g。
　　　　　　　　示例： console.log('lastMatch:'+RegExp.lastMatch) // -> nam
　　　　　　　　短名：RegExp['$&'];
　　　　　　　　注意：opera 低版本並不支持。

　　　　　　- lastParen
　　　　　　　　功能：最後一次匹配的捕獲組。須要開啓修飾符-g。
　　　　　　　　示例： console.log('lastParen:'+RegExp.lastParen) // -> 'm';
　　　　　　　　短名：RegExp['$+'];
　　　　　　　　注意：opera 低版本並不支持。

　　　　　　- leftContext
　　　　　　　　功能：以當前匹配到的子串爲上下文，返回以前的子串。
　　　　　　　　示例： console.log('leftContext:'+RegExp.leftContext) // -> 'Hello,everyone.My ';
　　　　　　　　短名：RegExp['$&`'];

　　　　　　- rightContext
　　　　　　　　功能：以當前匹配到的子串爲上下文，返回以後的子串。
　　　　　　　　示例： console.log('rightContext:'+RegExp.rightContext) // -> 'e is gtshen';
　　　　　　　　短名：RegExp['$\''];

　　　　　　- multiline
　　　　　　　　功能：是否支持多行。返回值爲boolean值，true表示支持，false表示不支持。
　　　　　　　　示例： console.log('multiline:'+RegExp.multiline);
　　　　　　　　短名：RegExp['$*'];
　　　　　　　　注意：IE 並不支持。

　　　　　　- $1 - $9
　　　　　　　　功能：返回1 - 9個捕獲組的值。
　　　　　　　　示例： console.log('$1:'+ RegExp.$1) // -> 'm'

　　　　　　* 注意的是「RegExp」指的是最近一次在程序運行中進行匹配操做的正則實例對象。

二、String 中的正則

1）match()

match() 接受一個正則表達式做爲參數。當正則表達式不具備全局屬性 g ，該方法和 RegExp 的 exec() 方法執行結果同樣，如正則有全局標誌 g 時，返回一個包含全部匹配項的純數組。

格式：str.match(pattern)
功能：
　　match 在功能上與正則對象自帶的方法exec很相似。
　　match 根據匹配規則pattern匹配指定的字符串str，若是匹配成功則返回一個數組格式的結果用於存放匹配文本有關的信息，若是沒有匹配到則返回null。

var text = "mom and dad and baby";
var pattern = /mom( and dad( and baby)?)?/i;

var matches = text.match(pattern);
console.log(matches);// 0: "mom and dad and baby", 1: " and dad and baby", 2: " and baby", groups: undefined, index: 0, input: "mom and dad and baby", length: 3
console.log(pattern.lastIndex); // 20

var pattern = /mom( and dad( and baby)?)?/gi;
matches = text.match(pattern);
console.log(matches);// ["mom and dad and baby"]
console.log(pattern.lastIndex); // 0

text = "vat ,bat, sat, fat";
pattern = /.at/;

//與 pattern.exec(text) 相同
matches = text.match(pattern);
console.log(matches);  //0: "vat", groups: undefined, index: 0, input: "vat ,vat, sat, fat", length: 1
console.log(pattern.lastIndex); //0


//若是match的匹配規則具備全局g屬性，那麼match返回的匹配結果，即是一個包含全部匹配結果的純數組
pattern = /.at/g;
matches = text.match(pattern);
console.log(matches);  //["vat", "bat", "sat", "fat"]
console.log(pattern.lastIndex); //0

2）search()

該方法接受一個正則表達式或者字符串做爲參數，從字符串開頭向後查找。

格式：str.search(pattern)；

功能：根絕匹配規則pattern在字符串中檢索指定的結果，若是檢索到則返回該結果首字母在原字符中的索引，不然返回-1。其功能相似於indexOf，只是indexOf並不支持正則匹配。

該方法忽略全局修飾符g，也不支持lastIndex也就是意味着它不能被屢次調用，一旦檢索到結果，便會中止檢索。

var text = "vat ,bat, sat, fat";
var pattern = /at/;

var pos = text.search(pattern);
console.log(pos);  // 1

console.log( text.search("t") ); // 2

3）replace()

格式：str.replace(pattern,given)

功能：根據匹配規則pattern來用指定的內容given去替換str或其部分字符。其中pattern能夠是str中的部分字符也能夠是一個正則表達式。

pattern 能夠是字符串，也能夠是正則表達式；given所表明替換的指定內容，既能夠是字符串，也能夠是一個回調函數。

replace() 方法只會返回原字符被操做後的副本，並不會對原字符串進行改動。

若是 pattern 是字符串，那麼只會替換第一個匹配的子字符串。要想替換全部匹配的子串，須要 pattern 是正則表達式，且帶全局標誌 g。

var text = "cat ,bat, sat, fat";
var pattern = "at";

var result = text.replace(pattern,"ond");
console.log(result);  // "cond ,bat, sat, fat"

pattern = /at/g;
result = text.replace(pattern,"ond"); 
console.log(result); //"cond ,bond, sond, fond"

var str = '世界上最遙遠的距離 不是生與死的距離 而是我就站在你的面前 你殊不知道我愛你。',
 st = ['死','愛','距離'],
 pattern = '',
 alt = '',
 str1= "";

for(var i=0;i<st.length;i++){

    pattern = new RegExp(st[i],'g');
    for(var j=0;j<st[i].length;j++){
        alt+='*';
    }

    str1 = str.replace(pattern,alt);
    alt = '';
}

console.log(str); //世界上最遙遠的距離 不是生與死的距離 而是我就站在你的面前 你殊不知道我愛你。
console.log(str1); //世界上最遙遠的** 不是生與*的** 而是我就站在你的面前 你殊不知道我*你。

given如果回調函數

在只有一個匹配項，即沒有捕獲組的狀況下，

其格式爲： function(match,index,input){}。

若是有多個捕獲組，

其格式爲： function(match,[$1...$9],index,input){}。

match：表示當前匹配的結果
[$1 - $9]：存在分組的狀況下，表示當前 1 - 9 個分組的內容。
index：當前匹配內容首字母在字符創中索引。
input ：進行匹配的原字符創。

function htmlEscape(text){
    return text.replace(/[<>"&]/g,function(match,index,input){
        switch(match){
            case "<": return "&lt;";
            case ">": return "&gt;";
            case "&": return "&amp;";
            case "\"": return "&quot;";
        }
    });
}
console.log(htmlEscape("<p class=\"greeting\">Hello world!</p>"));// &lt;p class=&quot;greeting&quot;&gt;Hello world!&lt;/p&gt;

var str = '2016/10/29';
var pattern = /(\d+)(\/)/g;
var data = str.replace(pattern,function(result,$1,$2){
    return $1+'.';
});
console.log(data); //2016.10.29
console.log(str.replace(pattern,"$1.")); //2016.10.29
console.log(str.replace(/\//g,".")); //2016.10.29

4）split()

格式：str.split(pattern,length)
功能：
　根據規則 pattern 將字符串拆分爲數組，拆分後的數組並不包含做爲拆分依據的那個參數。
　默認狀況下是空字符進行拆分，也就是每一個任意的字符做爲一個數組元素。
　　pattern參數，能夠是正則表達式，也能夠是單純的字符或字符串。
　　length參數，用於設置拆分數組後數組最大的長度（即數組元素的個數）。缺省該項，表示將字符所有拆分爲數組。

var str = 'hellow world!';
console.log(str.split('')); //  ["h", "e", "l", "l", "o", "w", " ", "w", "o", "r", "l", "d", "!"]
console.log(str.split('',5)); // ["h", "e", "l", "l", "o"]
console.log(str.split(/o/g)); //  ["hell", "w w", "rld!"]