Java 正則表達式

時間 2019-11-17

標籤 java 正則表達式欄目 Java 简体版

原文原文鏈接

1.Java 正則表達式 java.util.regex

Matcher (匹配器類) 真正影響搜索的對象html

Pattern (模式類) 用來表達和陳述所要搜索模式對象java

用法一：git

  Pattern p = Pattern.compile("a*b");
  Matcher m = p.matcher("aaaaab");
  boolean b = m.matches();

用法二：正則表達式

 
 boolean b = Pattern.matches("a*b", "aaaaab");

正則表達式：
app

  x   The character x  字符X
  \\   The backslash character  反斜槓
  \t   The tab character ('\u0009') 製表符Tab
  \n   The newline (line feed) character ('\u000A') 換行符 
  \r   The carriage-return character ('\u000D')  回車符
  \f   The form-feed character ('\u000C') 換頁符

  [abc]   a, b, or c (simple class)  匹配字符三者中的某一個
  [^abc]   Any character except a, b, or c (negation) 匹配的字符不包含abc這三個字符任意一個
  [a-zA-Z] a through z or A through Z, inclusive (range)匹配字符是小寫字母a-z任意一個或者大寫字母A-Z
            任意一個
  [a-d[m-p]]   a through d, or m through p: [a-dm-p] (union) 等價於[a-dm-p]
  [a-e&&[def]]   d, e, or f (intersection) 匹配 d，e，f 之中的一個 而且 在a-e的範圍內
  [a-z&&[^bc]]  a through z, except for b and c: [ad-z] (subtraction) 等價 [ad-z]
  [a-z&&[^m-p]]  a through z, and not m through p: [a-lq-z](subtraction) 等價 [a-lq-z]

.  Any character (may or may not match line terminators) 任意字符 可能有或者沒有
\d  A digit: [0-9] 數字0-9 
\D  A non-digit: [^0-9] 邊界  不是數字
\s  A whitespace character: [ \t\n\x0B\f\r] 空白字符
\S  A non-whitespace character: [^\s] 不是空白字符
\w  A word character: [a-zA-Z_0-9]  單詞 匹配a-zA-Z 或者_或者0-9
\W  A non-word character: [^\w] 不是單詞

 Greedy 貪婪模式     最大匹配
 ^       The beginning of a line  如^abc 以字符串abc開頭
 $       The end of a line  $abc  以字符串abc結尾
 X?      X, once or not at all X  出現1次或者不出現
 X*      X, zero or more times    出現0或者屢次
 X+      X, one or more times     出現1次或者屢次
 X{n}    X, exactly n times       出現n次
 X{n,}   X, at least n times      出現至少n
 X{n,m}  X, at least n but not more than m times 出現至少n次可是很少於m次

Reluctant 勉強  最小匹配 
X??	X, once or not at all
X*?	X, zero or more times
X+?	X, one or more times
X{n}?	X, exactly n times
X{n,}?	X, at least n times
X{n,m}?	X, at least n but not more than m times

Possessive 獨佔  徹底匹配
X?+	X, once or not at all
X*+	X, zero or more times
X++	X, one or more times
X{n}+	X, exactly n times
X{n,}+	X, at least n times
X{n,m}+	X, at least n but not more than m times

Special constructs (named-capturing and non-capturing)  名稱捕獲 、非捕獲 
(?<name>X)	X, as a named-capturing group
(?:X)	X, as a non-capturing group
(?idmsuxU-idmsuxU) 	Nothing, but turns match flags i d m s u x U on - off
(?idmsux-idmsux:X)  	X, as a non-capturing group with the given flags i d m s u x on - off
(?=X)	X, via zero-width positive lookahead
(?!X)	X, via zero-width negative lookahead
(?<=X)	X, via zero-width positive lookbehind
(?<!X)	X, via zero-width negative lookbehind
(?>X)	X, as an independent, non-capturing group

\n	Whatever the nth capturing group matched
\k<name>	Whatever the named-capturing group "name" matched

根據 Java Language Specification 的要求，Java 源代碼的字符串中的反斜線被解釋爲 Unicode 轉義或其餘字符轉義。所以必須在字符串字面值中使用兩個反斜線，表示正則表達式受到保護，不被 Java 字節碼編譯器解釋。spa

匹配字符串\string
正常狀況下 正則表達式爲 \string 
在正則表達式中\ 用於引用轉義構造，同時還用於引用其餘將被解釋爲非轉義構造的字符。
所以，表達式 \\ 與單個反斜線匹配，而 \{ 與左括號匹配。所以應該改成 \\string
在java 中 字符串\\string 應該定義的字符串常量爲 \\\\string  這樣編譯器才能認爲這個字符串合法。

   String num  = "\\string";
   System.out.println(num);
   Pattern pn = Pattern.compile("\\\\string");
   System.out.println(pn.toString());
   System.out.println(pn.matcher(num).matches());
   
   \string
   \\string
   true

String num  = "stresss";
System.out.println(num);
Pattern pn = Pattern.compile("s*tres{2,4}");
System.out.println(pn.toString());
System.out.println(pn.matcher(num).matches());
stresss
s*tres{2,4}
true
Groups

Group是指裏用()括起來的，能被後面的表達式調用的正則表達式。Group 0 表示整個表達式，group 1表示第一個被括起來的group，以此類推。因此 A(B(C))D 裏面有三個group：group 0是ABCD， group 1是BC，group 2是C。設計

你能夠用下述Matcher方法來使用group：
code

public int groupCount( )返回matcher對象中的group的數目。不包括group0。
orm

public String group( ) 返回上次匹配操做(比方說find( ))的group 0(整個匹配)匹配的字符串
htm

public String group(int i)返回上次匹配操做的某個group匹配的字符串。若是匹配成功，可是沒能找到group，則返回 null。

public int start(int group)返回上次匹配所找到的，group的開始位置。

public int end(int group)返回上次匹配所找到的，group的結束位置，最後一個字符的下標加一。

public Matcher appendReplacement(StringBuffer sb,String replacement)

實現非終端追加和替換步驟。

此方法執行如下操做：

替換字符串可能包含到之前匹配期間所捕獲的子序列的引用：$g 每次出現時，都將被 group(g) 的計算結果替換。$ 以後的第一個數始終被視爲組引用的一部分。若是後續的數能夠造成合法組引用，則將被合併到 g 中。只有數字 '0' 到 '9' 被視爲組引用的可能組件。例如，若是第二個組匹配字符串 "foo"，則傳遞替換字符串 "$2bar" 將致使 "foobar" 被追加到字符串緩衝區。可能將美圓符號 ($) 做爲替換字符串中的字面值（經過前面使用一個反斜線 (\$)）包括進來。注意，在替換字符串中使用反斜線 (\) 和美圓符號 ($) 可能致使與做爲字面值替換字符串時所產生的結果不一樣。美圓符號可視爲到如上所述已捕獲子序列的引用，反斜線可用於轉義替換字符串中的字面值字符。

此方法設計用於循環以及 appendTail 和 find 方法中。

例如，如下代碼將 one dog two dogs in the yard 寫入標準輸出流中：

 Pattern p = Pattern.compile("cat");
 Matcher m = p.matcher("one cat two cats in the yard");
 StringBuffer sb = new StringBuffer();
 while (m.find()) {
     m.appendReplacement(sb, "dog");
 }

它從追加位置開始在輸入序列讀取字符，並將其追加到給定字符串緩衝區。在讀取之前匹配以前的最後字符（即位於索引 start() - 1 處的字符）以後，它就會中止。它將給定替換字符串追加到字符串緩衝區。它將此匹配器的追加位置設置爲最後匹配位置的索引加 1，即 end()。

 public boolean  matches() 是否匹配表達式
 public StringBuffer  appendTail(StringBuffer sb)  添加最後未匹配上的末尾
 public int       start()  返回匹配成功的字符串的起始索引
 public int       end()   返回匹配成功的字符串的結束位置
 public boolean   find()  從上一次匹配成功以後開始查詢匹配的字符串  
 public String    group()  匹配的正則表達式
 public boolean   lookingAt() 每次都是從字符串開頭開始匹配
 public Matcher   reset()   重置matcher實例,避免前面的操做影響
 public Matcher   reset(Charseque char)  Matcher對象去匹配新的字符串