java正則表達式

時間 2019-11-24

標籤 java 正則表達式欄目 Java 简体版

原文原文鏈接

一、Pattern 類

pattern 對象是一個正則表達式的編譯表示。Pattern 類沒有公共構造方法，要建立一個 Pattern 對象，你必須首先調用其公共靜態編譯方法，它返回一個 Pattern 對象。該方法接受一個正則表達式做爲它的第一個參數。java

二、Matcher 類：

Matcher 對象是對輸入字符串進行解釋和匹配操做的引擎。與Pattern 類同樣，Matcher 也沒有公共構造方法，你須要調用 Pattern 對象的 matcher 方法來得到一個 Matcher 對象。正則表達式

三、PatternSyntaxException

PatternSyntaxException 是一個非強制異常類，它表示一個正則表達式模式中的語法錯誤。app

四、捕獲組

捕獲組是把多個字符當一個單獨單元進行處理的方法，它經過對括號內的字符分組來建立。this

例如，正則表達式 (dog) 建立了單一分組，組裏包含"d"，"o"，和"g"。spa

捕獲組是經過從左至右計算其開括號來編號。例如，在表達式（（A）（B（C））），有四個這樣的組：.net

((A)(B(C)))
(A)
(B(C))
(C)

能夠經過調用 matcher 對象的 groupCount 方法來查看錶達式有多少個分組，groupCount 方法返回一個 int 值，表示matcher對象當前有多個捕獲組。還有一個特殊的組（group(0)），它老是表明整個表達式。該組不包括在 groupCount 的返回值中。code

import java.util.regex.Matcher;
import java.util.regex.Pattern;
 
public class RegexMatches
{
    public static void main( String args[] ){
 
      // 按指定模式在字符串查找
      String line = "This order was placed for QT3000! OK?";
      String pattern = "(\\D*)(\\d+)(.*)";
 
      // 建立 Pattern 對象
      Pattern r = Pattern.compile(pattern);
 
      // 如今建立 matcher 對象
      Matcher m = r.matcher(line);
      if (m.find( )) {
         System.out.println("Found value: " + m.group(0) );
         System.out.println("Found value: " + m.group(1) );
         System.out.println("Found value: " + m.group(2) );
         System.out.println("Found value: " + m.group(3) ); 
      } else {
         System.out.println("NO MATCH");
      }
   }
}

以上實例編譯運行結果以下：xml

Found value: This order was placed for QT3000! OK?
Found value: This order was placed for QT
Found value: 3000
Found value: ! OK?

五、Matcher 類的方法

索引方法提供了有用的索引值，精確代表輸入字符串中在哪能找到匹配：對象

序號	方法及說明
1	public int start() 返回之前匹配的初始索引。
2	public int start(int group) 返回在之前的匹配操做期間，由給定組所捕獲的子序列的初始索引
3	public int end() 返回最後匹配字符以後的偏移量。
4	public int end(int group) 返回在之前的匹配操做期間，由給定組所捕獲子序列的最後字符以後的偏移量。

研究方法用來檢查輸入字符串並返回一個布爾值，表示是否找到該模式，模式便是正則：blog

1	public boolean lookingAt() 嘗試將從區域開頭開始的輸入序列與該模式匹配。
2	public boolean find() 嘗試查找與該模式匹配的輸入序列的下一個子序列。
3	public boolean find(int start）重置此匹配器，而後嘗試查找匹配該模式、從指定索引開始的輸入序列的下一個子序列。
4	public boolean matches() 嘗試將整個區域與模式匹配。

替換方法是替換輸入字符串裏文本的方法：

序號	方法及說明
1	public Matcher appendReplacement(StringBuffer sb, String replacement) 實現非終端添加和替換步驟。
2	public StringBuffer appendTail(StringBuffer sb) 實現終端添加和替換步驟。
3	public String replaceAll(String replacement) 替換模式與給定替換字符串相匹配的輸入序列的每一個子序列。
4	public String replaceFirst(String replacement) 替換模式與給定替換字符串匹配的輸入序列的第一個子序列。
5	public static String quoteReplacement(String s) 返回指定字符串的字面替換字符串。這個方法返回一個字符串，就像傳遞給Matcher類的appendReplacement 方法一個字面字符串同樣工做。

public class RegexMatches
{
    public static void main( String args[] ){
 
      // 按指定模式在字符串查找
      String line = "This order was placed for QT3000! OK?";
      String pattern = "(\\D*)(\\d+)(.*)";
 
      // 建立 Pattern 對象
      Pattern r = Pattern.compile(pattern);
 
      // 如今建立 matcher 對象
      Matcher m = r.matcher(line);
      if (m.find( )) {
    	  System.out.println(m.groupCount());
    	  System.out.println(m.start());
    	  System.out.println(m.end());
    	  for(int i = 0; i < m.groupCount(); i++){
    		  System.out.println(line.charAt(m.start(i+1)));
    		  System.out.println(line.charAt(m.end(i+1)-1));
    	  }
         System.out.println("Found value: " + m.group(0) );
         System.out.println("Found value: " + m.group(1) );
         System.out.println("Found value: " + m.group(2) );
         System.out.println("Found value: " + m.group(3) ); 
      } else {
         System.out.println("NO MATCH");
      }
   }
}

Start 方法返回在之前的匹配操做期間，由給定組所捕獲的子序列的初始索引，end 方法最後一個匹配字符的索引加 1。

public class RegexMatches2 {
	private static final String REGEX = "foo";
    private static final String INPUT = "fooooooooooooooooo";
    private static final String INPUT2 = "ooooofoooooooooooo";
    private static Pattern pattern;
    private static Matcher matcher;
    private static Matcher matcher2;
 
    public static void main( String args[] ){
       pattern = Pattern.compile(REGEX);
       matcher = pattern.matcher(INPUT);
       matcher2 = pattern.matcher(INPUT2);
 
       System.out.println("Current REGEX is: "+REGEX);
       System.out.println("Current INPUT is: "+INPUT);
       System.out.println("Current INPUT2 is: "+INPUT2);
 
 
       System.out.println("lookingAt(): "+matcher.lookingAt());
       System.out.println("matches(): "+matcher.matches());
       System.out.println("lookingAt(): "+matcher2.lookingAt());
   }
}

以上實例編譯運行結果以下：

Current REGEX is: foo
Current INPUT is: fooooooooooooooooo
Current INPUT2 is: ooooofoooooooooooo
lookingAt(): true
matches(): false
lookingAt(): false

matches 和 lookingAt 方法都用來嘗試匹配一個輸入序列模式。它們的不一樣是 matches 要求整個序列都匹配，而lookingAt 不要求。

lookingAt 方法雖然不須要整句都匹配，可是須要從第一個字符開始匹配。

這兩個方法常常在輸入字符串的開始使用。

public class RegexMatches3 {
	private static String REGEX = "dog";
    private static String INPUT = "The dog says meow. " +
                                    "All dogs say meow.";
    private static String REPLACE = "cat";
 
    public static void main(String[] args) {
       Pattern p = Pattern.compile(REGEX);
       // get a matcher object
       Matcher m = p.matcher(INPUT); 
       INPUT = m.replaceAll(REPLACE);
       System.out.println(INPUT);
   }
}

以上實例編譯運行結果以下：

The cat says meow. All cats say meow.

public class RegexMatches4 {
	private static String REGEX = "a*b";
	   private static String INPUT = "aabfooaabfooabfoobkkk";
	   private static String REPLACE = "-";
	   public static void main(String[] args) {
	      Pattern p = Pattern.compile(REGEX);
	      // 獲取 matcher 對象
	      Matcher m = p.matcher(INPUT);
	      StringBuffer sb = new StringBuffer();
	      while(m.find()){
	         m.appendReplacement(sb,REPLACE);
	      }
	      m.appendTail(sb);
	      System.out.println(sb.toString());
	   }
}

以上實例編譯運行結果以下：

-foo-foo-foo-kkk

六、pattern的匹配模式

compile( )方法有一個版本，它須要一個控制正則表達式的匹配行爲的參數：Pattern Pattern.compile(String regex, int flag)

多行模式

也就是說若是沒有 MULTILINE 標誌的話， ^ 和 $ 只能匹配輸入序列的開始和結束；不然，就能夠匹配輸入序列內部的行結束符。

import java.util.regex.*;

/**
 * 多行模式
 */
public class ReFlags_MULTILINE {

    public static void main(String[] args) {

        // 注意裏面的換行符
        String str = "hello world\r\n" + "hello java\r\n" + "hello java";

        System.out.println("===========匹配字符串開頭(非多行模式)===========");
        Pattern p = Pattern.compile("^hello");
        Matcher m = p.matcher(str);
        while (m.find()) {
            System.out.println(m.group() + "   位置：[" + m.start() + "," + m.end() + "]");
        }

        System.out.println("===========匹配字符串開頭(多行模式)===========");
        p = Pattern.compile("^hello", Pattern.MULTILINE);
        m = p.matcher(str);
        while (m.find()) {
            System.out.println(m.group() + "   位置：[" + m.start() + "," + m.end() + "]");
        }

        System.out.println("===========匹配字符串結尾(非多行模式)===========");
        p = Pattern.compile("java$");
        m = p.matcher(str);
        while (m.find()) {
            System.out.println(m.group() + "   位置：[" + m.start() + "," + m.end() + "]");
        }

        System.out.println("===========匹配字符串結尾(多行模式)===========");
        p = Pattern.compile("java$", Pattern.MULTILINE);
        m = p.matcher(str);
        while (m.find()) {
            System.out.println(m.group() + "   位置：[" + m.start() + "," + m.end() + "]");
        }
    }
}

運行結果

===========匹配字符串開頭(非多行模式)===========
hello   位置：[0,5]
===========匹配字符串開頭(多行模式)===========
hello   位置：[0,5]
hello   位置：[13,18]
hello   位置：[25,30]
===========匹配字符串結尾(非多行模式)===========
java   位置：[31,35]
===========匹配字符串結尾(多行模式)===========
java   位置：[19,23]
java   位置：[31,35]

忽略大小寫

public class ReFlags_CASE_INSENSITIVE {
	public static void main(String[] args) {


        System.out.println("===========API忽略大小寫===========");
        String moneyRegex = "[+-]?(\\d)+(.(\\d)*)?(\\s)*[CF]";
        Pattern p = Pattern.compile(moneyRegex,Pattern.CASE_INSENSITIVE);

        System.out.println("-3.33c   " + p.matcher("-3.33c").matches());
        System.out.println("-3.33C   " + p.matcher("-3.33C").matches());


        System.out.println("===========不忽略大小寫===========");
        moneyRegex = "[+-]?(\\d)+(.(\\d)*)?(\\s)*[CF]";
        p = Pattern.compile(moneyRegex);

        System.out.println("-3.33c   " + p.matcher("-3.33c").matches());
        System.out.println("-3.33C   " + p.matcher("-3.33C").matches());


        System.out.println("===========正則內部忽略大小寫===========");
        moneyRegex = "[+-]?(\\d)+(.(\\d)*)?(\\s)*(?i)[CF]";
        p = Pattern.compile(moneyRegex);

        System.out.println("-3.33c   " + p.matcher("-3.33c").matches());
        System.out.println("-3.33C   " + p.matcher("-3.33C").matches());


        System.out.println("===========內部不忽略大小寫===========");
        moneyRegex = "[+-]?(\\d)+(.(\\d)*)?(\\s)*[CF]";
        p = Pattern.compile(moneyRegex);

        System.out.println("-3.33c   " + p.matcher("-3.33c").matches());
        System.out.println("-3.33C   " + p.matcher("-3.33C").matches());
    }
}

運行結果

===========API忽略大小寫===========
-3.33c   true
-3.33C   true
===========不忽略大小寫===========
-3.33c   false
-3.33C   true
===========正則內部忽略大小寫===========
-3.33c   true
-3.33C   true
===========內部不忽略大小寫===========
-3.33c   false
-3.33C   true

啓用註釋

啓用註釋，開啓以後，正則表達式中的空格以及#號行將被忽略。

public class ReFlags_COMMENTS {

    public static void main(String[] args) {

        System.out.println("===========API啓用註釋===========");
        String comments = "    (\\d)+#this is comments.";
        Pattern p = Pattern.compile(comments, Pattern.COMMENTS);
        System.out.println("1234   " + p.matcher("1234").matches());

        System.out.println("===========不啓用註釋===========");
        comments = "    (\\d)+#this is comments.";
        p = Pattern.compile(comments);
        System.out.println("1234   " + p.matcher("1234").matches());

        System.out.println("===========正則啓用註釋===========");
        comments = "(?x)    (\\d)+#this is comments.";
        p = Pattern.compile(comments);
        System.out.println("1234   " + p.matcher("1234").matches());

        System.out.println("===========不啓用註釋===========");
        comments = "    (\\d)+#this is comments.";
        p = Pattern.compile(comments);
        System.out.println("1234   " + p.matcher("1234").matches());

    }
}
--------------------- 
做者：B8613A 
來源：CSDN 
原文：https://blog.csdn.net/liupeifeng3514/article/details/80030360?utm_source=copy 
版權聲明：本文爲博主原創文章，轉載請附上博文連接！

運行結果

===========API啓用註釋===========
1234   true
===========不啓用註釋===========
1234   false
===========正則啓用註釋===========
1234   true
===========不啓用註釋===========
1234   false

能夠看到，#號到行尾的註釋部分和前面的空白字符都被忽略了。正則表達式內置的啓用註釋爲（?x）

dotall 模式

啓用dotall模式，通常狀況下，點號（.）匹配任意字符，但不匹配換行符，啓用這個模式以後，點號還能匹配換行符。

public class ReFlags_DOTALL {

    public static void main(String[] args) {

        System.out.println("===========API啓用DOTALL===========");
        String dotall = "<xml>(.)*</xml>";
        Pattern p = Pattern.compile(dotall, Pattern.DOTALL);
        System.out.println("<xml>\\r\\n</xml>   " + p.matcher("<xml>\r\n</xml>").matches());

        System.out.println("===========不啓用DOTALL===========");
        dotall = "<xml>(.)*</xml>";
        p = Pattern.compile(dotall);
        System.out.println("<xml>\\r\\n</xml>   " + p.matcher("<xml>\r\n</xml>").matches());

        System.out.println("===========正則啓用DOTALL===========");
        dotall = "(?s)<xml>(.)*</xml>";
        p = Pattern.compile(dotall);
        System.out.println("<xml>\\r\\n</xml>   " + p.matcher("<xml>\r\n</xml>").matches());

        System.out.println("===========不啓用DOTALL===========");
        dotall = "<xml>(.)*</xml>";
        p = Pattern.compile(dotall);
        System.out.println("<xml>\\r\\n</xml>   " + p.matcher("<xml>\r\n</xml>").matches());

    }
}

運行結果

===========API啓用DOTALL===========
<xml>\r\n</xml>   true
===========不啓用DOTALL===========
<xml>\r\n</xml>   false
===========正則啓用DOTALL===========
<xml>\r\n</xml>   true
===========不啓用DOTALL===========
<xml>\r\n</xml>   false

平白字符模式

啓用這個模式以後，全部元字符、轉義字符都被當作普通的字符，再也不具備其餘意義。

public class ReFlags_LITERAL {

    public static void main(String[] args) {

        System.out.println(Pattern.compile("\\d", Pattern.LITERAL).matcher("\\d").matches());// true
        System.out.println(Pattern.compile("\\d", Pattern.LITERAL).matcher("2").matches());// false

        System.out.println(Pattern.compile("(\\d)+", Pattern.LITERAL).matcher("1234").matches());// false
        System.out.println(Pattern.compile("(\\d)+").matcher("1234").matches());// true

        System.out.println(Pattern.compile("(\\d){2,3}", Pattern.LITERAL).matcher("(\\d){2,3}").matches());// true
    }
}