正則表達式

  • 在一個句子中匹配一個單詞
public class PatternTest {
    private static Pattern pattern = Pattern.compile("Ben");

    public static void main(String[] args) {
        Matcher matcher = pattern.matcher("Hello,my name is Ben.");
        boolean result = matcher.find();
        if (result) {
            System.out.println(matcher.groupCount());
            for (int i = 0;i <= matcher.groupCount();i++) {
                System.out.println(matcher.group());
            }
        }
    }
}

結果javascript

0
Benjava

  • 單字通配符.
public class PatternTest {
    private static Pattern pattern = Pattern.compile("Be.");

    public static void main(String[] args) {
        Matcher matcher = pattern.matcher("Hello,my name is Ben.");
        boolean result = matcher.find();
        if (result) {
            System.out.println(matcher.groupCount());
            for (int i = 0;i <= matcher.groupCount();i++) {
                System.out.println(matcher.group());
            }
        }
    }
}

結果正則表達式

0
Benapache

  • 匹配.自己
public class PatternTest {
    private static Pattern pattern = Pattern.compile("Be..");

    public static void main(String[] args) {
        Matcher matcher = pattern.matcher("Hello,my name is Ben.");
        boolean result = matcher.find();
        if (result) {
            System.out.println(matcher.groupCount());
            for (int i = 0;i <= matcher.groupCount();i++) {
                System.out.println(matcher.group());
            }
        }
    }
}

結果數組

0
Ben.mybatis

  • 找出限定字符的匹配

假設我如今能夠匹配出3個值app

public class PatternTest {
    private static Pattern pattern = Pattern.compile(".e.");

    public static void main(String[] args) {
        Matcher matcher = pattern.matcher("Hello,my name is Ben.");
        List<String> list = new ArrayList<>();
        while (matcher.find()) {
            list.add(matcher.group());
        }
        list.stream().forEach(System.out::println);
    }
}

結果spa

Hel
me
Bencode

我如今只想要Hel,Ben這兩個xml

public class PatternTest {
    private static Pattern pattern = Pattern.compile("[HB]e.");

    public static void main(String[] args) {
        Matcher matcher = pattern.matcher("Hello,my name is Ben.");
        List<String> list = new ArrayList<>();
        while (matcher.find()) {
            list.add(matcher.group());
        }
        list.stream().forEach(System.out::println);
    }
}

結果

Hel
Ben

  • 匹配數字

假設有這麼一段字符串"x1.xml s2.xml f3.xml dd.xml d5.xml",我如今要匹配s和d開頭的.xml字符

public class PatternTest {
    private static Pattern pattern = Pattern.compile("[sd].\.xml");

    public static void main(String[] args) {
        Matcher matcher = pattern.matcher("x1.xml s2.xml f3.xml dd.xml d5.xml");
        List<String> list = new ArrayList<>();
        while (matcher.find()) {
            list.add(matcher.group());
        }
        list.stream().forEach(System.out::println);
    }
}

結果

s2.xml
dd.xml
d5.xml

如今我改變了需求,我只須要中間爲數字的.xml字符,如今修改以下

public class PatternTest {
    private static Pattern pattern = Pattern.compile("[sd][0123456789]\.xml");

    public static void main(String[] args) {
        Matcher matcher = pattern.matcher("x1.xml s2.xml f3.xml dd.xml d5.xml");
        List<String> list = new ArrayList<>();
        while (matcher.find()) {
            list.add(matcher.group());
        }
        list.stream().forEach(System.out::println);
    }
}

結果

s2.xml
d5.xml

固然[0123456789]能夠簡寫爲[0-9]

public class PatternTest {
    private static Pattern pattern = Pattern.compile("[sd][0-9]\.xml");

    public static void main(String[] args) {
        Matcher matcher = pattern.matcher("x1.xml s2.xml f3.xml dd.xml d5.xml");
        List<String> list = new ArrayList<>();
        while (matcher.find()) {
            list.add(matcher.group());
        }
        list.stream().forEach(System.out::println);
    }
}

結果

s2.xml
d5.xml

而[0-9]又能夠寫成\d來表示

public class PatternTest {
    private static Pattern pattern = Pattern.compile("[sd]\d\.xml");

    public static void main(String[] args) {
        Matcher matcher = pattern.matcher("x1.xml s2.xml f3.xml dd.xml d5.xml");
        List<String> list = new ArrayList<>();
        while (matcher.find()) {
            list.add(matcher.group());
        }
        list.stream().forEach(System.out::println);
    }
}

結果

s2.xml
d5.xml

  • 匹配字符區間

再將上面的命題改一下,我只須要中間爲字母的.xml字符

public class PatternTest {
    private static Pattern pattern = Pattern.compile("[sd][a-z]\.xml");

    public static void main(String[] args) {
        Matcher matcher = pattern.matcher("x1.xml s2.xml f3.xml dd.xml d5.xml");
        List<String> list = new ArrayList<>();
        while (matcher.find()) {
            list.add(matcher.group());
        }
        list.stream().forEach(System.out::println);
    }
}

結果

dd.xml

  • 同時匹配字母和數字區間
public class PatternTest {
    private static Pattern pattern = Pattern.compile("[sd][0-9a-z]\.xml");

    public static void main(String[] args) {
        Matcher matcher = pattern.matcher("x1.xml s2.xml f3.xml dd.xml d5.xml s#.xml");
        List<String> list = new ArrayList<>();
        while (matcher.find()) {
            list.add(matcher.group());
        }
        list.stream().forEach(System.out::println);
    }
}

結果

s2.xml
dd.xml
d5.xml

[0-9a-z]又能夠寫成\w

public class PatternTest {
    private static Pattern pattern = Pattern.compile("[sd]\w\.xml");

    public static void main(String[] args) {
        Matcher matcher = pattern.matcher("x1.xml s2.xml f3.xml dd.xml d5.xml s#.xml");
        List<String> list = new ArrayList<>();
        while (matcher.find()) {
            list.add(matcher.group());
        }
        list.stream().forEach(System.out::println);
    }
}

結果

s2.xml
dd.xml
d5.xml

這裏須要注意的是\w不只包括字母和數字還包括_

public class PatternTest {
    private static Pattern pattern = Pattern.compile("[sd]\w\.xml");

    public static void main(String[] args) {
        Matcher matcher = pattern.matcher("x1.xml s2.xml f3.xml dd.xml d5.xml s#.xml s_.xml");
        List<String> list = new ArrayList<>();
        while (matcher.find()) {
            list.add(matcher.group());
        }
        list.stream().forEach(System.out::println);
    }
}

結果

s2.xml
dd.xml
d5.xml
s_.xml

因此下劃線_不在匹配範圍的時候請不要使用\w,而是使用[0-9a-zA-Z] (這裏包含了大寫)

  • 取非匹配

如今有這麼一段字符串"x1.xml s2.xml f3.xml dd.xml d5.xml s#.xml",我要匹配以s、d開頭的,中間不須要字母的.xml字符

public class PatternTest {
    private static Pattern pattern = Pattern.compile("[sd][^a-z]\.xml");

    public static void main(String[] args) {
        Matcher matcher = pattern.matcher("x1.xml s2.xml f3.xml dd.xml d5.xml s#.xml");
        List<String> list = new ArrayList<>();
        while (matcher.find()) {
            list.add(matcher.group());
        }
        list.stream().forEach(System.out::println);
    }
}

結果

s2.xml
d5.xml
s#.xml

固然我也能夠須要中間不爲數字的.xml字符

public class PatternTest {
    private static Pattern pattern = Pattern.compile("[sd][^0-9]\.xml");

    public static void main(String[] args) {
        Matcher matcher = pattern.matcher("x1.xml s2.xml f3.xml dd.xml d5.xml s#.xml");
        List<String> list = new ArrayList<>();
        while (matcher.find()) {
            list.add(matcher.group());
        }
        list.stream().forEach(System.out::println);
    }
}

結果

dd.xml
s#.xml

[^0-9]也能夠寫成\D

public class PatternTest {
    private static Pattern pattern = Pattern.compile("[sd]\D\.xml");

    public static void main(String[] args) {
        Matcher matcher = pattern.matcher("x1.xml s2.xml f3.xml dd.xml d5.xml s#.xml");
        List<String> list = new ArrayList<>();
        while (matcher.find()) {
            list.add(matcher.group());
        }
        list.stream().forEach(System.out::println);
    }
}

結果

dd.xml
s#.xml

若是我既不要字母也不要數字

public class PatternTest {
    private static Pattern pattern = Pattern.compile("[sd][^0-9^a-z]\.xml");

    public static void main(String[] args) {
        Matcher matcher = pattern.matcher("x1.xml s2.xml f3.xml dd.xml d5.xml s#.xml");
        List<String> list = new ArrayList<>();
        while (matcher.find()) {
            list.add(matcher.group());
        }
        list.stream().forEach(System.out::println);
    }
}

結果

s#.xml

[^0-9^a-z]也能夠寫成\W

public class PatternTest {
    private static Pattern pattern = Pattern.compile("[sd]\W\.xml");

    public static void main(String[] args) {
        Matcher matcher = pattern.matcher("x1.xml s2.xml f3.xml dd.xml d5.xml s#.xml");
        List<String> list = new ArrayList<>();
        while (matcher.find()) {
            list.add(matcher.group());
        }
        list.stream().forEach(System.out::println);
    }
}

結果

s#.xml

注意,\W雖然不包含字母和數字,也不包含_

public class PatternTest {
    private static Pattern pattern = Pattern.compile("[sd]\W\.xml");

    public static void main(String[] args) {
        Matcher matcher = pattern.matcher("x1.xml s2.xml f3.xml dd.xml d5.xml s#.xml s_.xml");
        List<String> list = new ArrayList<>();
        while (matcher.find()) {
            list.add(matcher.group());
        }
        list.stream().forEach(System.out::println);
    }
}

結果

s#.xml

因此,若是隻排除字母和數字而不排除下劃線_的狀況下依然使用[^0-9^a-z^A-Z] (此處包含了大寫)

public class PatternTest {
    private static Pattern pattern = Pattern.compile("[sd][^0-9^a-z]\.xml");

    public static void main(String[] args) {
        Matcher matcher = pattern.matcher("x1.xml s2.xml f3.xml dd.xml d5.xml s#.xml s_.xml");
        List<String> list = new ArrayList<>();
        while (matcher.find()) {
            list.add(matcher.group());
        }
        list.stream().forEach(System.out::println);
    }
}

結果

s#.xml
s_.xml

  • 匹配[]自己

[]和.都是正則表達式裏面的元子符,因此不能直接進行匹配,須要轉意

好比有一段javascript代碼"var myArray = new Array();if (myArray[0] == 0) {",咱們須要匹配出其中數組的[0],若是咱們這麼寫

public class PatternTest {
    private static Pattern pattern = Pattern.compile("[0]");

    public static void main(String[] args) {
        Matcher matcher = pattern.matcher("var myArray = new Array();if (myArray[0] == 0) {");
        List<String> list = new ArrayList<>();
        while (matcher.find()) {
            list.add(matcher.group());
        }
        list.stream().forEach(System.out::println);
    }
}

結果

0
0

則只會匹配出其中的數字0,而不是[0]自己,因此咱們須要修改以下

public class PatternTest {
    private static Pattern pattern = Pattern.compile("\[0\]");

    public static void main(String[] args) {
        Matcher matcher = pattern.matcher("var myArray = new Array();if (myArray[0] == 0) {");
        List<String> list = new ArrayList<>();
        while (matcher.find()) {
            list.add(matcher.group());
        }
        list.stream().forEach(System.out::println);
    }
}

結果

[0]

固然你要匹配全部的帶索引的數組,能夠用全數字匹配

public class PatternTest {
    private static Pattern pattern = Pattern.compile("\[[0-9]\]");

    public static void main(String[] args) {
        Matcher matcher = pattern.matcher("var myArray = new Array();if (myArray[0] == 0)" +
                " { myArray[1] = 1;");
        List<String> list = new ArrayList<>();
        while (matcher.find()) {
            list.add(matcher.group());
        }
        list.stream().forEach(System.out::println);
    }
}

結果

[0]
[1]

  • 匹配\符

同理\也是一個正則表達式的元字符

public class PatternTest {
    private static Pattern pattern = Pattern.compile("\\\");

    public static void main(String[] args) {
        Matcher matcher = pattern.matcher("homebensales");
        List<String> list = new ArrayList<>();
        while (matcher.find()) {
            list.add(matcher.group());
        }
        list.stream().forEach(System.out::println);
    }
}

結果

\
\
\
\

  • 匹配空白字符

固然咱們這裏說的空白字符並非說的空格,而是一些特殊的字符

元字符 說明
[b] 回退(並刪除)一個字符(Backspace鍵)
\f 換頁符
\n 換行符
\r 回車符
\t 製表符
\v 垂直製表符
public class PatternTest {
    private static Pattern pattern = Pattern.compile("\r\ntand");

    public static void main(String[] args) {
        Matcher matcher = pattern.matcher("you are right\r\n\tand good");
        List<String> list = new ArrayList<>();
        while (matcher.find()) {
            list.add(matcher.group());
        }
        list.stream().forEach(System.out::println);
        System.out.println("you are right\r\n\tand good");
    }
}

結果


    and
you are right
    and good

而\s能夠代替這裏任意一個空白字符

public class PatternTest {
    private static Pattern pattern = Pattern.compile("\\s\\s\\sand");

    public static void main(String[] args) {
        Matcher matcher = pattern.matcher("you are right\r\n\tand good");
        List<String> list = new ArrayList<>();
        while (matcher.find()) {
            list.add(matcher.group());
        }
        list.stream().forEach(System.out::println);
        System.out.println("you are right\r\n\tand good");
    }
}

結果


    and
you are right
    and good

\S表明任意一個非空白字符(空白字符包括空格)

public class PatternTest {
    private static Pattern pattern = Pattern.compile("\\Snd");

    public static void main(String[] args) {
        Matcher matcher = pattern.matcher("you are right\r\n\tand good");
        List<String> list = new ArrayList<>();
        while (matcher.find()) {
            list.add(matcher.group());
        }
        list.stream().forEach(System.out::println);
        System.out.println("you are right\r\n\tand good");
    }
}

結果

and
you are right
    and good

  • 使用十六進制和八進制數ascii值來匹配字符

用a的十六進制0x61來匹配

public class PatternTest {
    private static Pattern pattern = Pattern.compile("\\x61..");

    public static void main(String[] args) {
        Matcher matcher = pattern.matcher("you are 10 years");
        List<String> list = new ArrayList<>();
        while (matcher.find()) {
            list.add(matcher.group());
        }
        list.stream().forEach(System.out::println);
    }
}

結果

are
ars

用a的八進制0o141來匹配

public class PatternTest {
    private static Pattern pattern = Pattern.compile("\\0141..");

    public static void main(String[] args) {
        Matcher matcher = pattern.matcher("you are 10 years");
        List<String> list = new ArrayList<>();
        while (matcher.find()) {
            list.add(matcher.group());
        }
        list.stream().forEach(System.out::println);
    }
}

結果

are
ars

  • 匹配一個或多個字符

好比說匹配一個電子郵件

public class PatternTest {
    private static Pattern pattern = Pattern.compile("\\w+@\\w+\\.\\w+");

    public static void main(String[] args) {
        Matcher matcher = pattern.matcher("my e-mail is boot@123.com");
        List<String> list = new ArrayList<>();
        while (matcher.find()) {
            list.add(matcher.group());
        }
        list.stream().forEach(System.out::println);
    }
}

結果

boot@123.com

這裏面\\w+表示匹配包括數字,字母,下劃線_的多個字符,其中+也是一個元字符,要匹配+自己也須要使用轉義字符\+

但若是我把e-mail地址改爲這樣ben.boot@123.ben.com,匹配結果如何呢

public class PatternTest {
    private static Pattern pattern = Pattern.compile("\\w+@\\w+\\.\\w+");

    public static void main(String[] args) {
        Matcher matcher = pattern.matcher("my e-mail is ben.boot@123.ben.com");
        List<String> list = new ArrayList<>();
        while (matcher.find()) {
            list.add(matcher.group());
        }
        list.stream().forEach(System.out::println);
    }
}

結果

boot@123.ben

這並非咱們想要的e-mail地址,因此要將正則表達式進行調整

public class PatternTest {
    private static Pattern pattern = Pattern.compile("[\\w.]+@[\\w.]+\\.\\w+");

    public static void main(String[] args) {
        Matcher matcher = pattern.matcher("my e-mail is ben.boot@123.ben.com");
        List<String> list = new ArrayList<>();
        while (matcher.find()) {
            list.add(matcher.group());
        }
        list.stream().forEach(System.out::println);
    }
}

結果

ben.boot@123.ben.com

[w.]+表示能夠匹配包括字母、數字、下劃線加.的多個字符,它等同於[w.]+

public class PatternTest {
    private static Pattern pattern = Pattern.compile("[\\w\\.]+@[\\w\\.]+\\.\\w+");

    public static void main(String[] args) {
        Matcher matcher = pattern.matcher("my e-mail is ben.boot@123.ben.com");
        List<String> list = new ArrayList<>();
        while (matcher.find()) {
            list.add(matcher.group());
        }
        list.stream().forEach(System.out::println);
    }
}

結果

ben.boot@123.ben.com

  • 匹配零個或多個字符

我如今有一段字符串"@Mr.Li @@Mr.Li Mr.Li",我要把這三種狀況都給匹配出來,若是這樣寫的話

public class PatternTest {
    private static Pattern pattern = Pattern.compile("@+[\\w.]+");

    public static void main(String[] args) {
        Matcher matcher = pattern.matcher("@Mr.Li @@Mr.Li Mr.Li");
        List<String> list = new ArrayList<>();
        while (matcher.find()) {
            list.add(matcher.group());
        }
        list.stream().forEach(System.out::println);
    }
}

結果

@Mr.Li
@@Mr.Li

很明顯,它只能匹配出前面兩個,而沒有@的匹配不出來,現作出修改

public class PatternTest {
    private static Pattern pattern = Pattern.compile("@*[\\w.]+");

    public static void main(String[] args) {
        Matcher matcher = pattern.matcher("@Mr.Li @@Mr.Li Mr.Li");
        List<String> list = new ArrayList<>();
        while (matcher.find()) {
            list.add(matcher.group());
        }
        list.stream().forEach(System.out::println);
    }
}

結果

@Mr.Li
@@Mr.Li
Mr.Li

從結果能夠看出,*相比於+,它能夠容許字符有多個,也能夠沒有爲零個。而+則必須有一個字符。

  • 匹配零個或一個字符

我如今要匹配兩個網址,一個是http的,一個是https的,"http://www.baidu.com/ https://www.baidu.com/",

public class PatternTest {
    private static Pattern pattern = Pattern.compile("https*://[\\w./]+");

    public static void main(String[] args) {
        Matcher matcher = pattern.matcher("http://www.baidu.com/ https://www.baidu.com/");
        List<String> list = new ArrayList<>();
        while (matcher.find()) {
            list.add(matcher.group());
        }
        list.stream().forEach(System.out::println);
    }
}

結果

http://www.baidu.com/
https://www.baidu.com/

這樣寫雖然能夠把兩個都匹配出來,那假如字符串中有httpssssss://www.baidu.com/,可是這一段並非我要的

public class PatternTest {
    private static Pattern pattern = Pattern.compile("https*://[\\w./]+");

    public static void main(String[] args) {
        Matcher matcher = pattern.matcher("http://www.baidu.com/ https://www.baidu.com/ httpssssss://www.baidu.com/");
        List<String> list = new ArrayList<>();
        while (matcher.find()) {
            list.add(matcher.group());
        }
        list.stream().forEach(System.out::println);
    }
}

結果

http://www.baidu.com/
https://www.baidu.com/
httpssssss://www.baidu.com/

現修改以下

public class PatternTest {
    private static Pattern pattern = Pattern.compile("https?://[\\w./]+");

    public static void main(String[] args) {
        Matcher matcher = pattern.matcher("http://www.baidu.com/ https://www.baidu.com/ httpssssss://www.baidu.com/");
        List<String> list = new ArrayList<>();
        while (matcher.find()) {
            list.add(matcher.group());
        }
        list.stream().forEach(System.out::println);
    }
}

結果

http://www.baidu.com/
https://www.baidu.com/

從結果能夠看出,?相比於*,它只匹配一個或零個字符,而*能夠匹配多個或零個字符。

  • 匹配的重複次數以及單詞邊界

咱們都知道,顏色的RGB值是一個6位的十六進制數,我如今有一個字符串"#336633 #FFFFFF #1123FD335D "

我如今要取前面兩個RGB值,而第三個值並非咱們所須要的

public class PatternTest {
    private static Pattern pattern = Pattern.compile("#[0-9a-zA-Z]+");

    public static void main(String[] args) {
        Matcher matcher = pattern.matcher("#336633 #FFFFFF #1123FD335D");
        List<String> list = new ArrayList<>();
        while (matcher.find()) {
            list.add(matcher.group());
        }
        list.stream().forEach(System.out::println);
    }
}

結果

#336633
#FFFFFF
#1123FD335D

很明顯用+號會把第三個值也匹配進來,現作出修改

public class PatternTest {
    private static Pattern pattern = Pattern.compile("#[0-9a-zA-Z]{6}\\b");

    public static void main(String[] args) {
        Matcher matcher = pattern.matcher("#336633 #FFFFFF #1123FD335D");
        List<String> list = new ArrayList<>();
        while (matcher.find()) {
            list.add(matcher.group());
        }
        list.stream().forEach(System.out::println);
    }
}

結果

#336633
#FFFFFF

注意這裏若是不以\b的結尾,#1123FD也會被匹配出來,它表明一種單詞邊界。#[0-9a-zA-Z]{6}的意思就是說,從字母、數字集合中匹配前6個出來。

  • 爲重複匹配次數設定一個區間

咱們來看匹配日期的一個例子,咱們要求年份必須是2位到4位,現有這樣的幾組格式"4/8/03 10-6-2004 2/2/2 01-01-01"

public class PatternTest {
    private static Pattern pattern = Pattern.compile("\\d{1,2}[-/]\\d{1,2}[-/]\\d{2,4}");

    public static void main(String[] args) {
        Matcher matcher = pattern.matcher("4/8/03 10-6-2004 2/2/2 01-01-01");
        List<String> list = new ArrayList<>();
        while (matcher.find()) {
            list.add(matcher.group());
        }
        list.stream().forEach(System.out::println);
    }
}

結果

4/8/03
10-6-2004
01-01-01

其中\\d{1,2}的意思爲1到2位任意數字以及\\d{2,4}爲2到4位任意數字,這裏須要注意的是{}可重複的數字能夠是0,也就是說?能夠等價於{0,1}

  • 至少重複多少次

假設有一組錢的數字,咱們須要匹配出至少上百元的數額,"$496.80 $1290.43 $24.25 $7.61 $414.32 $21.00"

public class PatternTest {
    private static Pattern pattern = Pattern.compile("\\$\\d{3,}\\.\\d{2}");

    public static void main(String[] args) {
        Matcher matcher = pattern.matcher("$496.80 $1290.43 $24.25 $7.61 $414.32 $21.00");
        List<String> list = new ArrayList<>();
        while (matcher.find()) {
            list.add(matcher.group());
        }
        list.stream().forEach(System.out::println);
    }
}

結果

$496.80
$1290.43
$414.32

其中\\d{3,}表示匹配的數字最少要3個起,最多不限

  • 防止過分匹配

在HTML文件中有這麼一段代碼"<B>I like you</B> and <B>I love you</B>",我如今須要匹配<B>和</B>之間。

public class PatternTest {
    private static Pattern pattern = Pattern.compile("<B>.*</B>");

    public static void main(String[] args) {
        Matcher matcher = pattern.matcher("<B>I like you</B> and <B>I love you</B>");
        List<String> list = new ArrayList<>();
        while (matcher.find()) {
            list.add(matcher.group());
        }
        list.stream().forEach(System.out::println);
    }
}

結果

<B>I like you</B> and <B>I love you</B>

結果它把and也匹配進來了,也就是說它把第一個<B>匹配了最後一個</B>,而咱們的本意是兩兩匹配,並不須要中間的and,現作出修改

public class PatternTest {
    private static Pattern pattern = Pattern.compile("<B>.*?</B>");

    public static void main(String[] args) {
        Matcher matcher = pattern.matcher("<B>I like you</B> and <B>I love you</B>");
        List<String> list = new ArrayList<>();
        while (matcher.find()) {
            list.add(matcher.group());
        }
        list.stream().forEach(System.out::println);
    }
}

結果

<B>I like you</B>
<B>I love you</B>

其緣由是+和*都是貪婪性元字符,它們在匹配時的行爲模式是多多益善而不是適可而止的。而與之對應的是它們的懶惰型版本,而懶惰型元字符只須要在貪婪型後面加上一個?的後綴便可。

貪婪型元字符 懶惰型元字符
* *?
+ +?
{n,} {n,}?
  • 非單詞邊界

前面咱們說了\b表明單詞的邊界,可是一個單獨的-並不構成一個單詞

public class PatternTest {
    private static Pattern pattern = Pattern.compile("\\b-\\b");

    public static void main(String[] args) {
        Matcher matcher = pattern.matcher("passkey color - coded");
        List<String> list = new ArrayList<>();
        while (matcher.find()) {
            list.add(matcher.group());
        }
        list.stream().forEach(System.out::println);
    }
}

運行是沒有任何打印輸出的,要匹配這個單獨的-,能夠修改以下

public class PatternTest {
    private static Pattern pattern = Pattern.compile("\\B-\\B");

    public static void main(String[] args) {
        Matcher matcher = pattern.matcher("passkey color - coded");
        List<String> list = new ArrayList<>();
        while (matcher.find()) {
            list.add(matcher.group());
        }
        list.stream().forEach(System.out::println);
    }
}

結果

-

因而可知要匹配非單詞邊界的字符,可使用\B

  • 字符串邊界

如今咱們要檢測這樣一個文件的內容是否是一個正確mybatis的mapper xml文件

"<?xml version=\"1.0\" encoding=\"UTF-8\" ?>\n" +
        "<!DOCTYPE mapper PUBLIC \"-//ibatis.apache.org//DTD Mapper 3.0//EN\"\n" +
        "\t\t\"http://mybatis.org/dtd/mybatis-3-mapper.dtd\">"

若是咱們只是這樣去檢測的話

public class PatternTest {
    private static Pattern pattern = Pattern.compile("<\\?xml.*\\?>");

    public static void main(String[] args) {
        Matcher matcher = pattern.matcher("<?xml version=\"1.0\" encoding=\"UTF-8\" ?>\n" +
                "<!DOCTYPE mapper PUBLIC \"-//ibatis.apache.org//DTD Mapper 3.0//EN\"\n" +
                "\t\t\"http://mybatis.org/dtd/mybatis-3-mapper.dtd\">");
        List<String> list = new ArrayList<>();
        while (matcher.find()) {
            list.add(matcher.group());
        }
        list.stream().forEach(System.out::println);
    }
}

結果

<?xml version="1.0" encoding="UTF-8" ?>

那若是在文件內容的前面隨意加了一些字符

public class PatternTest {
    private static Pattern pattern = Pattern.compile("<?xml.*\\?>");

    public static void main(String[] args) {
        Matcher matcher = pattern.matcher("This is bad,real bad! <?xml version=\"1.0\" encoding=\"UTF-8\" ?>\n" +
                "<!DOCTYPE mapper PUBLIC \"-//ibatis.apache.org//DTD Mapper 3.0//EN\"\n" +
                "\t\t\"http://mybatis.org/dtd/mybatis-3-mapper.dtd\">");
        List<String> list = new ArrayList<>();
        while (matcher.find()) {
            list.add(matcher.group());
        }
        list.stream().forEach(System.out::println);
    }
}

結果

<?xml version="1.0" encoding="UTF-8" ?>

一樣含有這樣的代碼,可是整個xml文件的結構就被破壞掉了,它就再也不是一個合法的xml文件,修改檢測條件以下

public class PatternTest {
    private static Pattern pattern = Pattern.compile("^s*<\\?xml.*?>");

    public static void main(String[] args) {
        Matcher matcher = pattern.matcher("This is bad,real bad! <?xml version=\"1.0\" encoding=\"UTF-8\" ?>\n" +
                "<!DOCTYPE mapper PUBLIC \"-//ibatis.apache.org//DTD Mapper 3.0//EN\"\n" +
                "\t\t\"http://mybatis.org/dtd/mybatis-3-mapper.dtd\">");
        List<String> list = new ArrayList<>();
        while (matcher.find()) {
            list.add(matcher.group());
        }
        list.stream().forEach(System.out::println);
    }
}

運行後沒有任何打印結果,說明它不是一個合格的xml文件

public class PatternTest {
    private static Pattern pattern = Pattern.compile("^\s*<\?xml.*\?>");

    public static void main(String[] args) {
        Matcher matcher = pattern.matcher(" <?xml version=\"1.0\" encoding=\"UTF-8\" ?>\n" +
                "<!DOCTYPE mapper PUBLIC \"-//ibatis.apache.org//DTD Mapper 3.0//EN\"\n" +
                "\t\t\"http://mybatis.org/dtd/mybatis-3-mapper.dtd\">");
        List<String> list = new ArrayList<>();
        while (matcher.find()) {
            list.add(matcher.group());
        }
        list.stream().forEach(System.out::println);
    }
}

只有<?xml version="1.0" encoding="UTF-8" ?>位於文件開頭的時候,才能說明這是一個合格的xml文件,即使前面有幾個空白符號,都是能夠承認的。

因此^在這裏是做爲一個字符串的開頭符而存在的

固然還有相對應的結尾符

public class PatternTest {
    private static Pattern pattern = Pattern.compile("\"http:.*.dtd\">\s*$");

    public static void main(String[] args) {
        Matcher matcher = pattern.matcher(" <?xml version=\"1.0\" encoding=\"UTF-8\" ?>\n" +
                "<!DOCTYPE mapper PUBLIC \"-//ibatis.apache.org//DTD Mapper 3.0//EN\"\n" +
                "\t\t\"http://mybatis.org/dtd/mybatis-3-mapper.dtd\">");
        List<String> list = new ArrayList<>();
        while (matcher.find()) {
            list.add(matcher.group());
        }
        list.stream().forEach(System.out::println);
    }
}

結果

"http://mybatis.org/dtd/mybatis-3-mapper.dtd">

\\s*$在這裏是做爲字符串的結尾符來處理的

若是在結尾處增長其餘字符(非空白字符)將沒法匹配

public class PatternTest {
    private static Pattern pattern = Pattern.compile("\"http:.*\\.dtd\">\\s*$");

    public static void main(String[] args) {
        Matcher matcher = pattern.matcher(" <?xml version=\"1.0\" encoding=\"UTF-8\" ?>\n" +
                "<!DOCTYPE mapper PUBLIC \"-//ibatis.apache.org//DTD Mapper 3.0//EN\"\n" +
                "\t\t\"http://mybatis.org/dtd/mybatis-3-mapper.dtd\">This is bad,really bad");
        List<String> list = new ArrayList<>();
        while (matcher.find()) {
            list.add(matcher.group());
        }
        list.stream().forEach(System.out::println);
    }
}

此時結果沒有任何輸出。

  • 分行匹配模式

我如今要匹配一段代碼全部的帶//的註釋以及註釋前面的空格

public class PatternTest {
    private static Pattern pattern = Pattern.compile("(?m)^\\s*//.*$");

    public static void main(String[] args) {
        Matcher matcher = pattern.matcher("//這是一個開頭\n" +
                "    public void print() {\n" +
                "        System.out.println("I am in Boot ClassLoader\");\n" +
                "    }\n" +
                "    //這是一個結尾");
        List<String> list = new ArrayList<>();
        while (matcher.find()) {
            list.add(matcher.group());
        }
        list.stream().forEach(System.out::println);
    }
}

結果

//這是一個開頭
    //這是一個結尾

(?m)帶上^以及$,^表明對每一行的開頭和$表明每一行的結尾結尾

相關文章
相關標籤/搜索
本站公眾號
   歡迎關注本站公眾號,獲取更多信息