正則表達式是一種定義的規則,Linux工具能夠用它來過濾文本。node
[root@node1 ~]# echo "this is a cat" | sed -n '/cat/p' this is a cat [root@node1 ~]# echo "this is a cat" | gawk '/cat/{print $0}' this is a cat
正則表達式的匹配很是挑剔,尤爲須要記住,正則表達式區分大小寫。正則表達式
正則表達式識別的特殊字符包括:工具
.*[]^${}\+?|()this
若是要使用某個特殊字符做爲文本字符,就必須轉義,通常用(\)來轉義。blog
[root@node1 ~]# echo "this is a $" | sed -n '/\$/p' this is a $
有兩個特殊字符能夠用來將模式鎖定在數據流的行首或行尾class
脫字符(^)定義從數據流中文本行的行首開始的模式。test
美圓符($)定義了行尾錨點。awk
[root@node1 ~]# echo "this is a cat" | sed -n '/^this/p' this is a cat [root@node1 ~]# echo "this is a cat" | sed -n '/cat$/p' this is a cat
在一些狀況下能夠組合使用這兩個命令基礎
1.好比查找只含有特定文本的行sed
[root@node1 ljy]# more test.txt this is a dog what how this is a cat is a dog [root@node1 ljy]# sed -n '/^is a dog$/p' test.txt is a dog [root@node
2.兩個錨點組合起來,能夠直接過濾空白行
[root@node1 ljy]# more test.txt this is a dog what how this is a cat is a dog [root@node1 ljy]# sed '/^$/d' test.txt this is a dog what how this is a cat is a dog
點號用來匹配除換行符外的任意單個字符,他必須匹配一個字符。
[root@node1 ljy]# more test.txt this is a dog what how this is a cat is a dog at [root@node1 ljy]# sed -n '/.at/p' test.txt what this is a cat
限定待匹配的具體字符,使用字符組。使用方括號來定義一個字符組。
[root@node1 ljy]# more test.txt this is a dog this is a Dog this is a DoG this is a cat [root@node1 ljy]# sed -n '/[dD]og/p' test.txt this is a dog this is a Dog [root@node1 ljy]# sed -n '/[dD]o[gG]/p' test.txt this is a dog this is a Dog this is a DoG
要排除某些特定的元素,要在字符組前面加個脫字符。
[root@node1 ljy]# sed -n '/[dD]o[gG]/p' test.txt this is a dog this is a Dog this is a DoG [root@node1 ljy]# sed -n '/[^D]og/p' test.txt this is a dog
正則表達式會包括此區間內的任意字符。
[root@node1 ljy]# more test.txt 123123 1231 121222222 412345341613 vsdvs qwer12344123 12345 34211 444444 [root@node1 ljy]# sed -n '/^[0-9][0-9][0-9][0-9][0-9]$/p' test.txt 12345 34211
問號代表前面的字符出現0次或者1次,僅限於此。
[root@node1 ljy]# echo "bat" | gawk '/ba?t/{print $0}' bat [root@node1 ljy]# echo "baat" | gawk '/ba?t/{print $0}' [root@node1 ljy]# echo "bt" | gawk '/ba?t/{print $0}' bt
能夠將問號和字符組一塊兒使用
[root@node1 ljy]# echo "bt" | gawk '/b[ae]?t/{print $0}' bt [root@node1 ljy]# echo "bat" | gawk '/b[ae]?t/{print $0}' bat [root@node1 ljy]# echo "bet" | gawk '/b[ae]?t/{print $0}' bet [root@node1 ljy]# echo "baat" | gawk '/b[ae]?t/{print $0}'
加號代表前面的字符能夠出現一次或屢次,但至少是1次。
[root@node1 ljy]# echo "baat" | gawk '/b[ae]+t/{print $0}' baat [root@node1 ljy]# echo "bt" | gawk '/b[ae]+t/{print $0}' [root@node1 ljy]# echo "bt" | gawk '/ba+t/{print $0}' [root@node1 ljy]# echo "bat" | gawk '/ba+t/{print $0}' bat [root@node1 ljy]# echo "baat" | gawk '/ba+t/{print $0}' baat
ERE中的花括號容許你爲可重複的正則表達式規定上下限。
m,n最少出現m此,最多出現n次。
[root@node1 ljy]# echo "baat" | gawk '/b[ae]{1,2}t/{print $0}' baat [root@node1 ljy]# echo "baaat" | gawk '/b[ae]{1,2}t/{print $0}'
用邏輯or的方式指定正則表達式規則,其中一個條件符合要就便可。
正則表達式分組也能夠用圓括號進行分組。
[root@node1 ljy]# echo "bat" | gawk '/b(a|e)t/{print $0}' bat [root@node1 ljy]# echo "baat" | gawk '/b(a|e)t/{print $0}' [root@node1 ljy]# echo "bet" | gawk '/b(a|e)t/{print $0}' bet