正則表達式與三劍客

第十二課 正則表達式

1、正則介紹php

2、grephtml

3、Sed正則表達式

4、awkspring

5、擴展shell


1、正則表達式介紹

正則表達式(Regular Express,RE)是一種字符模式,用於在查找過程當中匹配指定的字符。編程

元字符是這樣一類字符,它們表達的是不一樣於字面自己的含義。正則表達式的元字符由各類執行模式匹配操做的程序來解析,如:vi、grep、sed和awk等。數組

能被UNIX/Linux上全部的模式匹配工具識別的基本元字符sass

元字符 功能 示例 匹配對象
^ 行首定位符 /^love/ 匹配全部以love開頭的行
$ 行尾定位符 /love$/ 匹配全部以love結尾的行
. 匹配單個字符 /l..e/ 匹配包含一個l,後面跟兩字符,再跟一個e的行
* 匹配0個或多個重複的位於*號前的字符 / *love/ 匹配包含跟在0個或多個空格後的模式love行
[] 匹配一組字符中任一個 /[Ll]ove/ 匹配包含love或Love的行
[x-y] 匹配指定範圍內的一個字符 /[A-Z]ove/ 匹配大寫字母后面跟着ove的字符
[^] 匹配不在指定組內的字符 /[^A-Z]/ 匹配不在範圍A-Z之間的任意一個字符
\ 用來轉義元字符 /love./ 匹配包含love,後面跟一個句點。

擴展元字符,使用RE元字符的UNIX/Linux程序支持(不必定全部的模式匹配工具都支持)bash

column column column column
< 詞首定位符 /<love/ 匹配包含以love開頭的詞的行
> 詞尾定位符 /love>/ 匹配包含以love結尾的詞的行
\(..\) 匹配稍後將要使用的字符的標籤 /(love) able \1er/ 最多9個可用標籤。模式中最左邊的是第一個。左例中模式love被保存爲標籤1,用\1表示
x\{m\}或x\{m,\} 或x\{m,n\} 字符x的重複出現:m次,至少m次,至少m次且不超過n次 o\{5,10\} 匹配包含5~10個連續的字母o的行

基本元字符示例文件app

//,以grep程序演示
root@lanquark:~/unixshellbysample/chap03# cat picnic 
I had a lovely time on our little picnic.
Lovers were all around us. It is springtime. Oh
love, how much I adore you. Do you know
the extent of my love? Oh, by the way, I think
I lost my gloves somewhere out in that field of
clover. Did you see them?  I can only hope love
is forever. I live for you. It's hard to get back in the
groove.

簡單正則表達式查找

root@lanquark:~/demo# grep 'love' picnic 
I had a lovely time on our little picnic.
love, how much I adore you. Do you know
the extent of my love? Oh, by the way, I think
I lost my gloves somewhere out in that field of
clover. Did you see them?  I can only hope love

行首定位符

root@lanquark:~/demo# grep '^love' picnic 
love, how much I adore you. Do you know

行尾定位符

root@lanquark:~/demo# grep 'love$' picnic 
clover. Did you see them?  I can only hope love

任意單個字符(.

root@lanquark:~/demo# grep 'l.ve' picnic 
I had a lovely time on our little picnic.
love, how much I adore you. Do you know
the extent of my love? Oh, by the way, I think
I lost my gloves somewhere out in that field of
clover. Did you see them?  I can only hope love
is forever. I live for you. It's hard to get back in the

零個或多個前字符(*)

root@lanquark:~/demo# grep 'o*ve' picnic 
I had a lovely time on our little picnic.
Lovers were all around us. It is springtime. Oh
love, how much I adore you. Do you know
the extent of my love? Oh, by the way, I think
I lost my gloves somewhere out in that field of
clover. Did you see them?  I can only hope love
is forever. I live for you. It's hard to get back in the
groove.

一組字符([ ])

root@lanquark:~/demo# grep '[Ll]ove' picnic 
I had a lovely time on our little picnic.
Lovers were all around us. It is springtime. Oh
love, how much I adore you. Do you know
the extent of my love? Oh, by the way, I think
I lost my gloves somewhere out in that field of
clover. Did you see them?  I can only hope love

一個字符範圍([ - ])

root@lanquark:~/demo# grep 'ove[a-z]' picnic 
I had a lovely time on our little picnic.
Lovers were all around us. It is springtime. Oh
I lost my gloves somewhere out in that field of
clover. Did you see them?  I can only hope love

不在組內的字符([^ ])

root@lanquark:~/demo# grep 'ove[^a-zA-Z0-9]' picnic 
love, how much I adore you. Do you know
the extent of my love? Oh, by the way, I think
groove.

擴展元字符演示文件

//以grep或sed程序演示
root@lanquark:~/unixshellbysample/chap03# cat textfile 
Unusual occurrences happened at the fair.
Patty won fourth place in the 50 yard dash square and fair.
Occurrences like this are rare.
The winning ticket is 55222.
The ticket I got is 54333 and Dee got 55544.
Guy fell down while running around the south bend in his last event.

詞首定位符(\<)和詞尾定位符(\>)

root@lanquark:~/demo# grep '\<fourth\>' textfile 
Patty won fourth place in the 50 yard dash square and fair.

用\(和\)記錄模式

//occurrence替換成occurence或Occurrence替換成Occurence
root@lanquark:~/unixshellbysample/chap03# sed 's#\([Oo]ccur\)rence#\1enece#' textfile 
Unusual occureneces happened at the fair.
Patty won fourth place in the 50 yard dash square and fair.
Occureneces like this are rare.
The winning ticket is 55222.
The ticket I got is 54333 and Dee got 55544.
Guy fell down while running around the south bend in his last event.


2、grep

grep表示全局查找正則表達式並打印結果行。

grep不會對輸入文件進行任何修改或變化

命令格式

grep word filename

root@lanquark:~# grep hjm /etc/passwd
hjm:x:5000:5000:hjm:/home/hjm:/bin/bash

grep使用的正則表達式元字符

元字符 功能 示例 匹配對象
^ 行首定位符 '^love' 匹配全部以love開頭的行
$ 行尾定位符 'love$' 匹配全部以love結尾的行
. 匹配單個字符 'l..e' 匹配包含一個l,後面跟兩字符,再跟一個e的行
* 匹配0個或多個重複的位於*號前的字符 ' *love' 匹配包含跟在0個或多個空格後的模式love行
[ ] 匹配一組字符中任一個 '[Ll]ove' 匹配包含love或Love的行
[^] 匹配不在指定組內的字符 '[^A-K]' 匹配不在範圍A-Z之間的任意一個字符
\ 用來轉義元字符 'love.' 匹配包含love,後面跟一個句點。
< 詞首定位符 '<love' 匹配包含以love開頭的詞的行
> 詞尾定位符 'love>/' 匹配包含以love結尾的詞的行
\(..\) 匹配稍後將要使用的字符的標籤 '(love)ing' 最多9個可用標籤。模式中最左邊的是第一個。左例中模式love被保存爲標籤1,用\1表示
x\{m\}或x\{m,\} 或x\{m,n\} 字符x的重複出現:m次,至少m次,至少m次且不超過n次 o\{5,10\} 匹配包含5~10個連續的字母o的行
//演示文件
root@lanquark:~/demo# cat datafile 
northwest   NW  Charles Main        3.0 .98 3   34
western     WE  Sharon Gray     5.3 .97 5   23
southwest   SW  Lewis Dalsass       2.7 .8  2   18
southern    SO  Suan Chin       5.1 .95 4   15
southeast   SE  Patricia Hemenway   4.0 .7  4   17
eastern     EA  TB Savage       4.4 .84 5   20
northeast   NE  AM Main Jr.     5.1 .94 3   13
north       NO  Margot Weber        4.5 .89 5    9
central     CT  Ann Stephens        5.7 .94 5   13

打印全部包含NW的行

root@lanquark:~/demo# grep NW datafile 
northwest   NW  Charles Main        3.0 .98 3   34

打印以字母n開頭的行

root@lanquark:~/demo# grep '^n' datafile 
northwest   NW  Charles Main        3.0 .98 3   34
northeast   NE  AM Main Jr.     5.1 .94 3   13
north       NO  Margot Weber        4.5 .89 5    9

打印以數字4結尾的行

root@lanquark:~/demo# grep '4$' datafile 
northwest   NW  Charles Main        3.0 .98 3   34

打印以字母w或e開頭的行

root@lanquark:~/demo# grep '^[we]' datafile 
western     WE  Sharon Gray     5.3 .97 5   23
eastern     EA  TB Savage       4.4 .84 5   20

打印包含非數字的全部行

root@lanquark:~/demo# grep '^[we]' datafile 
western     WE  Sharon Gray     5.3 .97 5   23
eastern     EA  TB Savage       4.4 .84 5   20
root@lanquark:~/demo# grep '[^0-9]' datafile 
northwest   NW  Charles Main        3.0 .98 3   34
western     WE  Sharon Gray     5.3 .97 5   23
southwest   SW  Lewis Dalsass       2.7 .8  2   18
southern    SO  Suan Chin       5.1 .95 4   15
southeast   SE  Patricia Hemenway   4.0 .7  4   17
eastern     EA  TB Savage       4.4 .84 5   20
northeast   NE  AM Main Jr.     5.1 .94 3   13
north       NO  Margot Weber        4.5 .89 5    9
central     CT  Ann Stephens        5.7 .94 5   13

打印全部包含一個s,後跟0個或多個連着的s和一個空格的文本行。

root@lanquark:~/demo# grep 'ss* ' datafile 
northwest   NW  Charles Main        3.0 .98 3   34
southwest   SW  Lewis Dalsass       2.7 .8  2   18

打印至少9個小寫字母連在一塊兒的行

root@lanquark:~/demo# grep '[a-z]\{9\}' datafile 
northwest   NW  Charles Main        3.0 .98 3   34
southwest   SW  Lewis Dalsass       2.7 .8  2   18
southeast   SE  Patricia Hemenway   4.0 .7  4   17
northeast   NE  AM Main Jr.     5.1 .94 3   13

打印包含一個3後面跟一個句點和一個數字,再任意多個字符,而後跟一個3

root@lanquark:~/demo# grep '\(3\)\.[0-9].*\1' datafile 
northwest   NW  Charles Main        3.0 .98 3   34

打印全部包含以north開頭的單詞的行

root@lanquark:~/demo# grep '\<north' datafile 
northwest   NW  Charles Main        3.0 .98 3   34
northeast   NE  AM Main Jr.     5.1 .94 3   13
north       NO  Margot Weber        4.5 .89 5    9
root@lanquark:~/demo# grep '\<north\>' datafile 
north       NO  Margot Weber        4.5 .89 5    9

經常使用grep選項

選項 功能
-c 顯示匹配到的行的數目,而不是顯示行的內容
-i 比較字符時忽略大小寫
-l 只列出匹配行所在的文件的文件名
-n 在每一行前面加上它在文件中的相對行號
-v 反向查找,只顯示不匹配的行
-w 把表達式作爲詞來查,就好像被<和>所包含同樣
-A 匹配到模式所在行的後兩行
-B 匹配到模式行所在行的前兩行
-C 匹配到模式所在行的先後兩行
-R 對列出的目錄,遞歸的讀取並處理這些目錄中的全部文件,也就是指該下目錄下的全部目錄

示例文件

root@lanquark:~/demo# cat datafile 
northwest   NW  Charles Main        3.0 .98 3   34
western     WE  Sharon Gray     5.3 .97 5   23
southwest   SW  Lewis Dalsass       2.7 .8  2   18
southern    SO  Suan Chin       5.1 .95 4   15
southeast   SE  Patricia Hemenway   4.0 .7  4   17
eastern     EA  TB Savage       4.4 .84 5   20
northeast   NE  AM Main Jr.     5.1 .94 3   13
north       NO  Margot Weber        4.5 .89 5    9
central     CT  Ann Stephens        5.7 .94 5   13

-c選項打印以south開頭的單的數量

root@lanquark:~/demo# grep -c '^south' datafile 
3

-i選項忽略大小

root@lanquark:~/demo# grep -i 'pat' datafile 
southeast   SE  Patricia Hemenway   4.0 .7  4   17

-l選項只顯示包含模式的文件名而不輸出文本

root@lanquark:~/demo# grep -l 'SE' *
datafile
temp

-n選項在找到指定模式的行前面加上其行號

root@lanquark:~/demo# grep -n '^south' datafile 
3:southwest SW  Lewis Dalsass       2.7 .8  2   18
4:southern  SO  Suan Chin       5.1 .95 4   15
5:southeast     SE  Patricia Hemenway   4.0 .7  4   17

-v表示取反

root@lanquark:~/demo# grep -v 'Suan Chin' datafile 
northwest   NW  Charles Main        3.0 .98 3   34
western     WE  Sharon Gray     5.3 .97 5   23
southwest   SW  Lewis Dalsass       2.7 .8  2   18
southeast   SE  Patricia Hemenway   4.0 .7  4   17
eastern     EA  TB Savage       4.4 .84 5   20
northeast   NE  AM Main Jr.     5.1 .94 3   13
north       NO  Margot Weber        4.5 .89 5    9
central     CT  Ann Stephens        5.7 .94 5   13

-w只查找做爲一個詞,而不是詞的一部分出現的模式。

root@lanquark:~/demo# grep -w 'north' datafile 
north       NO  Margot Weber        4.5 .89 5    9

-A選項打印匹配到模式所在行的後兩行

root@lanquark:~/demo# grep -A 2 'NE' datafile 
northeast   NE  AM Main Jr.     5.1 .94 3   13
north       NO  Margot Weber        4.5 .89 5    9
central     CT  Ann Stephens        5.7 .94 5   13

-B選項打印匹配到模式所在行的前兩行

root@lanquark:~/demo# grep -B 2 'NE' datafile 
southeast   SE  Patricia Hemenway   4.0 .7  4   17
eastern     EA  TB Savage       4.4 .84 5   20
northeast   NE  AM Main Jr.     5.1 .94 3   13

-C選項打印匹配到模式所在行的先後兩行

root@lanquark:~/demo# grep -C 2 'NE' datafile 
southeast   SE  Patricia Hemenway   4.0 .7  4   17
eastern     EA  TB Savage       4.4 .84 5   20
northeast   NE  AM Main Jr.     5.1 .94 3   13
north       NO  Margot Weber        4.5 .89 5    9
central     CT  Ann Stephens        5.7 .94 5   13

-R 遞歸查找模式

root@lanquark:~/demo# grep -R 'central' *
datafile:central        CT  Ann Stephens        5.7 .94 5   13
test.dir/datafile:central       CT  Ann Stephens        5.7 .94 5   13

grep的退出狀態

grep在腳本中頗有用,它總會返回一個退出狀態。退出狀態爲0,表示檢索到模式,退出狀態爲1表示找不到模式,退出狀態爲2表示找不到要搜索的文件。

grep的輸入能夠是文件和管道

//取目錄中的文件
root@lanquark:~/demo# ls -l | grep '^-'
-rw-r--r--  1 root root     5 Jun  1 04:30 1111
-rw-r--r--  1 root root  1066 May 31 20:56 1.txt
-rw-r--r--  1 root root   351 Jun  4 23:04 datafile
-rw-r--r--  1 root root    18 Jun  4 21:54 id.txt
-rw-r--r--  1 root root   876 May 31 21:05 ipconfig.txt
-rw-r--r--  1 root root   338 Jun  4 23:09 picnic
-rw-r--r--+ 1 root root 18065 May 24 21:00 temp
-rw-r--r--  1 root root     0 Jun  1 04:25 test1.txt
-rw-r--r--  1 root root   277 Jun  4 23:17 textfile
-rw-r--r--+ 1 root root   572 Jun  1 04:29 tt.txt


擴展的grep: Egrep

調用方式: egrep 或 grep -E

egrep的正則表達式元字符

元字符 功能 示例 匹配對象
^ 行首定位符 '^love' 匹配全部以love開頭的行
$ 行尾定位符 'love$' 匹配全部以love結尾的行
. 匹配單個字符 'l..e' 匹配包含一個l,後面跟兩字符,再跟一個e的行
* 匹配0個或多個重複的位於*號前的字符 ' *love' 匹配包含跟在0個或多個空格後的模式love行
[ ] 匹配一組字符中任一個 '[Ll]ove' 匹配包含love或Love的行
[^] 匹配不在指定組內的字符 '[^A-K]' 匹配不在範圍A-Z之間的任意一個字符
+ 匹配一個或多個加號前的字符 '[a-z]+ove' 匹配一個或多個小寫字母后跟ove的字符串
匹配0個或1個前導字符 'lo?ve' 匹配l後跟一個或0個字母o以及ve的字符串。
a|b 行尾定位符 'love|hate' 匹配love或hate兩上表達式之一
() 字符組 'love(able|ly)(ve)+' 匹配lovable或lovely,匹配ov的一次或屢次出現

示例文件

root@lanquark:~/demo# cat datafile 
northwest   NW  Charles Main        3.0 .98 3   34
western     WE  Sharon Gray     5.3 .97 5   23
southwest   SW  Lewis Dalsass       2.7 .8  2   18
southern    SO  Suan Chin       5.1 .95 4   15
southeast   SE  Patricia Hemenway   4.0 .7  4   17
eastern     EA  TB Savage       4.4 .84 5   20
northeast   NE  AM Main Jr.     5.1 .94 3   13
north       NO  Margot Weber        4.5 .89 5    9
central     CT  Ann Stephens        5.7 .94 5   13

打印包含NW或EA的行

root@lanquark:~/demo# egrep 'NW|EA' datafile 
northwest   NW  Charles Main        3.0 .98 3   34
eastern     EA  TB Savage       4.4 .84 5   20

打印全部包含一個或多個數字3的行

root@lanquark:~/demo# egrep '3+' datafile 
northwest   NW  Charles Main        3.0 .98 3   34
western     WE  Sharon Gray     5.3 .97 5   23
northeast   NE  AM Main Jr.     5.1 .94 3   13
central     CT  Ann Stephens        5.7 .94 5   13

打印全部包含數字2,後面跟零個或一個句點,再跟數字的行。

root@lanquark:~/demo# egrep '2\.?[0-9]' datafile 
western     WE  Sharon Gray     5.3 .97 5   23
southwest   SW  Lewis Dalsass       2.7 .8  2   18
eastern     EA  TB Savage       4.4 .84 5   20

打印連續出現一個或多個模式no的行

root@lanquark:~/demo# egrep '(no)+' datafile 
northwest   NW  Charles Main        3.0 .98 3   34
northeast   NE  AM Main Jr.     5.1 .94 3   13
north       NO  Margot Weber        4.5 .89 5    9

打印全部包含字母S,後跟h或u的行

root@lanquark:~/demo# egrep 'S(h|u)' datafile 
western     WE  Sharon Gray     5.3 .97 5   23
southern    SO  Suan Chin       5.1 .95 4   15


3、sed

sed是一種新型的,非交互式的編輯器。它不會修改原文件。

sed編輯器逐行處理文件(或輸入),並將輸出結果發送到屏幕。sed把正在處理的行保存在一個臨時緩衝區。sed處理完模式空間中的行後,就把該行發送到屏幕。sed處理完一行就將其從模式空間刪除,而後將下一行讀入空間。

sed的命令與選項

命令 功能
a\ 在當前行後添加一行或多行
c\ 用新文本修改(替換)當前行中的文件
d 刪除行
i\ 在當前行前插入文本
h 把模式空間裏的內容複製到暫存緩衝區
H 把模式空間裏的內容追加到暫存緩衝區
g 取出暫存緩衝區的內容,並將其複製到模式空間,覆蓋該處原有內容
G 取出暫存緩衝區的內容,並將其複製到模式空間,追加在原有內容後面。
l 列出非打印字符
p 打印行
n 讀入下一輸入行,並從下一條命令而不是第一條命令開始對其處理
q 結束或退出sed
r 從文件中讀取行
! 對所選行之外的全部行應用命令
s 用一個字符串替換另外一個
替換標誌
g 在行內進行全局替換
p 打印行
w 將行寫入文件
x 交換暫存緩衝區與模式空間的內容
y 將字符轉換爲另外一個字符(不能對正則表達式使用y)

sed選項

選項 功能
-e 容許多項編輯
-f 指定sed腳本文件名
-n 取消默認的輸出

sed元字符

元字符 功能 示例 匹配對象
^ 行首定位符 /^love/ 匹配全部以love開頭的行
$ 行尾定位符 /love$/ 匹配全部以love結尾的行
. 匹配單個字符 /l..e/ 匹配包含一個l,後面跟兩字符,再跟一個e的行
* 匹配0個或多個重複的位於*號前的字符 / *love/ 匹配包含跟在0個或多個空格後的模式love行
[ ] 匹配一組字符中任一個 /[Ll]ove/ 匹配包含love或Love的行
[^] 匹配不在指定組內的字符 /[^A-KM-Z]/ 匹配包含ove,但ove以前的那個字符不在A-K或M-Z之間的行
\(..\) 保存已匹配的字符 s/\(love\)able/\1er 標記元字符之間的模式,並將其保存爲標籤1,以後能夠用\1來引用它。最多能夠定義9個標籤。從左邊開始編號。
& 保存查找串以便在替換串中引用 s/love/aa&aa 字符&表明查找串,字符串love將替換先後各加了兩個aa,即love變成aaloveaa
< 詞首定位符 /<love/ 匹配包含以love開頭的單詞的行
> 詞尾定位符 /love>/ 匹配包含以love結尾的單詞的行
x\{m\} 連續m個x /o\{5\}/ 匹配出現連續5個o
x\{m,\} 至少m個x /o\{5,\}/ 匹配至少5個連續o
x\{m,n\} 至少5個x,但不超過n個x /\{5,10\}/ 匹配最少5個,最多10個o

示例文件

root@lanquark:~/demo# cat datafile 
northwest   NW  Charles Main        3.0 .98 3   34
western     WE  Sharon Gray     5.3 .97 5   23
southwest   SW  Lewis Dalsass       2.7 .8  2   18
southern    SO  Suan Chin       5.1 .95 4   15
southeast   SE  Patricia Hemenway   4.0 .7  4   17
eastern     EA  TB Savage       4.4 .84 5   20
northeast   NE  AM Main Jr.     5.1 .94 3   13
north       NO  Margot Weber        4.5 .89 5    9
central     CT  Ann Stephens        5.7 .94 5   13

打印命令p

root@lanquark:~/demo# sed '/north/p' datafile 
northwest   NW  Charles Main        3.0 .98 3   34
northwest   NW  Charles Main        3.0 .98 3   34
western     WE  Sharon Gray     5.3 .97 5   23
southwest   SW  Lewis Dalsass       2.7 .8  2   18
southern    SO  Suan Chin       5.1 .95 4   15
southeast   SE  Patricia Hemenway   4.0 .7  4   17
eastern     EA  TB Savage       4.4 .84 5   20
northeast   NE  AM Main Jr.     5.1 .94 3   13
northeast   NE  AM Main Jr.     5.1 .94 3   13
north       NO  Margot Weber        4.5 .89 5    9
north       NO  Margot Weber        4.5 .89 5    9
central     CT  Ann Stephens        5.7 .94 5   13

取消默認默認輸出-n

root@lanquark:~/demo# sed -n '/north/p' datafile 
northwest   NW  Charles Main        3.0 .98 3   34
northeast   NE  AM Main Jr.     5.1 .94 3   13
north       NO  Margot Weber        4.5 .89 5    9

刪除:d命令

//刪除第3行
root@lanquark:~/demo# sed '3d' datafile 
northwest   NW  Charles Main        3.0 .98 3   34
western     WE  Sharon Gray     5.3 .97 5   23
southern    SO  Suan Chin       5.1 .95 4   15
southeast   SE  Patricia Hemenway   4.0 .7  4   17
eastern     EA  TB Savage       4.4 .84 5   20
northeast   NE  AM Main Jr.     5.1 .94 3   13
north       NO  Margot Weber        4.5 .89 5    9
central     CT  Ann Stephens        5.7 .94 5   13

//刪除第3行到最後一行
root@lanquark:~/demo# sed '3,$d' datafile 
northwest   NW  Charles Main        3.0 .98 3   34
western     WE  Sharon Gray     5.3 .97 5   23

//刪除最後一行
root@lanquark:~/demo# sed '$d' datafile 
northwest   NW  Charles Main        3.0 .98 3   34
western     WE  Sharon Gray     5.3 .97 5   23
southwest   SW  Lewis Dalsass       2.7 .8  2   18
southern    SO  Suan Chin       5.1 .95 4   15
southeast   SE  Patricia Hemenway   4.0 .7  4   17
eastern     EA  TB Savage       4.4 .84 5   20
northeast   NE  AM Main Jr.     5.1 .94 3   13
north       NO  Margot Weber        4.5 .89 5    9

//刪除包含模式north的行
root@lanquark:~/demo# sed '/north/d' datafile 
western     WE  Sharon Gray     5.3 .97 5   23
southwest   SW  Lewis Dalsass       2.7 .8  2   18
southern    SO  Suan Chin       5.1 .95 4   15
southeast   SE  Patricia Hemenway   4.0 .7  4   17
eastern     EA  TB Savage       4.4 .84 5   20
central     CT  Ann Stephens        5.7 .94 5   13

替換命令:s

//將west替換爲north,g表示全局替換
root@lanquark:~/demo# sed 's#west#north#g' datafile 
northnorth  NW  Charles Main        3.0 .98 3   34
northern        WE  Sharon Gray     5.3 .97 5   23
southnorth  SW  Lewis Dalsass       2.7 .8  2   18
southern    SO  Suan Chin       5.1 .95 4   15
southeast   SE  Patricia Hemenway   4.0 .7  4   17
eastern     EA  TB Savage       4.4 .84 5   20
northeast   NE  AM Main Jr.     5.1 .94 3   13
north       NO  Margot Weber        4.5 .89 5    9
central     CT  Ann Stephens        5.7 .94 5   13

//&表明匹配內容
root@lanquark:~/demo# sed 's#[0-9][0-9]$#&.5#' datafile 
northwest   NW  Charles Main        3.0 .98 3   34.5
western     WE  Sharon Gray     5.3 .97 5   23.5
southwest   SW  Lewis Dalsass       2.7 .8  2   18.5
southern    SO  Suan Chin       5.1 .95 4   15.5
southeast   SE  Patricia Hemenway   4.0 .7  4   17.5
eastern     EA  TB Savage       4.4 .84 5   20.5
northeast   NE  AM Main Jr.     5.1 .94 3   13.5
north       NO  Margot Weber        4.5 .89 5    9
central     CT  Ann Stephens        5.7 .94 5   13.5

//取消默認輸出,只有發生變化的行纔打印
root@lanquark:~/demo# sed -n 's#Hemenway#Jones#gp' datafile 
southeast   SE  Patricia Jones  4.0 .7  4   17

//保存已匹配的字符()
root@lanquark:~/demo# sed -n 's#\(Mar\)got#\1iance#p' datafile 
north       NO  Mariance Weber      4.5 .89 5    9

指定行的範圍:逗號

//正則表達式肯定匹配行的範圍
root@lanquark:~/demo# sed -n '/west/,/east/p' datafile 
northwest   NW  Charles Main        3.0 .98 3   34
western     WE  Sharon Gray     5.3 .97 5   23
southwest   SW  Lewis Dalsass       2.7 .8  2   18
southern    SO  Suan Chin       5.1 .95 4   15
southeast   SE  Patricia Hemenway   4.0 .7  4   17

//數字和正則表達式肯定匹配行的範圍
root@lanquark:~/demo# sed -n '5,/^northeast/p' datafile 
southeast   SE  Patricia Hemenway   4.0 .7  4   17
eastern     EA  TB Savage       4.4 .84 5   20
northeast   NE  AM Main Jr.     5.1 .94 3   13

//以數字肯定匹配行的範圍
root@lanquark:~/demo# sed -n '1,4p' datafile 
northwest   NW  Charles Main        3.0 .98 3   34
western     WE  Sharon Gray     5.3 .97 5   23
southwest   SW  Lewis Dalsass       2.7 .8  2   18
southern    SO  Suan Chin       5.1 .95 4   15

多重編輯:e命令

root@lanquark:~/demo# sed -e '1,3d' -e 's#Hemenway#Jones#' datafile 
southern    SO  Suan Chin       5.1 .95 4   15
southeast   SE  Patricia Jones  4.0 .7  4   17
eastern     EA  TB Savage       4.4 .84 5   20
northeast   NE  AM Main Jr.     5.1 .94 3   13
north       NO  Margot Weber        4.5 .89 5    9
central     CT  Ann Stephens        5.7 .94 5   13

讀文件:r命令

root@lanquark:~/demo# cat newfile 
    ______________________________________
    | *** SUAN HAS LEFT THE COMPANY ***  |
    |____________________________________|

root@lanquark:~/demo# sed '/Suan/r newfile' datafile 
northwest   NW  Charles Main        3.0 .98 3   34
western     WE  Sharon Gray     5.3 .97 5   23
southwest   SW  Lewis Dalsass       2.7 .8  2   18
southern    SO  Suan Chin       5.1 .95 4   15
    ______________________________________
    | *** SUAN HAS LEFT THE COMPANY ***  |
    |____________________________________|
    southeast   SE  Patricia Hemenway   4.0 .7  4   17
eastern     EA  TB Savage       4.4 .84 5   20
northeast   NE  AM Main Jr.     5.1 .94 3   13
north       NO  Margot Weber        4.5 .89 5    9
central     CT  Ann Stephens        5.7 .94 5   13

寫文件:w命令

root@lanquark:~/demo# sed -n '/north/w newfile1' datafile 
root@lanquark:~/demo# cat newfile1
northwest   NW  Charles Main        3.0 .98 3   34
northeast   NE  AM Main Jr.     5.1 .94 3   13
north       NO  Margot Weber        4.5 .89 5    9

追加:命令a

root@lanquark:~/demo# sed '/^north/a\--->THE NORTH SALES DISTRICT HAS MOVED<---' datafile 
northwest   NW  Charles Main        3.0 .98 3   34
--->THE NORTH SALES DISTRICT HAS MOVED<---
western     WE  Sharon Gray     5.3 .97 5   23
southwest   SW  Lewis Dalsass       2.7 .8  2   18
southern    SO  Suan Chin       5.1 .95 4   15
southeast   SE  Patricia Hemenway   4.0 .7  4   17
eastern     EA  TB Savage       4.4 .84 5   20
northeast   NE  AM Main Jr.     5.1 .94 3   13
--->THE NORTH SALES DISTRICT HAS MOVED<---
north       NO  Margot Weber        4.5 .89 5    9
--->THE NORTH SALES DISTRICT HAS MOVED<---
central     CT  Ann Stephens        5.7 .94 5   13

插入:i命令

root@lanquark:~/demo# sed '/eastern/i\--->NEW ENGLIST REGION<---' datafile 
northwest   NW  Charles Main        3.0 .98 3   34
western     WE  Sharon Gray     5.3 .97 5   23
southwest   SW  Lewis Dalsass       2.7 .8  2   18
southern    SO  Suan Chin       5.1 .95 4   15
southeast   SE  Patricia Hemenway   4.0 .7  4   17
--->NEW ENGLIST REGION<---
eastern     EA  TB Savage       4.4 .84 5   20
northeast   NE  AM Main Jr.     5.1 .94 3   13
north       NO  Margot Weber        4.5 .89 5    9
central     CT  Ann Stephens        5.7 .94 5   13

修改:c命令

root@lanquark:~/demo# sed '/eastern/c\THE EASTERN REGION HAS TEMPORARLLY CLOSED' datafile 
northwest   NW  Charles Main        3.0 .98 3   34
western     WE  Sharon Gray     5.3 .97 5   23
southwest   SW  Lewis Dalsass       2.7 .8  2   18
southern    SO  Suan Chin       5.1 .95 4   15
southeast   SE  Patricia Hemenway   4.0 .7  4   17
THE EASTERN REGION HAS TEMPORARLLY CLOSED
northeast   NE  AM Main Jr.     5.1 .94 3   13
north       NO  Margot Weber        4.5 .89 5    9
central     CT  Ann Stephens        5.7 .94 5   13

獲取下一行:n命令

root@lanquark:~/demo# sed -n '/eastern/{n;s#AM#Archie#p;}' datafile 
northeast   NE  Archie Main Jr.     5.1 .94 3   13

轉換:y命令

root@lanquark:~/demo# sed '1,3y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/' datafile 
NORTHWEST   NW  CHARLES MAIN        3.0 .98 3   34
WESTERN     WE  SHARON GRAY     5.3 .97 5   23
SOUTHWEST   SW  LEWIS DALSASS       2.7 .8  2   18
southern    SO  Suan Chin       5.1 .95 4   15
southeast   SE  Patricia Hemenway   4.0 .7  4   17
eastern     EA  TB Savage       4.4 .84 5   20
northeast   NE  AM Main Jr.     5.1 .94 3   13
north       NO  Margot Weber        4.5 .89 5    9
central     CT  Ann Stephens        5.7 .94 5   13

退出:q命令

//打印完第5行退出
root@lanquark:~/demo# sed '5q' datafile 
northwest   NW  Charles Main        3.0 .98 3   34
western     WE  Sharon Gray     5.3 .97 5   23
southwest   SW  Lewis Dalsass       2.7 .8  2   18
southern    SO  Suan Chin       5.1 .95 4   15
southeast   SE  Patricia Hemenway   4.0 .7  4   17

//匹配到模式時,先替換再退出
root@lanquark:~/demo# sed '/Lewis/{s#Lewis#Joseph#;q;}' datafile 
northwest   NW  Charles Main        3.0 .98 3   34
western     WE  Sharon Gray     5.3 .97 5   23
southwest   SW  Joseph Dalsass      2.7 .8  2   18

暫存和取用:h命令和g命令

//WE行打印2次,G是追加
root@lanquark:~/demo# sed -e '/northeast/h' -e '$G' datafile 
northwest   NW  Charles Main        3.0 .98 3   34
western     WE  Sharon Gray     5.3 .97 5   23
southwest   SW  Lewis Dalsass       2.7 .8  2   18
southern    SO  Suan Chin       5.1 .95 4   15
southeast   SE  Patricia Hemenway   4.0 .7  4   17
eastern     EA  TB Savage       4.4 .84 5   20
→northeast  NE  AM Main Jr.     5.1 .94 3   13
north       NO  Margot Weber        4.5 .89 5    9
central     CT  Ann Stephens        5.7 .94 5   13
→northeast  NE  AM Main Jr.     5.1 .94 3   13

//WE行只打印一次
root@lanquark:~/demo# sed -e '/WE/{h;d;}' -e '/CT/{G;}' datafile 
northwest   NW  Charles Main        3.0 .98 3   34
southwest   SW  Lewis Dalsass       2.7 .8  2   18
southern    SO  Suan Chin       5.1 .95 4   15
southeast   SE  Patricia Hemenway   4.0 .7  4   17
eastern     EA  TB Savage       4.4 .84 5   20
northeast   NE  AM Main Jr.     5.1 .94 3   13
north       NO  Margot Weber        4.5 .89 5    9
central     CT  Ann Stephens        5.7 .94 5   13
→western        WE  Sharon Gray     5.3 .97 5   23

//g是覆蓋
root@lanquark:~/demo# sed -e '/WE/{h;d;}' -e '/CT/{g;}' datafile 
northwest   NW  Charles Main        3.0 .98 3   34
southwest   SW  Lewis Dalsass       2.7 .8  2   18
southern    SO  Suan Chin       5.1 .95 4   15
southeast   SE  Patricia Hemenway   4.0 .7  4   17
eastern     EA  TB Savage       4.4 .84 5   20
northeast   NE  AM Main Jr.     5.1 .94 3   13
north       NO  Margot Weber        4.5 .89 5    9
western     WE  Sharon Gray     5.3 .97 5   23

暫存和互換

//x表示互換
root@lanquark:~/demo# sed -e '/Patricia/h' -e /Margot/x datafile 
northwest   NW  Charles Main        3.0 .98 3   34
western     WE  Sharon Gray     5.3 .97 5   23
southwest   SW  Lewis Dalsass       2.7 .8  2   18
southern    SO  Suan Chin       5.1 .95 4   15
southeast   SE  Patricia Hemenway   4.0 .7  4   17
eastern     EA  TB Savage       4.4 .84 5   20
northeast   NE  AM Main Jr.     5.1 .94 3   13
→southeast  SE  Patricia Hemenway   4.0 .7  4   17
central     CT  Ann Stephens        5.7 .94 5   13


四 awk

awk是一種用於處理數據和生成報告的UNIX編程語言,gawk是基於Linux的GNU版本。

awk的格式:awk指令由模式、操做、或模式與操做的組合組成。

awk能夠接受來自文件、管道或標準輸入的輸入。

1.從文件輸入

格式:
awk 'pattern' filename
awk '{action}' filename
awk 'pattern{action}' filename

//示例文件
[root@lanquark demo]# cat employees
Tom Jones   4424    5/12/66 54335
Mary Adams  5346    11/4/63 28765
Sally Chang 1654    7/22/54 65000
Billy Black 1683    9/23/44 33650

//僅有模式
[root@lanquark demo]# awk '/Mary/' employees
Mary Adams  5346    11/4/63 28765

//僅有動做
[root@lanquark demo]# awk '{print $1}' employees
Tom
Mary
Sally
Billy

//模式和動做的組合
[root@lanquark demo]# awk '/Sally/{print $1,$2}' employees
Sally Chang

2.從命令輸入

格式

command | awk 'pattern'
command | awk '{action}'
command | awk 'pattern{action}'

//僅有模式
[root@lanquark demo]# cat employees | awk '/Mary/'
Mary Adams  5346    11/4/63 28765

//有模式有動做
[root@lanquark demo]# cat employees | awk '/Mary/{print $1,$2}'
Mary Adams

awk的正則表達式元字符

元字符 說明
^ 在行首匹配
$ 在行尾匹配
. 匹配單個任意字符
* 匹配零個或多個前導字符
+ 匹配1個或多個前導字符
? 匹配0個或1個前導字符
[ABC] 匹配指定字符組(即A、B和C)中的字符
[^ABC] 匹配任何一個不在指定字符組(即A、B和C)中的字符
[A-Z] 匹配A至Z之間的任一字符
A|N 匹配A或B
(AB)+ 匹配一個AB或多個AB組合,如AB,ABAB,ABABAB
\* 匹配星號自己
& 用在替代串中,表明查找串中匹配到的內容

示例文件

[root@lanquark demo]# cat datafile1
northwest   NW  Joel Craig  3.0 .98 3   4
western WE  Sharon Kelly    5.3 .97 5   23
southwest   SW  Chris Foster    2.7 .8  2   18
southern    SO  May Chin    5.1 .95 4   15
southeast   SE  Derek Johnson   4.0 .7  4   17
eastern EA  Susan Beal  4.4 .84 5   20
northeast   NE  TJ Nichols  5.1 .94 3   13
north   NO  Val Shultz  4.5 .89 5   9
central CT  Sheri Watson    5.7 .94 5   13

簡單模式匹配

[root@lanquark demo]#  awk '/west/' datafile1
northwest   NW  Joel Craig  3.0 .98 3   4
western WE  Sharon Kelly    5.3 .97 5   23
southwest   SW  Chris Foster    2.7 .8  2   18

匹配行首(^)

[root@lanquark demo]# awk '/^north/' datafile1
northwest   NW  Joel Craig  3.0 .98 3   4
northeast   NE  TJ Nichols  5.1 .94 3   13
north   NO  Val Shultz  4.5 .89 5   9

匹配模式no或so(|)

[root@lanquark demo]# awk '/^(no|so)/' datafile1
northwest   NW  Joel Craig  3.0 .98 3   4
southwest   SW  Chris Foster    2.7 .8  2   18
southern    SO  May Chin    5.1 .95 4   15
southeast   SE  Derek Johnson   4.0 .7  4   17
northeast   NE  TJ Nichols  5.1 .94 3   13
north   NO  Val Shultz  4.5 .89 5   9

簡單的操做

[root@lanquark demo]# awk '{print $3,$2}' datafile1
Joel NW
Sharon WE
Chris SW
May SO
Derek SE
Susan EA
TJ NE
Val NO
Sheri CT

[root@lanquark demo]# awk '{print "number of fields:",NF}' datafile1
number of fields: 8
number of fields: 8
number of fields: 8
number of fields: 8
number of fields: 8
number of fields: 8
number of fields: 8
number of fields: 8
number of fields: 8

模式與操做組合的正則表達式

[root@lanquark demo]# awk '/northeast/{print $3,$2}' datafile1
TJ NE

[root@lanquark demo]# awk '/^[ns]/{print $1}' datafile 
[root@lanquark demo]# awk '/^[ns]/{print $1}' datafile1
northwest
southwest
southern
southeast
northeast
north

匹配模式(~)

[root@lanquark demo]# awk '$5~/\.[7-9]+/' datafile
southwest   SW  Lewis Dalsass       2.7 .8  2   18
central     CT  Ann Stephens        5.7 .94 5   13

輸入字段分隔符(F)

//未指定分隔符,默認是以空格
[root@lanquark demo]# head -n 5 /etc/passwd | awk '{print $1}'
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin

//指定分隔符爲:號
[root@lanquark demo]# head -n 5 /etc/passwd | awk -F: '{print $1}'
root
bin
daemon
adm
lp

比較表達式

關係運算符

運算符 含義 示例
< 小於 x < y
<= 小於或等於 x <= y
== 等於 x == y
!= 不等於 x != y
>= 大於或等於 x >= y
> 大於 x > y
~ 與正則表達式匹配 x ~ /y/
!~ 與正則表達式不匹配 x !~ /y/

示例文件

[root@lanquark demo]# cat employees 
Tom Jones   4424    5/12/66 54335
Mary Adams  5346    11/4/63 28765
Sally Chang 1654    7/22/54 65000
Billy Black 1683    9/23/44 33650
[root@lanquark demo]# awk '$3 == 5346' employees 
Mary Adams  5346    11/4/63 28765

[root@lanquark demo]# awk '$3>5000{print $1}' employees 
Mary

[root@lanquark demo]# awk '$2~/Adam/' employees 
Mary Adams  5346    11/4/63 28765

[root@lanquark demo]# awk '$2!~/Adam/' employees 
Tom Jones   4424    5/12/66 54335
Sally Chang 1654    7/22/54 65000
Billy Black 1683    9/23/44 33650

算術運算

算術運算符

運算符 含義 示例
+ x + y
- x - y
* x * y
/ x / y
% x % y
^ x ^ y
[root@lanquark demo]# cat emp.data 
Beth 4.00 0
Dan 3.75 0
Kathy 4.00 10
Mark 5.00 20
Mary 5.50 22
Susie 4.25 18

[root@lanquark demo]# awk '$3>0{print $2*$3}' emp.data 
40
100
121
76.5

邏輯運算符和複合運算符

運算符 含義 示例
&& 邏輯與 a&&b
|| 邏輯或 a||b
邏輯非 !a
Beth 4.00 0
Dan 3.75 0
Kathy 4.00 10
Mark 5.00 20
Mary 5.50 22
Susie 4.25 18

[root@lanquark demo]# awk '$3>10 && $3<22' emp.data
Mark 5.00 20
Susie 4.25 18

賦值運算符

[root@lanquark demo]# awk '$3=="Chris"{$3="Christian";print}' datafile1
southwest SW Christian Foster 2.7 .8 2 18

內置變量

變量名 含義
ARGC 命令行參數數目
ARGIND 命令行中當前文件在ARGV內的索引
ARGV 命令參數構成的數組
CONVFMT 數字轉換格式,默認爲%.6g
ENVIRON 包含當前shell環境變量值的數組
ERRNO 當使用getline函數進行讀操做或使用cloase函數時,因重定向操做而生產的系統錯誤
FIELDWIDTHS 在分隔固定寬度的列表時,使用空白而不是FS進行分隔的字段寬度列表
FILENAME 當前輸入文件的文件名
FNR 當前文件的記錄數
FS 輸入字段分隔符,默認爲空格
IGNORECASE 在正則表達式和字符串匹配中不區分大小寫
NF 當前記錄中的字段數
NR 目前的記錄數
OFMT 數字的輸出格式
OFS 輸出字段分隔符
ORS 輸出記錄分隔符
RLENGTH match函數匹配到的字符串的長度
RS 輸入記錄分隔符
RSTART match函數匹配到的字符串的偏移量
RT 記錄終結符,對於匹配字符或者用RS指定的regex,gawk將RT設置到輸入文本
SUBSEP 數組下標分隔符
[root@lanquark demo]# cat employees2
Tom Jones:4424:5/12/66:54335
Mary Adams:5346:11/4/63:28765
Sally Chang:1654:7/22/54:65000
Billy Black:1683:9/23/44:33650

[root@lanquark demo]# awk -F: '$1=="Mary Adams"{print NR,$1,$2,$NF}' employees2
2 Mary Adams 5346 28765

[root@lanquark demo]# awk -F: 'BEGIN{IGNORECASE=1};$1=="mary adams"{print NR,$1,$2,$NF}' employees2
2 Mary Adams 5346 28765

BEGIN模式

[root@lanquark demo]# awk 'BEGIN{FS=":";OFS="\t";ORS="\n\n"}{print $1,$2,$3}' employees2
Tom Jones   4424    5/12/66

Mary Adams  5346    11/4/63

Sally Chang 1654    7/22/54

Billy Black 1683    9/23/44


[root@lanquark demo]# awk 'BEGIN{print "Make Year"}'
Make Year

END模式

[root@lanquark demo]# awk 'END{print "The number of records is",NR}' employees2
The number of records is 4

[root@lanquark demo]# awk '/Mary/{count++}END{print "Mary was found",count,"times"}' employees2
Mary was found 1 times

重定向和管道

輸出重定向(>清空 >>追加,不清空)

[root@lanquark demo]# awk '$1=="Tom"{print $1}' employees2
Tom
[root@lanquark demo]# awk '$1=="Tom"{print $1>"passing_file"}' employees2
[root@lanquark demo]# cat passing_file 
Tom

輸入重定向(getline)

[root@lanquark demo]# awk 'BEGIN{"date"|getline d;print d}'
Tue Jun  5 22:53:24 EDT 2018

[root@lanquark demo]# awk 'BEGIN{"date" | getline d;split(d,mon);print mon[2]}' 
Jun

[root@lanquark demo]# awk 'BEGIN{while("ls" | getline) print}'
1111
1.txt
datafile
datafile1
emp.data
employees
employees2
id.txt
ipconfig.txt
lab5.data
names
newfile
newfile1
passing_file
picnic
temp
test1.txt
test.dir
textfile
tt.txt

管道

若是在awk中打開了管道,就必須先關閉它才能打開另外一個管道。管道符右邊的命令被括在雙引號中。

[root@lanquark demo]# cat names 
john smith 
alice cheba 
george goldberg 
susan goldberg 
tony tram 
barbara nguyen 
elizabeth lone 
dan savage 
eliza goldberg 
john goldenrod
[root@lanquark demo]# awk '{print $1,$2 | "sort -r +1 -2 +0 -1"}' names
tony tram
john smith
dan savage
barbara nguyen
elizabeth lone
john goldenrod
susan goldberg
george goldberg
eliza goldberg
alice cheba

//關閉管道
[root@lanquark demo]# awk '{print $1,$2 | "sort -r +1 -2 +0 -1"}END{print "game over"}' names
game over
tony tram
john smith
dan savage
barbara nguyen
elizabeth lone
john goldenrod
susan goldberg
george goldberg
eliza goldberg
alice cheba
[root@lanquark demo]# awk '{print $1,$2 | "sort -r +1 -2 +0 -1"}END{close("sort -r +1 -2 +0 -1");print "game over"}' names
tony tram
john smith
dan savage
barbara nguyen
elizabeth lone
john goldenrod
susan goldberg
george goldberg
eliza goldberg
alice cheba
game over


5、擴展

遞歸過濾:

如在data目錄下,過濾全部*.php文檔中含有eval的行

grep -r --include="*.php" 'eval' /data/

練習

http://www.apelearn.com/study_v2/chapter14.html

相關文章
相關標籤/搜索