1、正則介紹php
2、grephtml
3、Sed正則表達式
4、awkspring
5、擴展shell
正則表達式(Regular Express,RE)是一種字符模式,用於在查找過程當中匹配指定的字符。編程
元字符是這樣一類字符,它們表達的是不一樣於字面自己的含義。正則表達式的元字符由各類執行模式匹配操做的程序來解析,如:vi、grep、sed和awk等。數組
能被UNIX/Linux上全部的模式匹配工具識別的基本元字符sass
元字符 | 功能 | 示例 | 匹配對象 |
---|---|---|---|
^ | 行首定位符 | /^love/ | 匹配全部以love開頭的行 |
$ | 行尾定位符 | /love$/ | 匹配全部以love結尾的行 |
. | 匹配單個字符 | /l..e/ | 匹配包含一個l,後面跟兩字符,再跟一個e的行 |
* | 匹配0個或多個重複的位於*號前的字符 | / *love/ | 匹配包含跟在0個或多個空格後的模式love行 |
[] | 匹配一組字符中任一個 | /[Ll]ove/ | 匹配包含love或Love的行 |
[x-y] | 匹配指定範圍內的一個字符 | /[A-Z]ove/ | 匹配大寫字母后面跟着ove的字符 |
[^] | 匹配不在指定組內的字符 | /[^A-Z]/ | 匹配不在範圍A-Z之間的任意一個字符 |
\ | 用來轉義元字符 | /love./ | 匹配包含love,後面跟一個句點。 |
擴展元字符,使用RE元字符的UNIX/Linux程序支持(不必定全部的模式匹配工具都支持)bash
column | column | column | column |
---|---|---|---|
< | 詞首定位符 | /<love/ | 匹配包含以love開頭的詞的行 |
> | 詞尾定位符 | /love>/ | 匹配包含以love結尾的詞的行 |
\(..\) | 匹配稍後將要使用的字符的標籤 | /(love) able \1er/ | 最多9個可用標籤。模式中最左邊的是第一個。左例中模式love被保存爲標籤1,用\1表示 |
x\{m\}或x\{m,\} 或x\{m,n\} | 字符x的重複出現:m次,至少m次,至少m次且不超過n次 | o\{5,10\} | 匹配包含5~10個連續的字母o的行 |
基本元字符示例文件app
//,以grep程序演示 root@lanquark:~/unixshellbysample/chap03# cat picnic I had a lovely time on our little picnic. Lovers were all around us. It is springtime. Oh love, how much I adore you. Do you know the extent of my love? Oh, by the way, I think I lost my gloves somewhere out in that field of clover. Did you see them? I can only hope love is forever. I live for you. It's hard to get back in the groove.
簡單正則表達式查找
root@lanquark:~/demo# grep 'love' picnic I had a lovely time on our little picnic. love, how much I adore you. Do you know the extent of my love? Oh, by the way, I think I lost my gloves somewhere out in that field of clover. Did you see them? I can only hope love
行首定位符
root@lanquark:~/demo# grep '^love' picnic love, how much I adore you. Do you know
行尾定位符
root@lanquark:~/demo# grep 'love$' picnic clover. Did you see them? I can only hope love
任意單個字符(.)
root@lanquark:~/demo# grep 'l.ve' picnic I had a lovely time on our little picnic. love, how much I adore you. Do you know the extent of my love? Oh, by the way, I think I lost my gloves somewhere out in that field of clover. Did you see them? I can only hope love is forever. I live for you. It's hard to get back in the
零個或多個前字符(*)
root@lanquark:~/demo# grep 'o*ve' picnic I had a lovely time on our little picnic. Lovers were all around us. It is springtime. Oh love, how much I adore you. Do you know the extent of my love? Oh, by the way, I think I lost my gloves somewhere out in that field of clover. Did you see them? I can only hope love is forever. I live for you. It's hard to get back in the groove.
一組字符([ ])
root@lanquark:~/demo# grep '[Ll]ove' picnic I had a lovely time on our little picnic. Lovers were all around us. It is springtime. Oh love, how much I adore you. Do you know the extent of my love? Oh, by the way, I think I lost my gloves somewhere out in that field of clover. Did you see them? I can only hope love
一個字符範圍([ - ])
root@lanquark:~/demo# grep 'ove[a-z]' picnic I had a lovely time on our little picnic. Lovers were all around us. It is springtime. Oh I lost my gloves somewhere out in that field of clover. Did you see them? I can only hope love
不在組內的字符([^ ])
root@lanquark:~/demo# grep 'ove[^a-zA-Z0-9]' picnic love, how much I adore you. Do you know the extent of my love? Oh, by the way, I think groove.
擴展元字符演示文件
//以grep或sed程序演示 root@lanquark:~/unixshellbysample/chap03# cat textfile Unusual occurrences happened at the fair. Patty won fourth place in the 50 yard dash square and fair. Occurrences like this are rare. The winning ticket is 55222. The ticket I got is 54333 and Dee got 55544. Guy fell down while running around the south bend in his last event.
詞首定位符(\<)和詞尾定位符(\>)
root@lanquark:~/demo# grep '\<fourth\>' textfile Patty won fourth place in the 50 yard dash square and fair.
用\(和\)記錄模式
//occurrence替換成occurence或Occurrence替換成Occurence root@lanquark:~/unixshellbysample/chap03# sed 's#\([Oo]ccur\)rence#\1enece#' textfile Unusual occureneces happened at the fair. Patty won fourth place in the 50 yard dash square and fair. Occureneces like this are rare. The winning ticket is 55222. The ticket I got is 54333 and Dee got 55544. Guy fell down while running around the south bend in his last event.
grep表示全局查找正則表達式並打印結果行。
grep不會對輸入文件進行任何修改或變化
命令格式
grep word filename
root@lanquark:~# grep hjm /etc/passwd hjm:x:5000:5000:hjm:/home/hjm:/bin/bash
grep使用的正則表達式元字符
元字符 | 功能 | 示例 | 匹配對象 |
---|---|---|---|
^ | 行首定位符 | '^love' | 匹配全部以love開頭的行 |
$ | 行尾定位符 | 'love$' | 匹配全部以love結尾的行 |
. | 匹配單個字符 | 'l..e' | 匹配包含一個l,後面跟兩字符,再跟一個e的行 |
* | 匹配0個或多個重複的位於*號前的字符 | ' *love' | 匹配包含跟在0個或多個空格後的模式love行 |
[ ] | 匹配一組字符中任一個 | '[Ll]ove' | 匹配包含love或Love的行 |
[^] | 匹配不在指定組內的字符 | '[^A-K]' | 匹配不在範圍A-Z之間的任意一個字符 |
\ | 用來轉義元字符 | 'love.' | 匹配包含love,後面跟一個句點。 |
< | 詞首定位符 | '<love' | 匹配包含以love開頭的詞的行 |
> | 詞尾定位符 | 'love>/' | 匹配包含以love結尾的詞的行 |
\(..\) | 匹配稍後將要使用的字符的標籤 | '(love)ing' | 最多9個可用標籤。模式中最左邊的是第一個。左例中模式love被保存爲標籤1,用\1表示 |
x\{m\}或x\{m,\} 或x\{m,n\} | 字符x的重複出現:m次,至少m次,至少m次且不超過n次 | o\{5,10\} | 匹配包含5~10個連續的字母o的行 |
//演示文件 root@lanquark:~/demo# cat datafile northwest NW Charles Main 3.0 .98 3 34 western WE Sharon Gray 5.3 .97 5 23 southwest SW Lewis Dalsass 2.7 .8 2 18 southern SO Suan Chin 5.1 .95 4 15 southeast SE Patricia Hemenway 4.0 .7 4 17 eastern EA TB Savage 4.4 .84 5 20 northeast NE AM Main Jr. 5.1 .94 3 13 north NO Margot Weber 4.5 .89 5 9 central CT Ann Stephens 5.7 .94 5 13
打印全部包含NW的行
root@lanquark:~/demo# grep NW datafile northwest NW Charles Main 3.0 .98 3 34
打印以字母n開頭的行
root@lanquark:~/demo# grep '^n' datafile northwest NW Charles Main 3.0 .98 3 34 northeast NE AM Main Jr. 5.1 .94 3 13 north NO Margot Weber 4.5 .89 5 9
打印以數字4結尾的行
root@lanquark:~/demo# grep '4$' datafile northwest NW Charles Main 3.0 .98 3 34
打印以字母w或e開頭的行
root@lanquark:~/demo# grep '^[we]' datafile western WE Sharon Gray 5.3 .97 5 23 eastern EA TB Savage 4.4 .84 5 20
打印包含非數字的全部行
root@lanquark:~/demo# grep '^[we]' datafile western WE Sharon Gray 5.3 .97 5 23 eastern EA TB Savage 4.4 .84 5 20 root@lanquark:~/demo# grep '[^0-9]' datafile northwest NW Charles Main 3.0 .98 3 34 western WE Sharon Gray 5.3 .97 5 23 southwest SW Lewis Dalsass 2.7 .8 2 18 southern SO Suan Chin 5.1 .95 4 15 southeast SE Patricia Hemenway 4.0 .7 4 17 eastern EA TB Savage 4.4 .84 5 20 northeast NE AM Main Jr. 5.1 .94 3 13 north NO Margot Weber 4.5 .89 5 9 central CT Ann Stephens 5.7 .94 5 13
打印全部包含一個s,後跟0個或多個連着的s和一個空格的文本行。
root@lanquark:~/demo# grep 'ss* ' datafile northwest NW Charles Main 3.0 .98 3 34 southwest SW Lewis Dalsass 2.7 .8 2 18
打印至少9個小寫字母連在一塊兒的行
root@lanquark:~/demo# grep '[a-z]\{9\}' datafile northwest NW Charles Main 3.0 .98 3 34 southwest SW Lewis Dalsass 2.7 .8 2 18 southeast SE Patricia Hemenway 4.0 .7 4 17 northeast NE AM Main Jr. 5.1 .94 3 13
打印包含一個3後面跟一個句點和一個數字,再任意多個字符,而後跟一個3
root@lanquark:~/demo# grep '\(3\)\.[0-9].*\1' datafile northwest NW Charles Main 3.0 .98 3 34
打印全部包含以north開頭的單詞的行
root@lanquark:~/demo# grep '\<north' datafile northwest NW Charles Main 3.0 .98 3 34 northeast NE AM Main Jr. 5.1 .94 3 13 north NO Margot Weber 4.5 .89 5 9 root@lanquark:~/demo# grep '\<north\>' datafile north NO Margot Weber 4.5 .89 5 9
經常使用grep選項
選項 | 功能 |
---|---|
-c | 顯示匹配到的行的數目,而不是顯示行的內容 |
-i | 比較字符時忽略大小寫 |
-l | 只列出匹配行所在的文件的文件名 |
-n | 在每一行前面加上它在文件中的相對行號 |
-v | 反向查找,只顯示不匹配的行 |
-w | 把表達式作爲詞來查,就好像被<和>所包含同樣 |
-A | 匹配到模式所在行的後兩行 |
-B | 匹配到模式行所在行的前兩行 |
-C | 匹配到模式所在行的先後兩行 |
-R | 對列出的目錄,遞歸的讀取並處理這些目錄中的全部文件,也就是指該下目錄下的全部目錄 |
示例文件
root@lanquark:~/demo# cat datafile northwest NW Charles Main 3.0 .98 3 34 western WE Sharon Gray 5.3 .97 5 23 southwest SW Lewis Dalsass 2.7 .8 2 18 southern SO Suan Chin 5.1 .95 4 15 southeast SE Patricia Hemenway 4.0 .7 4 17 eastern EA TB Savage 4.4 .84 5 20 northeast NE AM Main Jr. 5.1 .94 3 13 north NO Margot Weber 4.5 .89 5 9 central CT Ann Stephens 5.7 .94 5 13
-c選項打印以south開頭的單的數量
root@lanquark:~/demo# grep -c '^south' datafile 3
-i選項忽略大小
root@lanquark:~/demo# grep -i 'pat' datafile southeast SE Patricia Hemenway 4.0 .7 4 17
-l選項只顯示包含模式的文件名而不輸出文本
root@lanquark:~/demo# grep -l 'SE' * datafile temp
-n選項在找到指定模式的行前面加上其行號
root@lanquark:~/demo# grep -n '^south' datafile 3:southwest SW Lewis Dalsass 2.7 .8 2 18 4:southern SO Suan Chin 5.1 .95 4 15 5:southeast SE Patricia Hemenway 4.0 .7 4 17
-v表示取反
root@lanquark:~/demo# grep -v 'Suan Chin' datafile northwest NW Charles Main 3.0 .98 3 34 western WE Sharon Gray 5.3 .97 5 23 southwest SW Lewis Dalsass 2.7 .8 2 18 southeast SE Patricia Hemenway 4.0 .7 4 17 eastern EA TB Savage 4.4 .84 5 20 northeast NE AM Main Jr. 5.1 .94 3 13 north NO Margot Weber 4.5 .89 5 9 central CT Ann Stephens 5.7 .94 5 13
-w只查找做爲一個詞,而不是詞的一部分出現的模式。
root@lanquark:~/demo# grep -w 'north' datafile north NO Margot Weber 4.5 .89 5 9
-A選項打印匹配到模式所在行的後兩行
root@lanquark:~/demo# grep -A 2 'NE' datafile northeast NE AM Main Jr. 5.1 .94 3 13 north NO Margot Weber 4.5 .89 5 9 central CT Ann Stephens 5.7 .94 5 13
-B選項打印匹配到模式所在行的前兩行
root@lanquark:~/demo# grep -B 2 'NE' datafile southeast SE Patricia Hemenway 4.0 .7 4 17 eastern EA TB Savage 4.4 .84 5 20 northeast NE AM Main Jr. 5.1 .94 3 13
-C選項打印匹配到模式所在行的先後兩行
root@lanquark:~/demo# grep -C 2 'NE' datafile southeast SE Patricia Hemenway 4.0 .7 4 17 eastern EA TB Savage 4.4 .84 5 20 northeast NE AM Main Jr. 5.1 .94 3 13 north NO Margot Weber 4.5 .89 5 9 central CT Ann Stephens 5.7 .94 5 13
-R 遞歸查找模式
root@lanquark:~/demo# grep -R 'central' * datafile:central CT Ann Stephens 5.7 .94 5 13 test.dir/datafile:central CT Ann Stephens 5.7 .94 5 13
grep的退出狀態
grep在腳本中頗有用,它總會返回一個退出狀態。退出狀態爲0,表示檢索到模式,退出狀態爲1表示找不到模式,退出狀態爲2表示找不到要搜索的文件。
grep的輸入能夠是文件和管道
//取目錄中的文件 root@lanquark:~/demo# ls -l | grep '^-' -rw-r--r-- 1 root root 5 Jun 1 04:30 1111 -rw-r--r-- 1 root root 1066 May 31 20:56 1.txt -rw-r--r-- 1 root root 351 Jun 4 23:04 datafile -rw-r--r-- 1 root root 18 Jun 4 21:54 id.txt -rw-r--r-- 1 root root 876 May 31 21:05 ipconfig.txt -rw-r--r-- 1 root root 338 Jun 4 23:09 picnic -rw-r--r--+ 1 root root 18065 May 24 21:00 temp -rw-r--r-- 1 root root 0 Jun 1 04:25 test1.txt -rw-r--r-- 1 root root 277 Jun 4 23:17 textfile -rw-r--r--+ 1 root root 572 Jun 1 04:29 tt.txt
擴展的grep: Egrep
調用方式: egrep 或 grep -E
egrep的正則表達式元字符
元字符 | 功能 | 示例 | 匹配對象 |
---|---|---|---|
^ | 行首定位符 | '^love' | 匹配全部以love開頭的行 |
$ | 行尾定位符 | 'love$' | 匹配全部以love結尾的行 |
. | 匹配單個字符 | 'l..e' | 匹配包含一個l,後面跟兩字符,再跟一個e的行 |
* | 匹配0個或多個重複的位於*號前的字符 | ' *love' | 匹配包含跟在0個或多個空格後的模式love行 |
[ ] | 匹配一組字符中任一個 | '[Ll]ove' | 匹配包含love或Love的行 |
[^] | 匹配不在指定組內的字符 | '[^A-K]' | 匹配不在範圍A-Z之間的任意一個字符 |
+ | 匹配一個或多個加號前的字符 | '[a-z]+ove' | 匹配一個或多個小寫字母后跟ove的字符串 |
? | 匹配0個或1個前導字符 | 'lo?ve' | 匹配l後跟一個或0個字母o以及ve的字符串。 |
a|b | 行尾定位符 | 'love|hate' | 匹配love或hate兩上表達式之一 |
() | 字符組 | 'love(able|ly)(ve)+' | 匹配lovable或lovely,匹配ov的一次或屢次出現 |
示例文件
root@lanquark:~/demo# cat datafile northwest NW Charles Main 3.0 .98 3 34 western WE Sharon Gray 5.3 .97 5 23 southwest SW Lewis Dalsass 2.7 .8 2 18 southern SO Suan Chin 5.1 .95 4 15 southeast SE Patricia Hemenway 4.0 .7 4 17 eastern EA TB Savage 4.4 .84 5 20 northeast NE AM Main Jr. 5.1 .94 3 13 north NO Margot Weber 4.5 .89 5 9 central CT Ann Stephens 5.7 .94 5 13
打印包含NW或EA的行
root@lanquark:~/demo# egrep 'NW|EA' datafile northwest NW Charles Main 3.0 .98 3 34 eastern EA TB Savage 4.4 .84 5 20
打印全部包含一個或多個數字3的行
root@lanquark:~/demo# egrep '3+' datafile northwest NW Charles Main 3.0 .98 3 34 western WE Sharon Gray 5.3 .97 5 23 northeast NE AM Main Jr. 5.1 .94 3 13 central CT Ann Stephens 5.7 .94 5 13
打印全部包含數字2,後面跟零個或一個句點,再跟數字的行。
root@lanquark:~/demo# egrep '2\.?[0-9]' datafile western WE Sharon Gray 5.3 .97 5 23 southwest SW Lewis Dalsass 2.7 .8 2 18 eastern EA TB Savage 4.4 .84 5 20
打印連續出現一個或多個模式no的行
root@lanquark:~/demo# egrep '(no)+' datafile northwest NW Charles Main 3.0 .98 3 34 northeast NE AM Main Jr. 5.1 .94 3 13 north NO Margot Weber 4.5 .89 5 9
打印全部包含字母S,後跟h或u的行
root@lanquark:~/demo# egrep 'S(h|u)' datafile western WE Sharon Gray 5.3 .97 5 23 southern SO Suan Chin 5.1 .95 4 15
sed是一種新型的,非交互式的編輯器。它不會修改原文件。
sed編輯器逐行處理文件(或輸入),並將輸出結果發送到屏幕。sed把正在處理的行保存在一個臨時緩衝區。sed處理完模式空間中的行後,就把該行發送到屏幕。sed處理完一行就將其從模式空間刪除,而後將下一行讀入空間。
sed的命令與選項
命令 | 功能 |
---|---|
a\ | 在當前行後添加一行或多行 |
c\ | 用新文本修改(替換)當前行中的文件 |
d | 刪除行 |
i\ | 在當前行前插入文本 |
h | 把模式空間裏的內容複製到暫存緩衝區 |
H | 把模式空間裏的內容追加到暫存緩衝區 |
g | 取出暫存緩衝區的內容,並將其複製到模式空間,覆蓋該處原有內容 |
G | 取出暫存緩衝區的內容,並將其複製到模式空間,追加在原有內容後面。 |
l | 列出非打印字符 |
p | 打印行 |
n | 讀入下一輸入行,並從下一條命令而不是第一條命令開始對其處理 |
q | 結束或退出sed |
r | 從文件中讀取行 |
! | 對所選行之外的全部行應用命令 |
s | 用一個字符串替換另外一個 |
替換標誌 | |
g | 在行內進行全局替換 |
p | 打印行 |
w | 將行寫入文件 |
x | 交換暫存緩衝區與模式空間的內容 |
y | 將字符轉換爲另外一個字符(不能對正則表達式使用y) |
sed選項
選項 | 功能 |
---|---|
-e | 容許多項編輯 |
-f | 指定sed腳本文件名 |
-n | 取消默認的輸出 |
sed元字符
元字符 | 功能 | 示例 | 匹配對象 |
---|---|---|---|
^ | 行首定位符 | /^love/ | 匹配全部以love開頭的行 |
$ | 行尾定位符 | /love$/ | 匹配全部以love結尾的行 |
. | 匹配單個字符 | /l..e/ | 匹配包含一個l,後面跟兩字符,再跟一個e的行 |
* | 匹配0個或多個重複的位於*號前的字符 | / *love/ | 匹配包含跟在0個或多個空格後的模式love行 |
[ ] | 匹配一組字符中任一個 | /[Ll]ove/ | 匹配包含love或Love的行 |
[^] | 匹配不在指定組內的字符 | /[^A-KM-Z]/ | 匹配包含ove,但ove以前的那個字符不在A-K或M-Z之間的行 |
\(..\) | 保存已匹配的字符 | s/\(love\)able/\1er | 標記元字符之間的模式,並將其保存爲標籤1,以後能夠用\1來引用它。最多能夠定義9個標籤。從左邊開始編號。 |
& | 保存查找串以便在替換串中引用 | s/love/aa&aa | 字符&表明查找串,字符串love將替換先後各加了兩個aa,即love變成aaloveaa |
< | 詞首定位符 | /<love/ | 匹配包含以love開頭的單詞的行 |
> | 詞尾定位符 | /love>/ | 匹配包含以love結尾的單詞的行 |
x\{m\} | 連續m個x | /o\{5\}/ | 匹配出現連續5個o |
x\{m,\} | 至少m個x | /o\{5,\}/ | 匹配至少5個連續o |
x\{m,n\} | 至少5個x,但不超過n個x | /\{5,10\}/ | 匹配最少5個,最多10個o |
示例文件
root@lanquark:~/demo# cat datafile northwest NW Charles Main 3.0 .98 3 34 western WE Sharon Gray 5.3 .97 5 23 southwest SW Lewis Dalsass 2.7 .8 2 18 southern SO Suan Chin 5.1 .95 4 15 southeast SE Patricia Hemenway 4.0 .7 4 17 eastern EA TB Savage 4.4 .84 5 20 northeast NE AM Main Jr. 5.1 .94 3 13 north NO Margot Weber 4.5 .89 5 9 central CT Ann Stephens 5.7 .94 5 13
打印命令p
root@lanquark:~/demo# sed '/north/p' datafile northwest NW Charles Main 3.0 .98 3 34 northwest NW Charles Main 3.0 .98 3 34 western WE Sharon Gray 5.3 .97 5 23 southwest SW Lewis Dalsass 2.7 .8 2 18 southern SO Suan Chin 5.1 .95 4 15 southeast SE Patricia Hemenway 4.0 .7 4 17 eastern EA TB Savage 4.4 .84 5 20 northeast NE AM Main Jr. 5.1 .94 3 13 northeast NE AM Main Jr. 5.1 .94 3 13 north NO Margot Weber 4.5 .89 5 9 north NO Margot Weber 4.5 .89 5 9 central CT Ann Stephens 5.7 .94 5 13
取消默認默認輸出-n
root@lanquark:~/demo# sed -n '/north/p' datafile northwest NW Charles Main 3.0 .98 3 34 northeast NE AM Main Jr. 5.1 .94 3 13 north NO Margot Weber 4.5 .89 5 9
刪除:d命令
//刪除第3行 root@lanquark:~/demo# sed '3d' datafile northwest NW Charles Main 3.0 .98 3 34 western WE Sharon Gray 5.3 .97 5 23 southern SO Suan Chin 5.1 .95 4 15 southeast SE Patricia Hemenway 4.0 .7 4 17 eastern EA TB Savage 4.4 .84 5 20 northeast NE AM Main Jr. 5.1 .94 3 13 north NO Margot Weber 4.5 .89 5 9 central CT Ann Stephens 5.7 .94 5 13 //刪除第3行到最後一行 root@lanquark:~/demo# sed '3,$d' datafile northwest NW Charles Main 3.0 .98 3 34 western WE Sharon Gray 5.3 .97 5 23 //刪除最後一行 root@lanquark:~/demo# sed '$d' datafile northwest NW Charles Main 3.0 .98 3 34 western WE Sharon Gray 5.3 .97 5 23 southwest SW Lewis Dalsass 2.7 .8 2 18 southern SO Suan Chin 5.1 .95 4 15 southeast SE Patricia Hemenway 4.0 .7 4 17 eastern EA TB Savage 4.4 .84 5 20 northeast NE AM Main Jr. 5.1 .94 3 13 north NO Margot Weber 4.5 .89 5 9 //刪除包含模式north的行 root@lanquark:~/demo# sed '/north/d' datafile western WE Sharon Gray 5.3 .97 5 23 southwest SW Lewis Dalsass 2.7 .8 2 18 southern SO Suan Chin 5.1 .95 4 15 southeast SE Patricia Hemenway 4.0 .7 4 17 eastern EA TB Savage 4.4 .84 5 20 central CT Ann Stephens 5.7 .94 5 13
替換命令:s
//將west替換爲north,g表示全局替換 root@lanquark:~/demo# sed 's#west#north#g' datafile northnorth NW Charles Main 3.0 .98 3 34 northern WE Sharon Gray 5.3 .97 5 23 southnorth SW Lewis Dalsass 2.7 .8 2 18 southern SO Suan Chin 5.1 .95 4 15 southeast SE Patricia Hemenway 4.0 .7 4 17 eastern EA TB Savage 4.4 .84 5 20 northeast NE AM Main Jr. 5.1 .94 3 13 north NO Margot Weber 4.5 .89 5 9 central CT Ann Stephens 5.7 .94 5 13 //&表明匹配內容 root@lanquark:~/demo# sed 's#[0-9][0-9]$#&.5#' datafile northwest NW Charles Main 3.0 .98 3 34.5 western WE Sharon Gray 5.3 .97 5 23.5 southwest SW Lewis Dalsass 2.7 .8 2 18.5 southern SO Suan Chin 5.1 .95 4 15.5 southeast SE Patricia Hemenway 4.0 .7 4 17.5 eastern EA TB Savage 4.4 .84 5 20.5 northeast NE AM Main Jr. 5.1 .94 3 13.5 north NO Margot Weber 4.5 .89 5 9 central CT Ann Stephens 5.7 .94 5 13.5 //取消默認輸出,只有發生變化的行纔打印 root@lanquark:~/demo# sed -n 's#Hemenway#Jones#gp' datafile southeast SE Patricia Jones 4.0 .7 4 17 //保存已匹配的字符() root@lanquark:~/demo# sed -n 's#\(Mar\)got#\1iance#p' datafile north NO Mariance Weber 4.5 .89 5 9
指定行的範圍:逗號
//正則表達式肯定匹配行的範圍 root@lanquark:~/demo# sed -n '/west/,/east/p' datafile northwest NW Charles Main 3.0 .98 3 34 western WE Sharon Gray 5.3 .97 5 23 southwest SW Lewis Dalsass 2.7 .8 2 18 southern SO Suan Chin 5.1 .95 4 15 southeast SE Patricia Hemenway 4.0 .7 4 17 //數字和正則表達式肯定匹配行的範圍 root@lanquark:~/demo# sed -n '5,/^northeast/p' datafile southeast SE Patricia Hemenway 4.0 .7 4 17 eastern EA TB Savage 4.4 .84 5 20 northeast NE AM Main Jr. 5.1 .94 3 13 //以數字肯定匹配行的範圍 root@lanquark:~/demo# sed -n '1,4p' datafile northwest NW Charles Main 3.0 .98 3 34 western WE Sharon Gray 5.3 .97 5 23 southwest SW Lewis Dalsass 2.7 .8 2 18 southern SO Suan Chin 5.1 .95 4 15
多重編輯:e命令
root@lanquark:~/demo# sed -e '1,3d' -e 's#Hemenway#Jones#' datafile southern SO Suan Chin 5.1 .95 4 15 southeast SE Patricia Jones 4.0 .7 4 17 eastern EA TB Savage 4.4 .84 5 20 northeast NE AM Main Jr. 5.1 .94 3 13 north NO Margot Weber 4.5 .89 5 9 central CT Ann Stephens 5.7 .94 5 13
讀文件:r命令
root@lanquark:~/demo# cat newfile ______________________________________ | *** SUAN HAS LEFT THE COMPANY *** | |____________________________________| root@lanquark:~/demo# sed '/Suan/r newfile' datafile northwest NW Charles Main 3.0 .98 3 34 western WE Sharon Gray 5.3 .97 5 23 southwest SW Lewis Dalsass 2.7 .8 2 18 southern SO Suan Chin 5.1 .95 4 15 ______________________________________ | *** SUAN HAS LEFT THE COMPANY *** | |____________________________________| southeast SE Patricia Hemenway 4.0 .7 4 17 eastern EA TB Savage 4.4 .84 5 20 northeast NE AM Main Jr. 5.1 .94 3 13 north NO Margot Weber 4.5 .89 5 9 central CT Ann Stephens 5.7 .94 5 13
寫文件:w命令
root@lanquark:~/demo# sed -n '/north/w newfile1' datafile root@lanquark:~/demo# cat newfile1 northwest NW Charles Main 3.0 .98 3 34 northeast NE AM Main Jr. 5.1 .94 3 13 north NO Margot Weber 4.5 .89 5 9
追加:命令a
root@lanquark:~/demo# sed '/^north/a\--->THE NORTH SALES DISTRICT HAS MOVED<---' datafile northwest NW Charles Main 3.0 .98 3 34 --->THE NORTH SALES DISTRICT HAS MOVED<--- western WE Sharon Gray 5.3 .97 5 23 southwest SW Lewis Dalsass 2.7 .8 2 18 southern SO Suan Chin 5.1 .95 4 15 southeast SE Patricia Hemenway 4.0 .7 4 17 eastern EA TB Savage 4.4 .84 5 20 northeast NE AM Main Jr. 5.1 .94 3 13 --->THE NORTH SALES DISTRICT HAS MOVED<--- north NO Margot Weber 4.5 .89 5 9 --->THE NORTH SALES DISTRICT HAS MOVED<--- central CT Ann Stephens 5.7 .94 5 13
插入:i命令
root@lanquark:~/demo# sed '/eastern/i\--->NEW ENGLIST REGION<---' datafile northwest NW Charles Main 3.0 .98 3 34 western WE Sharon Gray 5.3 .97 5 23 southwest SW Lewis Dalsass 2.7 .8 2 18 southern SO Suan Chin 5.1 .95 4 15 southeast SE Patricia Hemenway 4.0 .7 4 17 --->NEW ENGLIST REGION<--- eastern EA TB Savage 4.4 .84 5 20 northeast NE AM Main Jr. 5.1 .94 3 13 north NO Margot Weber 4.5 .89 5 9 central CT Ann Stephens 5.7 .94 5 13
修改:c命令
root@lanquark:~/demo# sed '/eastern/c\THE EASTERN REGION HAS TEMPORARLLY CLOSED' datafile northwest NW Charles Main 3.0 .98 3 34 western WE Sharon Gray 5.3 .97 5 23 southwest SW Lewis Dalsass 2.7 .8 2 18 southern SO Suan Chin 5.1 .95 4 15 southeast SE Patricia Hemenway 4.0 .7 4 17 THE EASTERN REGION HAS TEMPORARLLY CLOSED northeast NE AM Main Jr. 5.1 .94 3 13 north NO Margot Weber 4.5 .89 5 9 central CT Ann Stephens 5.7 .94 5 13
獲取下一行:n命令
root@lanquark:~/demo# sed -n '/eastern/{n;s#AM#Archie#p;}' datafile northeast NE Archie Main Jr. 5.1 .94 3 13
轉換:y命令
root@lanquark:~/demo# sed '1,3y/abcdefghijklmnopqrstuvwxyz/ABCDEFGHIJKLMNOPQRSTUVWXYZ/' datafile NORTHWEST NW CHARLES MAIN 3.0 .98 3 34 WESTERN WE SHARON GRAY 5.3 .97 5 23 SOUTHWEST SW LEWIS DALSASS 2.7 .8 2 18 southern SO Suan Chin 5.1 .95 4 15 southeast SE Patricia Hemenway 4.0 .7 4 17 eastern EA TB Savage 4.4 .84 5 20 northeast NE AM Main Jr. 5.1 .94 3 13 north NO Margot Weber 4.5 .89 5 9 central CT Ann Stephens 5.7 .94 5 13
退出:q命令
//打印完第5行退出 root@lanquark:~/demo# sed '5q' datafile northwest NW Charles Main 3.0 .98 3 34 western WE Sharon Gray 5.3 .97 5 23 southwest SW Lewis Dalsass 2.7 .8 2 18 southern SO Suan Chin 5.1 .95 4 15 southeast SE Patricia Hemenway 4.0 .7 4 17 //匹配到模式時,先替換再退出 root@lanquark:~/demo# sed '/Lewis/{s#Lewis#Joseph#;q;}' datafile northwest NW Charles Main 3.0 .98 3 34 western WE Sharon Gray 5.3 .97 5 23 southwest SW Joseph Dalsass 2.7 .8 2 18
暫存和取用:h命令和g命令
//WE行打印2次,G是追加 root@lanquark:~/demo# sed -e '/northeast/h' -e '$G' datafile northwest NW Charles Main 3.0 .98 3 34 western WE Sharon Gray 5.3 .97 5 23 southwest SW Lewis Dalsass 2.7 .8 2 18 southern SO Suan Chin 5.1 .95 4 15 southeast SE Patricia Hemenway 4.0 .7 4 17 eastern EA TB Savage 4.4 .84 5 20 →northeast NE AM Main Jr. 5.1 .94 3 13 north NO Margot Weber 4.5 .89 5 9 central CT Ann Stephens 5.7 .94 5 13 →northeast NE AM Main Jr. 5.1 .94 3 13 //WE行只打印一次 root@lanquark:~/demo# sed -e '/WE/{h;d;}' -e '/CT/{G;}' datafile northwest NW Charles Main 3.0 .98 3 34 southwest SW Lewis Dalsass 2.7 .8 2 18 southern SO Suan Chin 5.1 .95 4 15 southeast SE Patricia Hemenway 4.0 .7 4 17 eastern EA TB Savage 4.4 .84 5 20 northeast NE AM Main Jr. 5.1 .94 3 13 north NO Margot Weber 4.5 .89 5 9 central CT Ann Stephens 5.7 .94 5 13 →western WE Sharon Gray 5.3 .97 5 23 //g是覆蓋 root@lanquark:~/demo# sed -e '/WE/{h;d;}' -e '/CT/{g;}' datafile northwest NW Charles Main 3.0 .98 3 34 southwest SW Lewis Dalsass 2.7 .8 2 18 southern SO Suan Chin 5.1 .95 4 15 southeast SE Patricia Hemenway 4.0 .7 4 17 eastern EA TB Savage 4.4 .84 5 20 northeast NE AM Main Jr. 5.1 .94 3 13 north NO Margot Weber 4.5 .89 5 9 western WE Sharon Gray 5.3 .97 5 23
暫存和互換
//x表示互換 root@lanquark:~/demo# sed -e '/Patricia/h' -e /Margot/x datafile northwest NW Charles Main 3.0 .98 3 34 western WE Sharon Gray 5.3 .97 5 23 southwest SW Lewis Dalsass 2.7 .8 2 18 southern SO Suan Chin 5.1 .95 4 15 southeast SE Patricia Hemenway 4.0 .7 4 17 eastern EA TB Savage 4.4 .84 5 20 northeast NE AM Main Jr. 5.1 .94 3 13 →southeast SE Patricia Hemenway 4.0 .7 4 17 central CT Ann Stephens 5.7 .94 5 13
awk是一種用於處理數據和生成報告的UNIX編程語言,gawk是基於Linux的GNU版本。
awk的格式:awk指令由模式、操做、或模式與操做的組合組成。
awk能夠接受來自文件、管道或標準輸入的輸入。
1.從文件輸入
格式:
awk 'pattern' filename
awk '{action}' filename
awk 'pattern{action}' filename
//示例文件 [root@lanquark demo]# cat employees Tom Jones 4424 5/12/66 54335 Mary Adams 5346 11/4/63 28765 Sally Chang 1654 7/22/54 65000 Billy Black 1683 9/23/44 33650 //僅有模式 [root@lanquark demo]# awk '/Mary/' employees Mary Adams 5346 11/4/63 28765 //僅有動做 [root@lanquark demo]# awk '{print $1}' employees Tom Mary Sally Billy //模式和動做的組合 [root@lanquark demo]# awk '/Sally/{print $1,$2}' employees Sally Chang
2.從命令輸入
格式
command | awk 'pattern'
command | awk '{action}'
command | awk 'pattern{action}'
//僅有模式 [root@lanquark demo]# cat employees | awk '/Mary/' Mary Adams 5346 11/4/63 28765 //有模式有動做 [root@lanquark demo]# cat employees | awk '/Mary/{print $1,$2}' Mary Adams
awk的正則表達式元字符
元字符 | 說明 |
---|---|
^ | 在行首匹配 |
$ | 在行尾匹配 |
. | 匹配單個任意字符 |
* | 匹配零個或多個前導字符 |
+ | 匹配1個或多個前導字符 |
? | 匹配0個或1個前導字符 |
[ABC] | 匹配指定字符組(即A、B和C)中的字符 |
[^ABC] | 匹配任何一個不在指定字符組(即A、B和C)中的字符 |
[A-Z] | 匹配A至Z之間的任一字符 |
A|N | 匹配A或B |
(AB)+ | 匹配一個AB或多個AB組合,如AB,ABAB,ABABAB |
\* | 匹配星號自己 |
& | 用在替代串中,表明查找串中匹配到的內容 |
示例文件
[root@lanquark demo]# cat datafile1 northwest NW Joel Craig 3.0 .98 3 4 western WE Sharon Kelly 5.3 .97 5 23 southwest SW Chris Foster 2.7 .8 2 18 southern SO May Chin 5.1 .95 4 15 southeast SE Derek Johnson 4.0 .7 4 17 eastern EA Susan Beal 4.4 .84 5 20 northeast NE TJ Nichols 5.1 .94 3 13 north NO Val Shultz 4.5 .89 5 9 central CT Sheri Watson 5.7 .94 5 13
簡單模式匹配
[root@lanquark demo]# awk '/west/' datafile1 northwest NW Joel Craig 3.0 .98 3 4 western WE Sharon Kelly 5.3 .97 5 23 southwest SW Chris Foster 2.7 .8 2 18
匹配行首(^)
[root@lanquark demo]# awk '/^north/' datafile1 northwest NW Joel Craig 3.0 .98 3 4 northeast NE TJ Nichols 5.1 .94 3 13 north NO Val Shultz 4.5 .89 5 9
匹配模式no或so(|)
[root@lanquark demo]# awk '/^(no|so)/' datafile1 northwest NW Joel Craig 3.0 .98 3 4 southwest SW Chris Foster 2.7 .8 2 18 southern SO May Chin 5.1 .95 4 15 southeast SE Derek Johnson 4.0 .7 4 17 northeast NE TJ Nichols 5.1 .94 3 13 north NO Val Shultz 4.5 .89 5 9
簡單的操做
[root@lanquark demo]# awk '{print $3,$2}' datafile1 Joel NW Sharon WE Chris SW May SO Derek SE Susan EA TJ NE Val NO Sheri CT [root@lanquark demo]# awk '{print "number of fields:",NF}' datafile1 number of fields: 8 number of fields: 8 number of fields: 8 number of fields: 8 number of fields: 8 number of fields: 8 number of fields: 8 number of fields: 8 number of fields: 8
模式與操做組合的正則表達式
[root@lanquark demo]# awk '/northeast/{print $3,$2}' datafile1 TJ NE [root@lanquark demo]# awk '/^[ns]/{print $1}' datafile [root@lanquark demo]# awk '/^[ns]/{print $1}' datafile1 northwest southwest southern southeast northeast north
匹配模式(~)
[root@lanquark demo]# awk '$5~/\.[7-9]+/' datafile southwest SW Lewis Dalsass 2.7 .8 2 18 central CT Ann Stephens 5.7 .94 5 13
輸入字段分隔符(F)
//未指定分隔符,默認是以空格 [root@lanquark demo]# head -n 5 /etc/passwd | awk '{print $1}' root:x:0:0:root:/root:/bin/bash bin:x:1:1:bin:/bin:/sbin/nologin daemon:x:2:2:daemon:/sbin:/sbin/nologin adm:x:3:4:adm:/var/adm:/sbin/nologin lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin //指定分隔符爲:號 [root@lanquark demo]# head -n 5 /etc/passwd | awk -F: '{print $1}' root bin daemon adm lp
比較表達式
關係運算符
運算符 | 含義 | 示例 |
---|---|---|
< | 小於 | x < y |
<= | 小於或等於 | x <= y |
== | 等於 | x == y |
!= | 不等於 | x != y |
>= | 大於或等於 | x >= y |
> | 大於 | x > y |
~ | 與正則表達式匹配 | x ~ /y/ |
!~ | 與正則表達式不匹配 | x !~ /y/ |
示例文件
[root@lanquark demo]# cat employees Tom Jones 4424 5/12/66 54335 Mary Adams 5346 11/4/63 28765 Sally Chang 1654 7/22/54 65000 Billy Black 1683 9/23/44 33650
[root@lanquark demo]# awk '$3 == 5346' employees Mary Adams 5346 11/4/63 28765 [root@lanquark demo]# awk '$3>5000{print $1}' employees Mary [root@lanquark demo]# awk '$2~/Adam/' employees Mary Adams 5346 11/4/63 28765 [root@lanquark demo]# awk '$2!~/Adam/' employees Tom Jones 4424 5/12/66 54335 Sally Chang 1654 7/22/54 65000 Billy Black 1683 9/23/44 33650
算術運算
算術運算符
運算符 | 含義 | 示例 |
---|---|---|
+ | 加 | x + y |
- | 減 | x - y |
* | 乘 | x * y |
/ | 除 | x / y |
% | 模 | x % y |
^ | 冪 | x ^ y |
[root@lanquark demo]# cat emp.data Beth 4.00 0 Dan 3.75 0 Kathy 4.00 10 Mark 5.00 20 Mary 5.50 22 Susie 4.25 18 [root@lanquark demo]# awk '$3>0{print $2*$3}' emp.data 40 100 121 76.5
邏輯運算符和複合運算符
運算符 | 含義 | 示例 |
---|---|---|
&& | 邏輯與 | a&&b |
|| | 邏輯或 | a||b |
! | 邏輯非 | !a |
Beth 4.00 0 Dan 3.75 0 Kathy 4.00 10 Mark 5.00 20 Mary 5.50 22 Susie 4.25 18 [root@lanquark demo]# awk '$3>10 && $3<22' emp.data Mark 5.00 20 Susie 4.25 18
賦值運算符
[root@lanquark demo]# awk '$3=="Chris"{$3="Christian";print}' datafile1 southwest SW Christian Foster 2.7 .8 2 18
內置變量
變量名 | 含義 |
---|---|
ARGC | 命令行參數數目 |
ARGIND | 命令行中當前文件在ARGV內的索引 |
ARGV | 命令參數構成的數組 |
CONVFMT | 數字轉換格式,默認爲%.6g |
ENVIRON | 包含當前shell環境變量值的數組 |
ERRNO | 當使用getline函數進行讀操做或使用cloase函數時,因重定向操做而生產的系統錯誤 |
FIELDWIDTHS | 在分隔固定寬度的列表時,使用空白而不是FS進行分隔的字段寬度列表 |
FILENAME | 當前輸入文件的文件名 |
FNR | 當前文件的記錄數 |
FS | 輸入字段分隔符,默認爲空格 |
IGNORECASE | 在正則表達式和字符串匹配中不區分大小寫 |
NF | 當前記錄中的字段數 |
NR | 目前的記錄數 |
OFMT | 數字的輸出格式 |
OFS | 輸出字段分隔符 |
ORS | 輸出記錄分隔符 |
RLENGTH | match函數匹配到的字符串的長度 |
RS | 輸入記錄分隔符 |
RSTART | match函數匹配到的字符串的偏移量 |
RT | 記錄終結符,對於匹配字符或者用RS指定的regex,gawk將RT設置到輸入文本 |
SUBSEP | 數組下標分隔符 |
[root@lanquark demo]# cat employees2 Tom Jones:4424:5/12/66:54335 Mary Adams:5346:11/4/63:28765 Sally Chang:1654:7/22/54:65000 Billy Black:1683:9/23/44:33650 [root@lanquark demo]# awk -F: '$1=="Mary Adams"{print NR,$1,$2,$NF}' employees2 2 Mary Adams 5346 28765 [root@lanquark demo]# awk -F: 'BEGIN{IGNORECASE=1};$1=="mary adams"{print NR,$1,$2,$NF}' employees2 2 Mary Adams 5346 28765
BEGIN模式
[root@lanquark demo]# awk 'BEGIN{FS=":";OFS="\t";ORS="\n\n"}{print $1,$2,$3}' employees2 Tom Jones 4424 5/12/66 Mary Adams 5346 11/4/63 Sally Chang 1654 7/22/54 Billy Black 1683 9/23/44 [root@lanquark demo]# awk 'BEGIN{print "Make Year"}' Make Year
END模式
[root@lanquark demo]# awk 'END{print "The number of records is",NR}' employees2 The number of records is 4 [root@lanquark demo]# awk '/Mary/{count++}END{print "Mary was found",count,"times"}' employees2 Mary was found 1 times
重定向和管道
輸出重定向(>清空 >>追加,不清空)
[root@lanquark demo]# awk '$1=="Tom"{print $1}' employees2 Tom [root@lanquark demo]# awk '$1=="Tom"{print $1>"passing_file"}' employees2 [root@lanquark demo]# cat passing_file Tom
輸入重定向(getline)
[root@lanquark demo]# awk 'BEGIN{"date"|getline d;print d}' Tue Jun 5 22:53:24 EDT 2018 [root@lanquark demo]# awk 'BEGIN{"date" | getline d;split(d,mon);print mon[2]}' Jun [root@lanquark demo]# awk 'BEGIN{while("ls" | getline) print}' 1111 1.txt datafile datafile1 emp.data employees employees2 id.txt ipconfig.txt lab5.data names newfile newfile1 passing_file picnic temp test1.txt test.dir textfile tt.txt
管道
若是在awk中打開了管道,就必須先關閉它才能打開另外一個管道。管道符右邊的命令被括在雙引號中。
[root@lanquark demo]# cat names john smith alice cheba george goldberg susan goldberg tony tram barbara nguyen elizabeth lone dan savage eliza goldberg john goldenrod [root@lanquark demo]# awk '{print $1,$2 | "sort -r +1 -2 +0 -1"}' names tony tram john smith dan savage barbara nguyen elizabeth lone john goldenrod susan goldberg george goldberg eliza goldberg alice cheba //關閉管道 [root@lanquark demo]# awk '{print $1,$2 | "sort -r +1 -2 +0 -1"}END{print "game over"}' names game over tony tram john smith dan savage barbara nguyen elizabeth lone john goldenrod susan goldberg george goldberg eliza goldberg alice cheba [root@lanquark demo]# awk '{print $1,$2 | "sort -r +1 -2 +0 -1"}END{close("sort -r +1 -2 +0 -1");print "game over"}' names tony tram john smith dan savage barbara nguyen elizabeth lone john goldenrod susan goldberg george goldberg eliza goldberg alice cheba game over
遞歸過濾:
如在data目錄下,過濾全部*.php文檔中含有eval的行
grep -r --include="*.php" 'eval' /data/
練習
http://www.apelearn.com/study_v2/chapter14.html