grep的學習

時間 2019-11-10

標籤 grep 學習简体版

原文原文鏈接

grephtml

grep：global search regular expression(RE) and print out the line (全面搜索正則表達式並把行打印出來)linux

grep在數據中查找一個字符串時，是以「整行」爲單位進行數據選取的git

1. 定義正則表達式

1) grep是一種強大的文本搜索工具，它能使用正則表達式搜索文本，並把匹配的行打印出來。Unix的grep家族包括grep、egrep和fgrep。egrep和fgrep的命令只跟grep有很小不一樣。egrep是grep的擴展，支持更多的re元字符， fgrep就是fixed grep或fast grep，它們把全部的字母都看做單詞，也就是說，正則表達式中的元字符表示回其自身的字面意義，再也不特殊。linux使用GNU版本的grep。它功能更強，能夠經過-G、-E、-F命令行選項來使用egrep和fgrep的功能。redis

2) grep的工做方式是這樣的，它在一個或多個文件中搜索字符串模板。若是模板包括空格，則必須被引用，模板後的全部字符串被看做文件名。搜索的結果被送到屏幕，不影響原文件內容。

3) grep可用於shell腳本，由於grep經過返回一個狀態值來講明搜索的狀態，若是模板搜索成功，則返回0，若是搜索不成功，則返回1，若是搜索的文件不存在，則返回2。咱們利用這些返回值就可進行一些自動化的文本處理工做。

2. 語法：grep [options] filename

1） -A NUM，--after-context=NUM 除了列出匹配行以外，還列出其後NUM行shell

範例1：express

ompmsc35 chuntaoh> cat test1工具

a1ui

b2this

ompmsc35 chuntaoh> grep -A 1 'b' test1

2） -a或--text

grep本來是搜尋文字文件，若拿二進制的檔案做爲搜尋的目標，則會顯示以下的訊息: Binary file 二進制文件名matches 而後結束。

若加上-a參數則可將二進制檔案視爲文本文件搜尋，至關於--binary-files=text這個參數。

範例2：

ompmsc35 chuntaoh> grep 'redistribute' /bin/mv

Binary file /bin/mv matches

ompmsc35 chuntaoh> grep -a 'redistribute' /bin/mv

This is free software. You may redistribute copies of it under the terms of

範例3：

(1)找出一個二進制文件。如/usr/bin/[,

ompmsc35 chuntaoh> file /usr/bin/[

/usr/bin/[: ELF 64-bit LSB executable, AMD x86-64, version 1 (SYSV), for GNU/Linux 2.6.18, dynamically linked (uses shared libs), stripped

(2)二進制文件用strings查看

ompmsc35 chuntaoh> strings [

/lib64/ld-linux-x86-64.so.2

__gmon_start__

libc.so.6

setlocale

mbrtowc

optind

fflush_unlocked

dcgettext

error

__lxstat

iswprint

(2）使用grep -a

ompmsc35 chuntaoh> grep -a shell /usr/bin/[

3） -B NUM，--before-context=NUM

與-A NUM 相對，但這此參數是顯示除符合行以外，並顯示在它以前的NUM行。

範例3：

ompmsc35 chuntaoh> cat test1

ompmsc35 chuntaoh> grep -B 1 'b' test1

4） -b, --byte-offset 打印匹配行前面的文本總共有多少byte

範例4：

ompmsc35 chuntaoh> cat test1

ompmsc35 chuntaoh> grep -b 'a' test1 #前面的行有0字節

0:a1

ompmsc35 chuntaoh> grep -b '1' test1

0:a1

ompmsc35 chuntaoh> grep -b 'b' test1 #前面的行有3個字節，由於\n也算一個字節

3:b2

ompmsc35 chuntaoh> grep -b '2' test1

3:b2

ompmsc35 chuntaoh> od -N4 test1 -t c #N只查看4個字節，顯t按ASCII顯示，可知\n也算一個字節

0000000 a 1 \n b

0000004

ompmsc35 chuntaoh> od -N6 test1 -t c

0000000 a 1 \n b 2 \n

0000006

5)-C [NUM]

-NUM

--context[=NUM] 列出匹配行以外並列出上下各NUM行，默認值是2，爲何不能用默認的兩行

範例5：

ompmsc35 chuntaoh> grep -C 2 '3' test1

ompmsc35 chuntaoh> grep -2 '3' test1

ompmsc35 chuntaoh> grep --context=2 '3' test1

6）-c：計算找到」搜索字符串」的個數。不顯示符合樣式行，只顯示符合的總行數。

若再加上-v,--invert-match，參數顯示不符合的總行數。

ompmsc35 chuntaoh> grep -c '3' test1

ompmsc35 chuntaoh> grep -c '3' test1 -v

7） -d ACTION, --directories=ACTION

若輸入的檔案是一個目錄，使用ACTION去處理這個目錄。
預設ACTION是read(讀取)，也就是說此目錄會被視爲通常的檔案；
若ACTION是skip(略過)，目錄會被grep略過
若ACTION是recurse(遞歸)，grep會去讀取目錄下全部的檔案，此至關於-r 參數

ompmsc35 chuntaoh> cp test1 ./dir/test1

ompmsc35 chuntaoh> grep -d recurse '3' dir

dir/test1:c3

ompmsc35 chuntaoh> grep -r '3' dir

dir/test1:c3

ompmsc35 chuntaoh> grep -r 'c3' /home/chuntaoh/dir

/home/chuntaoh/dir/test1:c3

ompmsc35 chuntaoh> grep -r 'c3' /home/chuntaoh/dir -d skip #跳過目錄，沒有輸出

ompmsc35 chuntaoh> grep -r 'c3' /home/chuntaoh/dir -d read #看做通常文檔，沒有輸出

8）-E, --extended-regexp 採用規則表示式去解釋樣式。至關於egrep

ompmsc35 chuntaoh> grep '3|4' test1 #通常狀況下，不能用|分隔兩個匹配方式

ompmsc35 chuntaoh> grep '3\|4' test1 #可是若是加了\轉義，則能夠

ompmsc35 chuntaoh> egrep '3|4' test1 #egrep可用|

9)-e PATTERN, --regexp=PATTERN

指定多個匹配模式，很到知足兩個模式中任意一個的全部結果

一般用在避免partern用-開始。

ompmsc35 chuntaoh> cat test1

-c3

ompmsc35 chuntaoh> grep '-c' test1 #沒有輸出

ompmsc35 chuntaoh> grep -e -c test1

-c3

範例6：-e: 指定多個匹配模式，很到知足兩個模式中任意一個的全部結果

ompmsc35 chuntaoh> cat test

one

two

three

four

five

six

ompmsc35 chuntaoh> grep -e t -e f test

two

three

four

five

輸出了含有字符t或字符f的全部行，也可以使用正則表達式

ompmsc35 chuntaoh> grep [tf] test

two

three

four

five

10）-f FILE, --file=FILE

事先將要搜尋的樣式寫入到一個檔案，一行一個樣式。而後採用檔案搜尋。空的檔案表示沒有要搜尋的樣式，所以也就不會有任何符合。

ompmsc35 chuntaoh> cat test1

-c3

ompmsc35 chuntaoh> cat reg

ompmsc35 chuntaoh> grep -f reg test1

-c3

11）-G, --basic-regexp 將樣式視爲基本的規則表示式解釋。(此爲預設)

12）-H, --with-filename 在每一個符合樣式行前加上符合的文件名稱，如有路徑會顯示路徑

ompmsc35 chuntaoh> grep -H 'c' /home/chuntaoh/test1

/home/chuntaoh/test1:-c3

ompmsc35 chuntaoh> pwd

/home/chuntaoh

ompmsc35 chuntaoh> grep -H 'c' test1

test1:-c3

13）-h, --no-filename 與-H參數相相似，但在輸出時不顯示文件名

ompmsc35 chuntaoh> grep -h 'c' /home/chuntaoh/test1

-c3

ompmsc35 chuntaoh> grep -h 'c' test1

-c3

14） --help 產生簡短的help訊息。

15）-I grep會強制認爲此二進制檔案沒有包含任何搜尋樣式，與--binary-files=without-match參數相同

ompmsc35 chuntaoh> grep -a 'redistribute' /bin/mv

This is free software. You may redistribute copies of it under the terms of

ompmsc35 chuntaoh> grep -I 'redistribute' /bin/mv

16) --binary-files=TYPE
此參數TYPE預設爲binary(二進制)

若以普通方式搜尋，只有2種結果:
1.如有符合的地方：顯示Binary file 二進制文件名matches
2.若沒有符合的地方：什麼都沒有顯示。
若TYPE爲without-match，遇到此參數，grep會認爲此二進制檔案沒有包含任何搜尋樣式，與-I 參數相同。
若TPYE爲text, grep會將此二進制文件視爲text檔案，與-a 參數相同。
　Warning: --binary-files=text 若輸出爲終端機，可能會產生一些沒必要要的輸出。

以普通方式搜尋:

ompmsc35 chuntaoh> grep 'redistribute' /bin/mv

Binary file /bin/mv matches

ompmsc35 chuntaoh> grep --binary-files=text 'redistribute' /bin/mv

This is free software. You may redistribute copies of it under the terms of

17）-i, --ignore-case 忽略大小寫，包含要搜尋的樣式及被搜尋的檔案。

ompmsc35 chuntaoh> grep -i 'C' test1

-c3

18） -L, --files-without-match 不顯示日常通常的輸出結果，反而顯示出沒有符合的文件名稱

ompmsc35 chuntaoh> grep -L 'c' test1 test2

test2

19) -l, --files-with-matches 不顯示日常通常的輸出結果，只顯示符合的文件名稱

ompmsc35 chuntaoh> grep -l 'c' test1 test2

test1

20）--mmap 不懂

若是可能，使用mmap系統呼叫去讀取輸入，而不是預設的read系統呼叫。

在某些情況，--mmap 能產生較好的效能。然而，--mmap若是運做中檔案縮短，或I/O 錯誤發生時，可能形成未定義的行爲(包含core dump)。

21）-n, --line-number 在顯示行前，標上行號。

ompmsc35 chuntaoh> grep -n '3' test1

3:-c3

22）-q, --quiet, --silent 不顯示任何的通常輸出。請參閱-s或--no-messages

grep -q用於if邏輯判斷

忽然發現grep -q 用於if 邏輯判斷很好用。

-q : 安靜模式，不打印任何標準輸出。若是有匹配的內容則當即返回狀態值0。

# cat a.txt

nihao

nihaooo

hello

# if grep -q hello a.txt ; then echo yes;else echo no; fi

yes

# if grep -q word a.txt; then echo yes; else echo no; fi

23） -R -r, --recursive 遞歸地，讀取每一個資料夾下的全部文件，此至關於-d recsuse 參數

ompmsc35 chuntaoh> grep -r 'c3' /home/chuntaoh/dir

/home/chuntaoh/dir/test1:c3

-r/-R

ompmsc35 chuntaoh> grep -R 'goface' /home/chuntaoh

/home/chuntaoh/goface.txt:goface

/home/chuntaoh/goface.txt:gofaceme

ompmsc35 chuntaoh> grep -r 'goface' /home/chuntaoh

/home/chuntaoh/goface.txt:goface

/home/chuntaoh/goface.txt:gofaceme

24） -s, --no-messages 不顯示關於不存在或沒法讀取的錯誤信息。

不懂

小注: 不像GNU grep，傳統的grep不符合POSIX.2協議，由於缺少-q參數，且他的-s 參數表現像GNU grep的 -q 參數。
Shell Script傾向將傳統的grep移植，避開-q及-s參數，且將輸出限制到/dev/null。POSIX: 定義UNIX及UNIX-like系統須要提供的功能

ompmsc35 chuntaoh> grep 'c3' test1 test2 test3

test1:-c3

grep: test3: No such file or directory

ompmsc35 chuntaoh> grep -s 'c3' test1 test2 test3

test1:-c3

25） -V, --version顯示出grep的版本號到標準錯誤。

當在回報有關grep的bugs時，grep版本號是必需要包含在內的。

26）-v, --invert-match 顯示除搜尋樣式行以外的所有。

ompmsc35 chuntaoh> grep -v 'c3' test1

27）w, –word-regexp 意思就是精確匹配，匹配單詞還不是字符串，如想匹配「is」,」this」就不會被匹配

ompmsc35 chuntaoh> cat goface.txt

goface

gofaceme

ompmsc35 chuntaoh> grep 'goface' goface.txt

goface

gofaceme

ompmsc35 chuntaoh> grep -w 'goface' goface.txt

goface

28）-x, --line-regexp 將搜尋樣式視爲一行去搜尋，徹底符合該"行"的行纔會被列出

ompmsc35 chuntaoh> cat test1

bb2

-c3

ompmsc35 chuntaoh> grep -x 'b2' test1

ompmsc35 chuntaoh> grep -x 'bb2' test1

bb2

3.grep正則表達式元字符集(基本集)

錨定行的開始如：'^grep'匹配全部以grep開頭的行。

錨定行的結束如：'grep$'匹配全部以grep結尾的行。

匹配一個非換行符的字符如：'gr.p'匹配gr後接一個任意字符，而後是p。

匹配零個或多個先前字符如：'*grep'匹配全部0個或多個空格後緊跟grep的行。.*一塊兒用表明任意字符。

[]

匹配一個指定範圍內的字符，如：'[Gg]rep'匹配Grep和grep。

[^]

匹配一個不在指定範圍內的字符，如：'[^A-FH-Z]rep'匹配不包含A-F和H-Z的一個字母開頭，緊跟rep的行。

$..$

標記匹配字符，如：'$love$'，love被標記爲1。

錨定單詞的開始，如：'\<grep'匹配包含以grep開頭的單詞的行。

錨定單詞的結束，如'grep\>'匹配包含以grep結尾的單詞的行。

x\{m\}

連續重複字符x，m次，如：'o\{5\}'匹配包含連續5個o的行。

x\{m,\}

連續重複字符x,至少m次，如：'o\{5,\}'匹配至少連續有5個o的行。

x\{m,n\}

連續重複字符x，至少m次，很少於n次，如：'o\{5,10\}'匹配連續5--10個o的行。

匹配一個文字和數字字符，也就是[A-Za-z0-9]，如：'G\w*p'匹配以G後跟零個或多個文字或數字字符，而後是p。

w的反置形式，匹配一個非單詞字符，如點號句號等。\W*則可匹配多個。

單詞鎖定符，如: '\bgrep\b'只匹配grep，即只能是grep這個單詞，兩邊均爲空格。

4. 用於egrep和 grep -E的元字符擴展集

匹配一個或多個先前的字符。如：'[a-z]+able'，匹配一個或多個小寫字母后跟able的串，如loveable,enable,disable等。

匹配零個或多個先前的字符。如：'gr?p'匹配gr後跟一個或沒有字符，而後是p的行。

a|b|c

匹配a或b或c。如：grep|sed匹配grep或sed

()

分組符號，如：love(able|rs)ov+匹配loveable或lovers，匹配一個或多個ov。

x,x,x

做用同x,x,x

5. POSIX字符類

爲了在不一樣國家的字符編碼中保持一至，POSIX(The Portable Operating System Interface)增長了特殊的字符類，如[:alnum:]是A-Za-z0-9的另外一個寫法。要把它們放到[]號內才能成爲正則表達式，如[A-Za-z0-9]或[[:alnum:]]。在linux下的grep除fgrep外，都支持POSIX的字符類。

fgrep把全部的字母都看做單詞，也就是說，正則表達式中的元字符表示回其自身的字面意義，再也不特殊。

類等價的正則表達式解釋

[[:upper:]] [A-Z] 小寫字符

[[:lower:]] [a-z] 大寫字符

[[:alpha:]] [a-zA-Z] 文字字符

[[:alnum:]] [0-9a-zA-Z] 文字數字字符

[[:digit:]] [0-9] 數字字

[[:space:]] [空格或tab鍵等] 全部空白字符（新行，空格，製表符）

[:graph:] 空字符（非空格、控制字符）

[:cntrl:] 控制字符

[:print:] 非空字符（包括空格）

[:punct:] 標點符號