Linux文本三劍客：grep、sed、awk 學習總結

時間 2019-11-17

標籤 linux 文本劍客 grep sed awk 學習總結欄目 Linux 简体版

原文原文鏈接

grep

grep(global search regular expression(RE) and print out the line)全局搜索正則表達式並把行打印。它能使用正則表達式搜索文本，並把匹配的行打印出來。核心在於正則表達式。php

使用正則表達式

grep -E ""
或
egrep ""

-P:--perl-regexpperl正則表達式。

Bugbountrytips：One line crt.sh subdomain discover code。匹配<TD>和</TD>之間的內容：(?<=<TD>).*(?=</TD>)。html

curl -fsSL -H "User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:69.0) Gecko/20100101 Firefox/69.0" "https://crt.sh/?CN=%25.target.com" | sort -n | uniq -c | grep -o -P '(?<=<TD>).*(?=</TD>)' | sed -e '/white-space:normal/d'

在文件中搜索字符串，可指定多個文件，輸出除以外的全部行使用-v

mark@mark-Pc:~$ grep 'root' /etc/passwd
root:x:0:0:root:/root:/bin/bash

-l:列出符合匹配內容的文件名稱

mark@mark-Pc:~$ grep -l 'root' /etc/passwd
/etc/passwd
mark@mark-Pc:~$ grep -l 'root' ./1.txt 
mark@mark-Pc:~$

-o:只輸出文件中匹配到的部分

mark@mark-Pc:~$ grep -o 'root' /etc/passwd
root
root
root

-c:統計文件或者文本中包含匹配字符串的行數

mark@mark-Pc:~$ grep -c 'root' /etc/passwd
1

-n:輸出包含匹配字符串的行數

mark@mark-Pc:~$ grep -n 'root' /etc/passwd
1:root:x:0:0:root:/root:/bin/bash

-r:在多級目錄中對文本進行遞歸搜索，-i表示忽略大小寫

mark@mark-Pc:~$ grep "root" ./shell -r -i -n 
./shell/test.sh:3:    echo 'root'
./shell/test.sh:5:    echo 'not root'

搜索結果中包含--include或者排除--exclude指定文件

# 在目錄中搜索全部的php和html文件中包含`target`字符串的
grep "target" . -r --include *.{php,html}

sed

sed(stream editor)，流編輯器，用程序的方式處理文本。node

-e:多點編輯，以指定的script來處理文本文件。

sed -e "" -e ""

用s命令替換

mark@mark-Pc:~$ cat 1.txt 
hello name
mark@mark-Pc:~$ sed "s/name/mark/g" 1.txt 
hello mark
mark@mark-Pc:~$ sed "s/l/L/" 1.txt 
heLlo name
mark@mark-Pc:~$ sed "s/l/L/g" 1.txt 
heLLo name

s命令將name替換爲mark，/g表示以行爲單位進行匹配，不加g則只匹配第一個符合的字符串。此時並無改變文件內容，只是將處理後的內容輸出，可使用重定向寫入文件。linux

-i直接修改

mark@mark-Pc:~$ cat 1.txt 
hello name
mark@mark-Pc:~$ sed -i "s/name/mark/" 1.txt 
mark@mark-Pc:~$ cat 1.txt 
hello mark

選擇進行動做的行數

mark@mark-Pc:~$ nl pets.txt 
     1  This is my cat
     2    my cat's name is betty
     3  This is my dog
     4    my dog's name is frank
     5  This is my fish
     6    my fish's name is george
     7  This is my goat
     8    my goat's name is adam
mark@mark-Pc:~$ sed "1,3s/T/t/g" pets.txt 
this is my cat
  my cat's name is betty
this is my dog
  my dog's name is frank
This is my fish
  my fish's name is george
This is my goat
  my goat's name is adam

s命令將T替換爲t，加上1，3則表示只匹配第1到3行。正則表達式

mark@mark-Pc:~$ cat pets.txt 
This is my cat,my cat's name is betty
This is my dog,my dog's name is frank
This is my fish,my fish's name is george
This is my goat,my goat's name is adam
mark@mark-Pc:~$ sed "s/m/M/3g" pets.txt 
This is my cat,my cat's naMe is betty
This is my dog,my dog's naMe is frank
This is my fish,my fish's naMe is george
This is my goat,my goat's naMe is adaM

s命令將m替換爲M，加上3g則表示只匹配第3個和後面的。shell

-n只輸出通過sed處理的行數，p打印模版塊的行。

mark@mark-Pc:~$ cat -n pets.txt 
     1  This is my cat,my cat's name is betty
     2  This is my dog,my dog's name is frank
     3  This is my fish,my fish's name is george
     4  This is my goat,my goat's name is adam
mark@mark-Pc:~$ nl pets.txt | sed -n "1,2p"
     1  This is my cat,my cat's name is betty
     2  This is my dog,my dog's name is frank
mark@mark-Pc:~$ nl pets.txt | sed -n "s/cat/Cat/p"
     1  This is my Cat,my cat's name is betty

打印第1，2行，打印替換後的行。express

a(append)和i(insert)。a在當前行下面插入文本，i在當前行上面插入文本。

mark@mark-Pc:~$ nl pets.txt 
     1  This is my cat,my cat's name is betty
     2  This is my dog,my dog's name is frank
     3  This is my fish,my fish's name is george
     4  This is my goat,my goat's name is adam
# insert
mark@mark-Pc:~$ sed "1 i insert test" pets.txt 
insert test
This is my cat,my cat's name is betty
This is my dog,my dog's name is frank
This is my fish,my fish's name is george
This is my goat,my goat's name is adam
# append
mark@mark-Pc:~$ sed "1 a append test" pets.txt 
This is my cat,my cat's name is betty
append test
This is my dog,my dog's name is frank
This is my fish,my fish's name is george
This is my goat,my goat's name is adam
# 匹配內容後追加
mark@mark-Pc:~$ sed "/cat/a match append test" pets.txt 
This is my cat,my cat's name is betty
match append test
This is my dog,my dog's name is frank
This is my fish,my fish's name is george
This is my goat,my goat's name is adam

d(delete)

# 刪除空白行
sed '/^$/d'
# 刪除第2行
sed '2d'
# 刪除第2行到末尾
sed '2,$d' 
# 刪除最後一行
sed '$d'

c替換匹配行

mark@mark-Pc:~$ sed "/cat/c change test" pets.txt 
change test
This is my dog,my dog's name is frank
This is my fish,my fish's name is george
This is my goat,my goat's name is adam

已匹配字符串標記&

mark@mark-Pc:~$ cat pets.txt 
This is my cat,my cat's name is betty
This is my dog,my dog's name is frank
This is my fish,my fish's name is george
This is my goat,my goat's name is adam
mark@mark-Pc:~$ sed "s/my/[&]/g" pets.txt 
This is [my] cat,[my] cat's name is betty
This is [my] dog,[my] dog's name is frank
This is [my] fish,[my] fish's name is george
This is [my] goat,[my] goat's name is adam

awk

awk是一個強大的文本分析工具。awk有不少內建的功能，好比數組、函數等，這是它和C語言的相同之處，靈活性是awk最大的優點。編程

awk腳本基本結構

awk 'BEGIN{ print "start" } pattern{ commands } END{ print "end" }' file

一個awk腳本一般由：BEGIN語句塊、可以使用模式匹配的通用語句塊、END語句塊3部分組成，這三個部分是可選的。數組

一個簡單的例子

mark@mark-Pc:~$ cat 0.txt 
test
mark@mark-Pc:~$ awk '{print}' 0.txt 
test
mark@mark-Pc:~$ awk 'BEGIN{ print "Start" } { print } END{ print "End" }' 0.txt 
Start
test
End

print不帶參數時，打印當前行。bash

指定分隔符

用戶信息文件

/etc/passwd
root:x:0:0:root:/root:/bin/bash
account:password:UID:GID:GECOS:directory:shell
用戶名：密碼：用戶ID：組ID：用戶說明：家目錄：登錄以後shell
注意：無密碼只容許本機登錄，遠程不容許登錄

打印用戶名、uid，登陸以後的home目錄

mark@mark-Pc:~$ cat 3.txt 
mark:x:1000:1000:mark,,,:/home/mark:/bin/bash
mark@mark-Pc:~$ awk  'BEGIN{FS=":"} {print $1,$3,$6}' 3.txt 
mark 1000 /home/mark
mark@mark-Pc:~$ awk -F':' '{print $1,$3,$6}' 3.txt 
mark 1000 /home/mark
mark@mark-Pc:~$ awk -F: '{print $1,$3,$6}' 3.txt 
mark 1000 /home/mark
mark@mark-Pc:~$ awk  -F: '{print $1,$3,$6}' OFS="\t" 3.txt
mark    1000    /home/mark
mark@mark-Pc:~$

變量

mark@mark-Pc:~$ echo 'this is a test' | awk '{print $NF}'
test
mark@mark-Pc:~$ awk -F ':' '{print NR,$1}' /etc/passwd
1 root
2 daemon
3 bin
4 sys
5 sync

變量NF表示當前行有多少個字段，所以$NF就表明最後一個字段。變量NR表示當前處理的是第幾行。

內置變量

$n 當前記錄的第n個字段，好比n爲1表示第一個字段，n爲2表示第二個字段。 
$0 這個變量包含執行過程當中當前行的文本內容。
ARGC 命令行參數的數目。
ARGIND 命令行中當前文件的位置（從0開始算）。
ARGV 包含命令行參數的數組。
CONVFMT 數字轉換格式（默認值爲%.6g）。
ENVIRON 環境變量關聯數組。
ERRNO 最後一個系統錯誤的描述。
FIELDWIDTHS 字段寬度列表（用空格鍵分隔）。
FILENAME 當前輸入文件的名。
FNR 同NR，但相對於當前文件。
FS 字段分隔符（默認是任何空格）。
IGNORECASE 若是爲真，則進行忽略大小寫的匹配。
NF 表示字段數，在執行過程當中對應於當前的字段數。
NR 表示記錄數，在執行過程當中對應於當前的行號。
OFMT 數字的輸出格式（默認值是%.6g）。
OFS 輸出字段分隔符（默認值是一個空格）。
ORS 輸出記錄分隔符（默認值是一個換行符）。
RS 記錄分隔符（默認是一個換行符）。
RSTART 由match函數所匹配的字符串的第一個位置。
RLENGTH 由match函數所匹配的字符串的長度。
SUBSEP 數組下標分隔符（默認值是34）。

函數

mark@mark-Pc:~$ echo 'this is a test' | awk -F ':' '{ print toupper($1) }'
THIS IS A TEST

函數toupper()用於將字符轉爲大寫。

其餘內置函數：https://www.gnu.org/software/gawk/manual/html_node/Built_002din.html#Built_002din

條件判斷

# 檢查uid爲0的用戶
awk -F: '{if ($3==0) print $1}' /etc/passwd

awk總結
awk不單單是工具軟件，仍是一門編程語言。《awk Programming》專門介紹awk編程，程序結構大體以下。

#!/usr/bin/awk 
#運行前
BEGIN {
    math = 0
    english = 0
    computer = 0
 
    printf "NAME    NO.   MATH  ENGLISH  COMPUTER   TOTAL\n"
    printf "---------------------------------------------\n"
}
#運行中
{
    math+=$3
    english+=$4
    computer+=$5
    printf "%-6s %-6s %4d %8d %8d %8d\n", $1, $2, $3,$4,$5, $3+$4+$5
}
#運行後
END {
    printf "---------------------------------------------\n"
    printf "  TOTAL:%10d %8d %8d \n", math, english, computer
    printf "AVERAGE:%10.2f %8.2f %8.2f\n", math/NR, english/NR, computer/NR
}

參考

https://man.linuxde.net/ https://coolshell.cn/articles/9104.html http://www.ruanyifeng.com/blog/2018/11/awk.html