正則表達式grep、sed、awk

時間 2019-11-12

原文原文鏈接

什麼是正則

正則就是一串有規律的字符串
掌握好正則對於編寫shell腳本有很大幫助
各類編程語言中都有正則，原理是同樣的

grep

grep [-cinvABCE] 'word' filename
-c 行數 
-i 不區分大小寫
-n 顯示行號
-v 取反
-A 後面加數字n，過濾出符合條件的行及其下面n行
-B 同上，過濾出符合條件的行及其上面n行
-C 同上，同時過濾出符合條件的行及其上下各n行
-E 支持正則表達式

示例：
grep -n 'root' /etc/passwd
grep -nv 'nologin' /etc/passwd
grep '[0-9]' /etc/passwd
grep -v '[0-9]' /etc/passwd
grep -v '^#' test.txt
grep -v '^#' test.txt | grep -v '^$'
grep '^[^a-zA-Z]' test.txt
grep 'r.o' test.txt
grep 'oo*' test.txt
grep '.*' test.txt
grep 'o\{2\}' /etc/passwd
egrep 'o{2}' /etc/passwd
egrep 'o+' /etc/passwd
egrep 'oo?' /etc/passwod
egrep 'root|nologin' /etc/passwd
egrep '(oo){2}' /etc/passwd

sed

sed是一種流編編器，它是文本處理中很是中的工具，可以完美的配合正則表達式便用

命令  功能

a\  在當前行後添加一行或多行。多行時除最後一行外，每行末尾需用「\」續行
c\  用此符號後的新文本替換當前行中的文本。多行時除最後一行外，每行末尾需用"\"續行
i\  在當前行以前插入文本。多行時除最後一行外，每行末尾需用"\"續行
d   刪除行
h   把模式空間裏的內容複製到暫存緩衝區
H   把模式空間裏的內容追加到暫存緩衝區
g   把暫存緩衝區裏的內容複製到模式空間，覆蓋原有的內容
G   把暫存緩衝區的內容追加到模式空間裏，追加在原有內容的後面
l   列出非打印字符
p   打印行
n   讀入下一輸入行，並從下一條命令而不是第一條命令開始對其的處理
q   結束或退出sed
r   從文件中讀取輸入行
!   對所選行之外的全部行應用命令
s   用一個字符串替換另外一個
g   在行內進行全局替換
w   將所選的行寫入文件
x   交換暫存緩衝區與模式空間的內容
y   將字符替換爲另外一字符（不能對正則表達式使用y命令）

常見命令參數

p 打印到屏幕上
d 去掉匹配的內容
-n 取消默認的完整輸出，只要須要的  
-e 容許多項編輯
-i 修改文件內容
-r 支持正則表達式

示例

sed '1'd test.txt
sed '1,3'd test.txt
sed '/oot/'d test.txt
sed '1,2s/ot/to/g' test.txt
sed 's/[0-9]//g' test.txt
sed 's/[a-zA-Z]//g' test.txt
sed -r 's/(rot)(.*)(bash)/\3\2\1/' test.txt
sed -r 's/([^:]+):(.*):([^:]+)/\3:\2:\1/' test.txt
sed 's/^.*$/123&/' test.txt
sed -i 's/ot/to/g' test.txt
sed -n '5'p test.txt
sed -n '1,5'p test.txt
sed -n '1,$'p test.txt
sed -n '/root/'p test.txt
sed -n '/^1/'p test.txt
sed -n 'in$'p test.txt
sed -n '/r..o/'p test.txt
sed -e '1'p -e '/111/'p -n test.txt

AWK

awk和sed同樣，一次處理一行內容；也能夠對每行進行切片處理

命令行格式
 awk [options] 'command' files
    command 由兩部分組成，分別是
　　一、pattern，能夠是正則表達式或者邏輯判斷式
　　二、{ awk 命令 }    花括號括起來的是代碼段

awk內置變量及其含義

$0	當前記錄（整行的記錄）
$1~$n	當前記錄的第幾列
FILENAME	輸入的文件名稱
FS	輸入文件的字段分隔符（Fields Separator）
RS	輸入文件的記錄（每一行之間）的分隔符（Record Separator）
NF	當前行的字段數目（Number of Fields）
NR	當前記錄所在的行號
OFS	輸出字段的分隔符
ORS	輸出記錄的分隔符

awk函數
 
函數聲明	含義
length(str)	返回str中字符的個數
int(num)	返回num的整數部分
index(str1, str2)	返回str2在str1中的索引，若是不存在就返回0
split(str, arr, separator)	使用separator做爲分隔符，將str切分爲數組保存到arr中，返回數組的元素個數
printf(fmt, args)	根據fmt格式化args，並輸出結果
sprintf(fmt, args)	根據fmp格式化args，並返回格式化後的字符串
substr(str, pos, len)	返回str中從pos開始，長度爲len個字符的子字符串
tolower(str)	返回str轉換爲小寫字母后的副本
toupper(str)	返回str轉換爲大寫字母后的副本

示例

head -n2 test.txt | awk -F ':' '{print $1}'
head -n2 test.txt | gawk -F ':' '{print $0}' 
awk -F ':' '{print $1,$2,$3}'  test.txt
awk -F ':' '{print $1#$2#$3}'  test.txt 
awk '/oo/' test.txt
awk -F ':' '/root/ {print$1,$3} /test/ {print $1,$3}' test.txt
awk -F ':' '$3=="0"' /etc/passwod
awk -F ':' '$3>="500"'  /etc/passwod
awk -F ':' '$3>=500'  /etc/passwod
awk -F ':' '$7!="/sbin/nologin"' /etc/passwod
awk -F ':' '$3<$4' /etc/passwd
awk -F ':' '$3>"5" && $3<"7"' /etc/passwd
awk -F ':' '$3>1000 || $7=="/bin/bash"' /etc/passwd
head -5 /etc/passwd | awk -F ':' '{OFS="#"} {print $1,$3,$4}'
awk -F ':' '{OFS="#"}{if($3>1000){print $1,$2,$3,$4}}' /etc/passwd
head -n3 /etc/passwd | awk -F ':' '{print NF}'     打印列數
head -n3 /etc/passwd | awk -F ':' '{print NR}'     打印行數
awk -F ':' 'NR>20' /etc/passwd   打印行編號大於20的行
awk -F ':' 'NR<20 && $1 ~ /roo/'  /etc/passwd   打印行編號小魚20且第一段匹配roo的行
head -n3 1.txt | awk -F ':' '$1="root"'  #一個=號表示第一段賦值爲root
awk -F ':' '{tot=tot+$3};{print tot}' 1.txt
awk -F ':' '{if ($1=="root"){print $0}}' 1.txt