2018年02月08日 19時27分57秒html
命令語法常常忘記,每次老是看筆記不切實際,記不起來的要多查manual,本次總結按照manual總結,但願下次的你在使用man手冊的時候能記起來這些例子。
sed流處理,每次只會取文本的一行到模式空間(pattern space)中進行處理,處理完成以後默認輸出模式空間的內容。再次取下一行內容的時候會將模式空間的內容替換掉,若是須要模式空間中存在多行內容須要保持空間(hold space)的配合。linux
SYNOPSIS(概要)中描述了sed的使用格式
sed [OPTION]... {script-only-if-no-other-script} [input-file]...正則表達式
(1) OPTION:sed命令的選項,輔助操做方便的,不是真正的定義處理文件命令的地方shell
-n, --quiet, --silent suppress automatic printing of pattern space 限制sed命令不自動輸出模式空間的內容。每次sed取一行數據放到模式空間(模式空間在sed高級部分會再說明)進行處理,處理完以後會將模式空間的內容自動輸出,在有些時候咱們須要更加靈活的定義sed輸出的內容,故須要限制本來sed的默認輸出,轉而使用script中指定p(便是print)命令來指定輸出。 -e script, --expression=script add the script to the commands to be executed 默認狀況下sed是隻能指定一條處理命令在script中(目前版本的sed我沒有加-e選項也不會報錯),若是須要指定多條處理命令須要加上-e選項,而後對script中指定的多條命令使用分號隔開。 -f script-file, --file=script-file add the contents of script-file to the commands to be executed 不直接使用script中指定的命令,而是從文件中讀取處理的命令,文件中能夠指定多個命令,每個命令放在單獨的一行上,不須要使用分號結尾。 ![Diagram](./attachments/1551876130958.drawio.html) -r, --regexp-extended use extended regular expressions in the script. 在script中使用正則的時候默認是標準正則,若是須要識別擴展正則表達式,須要指定-r選項。
(2) script-only-if-no-other-script:指定處理文件的命令的地方,指定了應用於流數據上的單個命令(後面使用script指代這一部分),須要使用單引號包裹
(3) input-file:待處理的文件,若是最後不加待處理文件,sed編輯器會指定命令輸入到STDIN標準輸入上express
下面的內容分析一下script的基礎使用部分,大約分爲六個部分,對應着bash
s/regexp/replacement/ Attempt to match regexp against the pattern space. If successful, replace that portion matched with replacement. The replacement may contain the special character & to refer to that portion of the pattern space which matched, and the special escapes \1 through \9 to refer to the corresponding matching sub-expressions in the regexp. 上面的是替換命令的基礎語法,表示在sed會判斷讀取到pattern space(模式空間)的數據行是否可以匹配到regexp,若是可以匹配會將匹配到的部分使用replacement替換,可是默認只會替換每行中匹配的第一處,若是行中出現第二處匹配,則不會在進行替換。故存在一下語法: s/regexp/replacement/flags flags爲替換標記: 數字(1-9):表示將替換匹配到的**第幾處**地方 g:表示會替換全部匹配的文本 p:表示打印匹配regexp並替換處理以後的數據 w file:將替換的結果寫入到文件中去 上面還提到了&與\1到\9的模式替代,最後再說。
# 默認只會替換出現的第一處
[root@localhost ~]# echo "one one one" | sed -n 's/one/two/p'
two one one
# 使用gp參數,禁止sed默認輸出
[root@localhost ~]# echo "one one one" | sed -n 's/one/two/gp'
two two two
# 使用數字與p參數替換每行出現的第二處匹配,禁止默認輸出
[root@localhost ~]# echo "one one one" | sed -n 's/one/two/2p'
one two one
# 對兩行內容進行處理,若是不由止sed的默認輸出,能夠看出partten沒有匹配到的也會被輸出,因此能夠看出flags的輸出標記與sed的默認輸出的是存在差異的
[root@localhost ~]# echo -e "one one\ntwo two" | sed 's/one/two/'
two one
two two
[root@localhost ~]# echo -e "one one\ntwo two" | sed -n 's/one/two/p'
two one
默認狀況下是全部的數據行都會進入模式空間被處理的,當咱們要處理特定行的時候,就須要使用sed提供的Addresses了。在未指定處理的行雖然也會讀取到模式空間,卻並不會被處理。
須要注意的是,尋址能夠運用於任何一個單字命令或命令組合以前,用來指定那些行被處理app
Addresses Sed commands can be given with no addresses, in which case the command will be executed for all input lines; with one address, in which case the command will only be executed for input lines which match that address; or with two addresses, in which case the command will be executed for all input lines which match the inclusive range of lines starting from the first address and continuing to the second address. Three things to note about address ranges: the syntax is addr1,addr2 (i.e., the addresses are separated by a comma); the line which addr1 matched will always be accepted, even if addr2 selects an ear- lier line; and if addr2 is a regexp, it will not be tested against the line that addr1 matched. After the address (or address-range), and before the command, a ! may be inserted, which specifies that the command shall only be executed if the address (or address-range) does not match. The following address types are supported: number Match only the specified line number. first~step Match every step’th line starting with line first. For example, ‘‘sed -n 1~2p’’ will print all the odd-numbered lines in the input stream, and the address 2~5 will match every fifth line, starting with the second. first can be zero; in this case, sed operates as if it were equal to step. (This is an extension.) 一、循環的方式first爲第一次處理的行,step爲步長爲多少,如上面的2~5的例子,第一個處理的是第2行,第二個則爲第7行。 $ Match the last line. 二、$只會匹配最後一行 /regexp/ Match lines matching the regular expression regexp. 三、使用文本過濾的方式,若是該行中存在匹配,纔會執行後面的單命令操做。 \cregexpc Match lines matching the regular expression regexp. The c may be any character. 四、由於有些狀況下一直使用/進行對正則表達式包裹可能存在歧義,故提供指定任意字符包裹正則的語法,開始使用\進行轉義該字符,結尾處不須要使用\轉義 GNU sed also supports some special 2-address forms: 下面的三種都是數字尋址方式:都是閉區間 0,addr2 Start out in "matched first address" state, until addr2 is found. This is similar to 1,addr2, except that if addr2 matches the very first line of input the 0,addr2 form will be at the end of its range, whereas the 1,addr2 form will still be at the beginning of its range. This works only when addr2 is a regular expression. 五、從0開始的行開始,知道處理到第addr2行 addr1,+N Will match addr1 and the N lines following addr1. 六、從第addr1開始處理其與後N行數據 addr1,~N Will match addr1 and the lines following addr1 until the next line whose input line number is a multiple of N. 七、從第addr1開始處理到第N行
# 文件內容
[root@localhost ~]# cat file
this is a test line one
this is a test line two
this is a test line three
this is a test line four
this is a test line five
# 尋址方式1:從1開始,循環步長爲2
[root@localhost ~]# sed -n '1~2p' file
this is a test line one
this is a test line three
this is a test line five
# 尋址方式2:輸出最後一行
[root@localhost ~]# sed -n '$p' file
this is a test line five
# 尋址方式3:正則匹配定位行
[root@localhost ~]# sed -n '/three/p' file
this is a test line three
# 尋址方式4:指定包裹正則的字符
[root@localhost ~]# sed -n '\!three!p' file
this is a test line three
# 尋址方式5:從第1行開始,到第3行輸出
[root@localhost ~]# sed -n '1,3p' file
this is a test line one
this is a test line two
this is a test line three
# 尋址方式6:從第一行開始,包括第一行,輸出其自身在內以及其後一行
[root@localhost ~]# sed -n '1,+1p' file
this is a test line one
this is a test line two
# 尋址方式7:同尋址方式5
[root@localhost ~]# sed -n '1,~3p' file
this is a test line one
this is a test line two
this is a test line three
先來看一下咱們爲何須要命令組合編輯器
# 咱們想要對文本的第1到3行的this替換爲that,line替換爲text,若是每個單字命令以前不寫上尋址是不能保證每個命令都做用與1到3行的。
[root@localhost ~]# sed '1,3s/this/that/g; s/line/text/g' file
that is a test text one
that is a test text two
that is a test text three
this is a test text four
this is a test text five
這個時候須要用到命令組合了,看一下官方的解釋測試
{ Begin a block of commands (end with a }). } The closing bracket of a { } block. 解釋的很簡單,其實用起來也是那麼簡單,若是想要對一個尋址到行執行多個單字命令,使用大括號將他們括起來就好了。 # 看示例(很是簡單-_-): [root@localhost ~]# sed '1,~3{s/this/that/g; s/line/text/g}' file that is a test text one that is a test text two that is a test text three this is a test line four this is a test line five # 該示例完成以後接着來單字命令
前面將替換與尋址看完,在看刪除其實也是很是的簡單的,不過刪除存在兩種,目前先介紹d(delete),D留到進階。ui
d Delete pattern space. Start next cycle. 將模式空間的數據刪除掉,並讀取下一行的內容,這個刪除是很是的乾淨的,此時輸出模式空間什麼也沒有。 # 刪除練習 # 刪除第1到3行 [root@localhost ~]# sed -n '1,3D; p' file this is a test line four this is a test line five # 這兒的模式匹配使用了兩個,使用逗號隔開,one打開了刪除,three爲刪除的關閉 [root@localhost ~]# sed '/one/,/three/D' file this is a test line four this is a test line five
既然能夠刪除,那定然是能夠插入與附加文本的,插入是i(insert),附加是a(append)
(1)i插入命令會在指定行以前增長一個新行
(2)a追加命令會在指定行以後增長一個新行
命令格式:
# 首先看一下manual
i \
text Insert text, which has each embedded newline preceded by a backslash.
a \
text Append text, which has each embedded newline preceded by a backslash.
# 總結使用方法以下
sed '[address]command\new line'
# 如上address是指定的行地址,地址能夠指定,也能夠不指定,若是不指定都會在數據流以前與以後插入與追加。地址指定可使用數字行號或文本模式,可是不能指定區間!
# 例子以下:
# 在第三行以前插入一行。
[root@localhost ~]# sed '3i\this is a insert line' file
this is a test line one
this is a test line two
this is a insert line
this is a test line three
this is a test line four
this is a test line five
# 在第三行以後追加一行。
[root@localhost ~]# sed '3a\this is a append line' file
this is a test line one
this is a test line two
this is a test line three
this is a append line
this is a test line four
this is a test line five
# 在第三行以後追加多行,在命令行追加或插入,必須對每一行用反斜線換行。
[root@localhost ~]# sed '3a\this is a append line 1\
> this is a append line 2' file
this is a test line one
this is a test line two
this is a test line three
this is a append line 1
this is a append line 2
this is a test line four
this is a test line five
修改行c(change)的使用方法與插入、追加使用方法同樣,單獨拿出來的緣由就是有一點,修改行可使用地址區間,可是會將地址區間的所有內容總體替換爲待替換內容。
先看一下manual
c \
text Replace the selected lines with text, which has each embedded newline preceded by a backslash.
替換尋址到的文本,須要嵌入的新行放在反斜槓以後。
# 例子
[root@localhost ~]# sed '3c\this is a change line' file
this is a test line one
this is a test line two
this is a change line
this is a test line four
this is a test line five
y/source/dest/
Transliterate the characters in the pattern space which appear in source to the corresponding character in dest.
轉換命令(transform)會將尋址到行中能夠匹配source中的單個字符的字符所有轉換爲對應dest中的字符,source與dest至關因而一個map映射關係。
# 例
[root@localhost ~]# echo "123456" | sed -n 'y/123/456/; p'
456456
p用來打印文本行,P在進階中會有
=用來打印行的行號
l(小寫的l)用來打印行的內容,會將不可顯示的字符使用ASCII打印
p Print the current pattern space.
= Print the current line number.
l List out the current line in a ‘‘visually unambiguous’’ form.
l width
List out the current line in a ‘‘visually unambiguous’’ form, breaking it at width characters. This is a GNU exten-
sion.
這個與不加width會多出一個按照width指定的字符個數進行切詞,詳細切詞後看本節最後一個例子
# 例
[root@localhost ~]# echo -e "one\ntwo" | sed -n '$p'
two
[root@localhost ~]# echo -e "one\ntwo" | sed -n '=; p'
1
one
2
two
[root@localhost ~]# echo -e "one\ttwo" | sed -n 'l'
one\ttwo$
[root@localhost ~]# echo -e "one\ntwo" | sed -n 'l'
one$
two$
[root@localhost ~]# echo -e "one\ntwo" | sed -n 'p; l 2'
one
o\
n\
e$
two
t\
w\
o$
r filename
Append text read from filename.
讀取文件中的數據並追加在指定行以後,實際就是將a單字符命令的指定new line內容從filename對應的文件中讀取,filename可使用相對路徑或絕對路徑,可是對文件要有讀權限,且地址區間如a單字符命令同樣是不能指定區間的。
使用格式:
[address]r filename
w filename
Write the current pattern space to filename.
對模式空間中的行判斷尋址是否匹配,若是匹配將內容輸出到文件,filename同r單字符命令
使用格式:
[address]w filename
# 例:
[root@localhost ~]# cat file
this is a test line one
this is a test line two
[root@localhost ~]# cat file1
this file1 content
[root@localhost ~]# sed '1r file1' file
this is a test line one
this file1 content
this is a test line two
[root@localhost ~]# sed '1,2w file2' file
this is a test line one
this is a test line two
[root@localhost ~]# cat file2
this is a test line one
this is a test line two
在後面進行以前先了解兩個概念pattern space與hold space
pattern space:模式空間,容納當前輸入行的緩衝區,在sed基礎中咱們接觸到的都是每次讀取一行入模式空間,後面便會接觸到模式空間存在多行內容的狀況。
hold space:保持空間,做爲一個輔助的緩衝區,能夠和模式空間進行交互(交互經過單字命令進行),可是命令是不能直接做用於保持空間的。
sed在正常狀況下,將處理的行讀入模式空間,腳本中的script就一條一條的執行,直到執行完畢。而後該行被輸出,模式被清空。接着重複上面的動做,直到文件被處理完畢。
爲何須要多行命令?
前面提到的命令中每次都是讀取文件中的一行到模式空間,模式空間也只會存在一行數據,這種狀況下咱們每次只能對一行數據進行尋址與匹配。就像下面這樣。
一個文件中的內容以下:
this is a test line one
this is a test line two
this is a test line three
this is a test line four
this is a test line five
如今咱們要查找one two兩個詞,可是這兩個詞可能存在兩行中,若是普通的sed命令是不能匹配到兩個單詞的,針對於這種狀況sed提供了多行命令來針對多行操做,將多行數據讀入模式空間組成一個多行組,針對多行組操做即可解決上面的問題。
什麼是多行組?
多行組:將文本中的多行數據讀入模式空間,單行命令此時對模式空間操做,從外面將此時模式空間看做一行(宏觀上);而多行命令對模式空間進行操做,是針對於模式空間內部操做的,將模式空間內的數據當作單獨的行,每行之間仍是經過\n進行分割。
下面來逐個分析manual中剩餘的命令
n N Read/append the next line of input into the pattern space.
小寫的n(單行命令)將下一行讀入模式空間,且不會跳轉到命令的最初從新對數據行執行全部的命令
大寫的N(多行命令)將下一行追加到模式空間,與當前行組成一個多行組,多行組內,文本行之間仍然是經過\n進行分割
# 將當前尋址定位到的
[root@localhost ~]# cat file
this is a test line one
this is a test line two
this is a test line three
this is a test line four
this is a test line five
下面來分析一下n與N的區別:
一、n命令實例與總結
[root@localhost ~]# sed '{n; s/this is a test line//g; p}' file this is a test line one two two this is a test line three four four this is a test line five [root@localhost ~]# sed -n '{n; s/this is a test line//g; p}' file two four
經過上面分析可得
(1) 讀到第一行的時候出發n將下一行讀入模式空間,使用sed的默認輸出會將第一行輸出,並對後面讀入的數據進行執行剩餘命令
(2) 第二行已經被讀入模式空間過,故該行不會在出發n命令,可是會執行n以後的剩餘命令
(3) 第三行以後重複上面的步驟
(4) n在將下一行讀取以後,指針仍是停留在第一行
二、N命令示例與總結
[root@localhost ~]# sed -n '{N; s/this is a test line//1; p}' file one this is a test line two three this is a test line four [root@localhost ~]# sed -n '{N; s/this is a test line//2; p}' file this is a test line one two this is a test line three four [root@localhost ~]# sed '{N; s/this is a test line//g; p}' file one two one two three four three four this is a test line five [root@localhost ~]# sed '{N; s/this is a test line//1; p}' file one this is a test line two one this is a test line two three this is a test line four three this is a test line four this is a test line five
經過上面分析可得
(1) 讀取第一行以後出發N命令,將第二行追加到模式空間,將模式空間的兩行當作一行進行操做,p單字命令最後輸出模式空間,sed默認再輸出模式空間
(2) N命令將下一行追加到模式空間以後,指針會移動到下一行,如上面將第二行追加到模式空間以後會將指針移動到第二行,故下次在讀取就是第三行了
n與N有一個共同點就是:在沒有下一行供n或N進行觸發的時候,不會對這一行數據執行n、N命令
D Delete up to the first embedded newline in the pattern space. Start next cycle, but skip reading from the input if there is still data in the pattern space.
D會刪除模式空間的第一行,該命令會刪除到換行符含換行符在內的全部字符
# 例子以下:
[root@localhost ~]# cat file1
first line
last line
# 假設咱們要刪除緊挨first行前的空白行與後的空白行
[root@localhost ~]# sed -n '/^$/{N; /first/D}; /first/{p; n; d}; p' file1
first line
last line
單行d:刪除指定行以後的空行(先找到指定行,在刪除以後的空白行)
多行D:刪除目標行以前的行,匹配要刪除的行,將後面的一行匹配加入模式空間,而後刪除第一行。須要很是注意的是D命令有一個很是強大的地方(只有D命令存在該狀況)就是 :D命令會強制sed返回腳本的起始處,對同一模式空間的內容從新執行這些命令(不會讀取新的文本行),命令行中加入N就能循環掃過這個模式空間
P Print up to the first embedded newline of the current pattern space. 打印模式空間中的第一行 [root@localhost ~]# cat file1 first line last line [root@localhost ~]# sed -n 'N; p' file1 first line last line [root@localhost ~]# sed -n 'N; P' file1 first line
! 感嘆號命令用來排除(negate),讓尋址匹配到的命令不起做用。 [root@localhost ~]# cat file1 first line secod line third line last line [root@localhost ~]# sed -n 'N; /first/!P' file1 third line
h H Copy/append pattern space to hold space. g G Copy/append hold space to pattern space. x Exchange the contents of the hold and pattern spaces.
下面來兩個例子,練習熟悉一下模式空間的操做
一、模式空間之間的複製與追加
[root@localhost ~]# cat file this is a test line one this is a test line two this is a test line three this is a test line four this is a test line five [root@localhost ~]# sed -n '/two/{h; p; n; p; g; p}' file this is a test line two this is a test line three this is a test line two [root@localhost ~]# sed -n '/two/{h; p; n; H; p; n; p; x; p}' file this is a test line two this is a test line three this is a test line four this is a test line two this is a test line three
二、文本行反轉
[root@localhost ~]# cat file this is a test line one this is a test line two this is a test line three this is a test line four this is a test line five [root@localhost ~]# sed -n '{1!G; h; $p}' file this is a test line five this is a test line four this is a test line three this is a test line two this is a test line one [root@localhost ~]# sed '{1!G; h; $!d}' file this is a test line five this is a test line four this is a test line three this is a test line two this is a test line one
b label Branch to label; if label is omitted, branch to end of script. [address]b lable address決定了那些行的數據會觸發分支命令,lable指定了要跳轉的位置,若是沒有加lable參數,跳轉命令會直接跳轉到script的結尾 : label Label for b and t commands. 標籤以:開始,最多能夠是7個字符長度,放在b或者t、T命令以後會進行跳轉到script的指定位置。 t label If a s/// has done a successful substitution since the last input line was read and since the last t or T command, then branch to label; if label is omitted, branch to end of script. 測試命令也是相似分支命令,不過測試命令是根據替換的結果進行跳轉,若是替換命令成功匹配並替換了一個模式則會跳轉指定標籤,不然不跳轉。 [address]t lable 這兒的address是替換命令,類是b命令,lable指定了要跳轉的位置,若是沒有加lable參數,跳轉命令會直接跳轉到script的結尾 T label If no s/// has done a successful substitution since the last input line was read and since the last t or T command, then branch to label; if label is omitted, branch to end of script. This is a GNU extension. 測試失敗流,除了是替換失敗纔會跳轉,其他與t命令相同
下面來幾個例子
# 第一個b跳轉的時候是須要在b的前面判斷一下的,否則就無限循環了
[root@localhost ~]# echo "this,is,a,test." | sed -n '{
> : start
> s/,/ /1p
> /,/b start}'
this is,a,test.
this is a,test.
this is a test.
# 而使用t跳轉的時候是不用擔憂這個狀況的
[root@localhost ~]# echo "this,is,a,test." | sed -n '{
> : start
> s/,/ /1p
> t start}'
this is,a,test.
this is a,test.
this is a test.
# 測試錯誤流
[root@localhost ~]# echo "this,is,a,test." | sed -n '{
> : success
> s/inexistence//
> T error
> a\replace fail
> : error
> s/,/ /1p
> t success}'
this is,a,test.
this is a,test.
this is a test.
模式替代是s單字符替換命令的一種擴展,先看看s命令中的描述與什麼狀況下須要使用模式替換。
The replacement may contain the special character & to refer to that portion of the pattern space which matched, and
the special escapes \1 through \9 to refer to the corresponding matching sub-expressions in the regexp.
以下:如若讓cat與hat都加上雙引號,模式替換中若是一個單詞一個單詞的替換是能夠,可是若是使用通配符進行匹配是不行的,不過加上模式替換就比較簡單了
[root@localhost ~]# echo "The cat sleeps in his hat" | sed 's/cat/"cat"/g[root@localhost ~]# echo "The cat sleeps in his hat" | sed 's/.at/"&"/g' The "cat" sleeps in his "hat"' The "cat" sleeps in his hat
&能夠用來表明替換命令中匹配的模式,無論模式匹配的是什麼樣子的文本,均可以在替換模式中使用&來代替被匹配的到的內容。
[root@localhost ~]# echo "The cat sleeps in his hat" | sed 's/.at/"&"/g' The "cat" sleeps in his "hat"
sed編輯器使用圓括號來定義一個子模式,而後使用\1~\9來代表子模式的位置,第一個子模式使用\1表示,依次類推。在替換模式中使用圓括號時,必須使用轉義字符將他們標記爲分組字符,而不是普通的圓括號。
來兩個例子:
# 替換cat hat之間的字符串爲in this
[root@localhost ~]# echo "The cat sleeps in his hat" | sed 's/\(cat\).*\(hat\)/\1 in this \2/'
The cat in this hat
# 在子模式之間插入文本
[root@localhost ~]# echo "123456789" | sed '{
> : start
> s/\(.*[0-9]\)\([0-9]\{3\}\)/\1,\2/
> t start
> }'
123,456,789
q [exit-code]
Immediately quit the sed script without processing any more input, except that if auto-print is not disabled the current pattern space will be printed. The exit code argument is a GNU extension.
退出以前會打印模式空間的內容
Q [exit-code]
Immediately quit the sed script without processing any more input. This is a GNU extension.
直接退出
# 顯示數據後十行
[root@localhost ~]# sed '{
> : start
> $q; N; 11,$D
> b start
> }' file
7
8
9
10
11
12
13
14
15
16
# 在識別最後一行的時候退出,模式空間識別是否存在十行數據,若是很少於十行就一直向模式空間添加數據,若是多餘十行,判斷是否到達最後一行,若是沒有到達最後一行,就繼續向模式空間添加數據,同時刪除第一行(D會刪除模式空間的第一行並回到命令的開始)