gawk

時間 2019-12-11

標籤 gawk 简体版

原文原文鏈接

gawk正則表達式

雖然sed編輯器是很是方便自動修改文本文件的工具，但其也有自身的限制。一般你須要一個用來處理文件中的數據的更高級工具，它能提供一個類編程環境來修改和從新組織文件中的數據。這正是gawk可以作到的。

gawk程序是Unix中的原始awk程序的GNU版本。gawk程序讓流編輯邁上了一個新的臺階，它提供了一種編程語言而不僅是編輯器命令。在gawk編程語言中，你能夠作下面的事情：
 定義變量來保存數據；
 使用算術和字符串操做符來處理數據；
 使用結構化編程概念（好比if-then語句和循環）來爲數據處理增長處理邏輯；
 經過提取數據文件中的數據元素，將其從新排列或格式化，生成格式化報告。

gawk命令格式
gawk程序的基本格式以下：
gawk程序腳本用一對花括號來定義。
因爲gawk命令行假定腳本是單個文本字符串，你還必須將腳本放到單引號中。
gawk options program file
gawk '{print "Hello World!"}'

-F fs 指定行中劃分數據字段的字段分隔符
-f file 從指定的文件中讀取程序
-v var=value 定義gawk程序中的一個變量及其默認值
-mf N 指定要處理的數據文件中的最大字段數
-mr N 指定數據文件中的最大數據行數
-W keyword 指定gawk的兼容模式或警告等級

使用數據字段變量

使用數據字段變量
gawk的主要特性之一是其處理文本文件中數據的能力。它會自動給一行中的每一個數據元素分配一個變量。默認狀況下，gawk會將以下變量分配給它在文本行中發現的數據字段：
 $0表明整個文本行；
 $1表明文本行中的第1個數據字段；
 $2表明文本行中的第2個數據字段；
 $n表明文本行中的第n個數據字段。

gawk -F: '{print $1}' /etc/passwd
#指定分割符爲： 而後過濾第一列字符

在程序腳本中使用多個命令:shell

[root@localhost advanced_shell_script]# echo "my name is tom" |gawk '{$4="robin";print $0}' 
　my name is robin
#多個命令用； 分隔，$0 表明整個字符串。定義的字符 robin 要用""

從文件中讀取程序編程

[root@localhost gawk]# cat test1.gawk 
{print $1 "'s home directory is" $6}
[root@localhost gawk]# gawk -F: -f test1.gawk /etc/passwd
root's home directory is/root
bin's home directory is/bin
daemon's home directory is/sbin
adm's home directory is/var/adm

多行命令數組

[root@localhost gawk]# cat test1.gawk 
{print $1 "'s home directory is" $6}
{print $1}
[root@localhost gawk]# gawk -F: -f test1.gawk /etc/passwd
root's home directory is/root
root
bin's home directory is/bin
bin

或者bash

[root@localhost gawk]# cat test1.gawk 
text="'s home directory is"
{print $1  text $6}
[root@localhost gawk]# gawk -F: -f test1.gawk /etc/passwd
root:x:0:0:root:/root:/bin/bash
root's home directory is/root

在處理數據前運行腳本編程語言

[root@localhost gawk]# gawk 'BEGIN {print "The data file contents"}{print $0}' /etc/fstab 
The data file contents 
#
# /etc/fstab
# Created by anaconda on Tue Feb 26 18:42:10 2019
#
# Accessible filesystems, by reference, are maintained under '/dev/disk'

在處理數據後運行腳本編輯器

[root@localhost gawk]# gawk 'BEGIN {print "hehe"} 
> {print $0}
> END {print "haha"}' data1
hehe
test text
haha
[root@localhost gawk]#

使用模式

正則表達式函數

 gawk 'BEGIN{FS=","} /11/{print $1}' data1

匹配操做符工具

匹配操做符（matching operator）容許將正則表達式限定在記錄中的特定數據字段。匹配操做符是波浪線（~）。能夠指定匹配操做符、數據字段變量以及要匹配的正則表達式。

[root@localhost gawk]# gawk 'BEGIN{FS=" "}$1 ~ /^123/{print $1}' data1
123

[root@localhost gawk]# gawk -F: '$1 ~ /^zhengyue/{print $1,$NF}' /etc/passwd
zhengyue /bin/bash
[root@localhost gawk]#

數學表達式大數據

可使用任何常見的數學比較表達式。
 x == y：值x等於y。
 x <= y：值x小於等於y。
 x < y：值x小於y。
 x >= y：值x大於等於y。
 x > y：值x大於y。
也能夠對文本數據使用表達式，但必須當心。跟正則表達式不一樣，表達式必須徹底匹配。數據必須跟模式嚴格匹配。

查看passwd 文件中屬組是root 的用戶

[root@localhost gawk]# gawk -F: '$4 == 0{print $0}' /etc/passwd
root:x:0:0:root:/root:/bin/bash
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
operator:x:11:0:operator:/root:/sbin/nologin
[root@localhost gawk]#

gawk 結構化命令

　　if 語句

if (condition) 
 statement1

[root@localhost gawk]# cat data3
10
20
30
[root@localhost gawk]# gawk '{
if ($1>10)
{
x = $1 * 2
print x
}
}' data3
40
60
[root@localhost gawk]#

[root@localhost gawk]# cat data3
10
20
30
[root@localhost gawk]# gawk '{
if ($1>10)
{
x = $1 * 2
print x
} else
{
x = $1 / 2
print x
}}' data3
5
40
60
[root@localhost gawk]#

while 循環語法

[root@localhost gawk]# cat data4
130 120 135 
160 113 140 
145 170 215
[root@localhost gawk]# gawk '{
total = 0
i = 1
while (i < 4)
{
total += $i
i++
}
avg = total / 3
print "Average:", avg
}' data4
Average: 128.333
Average: 137.667
Average: 176.667
[root@localhost gawk]#

while語句會遍歷記錄中的數據字段，將每一個值都加到total變量上，並將計數器變量i增值。

當計數器值等於4時，while的條件變成了FALSE，循環結束，而後執行腳本中的下一條語句。

這條語句會計算並打印出平均值。這個過程會在數據文件中的每條記錄上不斷重複。

do-while 語句

do-while語句相似於while語句，但會在檢查條件語句以前執行命令。下面是do-while語句的格式。
do 
{ 
 statements 
} while (condition)

[root@localhost gawk]# cat data4
130 120 135 
160 113 140 
145 170 215
[root@localhost gawk]# gawk '{         #正常是while ,,,do,,,done 這個流程， do while 理解就是先執行一遍循環體的內容，而後進行判斷。符合就繼續執行do ，不符合就執行後續的命令， 
total = 0　　　　　　　　　　　　　　　　　　#這裏的 print total 是在循環體外執行的， 把 do while 當作 while do 而後先執行 do 就好理解了
i = 1
do
{
total += $i
i++
} while (total < 150)
print total }' data4
250
160
315
[root@localhost gawk]#

for 語句

for語句是許多編程語言執行循環的常見方法。gawk編程語言支持C風格的for循環。

for( variable assignment; condition; iteration process) 
將多個功能合併到一個語句有助於簡化循環。

[root@localhost gawk]# cat data4
130 120 135 
160 113 140 
145 170 215
[root@localhost gawk]#  gawk '{ 
 total = 0 
 for (i = 1; i < 4; i++) 
 { 
 total += $i 
 } 
 avg = total / 3 
 print "Average:",avg 
 }' data4
Average: 128.333
Average: 137.667
Average: 176.667
[root@localhost gawk]#

內建函數

數學函數

gawk             數學函數
函 數            描 述
atan2(x, y)       x/y的反正切，x和y以弧度爲單位
cos(x)            x的餘弦，x以弧度爲單位
exp(x)            x的指數函數
int(x)            x的整數部分，取靠近零一側的值
log(x)            x的天然對數
rand( )           比0大比1小的隨機浮點值
sin(x)            x的正弦，x以弧度爲單位
sqrt(x)           x的平方根
srand(x)          爲計算隨機數指定一個種子值

字符串函數

gawk                  字符串函數
函 數               描 述
asort(s [,d])      將數組s按數據元素值排序。索引值會被替換成表示新的排序順序的連續數字。另外若是指定了d，則排序後的數組會存儲在數組d中
asorti(s [,d])     將數組s按索引值排序。生成的數組會將索引值做爲數據元素值，用連續數字索引來明排序順序。另外若是指定了d，排序後的數組會存儲在數組d中
gensub(r, s, h [, t]) 查找變量$0或目標字符串t（若是提供了的話）來匹配正則表達式r。若是h是一個以g或G開頭的字符串，就用s替換掉匹配的文本。若是h是一個數字，它表示要替換掉第處r匹配的地方
gsub(r, s [,t])    查找變量$0或目標字符串t（若是提供了的話）來匹配正則表達式r。若是找到了，就所有替換成字符串s
index(s, t)       返回字符串t在字符串s中的索引值，若是沒找到的話返回0
length([s])       返回字符串s的長度；若是沒有指定的話，返回$0的長度
match(s, r [,a])  返回字符串s中正則表達式r出現位置的索引。若是指定了數組a，它會存儲s中匹配則表達式的那部分22.6 內建函數 489 

split(s, a [,r])  將s用FS字符或正則表達式r（若是指定了的話）分開放到數組a中。返回字段的總數sprintf(format, variables) 用提供的format和variables返回一個相似於printf輸出的字符串
sub(r, s [,t])    在變量$0或目標字符串t中查找正則表達式r的匹配。若是找到了，就用字符串s替換掉第一處匹配
substr(s, i [,n])  返回s中從索引值i開始的n個字符組成的子字符串。若是未提供n，則返回s剩下的部分
tolower(s)        將s中的全部字符轉換成小寫
toupper(s)        將s中的全部字符轉換成大寫

[root@localhost gawk]# gawk 'BEGIN{x = "testing"; print toupper(x);print length(x)}'   #經常使用方式
TESTING
7

時間函數

gawk 的時間函數
函 數                              描 述
mktime(datespec)              將一個按YYYY MM DD HH MM SS [DST]格式指定的日期轉換成時間戳值①
strftime(format [,timestamp]) 將當前時間的時間戳或timestamp（若是提供了的話）轉化格式化日期（採用shell函數date()的格式）
systime( )                   返回當前時間的時間戳

[root@localhost gawk]# gawk 'BEGIN{
date = systime()
print date
day = strftime("%A , %B ,%d , %Y",date)
print day
}'
1555921121
星期一 , 四月 ,22 , 2019
[root@localhost gawk]#

相關標籤/搜索

gawk

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。