【原創】shell 操做之 read、cat 和 here document

時間 2019-11-17

標籤原創 shell read cat document 欄目 Unix 简体版

原文原文鏈接

本文主要學習總結一下三方面問題：

經過 read 進行行讀
here document
here document 的應用

【read】

在 linux 下執行 man read 能看到以下內容

read [-ers] [-a aname] [-d delim] [-i text] [-n nchars] [-N nchars] [-p prompt] [-t timeout] [-u fd] [name ...]
One line is read from the standard input, or from the file descriptor fd supplied as an argument to the -u option,
and the first word is assigned to the first name, the second word to the second name, and so on, with leftover words and
their intervening separators assigned to the last name. If there are fewer words read from the input stream than names,
the remaining names are assigned empty values. The characters in IFS are used to split the line into words. The backslash
character (\) may be used to remove any special meaning for the next character read and for line continuation. Options,
if supplied, have the following meanings:
從標準輸入上讀取一行數據，或者從經過 -u 選項指定的文件描述符 fd 上讀取，而且按照順序，將讀取的第一個 word 賦值給第一個 name ，第二個
word 賦值給 name ，以此類推，對於剩餘的 words 和 word 之間的分隔符都被賦值到最後一個 name 上。若是從輸入流上讀取的 word 的數量少於給出
的 name 數量，則多出來的 names 將被自動賦值爲空值。IFS 中所包含的字符用於將整行字符拆分紅單獨的 word （換句話說也就是將 IFS 中包含的字符
從行數據中去除，IFS 中默認包含的字符爲空格、製表符和回車）。反斜線字符（\）能夠用於移除緊隨其後讀到的字符的任何特殊含義，還可用於行接續。
若提供了選項，則具備以下意義：
-a aname
The words are assigned to sequential indices of the array variable aname, starting at 0. aname is unset before
any new values are assigned. Other name arguments are ignored.

-d delim
The first character of delim is used to terminate the input line, rather than newline.
delim 的首字符被用於做爲輸入行數據的終止符，而不是換行符。
-e
If the standard input is coming from a terminal, readline (see READLINE above) is used to obtain the line.
Readline uses the current (or default, if line editing was not previously active) editing settings.
-i text
If readline is being used to read the line, text is placed into the editing buffer before editing begins.
-n nchars
read returns after reading nchars characters rather than waiting for a complete line of input, but honor a delimiter
if fewer than nchars characters are read before the delimiter.
-N nchars
read returns after reading exactly nchars characters rather than waiting for a complete line of input, unless EOF
is encountered or read times out.
Delimiter characters encountered in the input are not treated specially and do not cause read to return until
nchars characters are read.
-p prompt
Display prompt on standard error, without a trailing newline, before attempting to read any input. The prompt is
displayed only if input is coming from a terminal.
在開始讀取任何輸入前，向標準出錯上顯示提示信息，而且不帶尾部換行符。該提示信息僅在輸入數據來自終端的時候才被顯示。
-r Backslash does not act as an escape character. The backslash is considered to be part of the line.
In particular, a backslash-newline pair may not be used as a line continuation.
反斜線不做爲轉義字符起做用。反斜線被當作行數據的一部分。
特別值得注意的是，反斜線-換行 組合將不能做爲行接續來使用。
-s Silent mode. If input is coming from a terminal, characters are not echoed.
安靜模式。若是輸入來自終端，字符將不會被 echo 。
-t timeout
Cause read to time out and return failure if a complete line of input is not read within timeout seconds. timeout
may be a decimal number with a fractional portion following the decimal point. This option is only effective if read
is reading input from a terminal, pipe, or other special file; it has no effect when reading from regular files. If
timeout is 0, read returns success if input is available on the specified file descriptor, failure otherwise. The exit
status is greater than 128 if the timeout is exceeded.
-u fd
Read input from file descriptor fd.
從文件 fd 讀取輸入數據。

If no names are supplied, the line read is assigned to the variable REPLY. The return code is zero, unless end-of-file
is encountered, read times out (in which case the return code is greater than 128), or an invalid file descriptor is supplied
as the argument to -u.
若是沒有 name 變量被指定，所讀取的行數據將被賦值給變量 REPLY 。除非遇到了文件結束符（EOF），或者發生讀取超時（此時返回值將大於 128），或者
經過 -u 指定了無效的文件描述符，其餘狀況返回值均爲 0 。

【read 測試】 html

測試文件以下

[root@Betty Shell]# vi file

 -module( unique_name_test      )    . 
-compile(export_all). 

%% @spec (Nibble::integer()) -> char() 
%% @doc Returns the character code corresponding to Nibble. 
%% 
%% Nibble must be >=0 and =&lt;16. 
hex_digit(0) -> $0; 
hex_digit(1) -> $1; 
hex_digit(2) -> $2; 
hex_digit(3) -> $3; 
hex_digit(4) -> $4; 
hex_digit(5) -> $5; 
hex_digit(6) -> $6; 
hex_digit(7) -> $7; 
hex_digit(8) -> $8; 
hex_digit(9) -> $9; 
hex_digit(10) -> $A; 
hex_digit(11) -> $B; 
hex_digit(12) -> $C; 
hex_digit(13) -> $D; 
hex_digit(14) -> $E; 
hex_digit(15) -> $F.

測試一：讀取文件的首行並賦值給變量

[root@Betty Shell]# read -r line < file      
[root@Betty Shell]# echo $line          
-module( unique_name_test ) .

這一行命令用到了 Bash 的內置命令 read，和輸入重定向操做符 < 。read 命令從標準輸入中讀取一行，並將內容保存到變量 line 中。在這裏，-r 選項保證讀入的內容是原始的內容，意味着反斜槓轉義的行爲不會發生。輸入重定向操做符 < file 打開並讀取文件 file ，而後將它做爲 read 命令的標準輸入。

記住，read 命令會刪除包含在 IFS 變量中出現的全部字符（這個說法彷佛不夠準確），IFS 的全稱是 Internal Field Separator，Bash 根據 IFS 中定義的字符來分隔單詞。在這裏，read 命令讀入的行被分隔成多個單詞。默認狀況下，IFS 包含空格，製表符和回車，這意味着開頭和結尾的空格和製表符都會被刪除。若是你想保留這些符號，能夠經過設置 IFS 爲空來完成：

[root@Betty Shell]# IFS= read -r line < file 
[root@Betty Shell]# echo $line               
-module( unique_name_test ) .

IFS 的變化僅會影響當前的命令，這行命令能夠保證讀入原始的首行內容到變量 line 中，同時行首與行尾的空白字符被保留。

測試二：依次讀入文件每一行

[root@Betty Shell]# while read -r line; do
> echo "test $line";
> done < file
test -module( unique_name_test  )    .
test
test -compile(export_all).
test
test
test %% @spec (Nibble::integer()) -> char()
test %% @doc Returns the character code corresponding to Nibble.
test %%
test %% Nibble must be >=0 and =&lt;16.
test hex_digit(0) -> $0;
test hex_digit(1) -> $1;
test hex_digit(2) -> $2;
test hex_digit(3) -> $3;
test hex_digit(4) -> $4;
test hex_digit(5) -> $5;
test hex_digit(6) -> $6;
test hex_digit(7) -> $7;
test hex_digit(8) -> $8;
test hex_digit(9) -> $9;
test hex_digit(10) -> $A;
test hex_digit(11) -> $B;
test hex_digit(12) -> $C;
test hex_digit(13) -> $D;
test hex_digit(14) -> $E;
test hex_digit(15) -> $F.
test
test
[root@Betty Shell]#

這是一種正確的讀取文件內容的作法，read 命令放在 while 循環中。當 read 命令遇到文件結尾時（EOF），它會返回一個正值，致使循環判斷失敗終止。

=== 我是火影終結的分隔線 ===

關於 read 命令遇到文件結尾返回一個正值的結論，以前我一直持懷疑態度。由於常常會遇到這樣的用法：

while read -r line; do echo $line; done < file

而按照常規編程思惟，while 循環的斷定條件應該是不爲 0 則循環，彷佛這裏就出現了矛盾。
因而進行以下實驗進行驗證

[root@Betty workspace]# touch abc.txt
[root@Betty workspace]# cat abc.txt 
[root@Betty workspace]# read -r line < abc.txt  
[root@Betty workspace]# echo $?
1
[root@Betty workspace]# echo "1" >> abc.txt
[root@Betty workspace]# cat abc.txt            
1
[root@Betty workspace]# read -r line < abc.txt 
[root@Betty workspace]# echo $?
0

結果證實，read 讀到文件結束時確實返回 1 ，而讀到內容時返回 0 。
最後再確認一下 while 的斷定規則

while list; do list; done
    The while command continuously executes the do list as long as the last command in list returns an exit status 
of zero. The exit status of the while commands is the exit status of the last do list command executed, or zero if 
none was executed.

哈哈，套用工藤新一的話「真相只有一個」~~

=== 我是火影終結的分隔線 ===

記住，read 命令會刪除首尾多餘的空白字符，因此若是你想保留，請設置 IFS 爲空值:

[root@Betty Shell]# while IFS= read -r line; do  
> echo "test $line"; 
> done < file 
test  -module( unique_name_test )    .    
test  
test -compile(export_all). 
test  
test  
test %% @spec (Nibble::integer()) -> char() 
test %% @doc Returns the character code corresponding to Nibble. 
test %% 
test %% Nibble must be >=0 and =&lt;16. 
test hex_digit(0) -> $0; 
test hex_digit(1) -> $1; 
test hex_digit(2) -> $2; 
test hex_digit(3) -> $3; 
test hex_digit(4) -> $4; 
test hex_digit(5) -> $5; 
test hex_digit(6) -> $6; 
test hex_digit(7) -> $7; 
test hex_digit(8) -> $8; 
test hex_digit(9) -> $9; 
test hex_digit(10) -> $A; 
test hex_digit(11) -> $B; 
test hex_digit(12) -> $C; 
test hex_digit(13) -> $D; 
test hex_digit(14) -> $E; 
test hex_digit(15) -> $F. 
test  
test  
[root@Betty Shell]#

從上面能夠看出 < file 永遠是放在最後的，若是你不想將 < file 放在最後，能夠經過管道將文件的內容輸入到 while 循環中：

[root@Betty Shell]# cat file | while IFS= read -r line; do  
> echo "test $line"; 
> done 
test  -module( unique_name_test )    .    
test  
test -compile(export_all). 
test  
test  
test %% @spec (Nibble::integer()) -> char() 
test %% @doc Returns the character code corresponding to Nibble. 
test %% 
test %% Nibble must be >=0 and =&lt;16. 
test hex_digit(0) -> $0; 
test hex_digit(1) -> $1; 
test hex_digit(2) -> $2; 
test hex_digit(3) -> $3; 
test hex_digit(4) -> $4; 
test hex_digit(5) -> $5; 
test hex_digit(6) -> $6; 
test hex_digit(7) -> $7; 
test hex_digit(8) -> $8; 
test hex_digit(9) -> $9; 
test hex_digit(10) -> $A; 
test hex_digit(11) -> $B; 
test hex_digit(12) -> $C; 
test hex_digit(13) -> $D; 
test hex_digit(14) -> $E; 
test hex_digit(15) -> $F. 
test  
test  
[root@Betty Shell]#

測試三：讀取文件首行前三個字段並賦值給變量

[root@Betty Shell]# head -1 file | while read -r field1 field2 field3 throwaway; do echo "filed1 = $field1";echo "field2 = $field2";echo "field3 = $field3"; done    
filed1 = -module( 
field2 = unique_name_test 
field3 = )

若是在 read 命令中指定多個變量名，它會將讀入的內容分隔成多個字段，而後依次賦值給對應的變量，第一個字段賦值給第一個變量，第二個字段賦值給第二個變量，等等，最後將剩餘的全部字段賦值給最後一個變量。這也是爲何，在上面的例子中，咱們加了一個 throwaway 變量，不然的話，當文件的一行大於三個字段時，第三個變量的內容會包含全部剩餘的字段。
有時候，爲了書寫方便，能夠簡單地用 _ 來替換 throwaway 變量：

[root@Betty Shell]# head -1 file | while read -r field1 field2 field3 _; do echo "filed1 = $field1";echo "field2 = $field2";echo "field3 = $field3"; done 
filed1 = -module( 
field2 = unique_name_test 
field3 = )

又或者，若是你的文件確實只有三個字段，能夠忽略 _ 。

【cat 與 <<】

cat 命令是 linux 下的一個文本輸出命令，一般是用於觀看某個文件的內容的；命令 cat >file 能夠用於將鍵盤上的輸入寫到文件中。
EOF 爲 "end of file" 的縮寫，從語義上表明文本的結束符。
經過 cat <<EOF 將二者結合使用（EOF 和 << 中間是否有空格沒有關係），便可避免使用多行 echo 命令的方式，並實現多行輸出的結果。原則上講，此處的 EOF 可使用任何其餘字符替代。

測試 - 1

cat > test.cfg <<EOF 
log_facility=daemon 
pid_file=/var/run/nrpe.pid 
EOF

測試 - 2

cat > test.cfg <<ABC 
log_facility=daemon 
pid_file=/var/run/nrpe.pid 
ABC

測試 - 3

cat <<ABC > test.cfg 
log_facility=daemon 
pid_file=/var/run/nrpe.pid 
ABC

【Here document】

有寫書籍將 here document 翻譯爲內嵌文檔。
here document 的別名有 here-document 、heredoc 、hereis 、 here-string 或 here-script 。
here document 本來指 file literal 或者 input stream literal ；後來也指 multiline string literals 。
here document 會保留 text 中的 line break 和其餘 whitespace (including indentation) 的含義。
here document 起始於 Unix shell ，後在各類其餘 shell 中被使用。
here document 風格的字符串在不少高級語言中存在，尤爲是 Perl 語言和其餘受 Perl 影響的語言，如 PHP 和 Ruby 。
對於 here document 而言，不管指代的是文件仍是字符串，一些語言都將其看作格式化字符串，並容許在其內部進行變量替換和命令替代。
here document 的最通用語法起源於 Unix shell ，使用 "<< delimiter" 的形式（delimiter 一般爲 EOF 或 END）標識多行字串的開始，以後新起一行包含相應的文本，最後以相同的 delimiter 獨佔一行標識多行字串的結束。這種語法形式是由於 here documents 主要用於 stream literal ，且 document 的內容被重定向到前面 command 的標準輸入，即 here document 的語法模擬了輸入重定向的語法，也就是 < 所表示的「從後續 command 的輸出獲取輸入」。
其它語言一般使用了很是類似的語法，可是語法的細節和實際的功能可能很是的不一樣。

Unix shell 中的應用

In the following example, text is passed to the tr command using a here document. This could be in a shell file, or entered interactively at a prompt.

[root@Betty Shell]# tr a-z A-Z << END_TEXT 
> one two three 
> uno dos tres 
> END_TEXT

This yields the output:

ONE TWO THREE 
UNO DOS TRES

此處使用 <<END_TEXT 或 << END_TEXT 都正確。

在 << 後添加 - 符號的做用是能夠忽略掉前置的 tab 。這將容許在 shell 命令行上直接對包含縮進的 here document 進行操做，而不用變動腳本的內容。
注意：要想在 shell 命令行上輸入 TAB ，須要連續輸入 CTRL-V 、TAB 才行。

[root@Betty Shell]# tr a-z A-Z <<- END_TEXT 
>(Ctrl-V + TAB)one two three 
>(Ctrl-V + TAB)uno dos tres 
>(Ctrl-V + TAB)END_TEXT

This yields the same output, notably not indented:

ONE TWO THREE 
UNO DOS TRES

（
補充測試：

[root@Betty Shell]# tr a-z A-Z << END_TEXT  
>(Ctrl-V + TAB)one two three 
>(Ctrl-V + TAB)uno dos tres 
>(Ctrl-V + TAB)END_TEXT 
> END_TEXT

將獲得

(Ctrl-V + TAB)ONE TWO THREE 
(Ctrl-V + TAB)UNO DOS TRES 
(Ctrl-V + TAB)END_TEXT

）
在默認狀況下，變量將被內插替換，包含在 `` 中的命令將被求值。
backtick 即傳說中的反引號。

[root@Betty Shell]# cat << EOF 
> Working dir $PWD 
> EOF

yields:

Working dir /root/workspace/CODE_TEST/Shell

上述行爲能夠經過引號引用標籤的任何部分進行取消。例如，將 EOF 使用單引號或者雙引號進行包含：

[root@Betty Shell]# cat << "EOF" 
> Working dir $PWD 
> EOF

yields:

Working dir $PWD

（
補充測試：

[root@Betty Shell]# cat << "E"OF 
Working dir $PWD 
EOF

將獲得

Working dir $PWD

）
（後面還有對 here string 的介紹，此處略過）

man 手冊上的說明

Here Documents
This type of redirection instructs the shell to read input from the current source until a line containing only delimiter (with no trailing blanks) is seen. All of the lines read up to that point are then used as the standard input for a command.
這種類型的重定向會使得 shell 從當前源讀取輸入，直到遇到僅包含 delimiter 的行（尾部沒有任何空白符）。此時讀取到的所有行將被做爲 command 的標準輸入。

The format of here-documents is:

<<[-]word
here-document
delimiter

No parameter expansion, command substitution, arithmetic expansion, or pathname expansion is performed on word. If any characters in word are quoted, the delimiter is the result of quote removal on word, and the lines in the here-document are not expanded. If word is unquoted, all lines of the here-document are subjected to parameter expansion, command substitution, and arithmetic expansion. In the latter case, the character sequence \<newline> is ignored, and \ must be used to quote the characters \, $, and `.
針對 word 不會執行任何參數擴展、命令替代、算數擴展，或路徑擴展。若是 word 中有任何字符是被引號括起來的，那麼 delimiter 將是 word 移除引用部分後的內容，此時位於 here-document 中的行將不會被擴展，若是 word 沒有被引號括起來，here-document 中的全部行都要受到參數擴展、命令替換和算數擴展的影響。在後者的狀況下，字符序列 \<newline> 會被忽略，而且只要存在 \，$ 和 ` 字符都要使用 \ 進行轉義（若是你確實打算輸出未被轉義的字符）。

If the redirection operator is <<-, then all leading tab characters are stripped from input lines and the line containing delimiter. This allows here-documents within shell scripts to be indented in a natural fashion.
若是重定向操做符爲 <<- ，那麼全部前置 tab 字符都將被從輸入的行數據和僅包含 delimiter 的行中移除。這將使得 here-document 用於 shell 腳本時可以以天然方式進行縮進。

Here Strings
A variant of here documents, the format is:

<<<word

The word is expanded and supplied to the command on its standard input.

=== 我是 7 代的分隔線 ===

stackoverflow 上的討論

在 stackoverflow 上有以下針對 bash 中使用 cat << EOF 的討論。

bash 語法 cat <<EOF 在你遇到 Bash 上使用多行字符串的時候是很是有用的，例如，當傳遞多行字串到一個變量、文件，或者管道中的狀況。
例子一：將多行字符串傳遞給一個變量 （原文中的測試此處被我增強了）

[root@Betty Shell]# sql=$(cat <<EOF 
> SELECT foo, bar FROM db 
> WHERE foo='baz' 
> EOF 
> ) 
[root@Betty Shell]#  
[root@Betty Shell]# echo $sql     
SELECT foo, bar FROM db WHERE foo='baz' 
[root@Betty Shell]#  
[root@Betty Shell]# echo -e $sql                     -e     enable interpretation of backslash escapes 
SELECT foo, bar FROM db WHERE foo='baz' 
[root@Betty Shell]#  
[root@Betty Shell]# echo -E $sql                     -E     disable interpretation of backslash escapes (default) 
SELECT foo, bar FROM db WHERE foo='baz' 
[root@Betty Shell]#  
[root@Betty Shell]# echo "$sql" 
SELECT foo, bar FROM db 
WHERE foo='baz' 
[root@Betty Shell]#  
[root@Betty Shell]# echo -e "$sql"                   -e     enable interpretation of backslash escapes 
SELECT foo, bar FROM db 
WHERE foo='baz' 
[root@Betty Shell]#  
[root@Betty Shell]# echo -E "$sql"                   -E     disable interpretation of backslash escapes (default) 
SELECT foo, bar FROM db 
WHERE foo='baz' 
[root@Betty Shell]#

執行後，$sql 變量中將會包含帶換行符的字串，你能夠經過 echo -e "$sql" 命令進行查看。
（上面的結論和我本身的實驗結果有出入，按照原文的說法，shell 命令輸入時是帶有 \n 字符的，而且只有在 -e 選項下能被解析，然而實驗結果代表，只要將 $sql 用雙引號括起來，結果必定是帶有換行符的；而不用雙引號括起來的 $sql 則被顯示爲單行。這裏引出一個問題，"$sql" 和 $sql 在 shell 中的區別是什麼？）

例子二：將多行字符串傳遞給一個文件

[root@Betty Shell]# cat <<EOF > print.sh 
> #!/bin/bash 
> echo \$PWD 
> echo $PWD 
> EOF

The print.sh file now contains:

[root@Betty Shell]# cat print.sh  
#!/bin/bash 
echo $PWD                              -- 未被命令替換 
echo /root/workspace/CODE_TEST/Shell   -- 被命令替換

例子三：將多行字符串傳遞給一個 command/pipe

[root@Betty Shell]# cat <<EOF | grep 'b' | tee b.txt | grep 'r' 
> foo 
> bar 
> baz 
> EOF 
bar

上述命令只將 bar 打印到標準輸出，但會建立 b.txt 文件，其中包含了 bar 和 baz 兩行字符串。

===

在上面的例子中，"EOF" 被用做 "Here Tag" 。簡單的講，"<< Here" 的做用就在於告訴 shell 下面將開始一段多行字符串輸入，而且該多行字符串以 "Here" 做爲終止。你也能夠將 "Here" 替換成任何你想要的內容，但一般會使用 EOF 或者 STOP 。

一些關於 Here 標籤的規則:

標籤能夠是任何字符串，大小寫字母都可，但一般人們習慣使用大寫字母。
若（尾部）標籤所在行還存在其餘字符，則該標籤將不做爲標籤起做用。在這種狀況下，該標籤將僅做爲字符串的一部分。標籤必須自身獨佔一行纔會被斷定爲正確的標籤。
標籤所在行中的標籤不該該具備前置或後置的空白符，只有這樣才被斷定爲正確的標籤。不然會被認爲是字符串的一部分。

[root@Betty Shell]# cat >> test <<HERE 
> Hello world HERE               <--- Not the end of string 
>   HERE                         <-- Leading space, so not end of string 
> HERE                           <-- Now we have the end of the string 
[root@Betty Shell]# cat test  
Hello world HERE 
  HERE

參考

1. Linux man 手冊
2. bash read 背後的故事二：read -r
3. 關於 cat > file 和 cat > file <<EOF
4. Here document
5. how does ` cat << EOF` work in bash?

linux

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。