這是第一次動手翻譯一篇外文,看懂和翻懂是不同的,你所見到的是 v3.0 版本…node
感謝 依雲 信雅達
的科普和滿滿的批註,還有依雲和傳奇老師的最後的校訂,以及,H 老師的文章分享~python
若是你發現本文有任何一處翻譯不當的,歡迎指教,感謝感謝(///▽///)git
你所知的最簡單的 Unix 命令是什麼呢?github
有echo
命令,用於將字符串打印到標準輸出流,並以 o 爲結束的命令。bash
在成堆的簡單 Unix 命令中,也有 yes
命令。若是你不帶參數地運行yes
命令,你會獲得一串無盡的被換行符分隔開的 y 字符流:app
y
y
y
y
(...你明白了吧)
複製代碼
一開始看似無心義的東西原來它是很是的有用:less
yes | sh 糟心的安裝.sh
複製代碼
你曾經有安裝一個程序,須要你輸入「y」並按下回車繼續安裝的經歷嗎?yes
命令就是你的救星。它會很好地履行安裝程序繼續執行的義務,而你能夠繼續觀看 Pootie Tang.(一部歌舞喜劇)。ide
emmm,這是 BASIC 編寫 ‘yes’的一個基礎版本:工具
10 PRINT "y"
20 GOTO 10
複製代碼
下面這個是用 Python 實現的編寫 ‘yes’:
while True:
print("y")
複製代碼
看似很簡單?不,執行速度沒那麼快! 事實證實,這個程序執行的速度很是慢。
python yes.py | pv -r > /dev/null
[4.17MiB/s]
複製代碼
和我 Mac 自帶的版本執行速度相比:
yes | pv -r > /dev/null
[34.2MiB/s]
複製代碼
因此我從新寫了一個執行速度更快的的 Rust 版本,這是個人第一次嘗試:
use std::env;
fn main() {
let expletive = env::args().nth(1).unwrap_or("y".into());
loop {
println!("{}", expletive);
}
}
複製代碼
解釋一下:
expletive
字符串是第一個命令行的參數。expletive
這個詞是我在yes
書冊裏學會的;unwrap_or
給expletive
傳參,爲了防止參數沒有初始化,咱們將yes
做爲默認值into()
方法將默認參數將從單個字符串轉換爲堆上的字符串來,咱們測試下效果:
cargo run --release | pv -r > /dev/null
Compiling yes v0.1.0
Finished release [optimized] target(s) in 1.0 secs
Running `target/release/yes`
[2.35MiB/s]
複製代碼
emmm,速度上看上去並無多大提高,它甚至比 Python 版本的運行速度更慢。這結果讓我意外,因而我決定分析下用 C 實現的寫入‘yes’程序的源代碼。
這是 C 語言的第一個版本 ,這是 Ken Thompson 在 1979 年 1 月 10 日 Unix 第七版裏的 C 實現的編寫‘yes’程序:
main(argc, argv)
char **argv;
{
for (;;)
printf("%s\n", argc>1? argv[1]: "y");
}
複製代碼
這裏沒有魔法。
將它同 GitHub 上鏡像的 GNU coreutils 的 128 行代碼版 相比較,即便 25 年過去了,它依舊在發展更新。上一次的代碼變更是在一年前,如今它執行速度快多啦:
# brew install coreutils
gyes | pv -r > /dev/null
[854MiB/s]
複製代碼
最後,重頭戲來了:
/* Repeatedly output the buffer until there is a write error; then fail. */
while (full_write (STDOUT_FILENO, buf, bufused) == bufused)
continue;
複製代碼
wow,讓寫入速度更快他們只是用了一個緩衝區。 常量BUFSIZ
用來代表這個緩衝區的大小,根據不一樣的操做系統會選擇不一樣的緩衝區大小【寫入/讀取】操做高效(延伸閱讀傳送門 。個人系統的緩衝區大小是 1024 個字節,事實上,我用 8192 個字節能更高效。
好,來看看我改進的 Rust 新版本:
use std::io::{self, Write};
const BUFSIZE: usize = 8192;
fn main() {
let expletive = env::args().nth(1).unwrap_or("y".into());
let mut writer = BufWriter::with_capacity(BUFSIZE, io::stdout());
loop {
writeln!(writer, "{}", expletive).unwrap();
}
}
複製代碼
最關鍵的一點是,緩衝區的大小要是 4 的倍數以確保內存對齊 。
如今運行速度是 51.3MiB/s ,比我係統默認的版本執行速度快多了,但仍然比 Ken Thompson 在 [高效的輸入輸出] (https://www.gnu.org/software/libc/manual/html_node/Controlling-Buffering.html) 文中說的 10.2GiB/s 慢。
再一次,Rust 社區沒讓我失望。
這篇文章剛發佈到 Reddit 的 Rust 板塊, Reddit 的用戶 nwydo 就提到了以前關於速率問題的討論 。這個是先前討論人員的優化代碼,它打破了我機子的 3GB/s 的速度:
use std::env;
use std::io::{self, Write};
use std::process;
use std::borrow::Cow;
use std::ffi::OsString;
pub const BUFFER_CAPACITY: usize = 64 * 1024;
pub fn to_bytes(os_str: OsString) -> Vec<u8> {
use std::os::unix::ffi::OsStringExt;
os_str.into_vec()
}
fn fill_up_buffer<'a>(buffer: &'a mut [u8], output: &'a [u8]) -> &'a [u8] {
if output.len() > buffer.len() / 2 {
return output;
}
let mut buffer_size = output.len();
buffer[..buffer_size].clone_from_slice(output);
while buffer_size < buffer.len() / 2 {
let (left, right) = buffer.split_at_mut(buffer_size);
right[..buffer_size].clone_from_slice(left);
buffer_size *= 2;
}
&buffer[..buffer_size]
}
fn write(output: &[u8]) {
let stdout = io::stdout();
let mut locked = stdout.lock();
let mut buffer = [0u8; BUFFER_CAPACITY];
let filled = fill_up_buffer(&mut buffer, output);
while locked.write_all(filled).is_ok() {}
}
fn main() {
write(&env::args_os().nth(1).map(to_bytes).map_or(
Cow::Borrowed(
&b"y\n"[..],
),
|mut arg| {
arg.push(b'\n');
Cow::Owned(arg)
},
));
process::exit(1);
}
複製代碼
一個新的實現方式!
我惟一能作的事情就是 刪除一個沒必要要的 mut 。
看似簡單的 yes 程序其實沒那麼簡單,它用了一個輸出緩衝和內存對齊形式去提升性能。從新實現 Unix 工具頗有意思,我很欣賞那些讓電腦運行飛速的有趣的小技巧。
yes
Unix CommandWhat's the simplest Unix command you know? There's echo
, which prints a string to stdout andtrue
, which always terminates with an exit code of 0.
Among the rows of simple Unix commands, there's alsoyes
. If you run it without arguments, you get an infinite stream of y's, separated by a newline:
y
y
y
y
(...you get the idea)
複製代碼
What seems to be pointless in the beginning turns out to be pretty helpful :
yes | sh boring_installation.sh
複製代碼
Ever installed a program, which required you to type "y" and hit enter to keep going?yes
to the rescue! It will carefully fulfill this duty, so you can keep watchingPootie Tang.
Here's a basic version in... uhm... BASIC.
10 PRINT "y"
20 GOTO 10
複製代碼
And here's the same thing in Python:
while True:
print("y")
複製代碼
Simple, eh? Not so quick! Turns out, that program is quite slow.
python yes.py | pv -r > /dev/null
[4.17MiB/s]
複製代碼
Compare that with the built-in version on my Mac:
yes | pv -r > /dev/null [34.2MiB/s] So I tried to write a quicker version in Rust. Here's my first attempt:
use std::env;
fn main() {
let expletive = env::args().nth(1).unwrap_or("y".into());
loop {
println!("{}", expletive);
}
}
複製代碼
Some explanations:
Let's test it.
cargo run --release | pv -r > /dev/null
Compiling yes v0.1.0
Finished release [optimized] target(s) in 1.0 secs
Running `target/release/yes`
[2.35MiB/s]
複製代碼
Whoops, that doesn't look any better. It's even slower than the Python version! That caught my attention, so I looked around for the source code of a C implementation.
Here's the very first version of the program, released with Version 7 Unix and famously authored by Ken Thompson on Jan 10, 1979:
main(argc, argv)
char **argv;
{
for (;;)
printf("%s\n", argc>1? argv[1]: "y");
}
複製代碼
No magic here.
Compare that to the 128-line-version from the GNU coreutils, which is mirrored on Github. After 25 years, it is still under active development! The last code change happened around a year ago. That's quite fast:
# brew install coreutils
gyes | pv -r > /dev/null
[854MiB/s]
複製代碼
The important part is at the end:
/* Repeatedly output the buffer until there is a write error; then fail. */
while (full_write (STDOUT_FILENO, buf, bufused) == bufused)
continue;
複製代碼
Aha! So they simply use a buffer to make write operations faster. The buffer size is defined by a constant namedBUFSIZ
, which gets chosen on each system so as to make I/O efficient (see here). On my system, that was defined as 1024 bytes. I actually had better performance with 8192 bytes.
I've extended my Rust program:
use std::env;
use std::io::{self, BufWriter, Write};
const BUFSIZE: usize = 8192;
fn main() {
let expletive = env::args().nth(1).unwrap_or("y".into());
let mut writer = BufWriter::with_capacity(BUFSIZE, io::stdout());
loop {
writeln!(writer, "{}", expletive).unwrap();
}
}
複製代碼
The important part is, that the buffer size is a multiple of four, to ensure memory alignment.
Running that gave me 51.3MiB/s. Faster than the version, which comes with my system, but still way slower than the results from this Reddit post that I found, where the author talks about 10.2GiB/s.
####Update
Once again, the Rust community did not disappoint. As soon as this post hit the Rust subreddit, user nwydo pointed out a previous discussion on the same topic. Here's their optimized code, that breaks the 3GB/s mark on my machine:
use std::env;
use std::io::{self, Write};
use std::process;
use std::borrow::Cow;
use std::ffi::OsString;
pub const BUFFER_CAPACITY: usize = 64 * 1024;
pub fn to_bytes(os_str: OsString) -> Vec<u8> {
use std::os::unix::ffi::OsStringExt;
os_str.into_vec()
}
fn fill_up_buffer<'a>(buffer: &'a mut [u8], output: &'a [u8]) -> &'a [u8] {
if output.len() > buffer.len() / 2 {
return output;
}
let mut buffer_size = output.len();
buffer[..buffer_size].clone_from_slice(output);
while buffer_size < buffer.len() / 2 {
let (left, right) = buffer.split_at_mut(buffer_size);
right[..buffer_size].clone_from_slice(left);
buffer_size *= 2;
}
&buffer[..buffer_size]
}
fn write(output: &[u8]) {
let stdout = io::stdout();
let mut locked = stdout.lock();
let mut buffer = [0u8; BUFFER_CAPACITY];
let filled = fill_up_buffer(&mut buffer, output);
while locked.write_all(filled).is_ok() {}
}
fn main() {
write(&env::args_os().nth(1).map(to_bytes).map_or(
Cow::Borrowed(
&b"y\n"[..],
),
|mut arg| {
arg.push(b'\n');
Cow::Owned(arg)
},
));
process::exit(1);
}
複製代碼
Now that's a whole different ballgame!
std::ffi::OsString
and std::borrow::Cow
to avoid unnecessary allocations.The only thing, that I could contribute was removing an unnecessary mut
. 😅
The trivial programyes
turns out not to be so trivial after all. It uses output buffering and memory alignment to improve performance. Re-implementing Unix tools is fun and makes me appreciate the nifty tricks, which make our computers fast.