譯| 關於 Unix 命令 `yes` 的小故事

原文閱讀:A Little Story About the `yes` Unix Commandhtml

寫在前面:瑟瑟發抖的首次翻譯

這是第一次動手翻譯一篇外文,看懂和翻懂是不同的,你所見到的是 v3.0 版本…node

感謝 依雲 信雅達的科普和滿滿的批註,還有依雲和傳奇老師的最後的校訂,以及,H 老師的文章分享~python

若是你發現本文有任何一處翻譯不當的,歡迎指教,感謝感謝(///▽///)git


譯文開始

你所知的最簡單的 Unix 命令是什麼呢?github

echo命令,用於將字符串打印到標準輸出流,並以 o 爲結束的命令。bash

在成堆的簡單 Unix 命令中,也有 yes 命令。若是你不帶參數地運行yes命令,你會獲得一串無盡的被換行符分隔開的 y 字符流:app

y
y
y
y
(...你明白了吧)
複製代碼

一開始看似無心義的東西原來它是很是的有用:less

yes | sh 糟心的安裝.sh
複製代碼

你曾經有安裝一個程序,須要你輸入「y」並按下回車繼續安裝的經歷嗎?yes命令就是你的救星。它會很好地履行安裝程序繼續執行的義務,而你能夠繼續觀看 Pootie Tang.(一部歌舞喜劇)。ide

編寫 yes

emmm,這是 BASIC 編寫 ‘yes’的一個基礎版本:工具

10 PRINT "y"
20 GOTO 10
複製代碼

下面這個是用 Python 實現的編寫 ‘yes’:

while True:
    print("y")
複製代碼

看似很簡單?不,執行速度沒那麼快! 事實證實,這個程序執行的速度很是慢。

python yes.py | pv -r > /dev/null
[4.17MiB/s]
複製代碼

和我 Mac 自帶的版本執行速度相比:

yes | pv -r > /dev/null
[34.2MiB/s]
複製代碼

因此我從新寫了一個執行速度更快的的 Rust 版本,這是個人第一次嘗試:

use std::env;

fn main() {
  let expletive = env::args().nth(1).unwrap_or("y".into());
  loop {
    println!("{}", expletive);
  }
}
複製代碼

解釋一下:

  • 循環裏想打印的那個被叫作expletive字符串是第一個命令行的參數。expletive這個詞是我在yes書冊裏學會的;
  • unwrap_orexpletive傳參,爲了防止參數沒有初始化,咱們將yes做爲默認值
  • into()方法將默認參數將從單個字符串轉換爲堆上的字符串

來,咱們測試下效果:

cargo run --release | pv -r > /dev/null
   Compiling yes v0.1.0
    Finished release [optimized] target(s) in 1.0 secs
     Running `target/release/yes`
[2.35MiB/s] 
複製代碼

emmm,速度上看上去並無多大提高,它甚至比 Python 版本的運行速度更慢。這結果讓我意外,因而我決定分析下用 C 實現的寫入‘yes’程序的源代碼。

這是 C 語言的第一個版本 ,這是 Ken Thompson 在 1979 年 1 月 10 日 Unix 第七版裏的 C 實現的編寫‘yes’程序:

main(argc, argv)
char **argv;
{
  for (;;)
    printf("%s\n", argc>1? argv[1]: "y");
}
複製代碼

這裏沒有魔法。

將它同 GitHub 上鏡像的 GNU coreutils 的 128 行代碼版 相比較,即便 25 年過去了,它依舊在發展更新。上一次的代碼變更是在一年前,如今它執行速度快多啦:

# brew install coreutils
gyes | pv -r > /dev/null 
[854MiB/s]
複製代碼

最後,重頭戲來了:

/* Repeatedly output the buffer until there is a write error; then fail.  */
while (full_write (STDOUT_FILENO, buf, bufused) == bufused)
  continue;
複製代碼

wow,讓寫入速度更快他們只是用了一個緩衝區。 常量BUFSIZ用來代表這個緩衝區的大小,根據不一樣的操做系統會選擇不一樣的緩衝區大小【寫入/讀取】操做高效(延伸閱讀傳送門 。個人系統的緩衝區大小是 1024 個字節,事實上,我用 8192 個字節能更高效。

好,來看看我改進的 Rust 新版本:

use std::io::{self, Write};

const BUFSIZE: usize = 8192;

fn main() {
  let expletive = env::args().nth(1).unwrap_or("y".into());
  let mut writer = BufWriter::with_capacity(BUFSIZE, io::stdout());
  loop {
    writeln!(writer, "{}", expletive).unwrap();
  }
}
複製代碼

最關鍵的一點是,緩衝區的大小要是 4 的倍數以確保內存對齊

如今運行速度是 51.3MiB/s ,比我係統默認的版本執行速度快多了,但仍然比 Ken Thompson 在 [高效的輸入輸出] (https://www.gnu.org/software/libc/manual/html_node/Controlling-Buffering.html) 文中說的 10.2GiB/s 慢。

更新

再一次,Rust 社區沒讓我失望。

這篇文章剛發佈到 Reddit 的 Rust 板塊, Reddit 的用戶 nwydo 就提到了以前關於速率問題的討論 。這個是先前討論人員的優化代碼,它打破了我機子的 3GB/s 的速度:

use std::env;
use std::io::{self, Write};
use std::process;
use std::borrow::Cow;

use std::ffi::OsString;
pub const BUFFER_CAPACITY: usize = 64 * 1024;

pub fn to_bytes(os_str: OsString) -> Vec<u8> {
  use std::os::unix::ffi::OsStringExt;
  os_str.into_vec()
}

fn fill_up_buffer<'a>(buffer: &'a mut [u8], output: &'a [u8]) -> &'a [u8] {
  if output.len() > buffer.len() / 2 {
    return output;
  }

  let mut buffer_size = output.len();
  buffer[..buffer_size].clone_from_slice(output);

  while buffer_size < buffer.len() / 2 {
    let (left, right) = buffer.split_at_mut(buffer_size);
    right[..buffer_size].clone_from_slice(left);
    buffer_size *= 2;
  }

  &buffer[..buffer_size]
}

fn write(output: &[u8]) {
  let stdout = io::stdout();
  let mut locked = stdout.lock();
  let mut buffer = [0u8; BUFFER_CAPACITY];

  let filled = fill_up_buffer(&mut buffer, output);
  while locked.write_all(filled).is_ok() {}
}

fn main() {
  write(&env::args_os().nth(1).map(to_bytes).map_or(
    Cow::Borrowed(
      &b"y\n"[..],
    ),
    |mut arg| {
      arg.push(b'\n');
      Cow::Owned(arg)
    },
  ));
  process::exit(1);
}
複製代碼

一個新的實現方式!

  • 咱們預先準備了一個填充好的字符串緩衝區,在每次循環中重用。
  • 標準輸出流被鎖保護着,因此,咱們不採用不斷地獲取、釋放的形式,相反的,咱們用 lock 進行數據寫入同步。
  • 咱們用平臺原生的 std::ffi::OsStringstd::borrow::Cow 去避免沒必要要的空間分配

我惟一能作的事情就是 刪除一個沒必要要的 mut

這是我此次經歷的一個總結:

看似簡單的 yes 程序其實沒那麼簡單,它用了一個輸出緩衝和內存對齊形式去提升性能。從新實現 Unix 工具頗有意思,我很欣賞那些讓電腦運行飛速的有趣的小技巧。


附上原文

A Little Story About the yes Unix Command

What's the simplest Unix command you know? There's echo, which prints a string to stdout andtrue, which always terminates with an exit code of 0.

Among the rows of simple Unix commands, there's alsoyes. If you run it without arguments, you get an infinite stream of y's, separated by a newline:

y
y
y
y
(...you get the idea)
複製代碼

What seems to be pointless in the beginning turns out to be pretty helpful :

yes | sh boring_installation.sh
複製代碼

Ever installed a program, which required you to type "y" and hit enter to keep going?yesto the rescue! It will carefully fulfill this duty, so you can keep watchingPootie Tang.

Writing yes

Here's a basic version in... uhm... BASIC.

10 PRINT "y"
20 GOTO 10
複製代碼

And here's the same thing in Python:

while True:
    print("y")
複製代碼

Simple, eh? Not so quick! Turns out, that program is quite slow.

python yes.py | pv -r > /dev/null
[4.17MiB/s]
複製代碼

Compare that with the built-in version on my Mac:

yes | pv -r > /dev/null [34.2MiB/s] So I tried to write a quicker version in Rust. Here's my first attempt:

use std::env;

fn main() {
  let expletive = env::args().nth(1).unwrap_or("y".into());
  loop {
    println!("{}", expletive);
  }
}
複製代碼

Some explanations:

  • The string we want to print in a loop is the first command line parameter and is named expletive. I learned this word from the yes manpage.
  • I use unwrap_or to get the expletive from the parameters. In case the parameter is not set, we use "y" as a default.
  • The default parameter gets converted from a string slice (&str) into an owned string on the heap (String) using into().

Let's test it.

cargo run --release | pv -r > /dev/null
   Compiling yes v0.1.0
    Finished release [optimized] target(s) in 1.0 secs
     Running `target/release/yes`
[2.35MiB/s] 
複製代碼

Whoops, that doesn't look any better. It's even slower than the Python version! That caught my attention, so I looked around for the source code of a C implementation.

Here's the very first version of the program, released with Version 7 Unix and famously authored by Ken Thompson on Jan 10, 1979:

main(argc, argv)
char **argv;
{
  for (;;)
    printf("%s\n", argc>1? argv[1]: "y");
}
複製代碼

No magic here.

Compare that to the 128-line-version from the GNU coreutils, which is mirrored on Github. After 25 years, it is still under active development! The last code change happened around a year ago. That's quite fast:

# brew install coreutils
gyes | pv -r > /dev/null 
[854MiB/s]
複製代碼

The important part is at the end:

/* Repeatedly output the buffer until there is a write error; then fail.  */
while (full_write (STDOUT_FILENO, buf, bufused) == bufused)
  continue;
複製代碼

Aha! So they simply use a buffer to make write operations faster. The buffer size is defined by a constant namedBUFSIZ, which gets chosen on each system so as to make I/O efficient (see here). On my system, that was defined as 1024 bytes. I actually had better performance with 8192 bytes.

I've extended my Rust program:

use std::env;
use std::io::{self, BufWriter, Write};

const BUFSIZE: usize = 8192;

fn main() {
    let expletive = env::args().nth(1).unwrap_or("y".into());
    let mut writer = BufWriter::with_capacity(BUFSIZE, io::stdout());
    loop {
        writeln!(writer, "{}", expletive).unwrap();
    }
}
複製代碼

The important part is, that the buffer size is a multiple of four, to ensure memory alignment.

Running that gave me 51.3MiB/s. Faster than the version, which comes with my system, but still way slower than the results from this Reddit post that I found, where the author talks about 10.2GiB/s.

####Update

Once again, the Rust community did not disappoint. As soon as this post hit the Rust subreddit, user nwydo pointed out a previous discussion on the same topic. Here's their optimized code, that breaks the 3GB/s mark on my machine:

use std::env;
use std::io::{self, Write};
use std::process;
use std::borrow::Cow;

use std::ffi::OsString;
pub const BUFFER_CAPACITY: usize = 64 * 1024;

pub fn to_bytes(os_str: OsString) -> Vec<u8> {
  use std::os::unix::ffi::OsStringExt;
  os_str.into_vec()
}

fn fill_up_buffer<'a>(buffer: &'a mut [u8], output: &'a [u8]) -> &'a [u8] {
  if output.len() > buffer.len() / 2 {
    return output;
  }

  let mut buffer_size = output.len();
  buffer[..buffer_size].clone_from_slice(output);

  while buffer_size < buffer.len() / 2 {
    let (left, right) = buffer.split_at_mut(buffer_size);
    right[..buffer_size].clone_from_slice(left);
    buffer_size *= 2;
  }

  &buffer[..buffer_size]
}

fn write(output: &[u8]) {
  let stdout = io::stdout();
  let mut locked = stdout.lock();
  let mut buffer = [0u8; BUFFER_CAPACITY];

  let filled = fill_up_buffer(&mut buffer, output);
  while locked.write_all(filled).is_ok() {}
}

fn main() {
  write(&env::args_os().nth(1).map(to_bytes).map_or(
    Cow::Borrowed(
      &b"y\n"[..],
    ),
    |mut arg| {
      arg.push(b'\n');
      Cow::Owned(arg)
    },
  ));
  process::exit(1);
}
複製代碼

Now that's a whole different ballgame!

  • We prepare a filled string buffer, which will be reused for each loop.
  • Stdout is protected by a lock. So, instead of constantly acquiring and releasing it, we keep it all the time.
  • We use a the platform-native std::ffi::OsString and std::borrow::Cow to avoid unnecessary allocations.

The only thing, that I could contribute was removing an unnecessary mut. 😅

Lessons learned

The trivial programyesturns out not to be so trivial after all. It uses output buffering and memory alignment to improve performance. Re-implementing Unix tools is fun and makes me appreciate the nifty tricks, which make our computers fast.

相關文章
相關標籤/搜索