標準I/O與緩衝區

時間 2019-11-12

原文原文鏈接

《APUE》第五章講標準I/O庫，之因此叫「標準」就是由於這個庫不只被UNIX支持，並且在其它許多系統下也都獲得實現，是ISO C的一部分，大一下學期《C程序設計》裏面常常用的scanf和printf就是這塊。標準I/O和第三章的文件I/O主要區別就是緩衝，read、 write直接調用系統調用，沒有緩衝區，而scanf、printf不直接調用系統調用，在用戶空間維護一塊緩衝區，在適當的時候調用read、 write讀寫緩衝區。編程

首先遇到的是既熟悉又陌生的FILE，雖然一直都那麼用，但FILE到底啥樣一直不清楚，下面是Linux下定義：緩存

struct _IO_FILE {ide

int _flags; /* High-order word is _IO_MAGIC; rest is flags. */函數

#define _IO_file_flags _flagsgoogle

/* The following pointers correspond to the C++ streambuf protocol. */操作系統

/* Note: Tk uses the _IO_read_ptr and _IO_read_end fields directly. */命令行

char* _IO_read_ptr; /* Current read pointer */設計

char* _IO_read_end; /* End of get area. */指針

char* _IO_read_base; /* Start of putback+get area. */rest

char* _IO_write_base; /* Start of put area. */

char* _IO_write_ptr; /* Current put pointer. */

char* _IO_write_end; /* End of put area. */

char* _IO_buf_base; /* Start of reserve area. */

char* _IO_buf_end; /* End of reserve area. */

/* The following fields are used to support backing up and undo. */

char *_IO_save_base; /* Pointer to start of non-current get area. */

char *_IO_backup_base; /* Pointer to first valid character of backup area */

char *_IO_save_end; /* Pointer to end of non-current get area. */

struct _IO_marker *_markers;

struct _IO_FILE *_chain;

int _fileno;

#if 0

int _blksize;

#else

int _flags2;

#endif

_IO_off_t _old_offset; /* This used to be _offset but it's too small. */

#define __HAVE_COLUMN /* temporary */

/* 1+column number of pbase(); 0 is unknown. */

unsigned short _cur_column;

signed char _vtable_offset;

char _shortbuf[1];

/* char* _save_gptr; char* _save_egptr; */

_IO_lock_t *_lock;

#ifdef _IO_USE_OLD_IO_FILE

};

其中有幾個重要字段能夠幫助理解緩衝：

char* _IO_read_base;//讀緩衝區首指針

char* _IO_read_end;//讀緩衝區尾指針

char* _IO_read_ptr;//讀緩衝區當前指針

char* _IO_write_base;//寫緩衝區首指針

char* _IO_write_end;//寫緩衝區尾指針

char* _IO_write_ptr;//寫緩衝區當前指針

char* _IO_buf_base;//緩衝區首指針

char* _IO_buf_end;//緩衝區尾指針

能夠經過下面的程序知道，這三個緩衝區實際上是一個緩衝區，而且在第一次I/O的時候由庫函數自動申請空間，最終由庫函數自動釋放：

/*test.c for testing the members of FILE*/

void test()

{

printf("before reading\n");

printf("read buffer base %p\n", stdin->_IO_read_base);

printf("read buffer end %p\n", stdin->_IO_read_end);

printf("read buffer current %p\n", stdin->_IO_read_ptr);

printf("write buffer base %p\n", stdin->_IO_write_base);

printf("write buffer end %p\n", stdin->_IO_write_end);

printf("write buffer current %p\n", stdin->_IO_write_ptr);

printf("buf buffer base %p\n", stdin->_IO_buf_base);

printf("buf buffer end %p\n", stdin->_IO_buf_end);

fgetc(stdin);

//fputc('a',stdout);

printf("after reading\n");

printf("read buffer base %p\n", stdin->_IO_read_base);

printf("read buffer end %p\n", stdin->_IO_read_end);

printf("read buffer current %p\n", stdin->_IO_read_ptr);

printf("write buffer base %p\n", stdin->_IO_write_base);

printf("write buffer end %p\n", stdin->_IO_write_end);

printf("write buffer current %p\n", stdin->_IO_write_ptr);

printf("buf buffer base %p\n", stdin->_IO_buf_base);

printf("buf buffer end %p\n", stdin->_IO_buf_end);

}

UNIX下提供三種緩衝機制：全緩衝、行緩衝、無緩衝，這塊是本章最難理解的地方，下面分別介紹他們各自如何處理FILE：

1.全緩衝通常應用對象是磁盤文件，標準I/O儘可能多讀寫文件到緩衝區，當緩衝區已滿或手動flush時致使緩衝區當即flush。以上面的test.c爲例，把stdin重定向到一個磁盤文件，來看FILE個成員是如何變化的：

$ gcc test.c && ./a.out < datafile #其中datafile最好是含多行數據的文件

由輸出結果能夠看出，全緩衝是儘量的多讀寫數據到緩衝區，即使我只想獲得一個字符，而實際標準I/O已經爲我把剩下的數據（只要不超過stdin- >_IO_buf_end-stdin->_IO_buf_base）也都讀到緩衝區裏面了，這樣若是還須要讀文件就沒有必要再讀磁盤了，只須要從緩衝區取出數據便可。

全緩衝讀的時候，_IO_read_base指向緩衝區的開始，_IO_read_end指向已從磁盤讀入緩衝區的字符的下一個，_IO_read_ptr指向緩衝區中已被用戶讀走字符的下一個；全緩衝寫的時候，_IO_write_base指向緩衝區的開始， _IO_write_end指向緩衝區最後一個字符的下一個，_IO_write_ptr指向緩衝區中已被用戶寫入的字符的下一個。

2.行緩衝一般應用對象是標準輸入和輸出這些終端，當遇到下列三個條件時致使緩衝區當即flush：

(1)遇到'\n'

(2)緩衝區已滿

(3)書上P135最後一段強調的：當須要從內核讀取數據，若是輸入流是無緩衝流或行緩衝流，則全部的行緩衝輸出流被當即flush。

下面這個程序就涉及到行緩衝，但不少人包括我在第一次看到這個程序都是一頭霧水：

/*print.c for writing which is read from stdin to stdout*/

void print()

{

int c;

for(; (c=getchar())!=EOF; putchar(c));

}

通常會這麼誤解：若是簡單的按照程序走，應該是輸入一個字符，而後輸出一個字符；stdin和stdout都是行緩衝，按照上面第三條(3)應該在輸入一個字符之後全部的行緩衝包括stdout被當即flush啊？

但實際上咱們在終端輸入一個字符後並無直接在終端輸出，而是等到咱們輸入一行結束，按回車時才輸出，而且直接輸出剛纔輸入的一行（包括回車），這就是行緩衝搞的鬼，在第一次getchar時庫函數爲stdin分配緩衝區，並將一行數據（只要不超過stdin->_IO_buf_end-stdin->_IO_buf_base）放入緩衝區，後來 getchar並不是調用read讀內核而是直接從緩衝區讀，因此也就不知足上面第三條(3)，而第一次putchar時庫函數也爲stdout分配了緩衝區，並將要寫的字符放入緩衝區。

當輸完一行按回車鍵時，知足上面條件(1)緩衝區當即flush

，stdin緩衝區被清空，即stdin->_IO_read_ptr=stdin->_IO_read_end

，stdout 緩衝區也被清空，將stdout->_IO_write_base和stdout->_IO_write_ptr之間的字符經過write輸出到終端，而後stdout->_IO_write_ptr=stdout->_IO_write_base，這也就解釋爲何出現輸入一行輸出一行的緣由了。

固然知足條件(2)也致使flush(經過test.c看出行緩衝的緩衝區大小是固定的，在我係統上行緩衝大小是1024bytes)，咱們不妨試試，隨便找一個大小超過1024bytes的文檔複製到命令行，看是否是自動flush，在我係統上是徹底正常，自動flush。

行緩衝讀的時候，_IO_read_base指向緩衝區的開始，_IO_read_end指向已從內核讀入緩衝區的字符的下一個，_IO_read_ptr指向緩衝區中已被用戶讀走的字符的下一個；行緩衝寫的時候，_IO_write_base指向緩衝區的開始，_IO_write_end指向緩衝區的開始， _IO_write_ptr指向緩衝區中已被用戶寫入的字符的下一個。

行緩衝還有一點須要強調，換行符能夠手動保存到緩衝區，庫函數並不當即flush，下面的例子能夠說明這點：

/*check whether is flushed when '\n' is buffered by hand*/

void check()

{

char str[5] = "abc\n";

fputc(str[0], stdout);

fprintf(stderr, "%p\n", stdout->_IO_write_ptr);

fputc(str[1], stdout);

fprintf(stderr, "%p\n", stdout->_IO_write_ptr);

fputc(str[2], stdout);

fprintf(stderr, "%p\n", stdout->_IO_write_ptr);

*stdout->_IO_write_ptr++ = '\n';//能夠手動添加'\n'，但並不引發flush，由於庫函數並不作檢查

fputc(str[3], stdout);//緩衝區flush，stdout->_IO_write_ptr賦值爲stdout->_IO_write_base

fprintf(stderr, "%p\n", stdout->_IO_write_ptr);

}

3.無緩衝通常應用對象是標準錯誤輸出，「無緩衝」並非指緩衝區大小爲0而是爲1，只要把test.c裏面改爲stderr就能夠知道，對無緩衝流的每次讀寫操做都會引發flush。

ISO C規定：

（1）當且僅當標準輸入和標準輸出並不涉及交互式設備時，他們是全緩衝；

（2）標準輸出決不是全緩衝。

大部分系統默認規定：

（1）標準錯誤輸出是無緩衝；

（2）若是涉及終端設備，則是行緩衝，不然是全緩衝。

setbuf 和setvbuf看似很簡單，其實不少隱藏的東西，書上並無明確將出來，好比：本身指定buffer，那這個buffer怎麼保存，若是是簡簡單單的局部變量，那函數返回後buffer自動釋放，也就是說相應的流找不到其緩衝區；還有關於無緩衝，是否是setbuf成無緩衝就能夠輸入一邊輸入字符一邊輸出字符，這個概念我也一直很頭疼，若是我想設置緩衝模式爲徹底無緩衝，就相似曾經在編彙編程序的輸入和輸出，任何鍵盤輸入都不會緩存，實際我實驗是不能夠的，仍是有緩衝，why？下面有一段轉至google group上一個帖子裏面一段話：

setbuf() has to do with the delivery of bytes between the

C library FILE* management layer and the OS I/O layer.

Calls to fread(), fgets(), fgetc(), and getchar() work within

whatever FILE* buffered data is available, and when that data

is exhausted, the calls request that the FILE* buffer be refilled

by the system I/O layer.

When full buffering is turned on, that refill operation results in the

FILE* layer requesting that the operating system hand it a full

buffer's worth of data; when buffering is turned off, that

refill operation results in the FILE* layer requesting that the

operating system return a single character.

Your error is in assuming that the operating system layer in

question is dealing with raw bytes directly from the terminal.

That is not the case. Instead, the relevant operating system layer

is dealing with bytes returned by the terminal device driver --

and the device driver does not pass those bytes up to the

operating system layer until the device driver is ready to do so.

As I indicated before, setting an input stream to be unbuffered

does NOT tell the operating system to tell the device driver

to go into any kind of "raw" single-character mode. There are

system-specific calls such as ioctl() and tcsetterm() that

control what the device driver will do.

In Unix-type systems, the terminal device driver by default works

on a line at a time, not passing the line onward until it detects

a sequence that indicates end-of-line. When the Unix-type

'line disciplines' are in effect, you can edit the line in various

ways before allowing it to be passed to the operating system.

For example, you might type cad and then realize you mistyped and so

press the deletion key and type an r; if you were to do so, and then

pretty return, it would be the word car that was passed to the

next layer, *not* the series of keys cad<delete>r

The device driver buffers the input to allow you to edit it,

and setting your input stream to unbuffered in your program does NOT

affect that device driver buffering.

If you want to do single-character I/O and you will worry about

things like inline editting yourself in your program, then you

will need to use system-specific calls to enable that I/O mode.

Before you head down that path, you should keep in mind that

you cannot handle mouse-highlight and copy and paste operations

just by looking at the key presses themselves: you have to work

with the graphical layer to do that, and that can get very messy.

Because of that, character-by-character I/O is probably best

reserved for interaction with non-graphical devices such as

modems and serial ports. If you -really- want character-by-

character I/O, such as because you are programming a graphical

game, then it is probably best to find a pre-written library that

handles the dirty work for you.

程序設定無緩衝並不表示操做系統不緩衝（raw），並且還涉及硬件緩衝；自定義緩衝區應該設成全局或靜態變量。在讀文件的時候，咱們大都會用是否等於EOF來判斷文件結束，但爲何還要feof函數呢？在二進制文件中EOF是有效字符，這時就會出現文件還沒讀完就被認爲文件已經結束的狀況，而feof()就解決了這個問題，它基於文件長度判斷文件結束。

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。