淺析java中的IO流

在java中IO類很龐大,初學的時候以爲傻傻分不清楚。其實java流歸根結底的原理是普通字節流,字節緩衝流,轉換流。最基礎的是普通字節流,即從硬盤讀取字節寫入到內存中,但在實際使用中又發現一些特殊的需求,因此java語言的設計者這引入了字節緩衝流和轉換流。全部的java IO類對IO的處理都是基於這三種流中的一種或多種;在介紹完三種流的概念以後,會對IO流的部分java類作介紹。html

1.普通字節流

以FileInputStream爲例子。FileInputStream的硬盤讀取方式分爲兩種,一次讀取一個字節和一次讀取一個字節數組。字節數組的大小不一樣,實際IO耗時也不一樣,圖1爲示例代碼,圖3展現了圖1代碼中讀寫耗時隨字節數組大小的變化趨勢。隨着字節數組的增大,讀寫耗時減少,主要是硬盤尋道時間(seek time)和旋轉時間(rotational latency)的減小。在硬盤讀寫耗時很長時,內存讀寫的耗時相比硬盤讀寫能夠忽略,硬盤讀寫的耗時分爲尋道時間(seek time)、旋轉時間(rotational latency)和傳輸時間(transfer time),傳輸時間相對於尋道時間和旋轉時間(尋道時間和旋轉時間後合併稱爲尋址時間)能夠忽略【1】。硬盤的尋址時間在一個塊中的第一個字節耗時長,一個塊中的其他字節能夠忽略。當字節數組增大時(從32增長到1024*16 byte),尋址一個塊中的第一個字節的場景線性減小,尋址時間也線性減小,所以IO耗時呈線性減小趨勢。當字節數組大小繼續增大(從1024 * 8增長到1024 * 1024 * 16),此時尋址時間已降到很低,相比傳輸時間能夠忽略時,IO耗時的變化趨於平穩。當字節數組大小繼續增大時,讀寫耗時又出現增大的趨勢,這個我還沒找到緣由。當在數組較大(大於1024 *1024 *4)時,read(byte[])方法中除去讀寫以外也會有其它耗時,測試代碼如圖2,測試數據如圖3附表,這個機制我還不清楚(可能須要深刻了解jvm的底層實現了),圖3中在計算讀寫耗時時應減去這部分時間。java

public class Demo01_Copy {

	public static void main(String[] args) throws IOException {
		File src = new File ("e:\\foxit_Offline_FoxitInst.exe");
		File dest = new File("e:\\ithema\\foxit_Offline_FoxitInst.exe");

		byte[] bytes = new byte[1024*128];//調整字節數組的大小,看IO耗時的變化
		long time1 = System.currentTimeMillis();
		copyFile2(src,dest,bytes);
		long time2 = System.currentTimeMillis();
		System.out.println(time2 -time1);
	}
	
	public static void copyFile2(File src,File dest,byte[] bytes) throws IOException{
		InputStream in = new FileInputStream(src);
		OutputStream os = new FileOutputStream(dest);
		
		int len = 0;
		while((len = in.read(bytes))!=-1){
			os.write(bytes,0,len);
		}
		in.close();
		os.close();
	}
}

圖1 經過FileInputStream一次讀取一個字節數組數組

public class Demo02_Copy {
    public static void main(String[] args) throws IOException {
        File src = new File ("e:\\1.txt");
        File dest = new File("e:\\ithema\\1.txt");

        byte[] bytes = new byte[1024*128];//調整字節數組的大小,看IO耗時的變化
        long time1 = System.currentTimeMillis();
        copyFile2(src,dest,bytes);
        long time2 = System.currentTimeMillis();
        System.out.println(time2 -time1);
    }

    public static void copyFile2(File src,File dest,byte[] bytes) throws IOException{
        InputStream in = new FileInputStream(src);
        OutputStream os = new FileOutputStream(dest);

        int len = 0;
        while((len = in.read(bytes))!=-1){
            os.write(bytes,0,len);
        }
        in.close();
        os.close();
    }
}

圖 2 測試除硬盤內存讀寫外的其它耗時(1.txt文件爲空)jvm

圖3 當字節數組大小變化,讀寫總耗時的變化趨勢(折線圖數據來源於表格中藍色背景填充的數據)async

當數組大小從32逐漸增大到1024*16byte時,IO耗時呈線性減小,這基於FileInputStream的read(byte[])實現。read(byte[])的源碼如圖4所示,read(byte b[])是一個本地方法,它保證了硬盤的尋址時間在讀取一個數組大小的字節塊的第一個字節耗時較長,字節塊的其他字節能夠忽略。而相對於read()方法,一個字節一個字節讀取,每讀取一個字節都要從新進行硬盤尋址。oop

public class FileInputStream extends InputStream
{
    public int read(byte b[]) throws IOException {
        return readBytes(b, 0, b.length);
    }
    
     /**
     * Reads a subarray as a sequence of bytes.
     * @param b the data to be written
     * @param off the start offset in the data
     * @param len the number of bytes that are written
     * @exception IOException If an I/O error has occurred.
     */
    private native int readBytes(byte b[], int off, int len) throws IOException;
}

圖4 FileInputStream 的 read(byte[]) 方法源碼性能

2.字節緩衝流

假設如今你要寫一個程序以計算一個text文件的行數。一種方法是使用read()方法從硬盤中一次讀取1個字節到內存中,並檢查該字節是否是換行符「\n」【2】。這種方法已被證實是低效的。測試

更好的方法是使用字節緩衝流,先將字節從硬盤一次讀取一個緩衝區大小的字節到內存中的讀緩衝區,而後在從讀緩衝區中一次讀取一個字節。在逐字節讀取讀取緩衝區時,檢查字節是否是換行符'\n'。字節緩衝流BufferedInputStream的源碼如圖5所示,先從硬盤讀取一個緩衝大小的字節塊到緩衝區,而後逐個讀取緩衝區的字節;當緩衝區的字節讀取完畢後,在調用fill()方法填充緩衝區。字節緩衝流BufferedInputStream的緩衝區大小爲8192。圖6中對比了字節緩衝流和普通字節流的讀寫效率;字節緩衝流的讀耗時僅爲8ms,而沒有緩衝區的普通字節流的耗時爲567ms。圖7中展現了圖6中字節緩衝流讀寫文件的示意圖。ui

public class BufferedInputStream extends FilterInputStream {
    private static int DEFAULT_BUFFER_SIZE = 8192;
    
	public synchronized int read() throws IOException {
        //當緩衝區的字節已被讀取完畢後,調用fill()方法從硬盤讀取字節塊填充緩衝區;
        if (pos >= count) {
            fill();
            if (pos >= count)
                return -1;
        }
        //返回緩衝區的一個字節
        return getBufIfOpen()[pos++] & 0xff;
    }

	private void fill() throws IOException {
        byte[] buffer = getBufIfOpen();
        //初始定義int markpos = -1;
        if (markpos < 0)
            pos = 0;            /* no mark: throw away the buffer */
        else if (pos >= buffer.length)  /* no room left in buffer */
            if (markpos > 0) {  /* can throw away early part of the buffer */
                int sz = pos - markpos;
                System.arraycopy(buffer, markpos, buffer, 0, sz);
                pos = sz;
                markpos = 0;
            } else if (buffer.length >= marklimit) {
                markpos = -1;   /* buffer got too big, invalidate mark */
                pos = 0;        /* drop buffer contents */
            } else if (buffer.length >= MAX_BUFFER_SIZE) {
                throw new OutOfMemoryError("Required array size too large");
            } else {            /* grow buffer */
                int nsz = (pos <= MAX_BUFFER_SIZE - pos) ?
                        pos * 2 : MAX_BUFFER_SIZE;
                if (nsz > marklimit)
                    nsz = marklimit;
                byte nbuf[] = new byte[nsz];
                System.arraycopy(buffer, 0, nbuf, 0, pos);
                if (!bufUpdater.compareAndSet(this, buffer, nbuf)) {
                    // Can't replace buf if there was an async close.
                    // Note: This would need to be changed if fill()
                    // is ever made accessible to multiple threads.
                    // But for now, the only way CAS can fail is via close.
                    // assert buf == null;
                    throw new IOException("Stream closed");
                }
                buffer = nbuf;
            }
        count = pos;
        //從硬盤讀取一個緩衝區大小的塊到緩衝區
        int n = getInIfOpen().read(buffer, pos, buffer.length - pos);
        if (n > 0)
            count = n + pos;
    }
}

圖5 BufferedInputStream的read()方法源碼this

public class Demo03_Copy {

	public static void main(String[] args) throws IOException {
		File src = new File ("e:\\settings.xml");
		File dest = new File("e:\\ithema\\settings.xml");

		byte[] bytes = new byte[1024*128];
		long time1 = System.currentTimeMillis();
		//耗時:567 ms
        //copyFile1(src,dest);
		//耗時:8 ms
		copyFile3(src,dest);
		long time2 = System.currentTimeMillis();
		System.out.println(time2 -time1);
	}

    //使用普通字節流
	public static void copyFile1(File src,File dest) throws IOException{
		InputStream in = new FileInputStream(src);
		OutputStream os = new FileOutputStream(dest);
		int len = 0;
		int lineSum = 1;
		while((len = in.read())!= -1){
			if(len == '\n'){
				lineSum++;
			}
			os.write(len);
		}
		System.out.println("lineSum:"+lineSum);
		in.close();
		os.close();
	}
	
    //使用字節緩衝流
	public static void copyFile3(File src,File dest) throws IOException{
		
		InputStream in = new BufferedInputStream(new FileInputStream(src));
		OutputStream os = new BufferedOutputStream(new FileOutputStream(dest));
		
		int len = 0;
		int lineSum = 1;
		while((len = in.read())!=-1){
			if(len == '\n'){
				lineSum ++;
			}
			os.write(len);
		}
		System.out.println("lineSum:"+lineSum);
		in.close();
		os.close();
	}
}

圖6 字節緩衝流和普通字節流的讀寫效率對比

圖7 使用字節緩衝流在圖6中讀寫文件的示意圖

3.轉換流

轉換流實現了在指定的編碼方式下進行字節編碼和字符編碼的轉換。轉換流若是直接從硬盤一次一個字節讀取的轉換流效率也很低,因此轉換流通常都是基於字節緩衝流的。轉換流InputStreamReader的使用如圖8所示,圖中代碼底層的執行流程圖如圖9所示。InputStreamReader 的源碼解析圖如圖10所示,轉碼的關鍵代碼如圖11所示。如圖11,一個字符的字符編碼所佔字節個數固定爲2個字節,但一個字符的字符編碼通過轉換流按UTF-8格式轉換爲字節編碼後,字節編碼所佔字節個數爲1~4個。

public class Demo01_InputStreamReader {
	public static void main(String[] args) throws IOException {
		readUTF();
	}
    
    //一次讀取一個字符
	public static void readUTF() throws IOException{
		InputStreamReader isr = new InputStreamReader(new FileInputStream("e:\\2.txt"),"UTF-8");
		int ch = 0;
		while((ch = isr.read())!=-1){
			System.out.println((char)ch);
		}
		isr.close();
	}
}

圖8 使用轉換流InputStreamReader一次讀取一個字符

圖9 InputStreamReader在read()時的底層流程圖(文件中的字節編碼可經過FileInputStream讀取查看)

圖10 InputStreamReader的read()源碼解析圖

class UTF_8 extends Unicode{
    private CoderResult decodeArrayLoop(ByteBuffer paramByteBuffer, CharBuffer paramCharBuffer)
    {
      byte[] arrayOfByte = paramByteBuffer.array();
      int i = paramByteBuffer.arrayOffset() + paramByteBuffer.position();
      int j = paramByteBuffer.arrayOffset() + paramByteBuffer.limit();
      char[] arrayOfChar = paramCharBuffer.array();
      int k = paramCharBuffer.arrayOffset() + paramCharBuffer.position();
      int m = paramCharBuffer.arrayOffset() + paramCharBuffer.limit();
      int n = k + Math.min(j - i, m - k);
      while ((k < n) && (arrayOfByte[i] >= 0))
        arrayOfChar[(k++)] = (char)arrayOfByte[(i++)];
      while (i < j)
      {
        int i1 = arrayOfByte[i];
        if (i1 >= 0)
        {
          if (k >= m)
            return xflow(paramByteBuffer, i, j, paramCharBuffer, k, 1);
          arrayOfChar[(k++)] = (char)i1;
          i++;
        }
        else
        {
          int i2;
          if ((i1 >> 5 == -2) && ((i1 & 0x1E) != 0))
          {
            if ((j - i < 2) || (k >= m))
              return xflow(paramByteBuffer, i, j, paramCharBuffer, k, 2);
            i2 = arrayOfByte[(i + 1)];
            if (isNotContinuation(i2))
              return malformedForLength(paramByteBuffer, i, paramCharBuffer, k, 1);
            arrayOfChar[(k++)] = (char)(i1 << 6 ^ i2 ^ 0xF80);
            i += 2;
          }
          else
          {
            int i3;
            int i4;
            if (i1 >> 4 == -2)
            {
              i2 = j - i;
              if ((i2 < 3) || (k >= m))
              {
                if ((i2 > 1) && (isMalformed3_2(i1, arrayOfByte[(i + 1)])))
                  return malformedForLength(paramByteBuffer, i, paramCharBuffer, k, 1);
                return xflow(paramByteBuffer, i, j, paramCharBuffer, k, 3);
              }
              i3 = arrayOfByte[(i + 1)];
              i4 = arrayOfByte[(i + 2)];
              if (isMalformed3(i1, i3, i4))
                return malformed(paramByteBuffer, i, paramCharBuffer, k, 3);
              char c = (char)(i1 << 12 ^ i3 << 6 ^ (i4 ^ 0xFFFE1F80));
              if (Character.isSurrogate(c))
                return malformedForLength(paramByteBuffer, i, paramCharBuffer, k, 3);
              arrayOfChar[(k++)] = c;
              i += 3;
            }
            else if (i1 >> 3 == -2)
            {
              i2 = j - i;
              if ((i2 < 4) || (m - k < 2))
              {
                i1 &= 255;
                if ((i1 > 244) || ((i2 > 1) && (isMalformed4_2(i1, arrayOfByte[(i + 1)] & 0xFF))))
                  return malformedForLength(paramByteBuffer, i, paramCharBuffer, k, 1);
                if ((i2 > 2) && (isMalformed4_3(arrayOfByte[(i + 2)])))
                  return malformedForLength(paramByteBuffer, i, paramCharBuffer, k, 2);
                return xflow(paramByteBuffer, i, j, paramCharBuffer, k, 4);
              }
              i3 = arrayOfByte[(i + 1)];
              i4 = arrayOfByte[(i + 2)];
              int i5 = arrayOfByte[(i + 3)];
              int i6 = i1 << 18 ^ i3 << 12 ^ i4 << 6 ^ (i5 ^ 0x381F80);
              if ((isMalformed4(i3, i4, i5)) || (!Character.isSupplementaryCodePoint(i6)))
                return malformed(paramByteBuffer, i, paramCharBuffer, k, 4);
              arrayOfChar[(k++)] = Character.highSurrogate(i6);
              arrayOfChar[(k++)] = Character.lowSurrogate(i6);
              i += 4;
            }
            else
            {
              return malformed(paramByteBuffer, i, paramCharBuffer, k, 1);
            }
          }
        }
      }
      return xflow(paramByteBuffer, i, j, paramCharBuffer, k, 0);
    }
}

圖11 UTF_8中將字節編碼解碼爲字符編碼的方法decodeArrayLoop()

4.經常使用的IO類FileReader和BufferedReader

FileReader(String fileName)和InputStreamReader(new FileInputStream(String fileName))是等價的,如圖12所示,具體實現參見第3節。BufferedReader的實現與FileReader不一樣,它們的性能對好比圖13所示。圖14展現了BufferedReader的使用,這爲了和圖7中InputStreamReader(new FileInputStream(String fileName))的使用作對比。圖14中代碼底層的執行流程圖如圖15所示。BufferedReader的方法read()的源碼解析圖如圖16所示。BufferedReader和FileReader在字符編碼和字節編碼的轉換時都調用了CharsetDecoder.decode()方法;不一樣的是BufferedReader一次轉換了8192個字符(圖15),而FileReader一次只轉換了2個字符(圖9)。但因爲BufferedReader和FileReader的字節緩衝區大小於均爲8192個字節,所以BufferedReader與FileReader效率相差不大。

public class FileReader extends InputStreamReader {
    public FileReader(String fileName) throws FileNotFoundException {
        super(new FileInputStream(fileName));
    }
}

圖12 FileReader(String filePath)的構造方法

public class Demo01_Copy {

	public static void main(String[] args) throws IOException {
		File src = new File ("e:\\foxit_Offline_FoxitInst.exe");
		File dest = new File("e:\\ithema\\foxit_Offline_FoxitInst.exe");

		long time1 = System.currentTimeMillis();
        //耗時:3801 ms
		//copyFile5(src,dest,bytes);
		//耗時:2938 ms
		copyFile6(src,dest);
		long time2 = System.currentTimeMillis();
		System.out.println(time2 -time1);
	}


	public static void copyFile5(File src ,File dest) throws IOException {
		FileReader fr = new FileReader(src);
		FileWriter fw = new FileWriter(dest);
		int len = 0;
		while((len=fr.read())!=-1){
			fw.write(len);
		}
		fr.close();
		fw.close();
	}

	public static void copyFile6(File src,File dest) throws IOException{
		BufferedReader br = new BufferedReader(new FileReader(src));
		BufferedWriter bw = new BufferedWriter(new FileWriter(dest));
		int len = 0;
		while((len=br.read())!=-1){
			bw.write(len);
		}
		br.close();
		bw.close();
	}
}

圖13 FileReader和BufferedReader的性能對比

public class Demo01_BufferedReader {
	public static void main(String[] args) throws IOException {
		readUTF();
	}
    
    //一次讀取一個字符
	public static void readUTF() throws IOException{
		BufferedReader br = new BufferedReader(
            new InputStreamReader(new FileInputStream("e:\\2.txt"),"UTF-8"));
		int ch = 0;
		while((ch = br.read())!=-1){
			System.out.println((char)ch);
		}
		br.close();
	}
}

圖14 使用BufferedReader一次讀取一個字符(與圖7作對比)

圖15 BufferedReader在read()時的底層流程圖(與圖8作對比)

圖16 BufferedReader的read()源碼解析圖(與圖9作對比)

5.總結

普通字節流是基礎,是最簡單高效的流。若是沒有特殊的需求,只是高效的進行文件讀寫,選擇合適的字節數組大小,一次從硬盤讀取一個字節數組大小的字節塊,其效率是最高的。

字節緩衝流是爲行數統計,按行讀取等特殊需求而設計的。相比於直接從硬盤一次讀取一個字節;先從硬盤一次讀取一個緩衝區大小的字節塊到緩衝區(位於內存),再從緩衝區一個字節一個字節的讀取並判斷是否是行末尾('\n')的效率更高。

轉換流實現了在指定的編碼方式下進行字節編碼和字符編碼的轉換。轉換流若是直接從硬盤一次一個字節讀取的轉換流效率也很低,因此轉換流通常都是基於字節緩衝流的。

參考資料:

【1】Computer Systems A Programmers Perspective.3rd->Section 6.1 Storage Technologies->Disk Operation;

【2】Computer Systems A Programmers Perspective.3rd->Section 10.5.2 Rio Buffered Input Functions;

附件:

文中測試用的文件1.txt,2.txt,foxit_Offline_FoxitInst.exe,settings.xml

相關文章
相關標籤/搜索