java 字符流字節流

時間 2019-11-11

標籤 java 字符節流欄目 Java 简体版

原文原文鏈接

java對文本文檔進行操做（拷貝、顯示）出現亂碼通常來講，能夠從兩個方面入手。java

一、文本文件自己的編碼格式。數組

二、java代碼中處理文本文件的編碼格式。安全

這裏要注意的一點是，咱們能夠看出copyFileByByte方法和copyFileByChar1方法都是沒有辦法設置目的文件的編碼格式的，而且處理很差均可能出現亂碼，可是須要明確一點的是，copyFileByByte方法拷貝的文件即使出現亂碼也能夠經過另存爲其餘格式來調整消除亂碼，一樣的操做在copyFileByChar1方法拷貝生成的源文件是不能消除亂碼的。
app

假設咱們以字節流格式來讀取一份utf-8編碼格式的txt文檔：函數

package com.audi;

import java.io.File;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.IOException;
import java.io.InputStream;

public class ReadFile
{
    File fileName = new File("C:/Users/Mike/Desktop/爲何使用接口.txt");
    public void readFileByByte()
    {
        InputStream inputStream =null;
        try
        {
            inputStream = new FileInputStream(fileName);
            byte[] temp = new byte[2048];
            StringBuilder buf = new StringBuilder();    //非線程安全，不過這裏是單線程，無所謂線程安全與否
            int length = 0;
            while (-1!=(length=inputStream.read(temp)))
            {
                // 注意下面的代碼使用utf-8做爲編碼格式
                buf.append(new String(temp,0,length,"utf-8"));
            }
            System.out.println(buf.toString());
            
        } catch (FileNotFoundException e)
        {
            e.printStackTrace();
            System.out.println("文件C:/Users/Mike/Desktop/爲何使用接口.txt不存在");
        }
        catch (IOException e) {
            e.printStackTrace();
        }
        finally {
            try
            {
                if (inputStream!=null)
                {
                    inputStream.close();
                }
            } catch (Exception e2)
            {
                e2.printStackTrace();
            }
            
        }
        
    }
    
    public void readFileByChar()
    {
        
    }
}

文本文件原本的內容爲：測試

測試代碼以下：ui

package com.audi;

public class Test
{
    public static void main(String[] args)
    {
        ReadFile readFile = new ReadFile();
        readFile.readFileByByte();
    }
}

運行程序讀取文本文件獲得的控制檯輸出：編碼

能夠看出中文沒有亂碼，一切正常，若是咱們在在上面的代碼中不設置解碼格式，運行程序，依然正常，這是爲何？spa

這是由於個人java文件的默認編碼格式就是utf-8，因此java代碼在編譯的時候默認就取了這個格式做爲解碼格式。線程

若是換成GBK，那麼同樣會出現亂碼：

那麼若是咱們要使用字節流拷貝一份文本文件呢？

java代碼以下：

public void copyFileByByte()
    {
        InputStream inputStream = null;
        OutputStream outputStream = null;
        File destName = new File("copyFileByByte.txt");
        try
        {
            inputStream = new FileInputStream(fileName);
            outputStream = new FileOutputStream(destName);
            int length =0;
            byte[] temp = new byte[2048];
            while (-1!=(length=inputStream.read(temp)))
            {
                outputStream.write(temp, 0, length);
            }
            outputStream.flush();
        } catch (Exception e)
        {
            e.printStackTrace();
        }
        finally {
            try
            {
                if (inputStream!=null)
                {
                    inputStream.close();
                }
                if (outputStream!=null)
                {
                    outputStream.close();
                }
            } catch (Exception e2)
            {
                e2.printStackTrace();
            }
        }
    }

測試代碼以下：

package com.audi;

public class Test
{
    public static void main(String[] args)
    {
        ReadFile readFile = new ReadFile();
//        readFile.readFileByByte();
        readFile.copyFileByByte();
    }
}

個人實際測試結果是，拷貝後的文件的編碼格式是由源文件格式決定的，它會和源文件的格式保持一致。通常不會出現亂碼。

下面是使用字符流來操做文本文件。

public void readFileByChar()
    {
        Reader reader = null;
        try
        {
            reader = new BufferedReader(new FileReader(fileName));
            int length =0;
            char[] temp = new char[2048];
            StringBuilder buf = new StringBuilder();
            while (-1!=(length=reader.read(temp)))
            {
                buf.append(new String(temp, 0, length));
            }
            System.out.println(buf.toString());
        } catch (FileNotFoundException e)
        {
            e.printStackTrace();
            System.out.println("文件C:/Users/Mike/Desktop/爲何使用接口.txt不存在");
        }
        catch (IOException e) {
            e.printStackTrace();
        }
    }

上面的代碼和字節流讀取文件不一樣的是：

一、緩衝數組改爲了char[]類型

二、new string構造函數中不能再設定格式，這將會致使它直接使用測試的java文件格式來解碼讀取的字符流。以下圖所示，由於個人txt文檔也是utf-8格式的，因此不會出現亂碼錯誤。

測試代碼：

package com.audi;

public class Test
{
    public static void main(String[] args)
    {
        ReadFile readFile = new ReadFile();
//        readFile.readFileByByte();
//        readFile.copyFileByByte();
        readFile.readFileByChar();
    }
}

實際運行效果;

若是把測試java文件的編碼格式改成gbk，那麼就會出現亂碼

下面以字符流方式拷貝文件，一樣沒法手動設置文件的編碼格式：

public void copyFileByChar1()
    {
        FileReader fReader = null;
        FileWriter fWriter = null;
        File destName = new File("copyFileByChar1.txt");
        try
        {
            fReader = new FileReader(fileName);
            fWriter = new FileWriter(destName);
            int length =0;
            char[] temp = new char[2048];
            while (-1!=(length=fReader.read(temp)))
            {
                fWriter.write(temp,0,length);
            }
        } catch (FileNotFoundException e)
        {
            e.printStackTrace();
            System.out.println("文件C:/Users/Mike/Desktop/爲何使用接口.txt不存在");
        }
        catch (IOException e) {
            e.printStackTrace();
        }
        finally
        {
            try
            {
                if (fReader!=null)
                {
                    fReader.close();
                }
                if (fWriter!=null)
                {
                    fWriter.close();
                }
                System.out.println("copy succeed");
            } catch (Exception e2)
            {
                e2.printStackTrace();
            }
        }
    }

測試代碼，拷貝後文件的格式依然會和測試代碼的java文件的編碼格式保持一致。

package com.audi;

public class Test
{
    public static void main(String[] args)
    {
        ReadFile readFile = new ReadFile();
//        readFile.readFileByByte();
//        readFile.copyFileByByte();
//        readFile.readFileByChar();
        readFile.copyFileByChar1();
    }
}

下面介紹另一種方法來拷貝文件，使用的是InputStreamReader和OutputStreamWriter：

public void copyFileByChar2()
    {
        InputStreamReader inputStreamReader =null;
        OutputStreamWriter outputStreamWriter = null;
        File destName = new File("copyFileByChar2.txt");
        
        try
        {
            /*其實只有InputStreamReader和OutputStreamWriter才能夠設置編碼格式
             * 
             * */
            inputStreamReader = new InputStreamReader(new java.io.FileInputStream(fileName),"utf-8");
            outputStreamWriter = new OutputStreamWriter(new java.io.FileOutputStream(destName),"utf-8");
            int length =0;
            char[] temp = new char[2048];
            while (-1!=(length=inputStreamReader.read(temp)))
            {
                outputStreamWriter.write(temp,0,length);
            }
        } catch (UnsupportedEncodingException e1)
        {
            e1.printStackTrace();
        } catch (FileNotFoundException e1)
        {
            e1.printStackTrace();
        } catch (IOException e)
        {
            e.printStackTrace();
        }
        finally
        {
            try
            {
                outputStreamWriter.flush();
                if (inputStreamReader!=null)
                {
                    inputStreamReader.close();
                }
                if (outputStreamWriter!=null)
                {
                    outputStreamWriter.close();
                }
            } catch (Exception e2)
            {
                e2.printStackTrace();
            }
            System.out.println("拷貝結束了");
        }
    }

這個時候拷貝文件的格式徹底可控，不再會依賴測試文件的格式了。

此時，設置源文件UTF-8格式，測試java文件GBK格式：

運行測試代碼：

package com.audi;

public class Test
{
    public static void main(String[] args)
    {
        ReadFile readFile = new ReadFile();
//        readFile.readFileByByte();
        readFile.copyFileByByte();
//        readFile.readFileByChar();
        readFile.copyFileByChar1();
        readFile.copyFileByChar2();
    }
}