Java 字符集編碼

時間 2019-11-05

標籤 java 字符集編碼欄目 Java 简体版

原文原文鏈接

1、字符編碼實例
一、NioTest13_In.txt文件內容拷貝到NioTest13_Out.txt文件中

public class NioTest13 {

    public static void main(String[] args) throws  Exception {
        String inputFile = "NioTest13_In.txt";
        String outFile = "NioTest13_Out.txt";

        RandomAccessFile inputRandomAccessFile = new RandomAccessFile(inputFile,"r");

        RandomAccessFile outputRandomAccessFile = new RandomAccessFile(outFile,"rw");

        long inputLength = new File(inputFile).length();

        FileChannel inputFileChannel = inputRandomAccessFile.getChannel();
        FileChannel outputFileChannel = outputRandomAccessFile.getChannel();

        MappedByteBuffer inputData = inputFileChannel.map(FileChannel.MapMode.READ_ONLY, 0, inputLength);
        System.out.println("================================");
        /*Charset.availableCharsets().forEach( (k,v) -> {
            System.out.println(k + ", " + v);
        });*/
        System.out.println("================================");

        Charset charset = Charset.forName("iso-8859-1"); //utf-8
        CharsetDecoder decoder = charset.newDecoder(); //字節數組轉字符串
        CharsetEncoder encoder = charset.newEncoder(); //字符串轉字符數組

        CharBuffer charBuffer = decoder.decode(inputData);

         ByteBuffer outputData = encoder.encode(charBuffer);

        outputFileChannel.write(outputData);

        inputRandomAccessFile.close();
        outputRandomAccessFile.close();
    }
}

二、建立"NioTest13_In.txt文件java

三、執行後生成了NioTest13_Out.txt 文件數組

能夠知道使用： Charset charset = Charset.forName("iso-8859-1"); //utf-8app

使用iso-8859-1和utf-8，中文顯示都是正常的dom

2、字符編碼介紹編碼

一、ASCII
7 bit表示一個字符，共計能夠表示128種字符code

二、ISO-8859-1（兼容ASCII）
8 bit表示一個字符，共計能夠表示256種字符orm

三、gb2312
兩個字節表示一個漢字blog

gbk（是gb2312的超集）
包括生僻的漢字utf-8

四、gb18030 最完整的漢字表示形式unicode

五、big5 繁體中文

六、unicode，全部國家的字符。採用了兩個字節表示一個字符
缺點：不適合英文國家的存儲

七、UTF Unicode Transaction Format
unicode是一種編碼方式，而UTF則是一種存儲方式： UTF-8是unicode的實現方式之一
1) UTF-16LE(little endian) UTF-16-BE(big endian)
　　Zero Widht No-Break Space, 文件開頭以0xFEFF(BE)開始, 以0xFFFE（LE）開始

　2) UTF-8，變長字符表示形式（英文ASCII，中文：通常來講，UTF-8會經過3個字節表示一箇中文）

　3) BOM(Byte Order Mark),帶有BOM頭文件開頭以0xFEFF(BE)開始, 以0xFFFE（LE）開始，通常出如今Window系統