幾點注意:測試
1,ASCII碼和ISO-8859-1都是單字節編碼,ASCII碼能表示128個字符,ISO-8859-1總共能表示256個字符。都不能表示中文,若是中文字符或其它不在IOS-8859碼值範圍內的字符會統一用3f表示(顯示爲"?", 一般所說的"黑洞");編碼
2,GBK兼容GB2312,是雙字節編碼,GB2312包含6763個漢字,GBK包含21003個漢字;spa
3,UTF-16爲定長雙字節編碼,大大簡化了字符串的操做,可是會浪費存儲空間。JAVA以UTF-16做爲內存存儲格式(見編碼測試輸出source行);code
4,UTF-8採用變長,不一樣類型的字符能夠由1~6個字節組成(漢字通常3個字節)。UTF-8規則以下:orm
測試代碼:blog
package com.test.main; public class TestCode { public static void encode(){ String name = "淘!我喜歡!"; toHex(name.toCharArray()); String [] codeType = {"ISO-8859-1", "GB2312", "GBK", "UTF-16", "UTF-8"}; for (String type : codeType) { try { byte [] bytes = name.getBytes(type); toHex(type, bytes); } catch (Exception e) { e.printStackTrace(); } } } private static void toHex(String name, byte[] charArray) { System.out.print(String.format("%-15s", name + ":")); for (int i = 0; i < charArray.length; i++) { System.out.print(String.format("%-4x ", charArray[i])); } System.out.println(); } private static void toHex(char[] charArray) { System.out.print(String.format("%-15s","source:")); for (int i = 0; i < charArray.length; i++) { System.out.print(String.format("%-4x ", (int)charArray[i])); } System.out.println(); } public static void main(String[] args) { encode(); } }
輸出:內存