Java序列化機制將對象裝換爲連續的byte數據, 這些數據能夠在之後還原(反序列化)成原來的對象
Java中, 要想一個類的實例可被序列化, 該類須實現Serializable接口. Serializable接口是一個標誌, 沒有任何方法, 其定義以下java
public interface Serializable { }
定義一個類Block1, 該類實現了Serializable接口數組
class Block1 implements Serializable { private int one = 1; private int two = 2; private int three = 3; @Override public String toString() { return "Block1 [one=" + one + ", two=" + two + ", three=" + three + "]"; } }
定義一個類JavaSerializeTest, 測試Java序列化機制ide
public class JavaSerializeTest { public static void main(String[] args) throws IOException, ClassNotFoundException { Block1 block = new Block1(); ByteArrayOutputStream baos = null; ObjectOutputStream oos = null; ObjectInputStream ois = null; try { // 建立一個ByteArrayOutputStream對象baos baos = new ByteArrayOutputStream(); // 裝飾ByteArrayOutputStream對象baos, 獲得ObjectOutputStream對象oos oos = new ObjectOutputStream(baos); // 對block進行序列化, 序列化到baos中 oos.writeObject(block); // 從字節數組輸出流baos中獲得字節數組 byte[] bytes = baos.toByteArray(); System.out.println("序列化Block1對象爲byte數組, byte數組長度爲:" + bytes.length); // 以字節數組bytes建立ByteArrayInputStream對象, 再把這個對象裝飾成ObjectInputStream對象ois ois = new ObjectInputStream(new ByteArrayInputStream(bytes)); // 調用ObjectInputStream對象ois的readObject()方法, 實現反序列化, 返回一個Block1對象block1 Block1 block1 = (Block1) ois.readObject(); System.out.println("byte數組反序列化, 還原成Block1對象: " + block1); } finally { //關閉流 } } }
Console輸出:
序列化Block1對象爲byte數組, byte數組長度: 72
byte數組反序列化, 還原成Block1對象: Block1 [one=1, two=2, three=3]oop
ObjectOutputStream提供了一些writeX()方法, 包括writeInt(), writeLong(), writeFloat(), writeUTF()...
JavaAPI:
public final void writeObject(Object obj) throws IOException
將指定的對象寫入ObjectOutputStream。對象的類、類的簽名,以及類及其全部父類型的非瞬態和非靜態字段的值都將被寫入測試
因爲Java的序列化機制太過強大, 能夠看出只有3個屬性(都爲int類型,一共12個字節)的Block1對象block, 序列化後生成的字節數組卻有72個字節, 所以對於Hadoop來講, 須要一個新的序列化機制this
Hadoop中, 要想一個類的實例可被序列化, 該類須實現Writable接口.
Writable接口有兩個方法, write()序列化和readFields()反序列化, 其定義以下:spa
public interface Writable { /* * 將對象(this)的屬性字段序列化到輸出流DataOuput out中。 */ void write(DataOutput out) throws IOException; /* * 從輸入流DataInput in中讀取屬性字段信息,重組爲(this)對象,這是一個反序列化操做。 */ void readFields(DataInput in) throws IOException; }
定義一個類Block2, 該類實現了Writable接口code
class Block2 implements Writable { private int one = 1; private int two = 2; private int three = 3; /* * 將對象(this)的屬性字段序列化到輸出流DataOuput out中。 */ @Override public void write(DataOutput out) throws IOException { out.writeInt(one); out.writeInt(two); out.writeInt(three); } /* * 從輸入流DataInput in中讀取屬性字段信息,重組爲(this)對象,這是一個反序列化操做。 */ @Override public void readFields(DataInput in) throws IOException { one = in.readInt(); // 爲了看出來反序列化效果, 交換第two和three, three = in.readInt(); // two=3 two = in.readInt(); // three=2 } @Override public String toString() { return "Block2 [one=" + one + ", two=" + two + ", three=" + three + "]"; } }
PS: write()方法中out.writeX(x)和readFields()方法中x = in.readX()順序必須一致, 不然沒法保證數據的正確性對象
定義一個類HadoopSerializeTest, 測試Hadoop序列化機接口
public class HadoopSerializeTest { public static void main(String[] args) throws IOException, ClassNotFoundException { Block2 block = new Block2(); ByteArrayOutputStream baos = null; DataOutputStream dos = null; DataInputStream dis = null; try { // 建立一個ByteArrayOutputStream對象baos baos = new ByteArrayOutputStream(); // 裝飾ByteArrayOutputStream對象baos, 獲得DataOutputStream對象dos dos = new DataOutputStream(baos); // 對block進行序列化, 序列化到baos中 block.write(dos); // 從baos中獲得字節數組 byte[] bytes = baos.toByteArray(); System.out.println("序列化Block2對象爲byte數組, byte數組長度爲: " + bytes.length); // 以字節數組bytes建立ByteArrayInputStream對象, 再把這個對象裝飾成DataInputStream對象dis dis = new DataInputStream(new ByteArrayInputStream(bytes)); Block2 block1 = new Block2(); System.out.println("未反序列化的Block2對象: " + block1); // 調用block1的readFields(DataInput)方法, 實現反序列化, 交換two和three的值 block1.readFields(dis); System.out.println("byte數組反序列化, 還原成Block2對象:" + block1); } finally { //關閉流 } } }
Console輸出:
序列化Block2對象爲byte數組, byte數組長度: 12
未反序列化的Block2對象: Block2 [one=1, two=2, three=3]
byte數組反序列化, 還原成Block2對象: Block2 [one=1, two=3, three=2]
因爲Block2對象block序列化時只輸出3個int, 序列化後生成的字節數組只有12個字節, 和Java的序列化機制的輸出結果(72個字節)對比, Hadoop的序列化結果緊湊而快速