elephant-bird是Twitter的開源項目,項目的地址爲 https://github.com/twitter/elephant-birdgit
該項目是Twitter爲LZO,thrift,protocol buffer相關的hadoop InputFormats, OutputFormats, Writables, Pig加載函數, Hive SerDe, HBase二級索引等編寫的庫github
mvn clean install -U -Dprotobuf.version=2.5.0 -DskipTests=true
mvn package的時候須要簽名apache
gpg --gen-key
以及須要安裝apache Thrift和Protocol Buffersbash
使用elephant-bird來建hive表的類型對應關係函數
CREATE EXTERNAL TABLE `xxxx`( `ts` string COMMENT 'from deserializer', `schema` string COMMENT 'from deserializer', `test_string` string COMMENT 'from deserializer', `test_long` bigint COMMENT 'from deserializer', `test_int` int COMMENT 'from deserializer', `test_short` smallint COMMENT 'from deserializer', `test_double` double COMMENT 'from deserializer', `test_byte` tinyint COMMENT 'from deserializer', `test_bool` boolean COMMENT 'from deserializer', `test_list` array<string> COMMENT 'from deserializer', `test_set` array<bigint> COMMENT 'from deserializer', `test_map` map<string,int> COMMENT 'from deserializer') COMMENT 'test_all_type' PARTITIONED BY ( `ds` string COMMENT '日期分區') ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.thrift.ThriftDeserializer' WITH SERDEPROPERTIES ( 'serialization.class'='com.xxx.xxx.xxx', 'serialization.format'='org.apache.thrift.protocol.TCompactProtocol') STORED AS INPUTFORMAT 'org.apache.hadoop.mapred.SequenceFileInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveSequenceFileOutputFormat' LOCATION 'hdfs://xxxxxxx' TBLPROPERTIES (