(1)一段 建表語句:數據庫
[ROW FORMAT row_format]
[STORED AS file_format]
| STORED BY 'storage.handler.class.name'
[WITH SERDEPROPERTIES (...)] -- (Note: Available in Hive 0.6.0 and later) ]
row_format :
:DELIMITED
[FIELDS TERMINATED BY char [ESCAPED BY char]]
[COLLECTION ITEMS TERMINATED BY char]
[MAP KEYS TERMINATED BY char]
[LINES TERMINATED BY char]
[NULL DEFINED AS char] -- (Note: Available in Hive 0.13 and later)
| SERDE serde_name [WITH SERDEPROPERTIES (property_name=property_value, property_name=property_value, ...)]
file_format:
: SEQUENCEFILE
| TEXTFILE -- (Default, depending on hive.default.fileformat configuration)
| RCFILE -- (Note: Available in Hive 0.6.0 and later)
| ORC -- (Note: Available in Hive 0.11.0 and later)
| PARQUET -- (Note: Available in Hive 0.13.0 and later)
| AVRO -- (Note: Available in Hive 0.14.0 and later)
| INPUTFORMAT input_format_classname OUTPUTFORMAT output_format_classname
(2)根據建表語句逐層展開:hive的存儲形式,引伸思考下行式存儲、列式存儲的 在讀取時的優劣spa
hive表數據在存儲在文件系統上的,所以須要有文件存儲格式來規範化數據的存儲,一邊hive寫數據或者讀數據。hive有一些已構建好的存儲格式,也支持用戶自定義文件存儲格式。主要由兩部份內容構成file_format和row_format,二者息息相關。code
(3)文件格式:TEXTFILE,SEQUENCEFILE,RCFILE 及 自定義輸入格式DuallnputFormatorm
(4)記錄格式:SerDeblog
(5)CSV和TSV SerDeinput
SerDe is a short name for "Serializer and Deserializer." Hive uses SerDe (and FileFormat) to read and write table rows. HDFS files --> InputFileFormat --> <key, value> --> Deserializer --> Row object Row object --> Serializer --> <key, value> --> OutputFileFormat --> HDFS files
load數據時 根據表的文件格式及分割方式 直接寫入存儲,讀時校驗數據it
hive的讀時模式 與 傳統關係型數據庫寫入模式的差別io