0003-如何在CDH中使用LZO壓縮

時間 2019-11-17

標籤如何 cdh 使用 lzo 壓縮简体版

原文原文鏈接

1.問題描述html

CDH中默認不支持Lzo壓縮編碼，須要下載額外的Parcel包，才能讓Hadoop相關組件如HDFS，Hive，Spark支持Lzo編碼。sql

具體請參考：apache

https://www.cloudera.com/documentation/enterprise/latest/topics/cm_mc_gpl_extras.html併發

https://www.cloudera.com/documentation/enterprise/latest/topics/cm_ig_install_gpl_extras.html#xd_583c10bfdbd326ba-3ca24a24-13d80143249--7ec6oop

首先我在沒作額外配置的狀況下，生成Lzo文件並讀取。咱們在Hive中建立兩張表，test_table和test_table2，test_table是文本文件的表，test_table2是Lzo壓縮編碼的表。以下：測試

create external table test_table(s1 string,s2 string)row format delimited fields terminated by '#'location '/lilei/test_table'; insert into test_table values('1','a'),('2','b'); create external table test_table2(s1 string,s2 string)row format delimited fields terminated by '#'location '/lilei/test_table2';

經過beeline訪問Hive並執行上面命令：編碼

查詢test_table中的數據：3d

將test_table中的數據插入到test_table2，並設置輸出文件爲lzo壓縮：code

set mapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzoCodec;set hive.exec.compress.output=true;set mapreduce.output.fileoutputformat.compress=true;set mapreduce.output.fileoutputformat.compress.type=BLOCK; insert overwrite table test_table2 select * from test_table;

在Hive中執行報錯以下：orm

Error:Error while processing statement: FAILED: Execution Error, return code 2 from org.apache.hadoop.hive.ql.exec.mr.MapRedTask (state=08S01,code=2)

經過Yarn的8088能夠發現是由於找不到Lzo壓縮編碼：

Compression codec com.hadoop.compression.lzo.LzoCodec was not found.

2.解決辦法

經過Cloudera Manager的Parcel頁面配置Lzo的Parcel包地址：

注意：若是集羣沒法訪問公網，須要提早下載好Parcel包併發布到httpd

下載->分配->激活

配置HDFS的壓縮編碼加入Lzo：

com.hadoop.compression.lzo.LzoCodeccom.hadoop.compression.lzo.LzopCodec

保存更改，部署客戶端配置，重啓整個集羣。

等待重啓成功：

再次插入數據到test_table2，設置爲Lzo編碼格式：

set mapreduce.output.fileoutputformat.compress.codec=com.hadoop.compression.lzo.LzoCodec;set hive.exec.compress.output=true;set mapreduce.output.fileoutputformat.compress=true;set mapreduce.output.fileoutputformat.compress.type=BLOCK; insert overwrite table test_table2 select * from test_table;

插入成功：

2.1 Hive驗證

首先確認test_table2中的文件爲Lzo格式：

在Hive的beeline中進行測試：

Hive基於Lzo壓縮文件運行正常。

2.2 Spark SQL驗證

var textFile=sc.textFile("hdfs://ip-172-31-8-141:8020/lilei/test_table2/000000_0.lzo_deflate") textFile.count() sqlContext.sql("select * from test_table2")

SparkSQL基於Lzo壓縮文件運行正常。

醉酒鞭名馬，少年多浮誇！嶺南浣溪沙，嘔吐酒肆下！摯友不願放，數據玩的花！

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。