CDH5: 使用parcels配置lzo

時間 2019-11-17

標籤 cdh5 cdh 使用 parcels 配置 lzo 简体版

原文原文鏈接

1、Parcel 部署步驟

    1 下載: 首先須要下載 Parcel。下載完成後，Parcel 將駐留在 Cloudera Manager 主機的本地目錄中。
    2 分配: Parcel 下載後，將分配到羣集中的全部主機上並解壓縮。
    3 激活: 分配後，激活 Parcel 爲羣集重啓後使用作準備。激活前可能還須要升級。

2、lzo parcels本地化

一、到http://archive-primary.cloudera.com/gplextras/parcels/latest/下載最新lzo parcels包，根據安裝hadoop集羣的服務器操做系統版本下載，我使用的是rhel6.2, 因此下載的是HADOOP_LZO-0.4.15-1.gplextras.p0.64-el6.parcel html

二、同時下載manifest.json，並根據manifest.json文件中的hash值建立sha文件（注意：sha文件的名稱與parcels包名同樣） apache

三、命令行進入Apache（若是沒有安裝，則須要安裝）的網站根目錄下，默認是/var/www/html,在此目錄下建立lzo，並將這三個文件放在lzo目錄中 json

四、啓動httpd服務，在瀏覽器查看，如http://ip/lzo,則結果以下：瀏覽器

五、將發佈的local parcels發佈地址配置到遠程 Parcel 存儲庫 URL地址中，見下圖
服務器

六、在cloud manager的parcel頁面的可下載parcel中，就能夠看到lzo parcels, 點擊並進行下載 app

七、根據parcels的部署步驟，進行分配、激活。結果以下圖 oop

3、修改配置

修改hdfs的配置網站

將io.compression.codecs屬性值中追加,org.apache.hadoop.io.compress.Lz4Codec,
com.hadoop.compression.lzo.LzopCodec spa

修改yarn配置操作系統

將mapreduce.application.classpath的屬性值修改成：$HADOOP_MAPRED_HOME/*,$HADOOP_MAPRED_HOME/lib/*,$MR2_CLASSPATH,/opt/cloudera/parcels/HADOOP_LZO/lib/hadoop/lib/*

將mapreduce.admin.user.env的屬性值修改成：LD_LIBRARY_PATH=$HADOOP_COMMON_HOME/lib/native:$JAVA_LIBRARY_PATH:/opt/cloudera/parcels/HADOOP_LZO/lib/hadoop/lib/native

4、驗證

create external table lzo(id int,name string) row format delimited fields terminated by '#' STORED AS INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' location '/test';

建立一個data.txt,內容以下：

1#tianhe
2#gz
3#sz
4#sz
5#bx

而後使用lzop命令對此文件壓縮，而後上傳到hdfs的/test目錄下

啓動hive,建表並進行數據查詢，結果以下：

hive> create external table lzo(id int,name string) row format delimited fields terminated by '#' STORED AS INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' location '/test';
OK
Time taken: 0.108 seconds
hive> select * from lzo where id>2;
Total MapReduce jobs = 1
Launching Job 1 out of 1
Number of reduce tasks is set to 0 since there's no reduce operator
Starting Job = job_1404206497656_0002, Tracking URL = http://hadoop01.kt:8088/proxy/application_1404206497656_0002/
Kill Command = /opt/cloudera/parcels/CDH-5.0.1-1.cdh5.0.1.p0.47/lib/hadoop/bin/hadoop job -kill job_1404206497656_0002
Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0
2014-07-01 17:30:27,547 Stage-1 map = 0%, reduce = 0%
2014-07-01 17:30:37,403 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.84 sec
2014-07-01 17:30:38,469 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.84 sec
2014-07-01 17:30:39,527 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 2.84 sec
MapReduce Total cumulative CPU time: 2 seconds 840 msec
Ended Job = job_1404206497656_0002
MapReduce Jobs Launched:
Job 0: Map: 1 Cumulative CPU: 2.84 sec HDFS Read: 295 HDFS Write: 15 SUCCESS
Total MapReduce CPU Time Spent: 2 seconds 840 msec
OK
3 sz
4 sz
5 bx
Time taken: 32.803 seconds, Fetched: 3 row(s)

hive> SET hive.exec.compress.output=true;
hive> SET mapred.output.compression.codec=com.hadoop.compression.lzo.LzopCodec;
hive> create external table lzo2(id int,name string) row format delimited fields terminated by '#' STORED AS INPUTFORMAT 'com.hadoop.mapred.DeprecatedLzoTextInputFormat' OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' location '/test';
OK
Time taken: 0.092 seconds
hive> insert into table lzo2 select * from lzo; Total MapReduce jobs = 3 Launching Job 1 out of 3 Number of reduce tasks is set to 0 since there's no reduce operator Starting Job = job_1404206497656_0003, Tracking URL = http://hadoop01.kt:8088/proxy/application_1404206497656_0003/ Kill Command = /opt/cloudera/parcels/CDH-5.0.1-1.cdh5.0.1.p0.47/lib/hadoop/bin/hadoop job -kill job_1404206497656_0003 Hadoop job information for Stage-1: number of mappers: 1; number of reducers: 0 2014-07-01 17:33:47,351 Stage-1 map = 0%, reduce = 0% 2014-07-01 17:33:57,114 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.96 sec 2014-07-01 17:33:58,170 Stage-1 map = 100%, reduce = 0%, Cumulative CPU 1.96 sec MapReduce Total cumulative CPU time: 1 seconds 960 msec Ended Job = job_1404206497656_0003 Stage-4 is selected by condition resolver. Stage-3 is filtered out by condition resolver. Stage-5 is filtered out by condition resolver. Moving data to: hdfs://hadoop01.kt:8020/tmp/hive-hdfs/hive_2014-07-01_17-33-22_504_966970548620625440-1/-ext-10000 Loading data to table default.lzo2 Table default.lzo2 stats: [num_partitions: 0, num_files: 2, num_rows: 0, total_size: 171, raw_data_size: 0] MapReduce Jobs Launched: Job 0: Map: 1 Cumulative CPU: 1.96 sec HDFS Read: 295 HDFS Write: 79 SUCCESS Total MapReduce CPU Time Spent: 1 seconds 960 msec OK Time taken: 36.625 seconds

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。