1.安裝phoenixhtml
在界面上設置Phoenix的parcel包:java
http://52.11.56.155:7180/cmf/settings?groupKey=config.scm.parcel.display_group&groupParent=node
添加一個Remote Parcel Repository URLs url:http://archive.cloudera.com/cloudera-labs/phoenix/parcels/1.0/sql
CM會自動發現新的parcel,而後點擊Download,Distribute and Active。重啓集羣shell
2.進入到某臺服務器上,查看phoenix的安裝路徑apache
[root@ip-172-31-25-243 ~]# cd /opt/cloudera/parcels/CLABS_PHOENIX [root@ip-172-31-25-243 phoenix]# ls bin dev examples lib phoenix-4.3.0-clabs-phoenix-1.0.0-client.jar phoenix-4.3.0-clabs-phoenix-1.0.0-server.jar phoenix-4.3.0-clabs-phoenix-1.0.0-server-without-antlr.jar
bin目錄下爲可執行文件,examples目錄下爲一些樣例服務器
3.導入CSV格式的表session
CSV文件爲/root/ceb/cis_cust_imp_info.csv,內容以下:app
20131131,100010001001,BR01,2000.01 20131131,100010001002,BR01,2000.02 20131131,100010001003,BR02,2000.03
定義一個表結構的文件/root/ceb/cis_cust_imp_info.sql,內容以下,ide
CREATE TABLE IF NOT EXISTS cis_cust_imp_info( statistics_dt varchar(50), cust_id varchar(50), open_org_id varchar(50), assert9_bal decimal(18,2), CONSTRAINT pk PRIMARY KEY (statistics_dt, cust_id) );
注意最後的分號是必須的。
運行命令,導入CSV
[root@ip-172-31-25-243 phoenix]# bin/psql.py 172.31.25.244 /root/ceb/cis_cust_imp_info.sql /root/ceb/cis_cust_imp_info.csv SLF4J: Failed to load class "org.slf4j.impl.StaticLoggerBinder". SLF4J: Defaulting to no-operation (NOP) logger implementation SLF4J: See http://www.slf4j.org/codes.html#StaticLoggerBinder for further details. 15/09/04 10:26:59 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable 15/09/04 10:27:00 WARN impl.MetricsConfig: Cannot locate configuration: tried hadoop-metrics2-phoenix.properties,hadoop-metrics2.properties no rows upserted Time: 0.259 sec(s) csv columns from database. CSV Upsert complete. 3 rows upserted Time: 0.067 sec(s)
在hbase shell中進行驗證:
hbase(main):001:0> list TABLE CIS_CUST_IMP_INFO SYSTEM.CATALOG SYSTEM.SEQUENCE SYSTEM.STATS 4 row(s) in 0.2650 seconds => ["CIS_CUST_IMP_INFO", "SYSTEM.CATALOG", "SYSTEM.SEQUENCE", "SYSTEM.STATS"] hbase(main):002:0> scan 'CIS_CUST_IMP_INFO' ROW COLUMN+CELL 20131131\x00100010001001 column=0:ASSERT9_BAL, timestamp=1441362422661, value=\xC2\x15\x01\x02 20131131\x00100010001001 column=0:OPEN_ORG_ID, timestamp=1441362422661, value=BR01 20131131\x00100010001001 column=0:_0, timestamp=1441362422661, value= 20131131\x00100010001002 column=0:ASSERT9_BAL, timestamp=1441362422661, value=\xC2\x15\x01\x03 20131131\x00100010001002 column=0:OPEN_ORG_ID, timestamp=1441362422661, value=BR01 20131131\x00100010001002 column=0:_0, timestamp=1441362422661, value= 20131131\x00100010001003 column=0:ASSERT9_BAL, timestamp=1441362422661, value=\xC2\x15\x01\x04 20131131\x00100010001003 column=0:OPEN_ORG_ID, timestamp=1441362422661, value=BR02 20131131\x00100010001003 column=0:_0, timestamp=1441362422661, value= 3 row(s) in 0.1840 seconds
4.以MR的方式導入大量CSV文件
[root@ip-172-31-25-243 phoenix]# hadoop jar phoenix-4.3.0-clabs-phoenix-1.0.0-client.jar org.apache.phoenix.mapreduce.CsvBulkLoadTool --table cis_cust_imp_info --input /root/ceb/cis_cust_imp_info.csv --zookeeper 172.31.25.244
發生錯誤:
java.util.concurrent.ExecutionException: java.lang.IllegalAccessError: class com.google.protobuf.HBaseZeroCopyByteString cannot access its superclass com.google.protobuf.LiteralByteString at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:188) at org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1795) at org.apache.hadoop.hbase.client.HTable.coprocessorService(HTable.java:1751) at org.apache.phoenix.query.ConnectionQueryServicesImpl.metaDataCoprocessorExec(ConnectionQueryServicesImpl.java:1006) at org.apache.phoenix.query.ConnectionQueryServicesImpl.getTable(ConnectionQueryServicesImpl.java:1257) at org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:348) at org.apache.phoenix.schema.MetaDataClient.updateCache(MetaDataClient.java:309) at org.apache.phoenix.schema.MetaDataClient.getCurrentTime(MetaDataClient.java:293) at org.apache.phoenix.compile.StatementContext.getCurrentTime(StatementContext.java:253) at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:184) at org.apache.phoenix.execute.BaseQueryPlan.iterator(BaseQueryPlan.java:154) at org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:235) at org.apache.phoenix.jdbc.PhoenixStatement$1.call(PhoenixStatement.java:226) at org.apache.phoenix.call.CallRunner.run(CallRunner.java:53) at org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:225) at org.apache.phoenix.jdbc.PhoenixStatement.executeQuery(PhoenixStatement.java:1039) at org.apache.phoenix.jdbc.PhoenixDatabaseMetaData.getColumns(PhoenixDatabaseMetaData.java:492) at org.apache.phoenix.util.CSVCommonsLoader.generateColumnInfo(CSVCommonsLoader.java:296) at org.apache.phoenix.mapreduce.CsvBulkLoadTool.buildImportColumns(CsvBulkLoadTool.java:291) at org.apache.phoenix.mapreduce.CsvBulkLoadTool.loadData(CsvBulkLoadTool.java:200) at org.apache.phoenix.mapreduce.CsvBulkLoadTool.run(CsvBulkLoadTool.java:186) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:84) at org.apache.phoenix.mapreduce.CsvBulkLoadTool.main(CsvBulkLoadTool.java:97) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.hadoop.util.RunJar.run(RunJar.java:221) at org.apache.hadoop.util.RunJar.main(RunJar.java:136) Caused by: java.lang.IllegalAccessError: class com.google.protobuf.HBaseZeroCopyByteString cannot access its superclass com.google.protobuf.LiteralByteString
網上搜索,發現是因爲HBASE的一個bug,解決方法是:
[root@ip-172-31-25-243 phoenix]# cd /opt/cloudera/parcels/CDH/lib/hadoop ln -s /opt/cloudera/parcels/CDH-5.4.5-1.cdh5.4.5.p0.7/lib/hbase/lib/hbase-protocol-1.0.0-cdh5.4.5.jar hbase-protocol-1.0.0-cdh5.4.5.jar
從新運行導入命令,發現以下錯誤:
15/09/04 11:04:43 WARN security.UserGroupInformation: PriviledgedActionException as:root (auth:SIMPLE) cause:org.apache.hadoop.security.AccessControlException: Permission denied: user=root, access=WRITE, inode="/user":hdfs:supergroup:drwxr-xr-x at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.checkFsPermission(DefaultAuthorizationProvider.java:257) at org.apache.hadoop.hdfs.server.namenode.DefaultAuthorizationProvider.check(DefaultAuthorizationProvider.java:238)
緣由是/user目錄權限問題,用hdfs用戶從新跑一遍,發生錯誤。用chmod 修改/user爲777
sudo -u hdfs hdfs dfs -chmod 777 /user
sudo -u hdfs hadoop jar phoenix-4.3.0-clabs-phoenix-1.0.0-client.jar org.apache.phoenix.mapreduce.CsvBulkLoadTool --table cis_cust_imp_info --input /root/ceb/cis_cust_imp_info.csv --zookeeper 172.31.25.244
15/09/04 11:06:05 ERROR mapreduce.CsvBulkLoadTool: Import job on table=CIS_CUST_IMP_INFO failed due to exception:org.apache.hadoop.mapreduce.lib.input.InvalidInputException: Input path does not exist: hdfs://ip-172-31-25-243.us-west-2.compute.internal:8020/root/ceb/cis_cust_imp_info.csv
15/09/04 11:06:05 INFO client.ConnectionManager$HConnectionImplementation: Closing zookeeper sessionid=0x14f97b7df1400a4
原來用MR模式跑,文件須要放到HDFS上
這時MR運行能夠順利完成,HFile順利生產,可是在loadIncremental環境卡住了。緣由在於load到hbase中的表屬於hbase:hbase,但生產的HFile文件屬於當前用戶和組。因此以hbase用戶運行
sudo -u hbase hadoop jar phoenix-4.3.0-clabs-phoenix-1.0.0-client.jar org.apache.phoenix.mapreduce.CsvBulkLoadTool --table cis_cust_imp_info --input /root/ceb/cis_cust_imp_info.csv --zookeeper 172.31.25.244
順利搞定!