impala常見錯誤小結

時間 2021-02-15

標籤 java node mysql nginx sql shell apache dom socket ide 欄目 Hadoop 简体版

原文原文鏈接

線上使用impala作一部分的nginx日誌實時計算，簡單記錄下在使用過程當中遇到的一些小問題：java

部署常見問題：

1.mysql jar錯誤

Caused by: org.datanucleus.exceptions.NucleusException: Attempt to invoke the "DBCP" plugin to create a ConnectionPool gave an error : The specified datastore driver ("com.mysql.jdbc.Driver") was not found in the CLASSPATH. Please check your CLASSPATH specification, and the name of the driver.

拷貝hive的mysql-connector-java.xxxx.jar文件至impala的庫文件目錄便可默認(/usr/lib/impala/lib)

2.hdfs namenode錯誤

E0127 19:48:16.708744 31675 impala-server.cc:339] Could not read the HDFS root directory at hdfs://bipcluster. Error was:
Operation category READ is not supported in state standby

namenode ha沒有自動開啓，致使兩個namenode都在standby狀態。

手動設置爲active狀態便可。

3.impala特性支持

E0127 19:28:25.289991 13469 impala-server.cc:339] ERROR: short-circuit local reads is disabled because

- Impala cannot read or execute the parent directory of dfs.domain.socket.path
- dfs.client.read.shortcircuit is not enabled.
ERROR: block location tracking is not properly enabled because
- dfs.client.file-block-storage-locations.timeout is too low. It should be at least 3000.
E0127 19:28:25.290117 13469 impala-server.cc:341] Aborting Impala Server startup due to improper configuration

hdfs的配置文件hdfs-site.xml增長以下內容：

<property>
    <name>dfs.client.read.shortcircuit</name>
    <value>true</value>
</property>
<property>
    <name>dfs.domain.socket.path</name>
    <value>/var/run/hadoop-hdfs/dn._PORT</value>
</property>
<property>
    <name>dfs.client.file-block-storage-locations.timeout</name>
    <value>3000</value>
</property>
<property>
    <name>dfs.datanode.hdfs-blocks-metadata.enabled</name>
    <value>true</value>
</property>

使用常見錯誤：

4.建立表錯誤

impala默認使用impala用戶運行，建立表時，會因爲hdfs權限致使建立報錯

Query: create table nginx_test (line string) STORED AS TEXTFILE
ERROR: MetaException: Got exception: org.apache.hadoop.security.AccessControlException Permission denied: user=impala, access=WRITE, inode="/bip/hive_warehouse/cdnlog.db":hdfs:hdfs:drwxr-xr-x

5.查詢出錯

ERROR: Failed to open HDFS file hdfs://bipcluster/bip/hive_warehouse/cdnlog.db/dd_log/dt=20140117/data.file
Error(255): Unknown error 255
hdfsOpenFile(hdfs://bipcluster/bip/hive_warehouse/cdnlog.db/dd_log/dt=20140117/data.file): FileSystem#open((Lorg/apache/hadoop/fs/Path;I)Lorg/apache/hadoop/fs/FSDataInputStream;) error:
java.io.IOException: Filesystem closed
        at org.apache.hadoop.hdfs.DFSClient.checkOpen(DFSClient.java:565)
        at org.apache.hadoop.hdfs.DFSClient.open(DFSClient.java:1115)
        at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:249)
        at org.apache.hadoop.hdfs.DistributedFileSystem.open(DistributedFileSystem.java:82)

報hdfs文件打開錯誤，經過hadoop fs -cat能夠查看文件內容，即impala和datanode通信出錯，重啓impala進程便可

1)hive有隱式轉換的功能，能夠直接avg(string字段)（若是是數字型的值），impala的話須要手動cast

好比下面這個：

[10.19.111.106:21000] > select avg(status) from dd_log where dt='20140117';
Query: select avg(status) from dd_log where dt='20140117'
ERROR: AnalysisException: AVG requires a numeric or timestamp parameter: AVG(status)

能夠經過下面的方式運行：

select avg(cast(status as DOUBLE)) from dd_log where dt='20140117';

2)ERROR: NotImplementedException: ORDER BY without LIMIT currently not supported

impala中order by 須要limit的限制才能夠運行，不然報錯，能夠經過limit一個很大的值來查看全部的數據，另外limit不支持 limit a,b這種格式。

select ip,count(1) as cnt from cdnlog.dd_log group by ip order by cnt desc limit 100000000;

7.impala 1.0升級到1.1.1的問題

1)1.0的客戶端和1.1的服務端不兼容，用1.0的客戶端鏈接1.1的server後執行refresh會報錯ERROR: ExecPlanRequest rpcERROR

2)1.1的Refresh行爲改變了，1.0的Refresh等同於1.1的Invalid Metadata [Tablename]，而1.1的Refresh後面必須加表名

能夠在impala-shell中使用-r參數refresh metadata.

3)1.1的客戶端-o選項的默認行爲也發生改變了，1.0輸出到文件的格式是value逗號value，而在1.1必須顯示指定-B --output_delimiter=,

4)1.1的server端與1.0的statestore不兼容，沒法註冊成功

8.Could not resolve host for clientsocket

datanode的hosts解析問題

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。