python經過thrift方式鏈接hive

時間 2019-11-17

標籤 python 經過 thrift 方式鏈接 hive 欄目 Python 简体版

原文原文鏈接

hive安裝完成後，若是隻是本地使用，啓用python

nohup hive --service metastore &

[hadoop@master1 usr]$ hive

Logging initialized using configuration in file:/data/usr/hive/conf/hive-log4j.properties
hive> use fmcm;
OK
Time taken: 0.874 seconds

若是是要腳本調用，則須要啓用HiveServer2,確保10000端口已經被監聽（可在hive-site.xml中修改端口）c++

nohup hive --service hiveserver2 &

[hadoop@master1 usr]$ netstat -an|grep 10000            
tcp        0      0 0.0.0.0:10000           0.0.0.0:*               LISTEN

HiveServer2爲客戶端在遠程執行hive查詢提供了接口，經過Thrift RPC來實現，還提供了多用戶併發和認證功能。目前python能夠經過pyhs2這個模塊來鏈接HiveServer2，實現查詢和取回結果的操做。git

不過pyhs2已經不在維護,追新的能夠參考另外2個很好的python package(已經被證實pyhs2存在性能瓶頸，最好儘快切換到pyhive)github

https://github.com/dropbox/PyHivesql

https://github.com/cloudera/impyla數據庫

安裝sasl失敗的話，先安裝：
yum install gcc-c++ python-devel.x86_64 cyrus-sasl-devel.x86_64

pyhs2的項目託管在github之上，地址爲https://github.com/BradRuderman/pyhs2或在https://pypi.python.org/pypi/pyhs2/0.2直接下載bash

若是安裝不成功，能夠嘗試先安裝如下的組件：併發

yum install cyrus-sasl-plain
yum install cyrus-sasl-develcurl

安裝時若是遇到報錯: tcp

error: sasl/sasl.h: No such file or directory

能夠嘗試先安裝sasl , ubantu能夠用sudo apt-get install libsasl2-dev, CentOS能夠使用anaconda的pip安裝, 或者按照如下步驟安裝:

curl -O -L ftp://ftp.cyrusimap.org/cyrus-sasl/cyrus-sasl-2.1.26.tar.gz
tar xzf cyrus-sasl-2.1.2.26.tar.gz
cd cyrus-sasl-2.1.26.tar.gz
./configure && make install


最後附上測試代碼:

# -*- coding:utf-8 -*-
'''
採用Hive和thrift方式鏈接數據庫
'''
import pyhs2
import sys
reload(sys)
sys.setdefaultencoding('utf8')

class HiveClient:
    def __init__(self, db_host, user, password, database, port=10000, authMechanism="PLAIN"):
      
        self.conn = pyhs2.connect(host=db_host,
                                  port=port,
                                  authMechanism=authMechanism,
                                  user=user,
                                  password=password,
                                  database=database,
                                  )

    def query(self, sql):
        with self.conn.cursor() as cursor:
            cursor.execute(sql)
            return cursor.fetch()

    def close(self):
        self.conn.close()


def main():
    """
    main process
    @rtype:
    @return:
    @note:

    """
    hive_client = HiveClient(db_host='10.24.33.3', port=10000, user='hadoop', password='hadoop',
                             database='fmcm', authMechanism='PLAIN')
    result = hive_client.query('select * from fm_news_newsaction limit 10')
    print result
    hive_client.close()


if __name__ == '__main__':
    main()