爲了分析海量數據,須要尋找一款分佈式計算的開源項目,之前用的比較多的是hive,可是因爲hive任務最終會被解析成MR任務,MR從硬盤讀取數據並把中間結果寫進硬盤,速度很慢,因此要尋找一款基於內存計算的開源項目,presto是Facebook開源的,基於內存的分佈式計算框架。html
Presto優勢java
1. 基於標準的ANSI SQL,有sql基礎的都能快速使用node
2. 安裝部署簡單sql
3. 基於內存計算,不要依賴MR,速度比hive快不少app
4. 數據源解耦框架
安裝使用參考:maven
https://prestodb.io/分佈式
http://prestodb-china.com/docs/current/index.htmloop
安裝url
解壓修改核心配置:
etc/node.properties 配置每一個節點信息
node.environment=production node.id=datanode4 node.data-dir=/data/presto
etc/config.properties 配置server的配置信息
coordinator=true node-scheduler.include-coordinator=true http-server.http.port=9999 query.max-memory=4GB query.max-memory-per-node=1GB discovery-server.enabled=true discovery.uri=http://datanode4:9999 exchange.http-client.request-timeout=120s
etc/catalog/hive.properties hive鏈接器
connector.name=hive-hadoop2 hive.metastore.uri=thrift://datanode2:9083 hive.allow-drop-table=true hive.config.resources=/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml
bin/launcher start
界面http://datanode4:9999/
使用
用hive的元數據,建立hive庫:
create database if not exists monitor location '/apps/hive/warehouse/monitor';
建立hive表:
use monitor; create external table if not exists monitor.url_monitor_report (product STRING, url STRING, span INT, ymd STRING, hms STRING, succeed INT) Partitioned by (p_ymd STRING,p_hour STRING,p_minute STRING) ROW FORMAT DELIMITED FIELDS TERMINATED BY '\t' Location '/apps/hive/warehouse/monitor/url_monitor_report' ;
這個時候對應的hdfs目錄已經存在了,
生成分區:
alter table monitor.url_monitor_report add if not exists partition (p_ymd='2016-06-23',p_hour='00',p_minute='00') location '/apps/hive/warehouse/monitor/url_monitor_report/2016-06-23/00/00' ......//省略 ;
數據直接寫到對應的目錄文件便可:
1. 命令行使用:
/opt/presto/bin/presto --server 172.172.178.72:9999 --catalog hive --schema monitor
(presto是presto-cli-excute.jar進行重命名,而且chmod後而來的,具體詳細能夠看presto-cli裏面的pom.xml插件really-executable-jar-maven-plugin)
presto:monitor>select * from monitor.url_monitor_report where p_ymd>='2016-06-23' and p_ymd<='2016-06-23'
2. JDBC方式使用:
<dependency> <groupId>com.facebook.presto</groupId> <artifactId>presto-jdbc</artifactId> <version>0.144.1</version> </dependency>
代碼:
public static void main(String[] args) throws SQLException { String sql = "select distinct(url) from monitor.url_monitor_report where p_ymd>='2016-06-23' and p_ymd<='2016-06-23'"; Connection conn = DriverManager.getConnection("jdbc:presto://172.172.178.72:9999/hive/monitor", "hive", "hive"); Statement stmt = conn.createStatement(); ResultSet result = stmt.executeQuery(sql); while (null != result && result.next()) { String url = result.getString("url"); System.out.println(url); } result.close(); stmt.close(); conn.close(); }