presto入門安裝使用

爲了分析海量數據,須要尋找一款分佈式計算的開源項目,之前用的比較多的是hive,可是因爲hive任務最終會被解析成MR任務,MR從硬盤讀取數據並把中間結果寫進硬盤,速度很慢,因此要尋找一款基於內存計算的開源項目,presto是Facebook開源的,基於內存的分佈式計算框架。html

Presto優勢java

1. 基於標準的ANSI SQL,有sql基礎的都能快速使用node

2. 安裝部署簡單sql

3. 基於內存計算,不要依賴MR,速度比hive快不少app

4. 數據源解耦框架

安裝使用參考:maven

https://prestodb.io/分佈式

http://prestodb-china.com/docs/current/index.htmloop

安裝url

解壓修改核心配置:

etc/node.properties 配置每一個節點信息

node.environment=production
node.id=datanode4
node.data-dir=/data/presto

etc/config.properties 配置server的配置信息

coordinator=true
node-scheduler.include-coordinator=true
http-server.http.port=9999
query.max-memory=4GB
query.max-memory-per-node=1GB
discovery-server.enabled=true
discovery.uri=http://datanode4:9999
exchange.http-client.request-timeout=120s

etc/catalog/hive.properties hive鏈接器

connector.name=hive-hadoop2
hive.metastore.uri=thrift://datanode2:9083
hive.allow-drop-table=true
hive.config.resources=/etc/hadoop/conf/core-site.xml,/etc/hadoop/conf/hdfs-site.xml

bin/launcher start

界面http://datanode4:9999/

使用

用hive的元數據,建立hive庫:

create database if not exists monitor location '/apps/hive/warehouse/monitor';

建立hive表:

use monitor;
create external table  if not exists monitor.url_monitor_report
(product  STRING,
url      STRING,
span  INT,
ymd    STRING,
hms    STRING,
succeed INT)
Partitioned by (p_ymd STRING,p_hour   STRING,p_minute STRING)
ROW FORMAT DELIMITED FIELDS TERMINATED BY  '\t'
Location  '/apps/hive/warehouse/monitor/url_monitor_report'
;

這個時候對應的hdfs目錄已經存在了,

 

生成分區:

 

alter table monitor.url_monitor_report add if not exists
partition (p_ymd='2016-06-23',p_hour='00',p_minute='00')  location '/apps/hive/warehouse/monitor/url_monitor_report/2016-06-23/00/00'
......//省略
;

數據直接寫到對應的目錄文件便可:

1. 命令行使用:

/opt/presto/bin/presto --server 172.172.178.72:9999 --catalog hive --schema monitor

(presto是presto-cli-excute.jar進行重命名,而且chmod後而來的,具體詳細能夠看presto-cli裏面的pom.xml插件really-executable-jar-maven-plugin)

presto:monitor>select * from monitor.url_monitor_report where p_ymd>='2016-06-23' and p_ymd<='2016-06-23'

2. JDBC方式使用:

<dependency>
	<groupId>com.facebook.presto</groupId>
	<artifactId>presto-jdbc</artifactId>
	<version>0.144.1</version>
</dependency>

代碼:

public static void main(String[] args) throws SQLException {
	String sql = "select distinct(url) from monitor.url_monitor_report where p_ymd>='2016-06-23' and p_ymd<='2016-06-23'";
	Connection conn = DriverManager.getConnection("jdbc:presto://172.172.178.72:9999/hive/monitor", "hive", "hive");
	Statement stmt = conn.createStatement();
	ResultSet result = stmt.executeQuery(sql);
	while (null != result && result.next()) {
		String url = result.getString("url");
		System.out.println(url);
	}
	result.close();
	stmt.close();
	conn.close();
}
相關文章
相關標籤/搜索