hadoop 讀寫 elasticsearch 初探

一、參考文檔:html

http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/configuration.htmlsql

http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/mapreduce.html#_emphasis_old_emphasis_literal_org_apache_hadoop_mapred_literal_apiapache

 

二、Mapreduce相關配置api

 

 

//如下ES配置主要是提供給ES的Format類進行讀取使用
app

Configuration conf = new Configuration();elasticsearch

conf.set(ConfigurationOptions.ES_NODES, "127.0.0.1");ide

conf.set(ConfigurationOptions.ES_PORT, "9200");oop

conf.set(ConfigurationOptions.ES_INDEX_AUTO_CREATE, "yes");ui

//設置讀取和寫入的資源index/typespa

conf.set(ConfigurationOptions.ES_RESOURCE, "helloes/demo"); //read Target index/type

 

 

 

//假如只是想檢索部分數據,能夠配置ES_QUERY

//conf.set(ConfigurationOptions.ES_QUERY, "?q=me*");

 

//配置Elasticsearch爲hadoop開發的format等

Job job = Job.getInstance(conf,ElasticsearchIndexMapper.class.getSimpleName());

job.setJarByClass(ElasticsearchIndexBuilder.class);

job.setSpeculativeExecution(false);//Disable speculative execution

job.setInputFormatClass(EsInputFormat.class);   

 

//假如數據輸出到HDFS,指定Map的輸出Value的格式。而且選擇Text格式

job.setOutputFormatClass(TextOutputFormat.class);

job.setMapOutputValueClass(Text.class);

job.setMapOutputKeyClass(NullWritable.class);   

 

 

//若是選擇輸入到ES

job.setOutputFormatClass(EsOutputFormat.class);//輸出到

job.setMapOutputValueClass(LinkedMapWritable.class);//輸出的數值類 

job.setMapOutputKeyClass(Text.class);   //輸出的Key值類

 

 

job.setMapperClass(ElasticsearchIndexMapper.class);

FileInputFormat.addInputPath(job, new Path("hdfs://localhost:9000/es_input"));

FileOutputFormat.setOutputPath(job, new Path("hdfs://localhost:9000/es_output"));

job.setNumReduceTasks(0);

job.waitForCompletion(true);

 

三、對應的Mapper類ElasticsearchIndexMapper

public class ElasticsearchIndexMapper extends Mapper {

@Override

protected void map(Object key, Object value, Context context)

        throws IOException, InterruptedException {

//假如我這邊只是想導出數據到HDFS

 

  LinkedMapWritable doc = (LinkedMapWritable) value;   

  Text docVal = new Text();

   docVal.set(doc.toString());

  context.write(NullWritable.get(), docVal);

}

}

四、小結

hadoop-ES讀寫最主要的就是ESInputFormat、ESOutputFormat的參數配置(Configuration)。

另外 其它數據源操做(Mysql等)也是相似,找到對應的InputFormat,OutputFormat配置上環境參數。

相關文章
相關標籤/搜索