一、參考文檔:html
http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/configuration.htmlsql
http://www.elasticsearch.org/guide/en/elasticsearch/hadoop/current/mapreduce.html#_emphasis_old_emphasis_literal_org_apache_hadoop_mapred_literal_apiapache
二、Mapreduce相關配置api
//如下ES配置主要是提供給ES的Format類進行讀取使用
app
Configuration conf = new Configuration();elasticsearch
conf.set(ConfigurationOptions.ES_NODES, "127.0.0.1");ide
conf.set(ConfigurationOptions.ES_PORT, "9200");oop
conf.set(ConfigurationOptions.ES_INDEX_AUTO_CREATE, "yes");ui
//設置讀取和寫入的資源index/typespa
conf.set(ConfigurationOptions.ES_RESOURCE, "helloes/demo"); //read Target index/type
//假如只是想檢索部分數據,能夠配置ES_QUERY
//conf.set(ConfigurationOptions.ES_QUERY, "?q=me*");
//配置Elasticsearch爲hadoop開發的format等
Job job = Job.getInstance(conf,ElasticsearchIndexMapper.class.getSimpleName());
job.setJarByClass(ElasticsearchIndexBuilder.class);
job.setSpeculativeExecution(false);//Disable speculative execution
job.setInputFormatClass(EsInputFormat.class);
//假如數據輸出到HDFS,指定Map的輸出Value的格式。而且選擇Text格式
job.setOutputFormatClass(TextOutputFormat.class);
job.setMapOutputValueClass(Text.class);
job.setMapOutputKeyClass(NullWritable.class);
//若是選擇輸入到ES
job.setOutputFormatClass(EsOutputFormat.class);//輸出到
job.setMapOutputValueClass(LinkedMapWritable.class);//輸出的數值類
job.setMapOutputKeyClass(Text.class); //輸出的Key值類
job.setMapperClass(ElasticsearchIndexMapper.class);
FileInputFormat.addInputPath(job, new Path("hdfs://localhost:9000/es_input"));
FileOutputFormat.setOutputPath(job, new Path("hdfs://localhost:9000/es_output"));
job.setNumReduceTasks(0);
job.waitForCompletion(true);
三、對應的Mapper類ElasticsearchIndexMapper
public class ElasticsearchIndexMapper extends Mapper {
@Override
protected void map(Object key, Object value, Context context)
throws IOException, InterruptedException {
//假如我這邊只是想導出數據到HDFS
LinkedMapWritable doc = (LinkedMapWritable) value;
Text docVal = new Text();
docVal.set(doc.toString());
context.write(NullWritable.get(), docVal);
}
}
四、小結
hadoop-ES讀寫最主要的就是ESInputFormat、ESOutputFormat的參數配置(Configuration)。
另外 其它數據源操做(Mysql等)也是相似,找到對應的InputFormat,OutputFormat配置上環境參數。