使用ElasticSearch完成百萬級數據查詢附近的人功能

這一篇咱們來看一下使用ElasticSearch完成大數據量查詢附近的人功能,搜索N米範圍的內的數據。java

準備環境 本機測試使用了ElasticSearch最新版5.5.1,SpringBoot1.5.4,spring-data-ElasticSearch2.1.4. 新建Springboot項目,勾選ElasticSearch和web。 pom文件以下git

<?xml version="1.0" encoding="UTF-8"?>github

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"> <modelVersion>4.0.0</modelVersion>web

<groupId>com.tianyalei</groupId>
<artifactId>elasticsearch</artifactId>
<version>0.0.1-SNAPSHOT</version>
<packaging>jar</packaging>

<name>elasticsearch</name>
<description>Demo project for Spring Boot</description>

<parent>
	<groupId>org.springframework.boot</groupId>
	<artifactId>spring-boot-starter-parent</artifactId>
	<version>1.5.4.RELEASE</version>
	<relativePath/> <!-- lookup parent from repository -->
</parent>

<properties>
	<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
	<project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
	<java.version>1.8</java.version>
</properties>

<dependencies>
	<dependency>
		<groupId>org.springframework.boot</groupId>
		<artifactId>spring-boot-starter-data-elasticsearch</artifactId>
	</dependency>
	<dependency>
		<groupId>org.springframework.boot</groupId>
		<artifactId>spring-boot-starter-web</artifactId>
	</dependency>

	<dependency>
		<groupId>org.springframework.boot</groupId>
		<artifactId>spring-boot-starter-test</artifactId>
		<scope>test</scope>
	</dependency>
	<dependency>
		<groupId>com.sun.jna</groupId>
		<artifactId>jna</artifactId>
		<version>3.0.9</version>
	</dependency>
</dependencies>

<build>
	<plugins>
		<plugin>
			<groupId>org.springframework.boot</groupId>
			<artifactId>spring-boot-maven-plugin</artifactId>
		</plugin>
	</plugins>
</build>

</project> 新建model類Person package com.tianyalei.elasticsearch.model;spring

import org.springframework.data.annotation.Id; import org.springframework.data.elasticsearch.annotations.Document; import org.springframework.data.elasticsearch.annotations.GeoPointField;apache

import java.io.Serializable;數組

/**緩存

  • model類 */ @Document(indexName="elastic_search_project",type="person",indexStoreType="fs",shards=5,replicas=1,refreshInterval="-1") public class Person implements Serializable { @Id private int id;springboot

    private String name;app

    private String phone;

    /**

    • 地理位置經緯度
    • lat緯度,lon經度 "40.715,-74.011"
    • 若是用數組則相反[-73.983, 40.719] */ @GeoPointField private String address;

    public int getId() { return id; }

    public void setId(int id) { this.id = id; }

    public String getName() { return name; }

    public void setName(String name) { this.name = name; }

    public String getPhone() { return phone; }

    public void setPhone(String phone) { this.phone = phone; }

    public String getAddress() { return address; }

    public void setAddress(String address) { this.address = address; } } 我用address字段表示經緯度位置。注意,使用String[]和String分別來表示經緯度時是不一樣的,見註釋。 import com.tianyalei.elasticsearch.model.Person; import org.springframework.data.elasticsearch.repository.ElasticsearchRepository;

public interface PersonRepository extends ElasticsearchRepository<Person, Integer> {

} 看一下Service類,完成插入測試數據的功能,查詢的功能我放在Controller裏了,爲了方便查看,正常是應該放在Service裏 package com.tianyalei.elasticsearch.service;

import com.tianyalei.elasticsearch.model.Person; import com.tianyalei.elasticsearch.repository.PersonRepository; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.data.elasticsearch.core.ElasticsearchTemplate; import org.springframework.data.elasticsearch.core.query.IndexQuery; import org.springframework.stereotype.Service;

import java.util.ArrayList; import java.util.List;

@Service public class PersonService { @Autowired PersonRepository personRepository; @Autowired ElasticsearchTemplate elasticsearchTemplate;

private static final String PERSON_INDEX_NAME = "elastic_search_project";
private static final String PERSON_INDEX_TYPE = "person";

public Person add(Person person) {
    return personRepository.save(person);
}

public void bulkIndex(List<Person> personList) {
    int counter = 0;
    try {
        if (!elasticsearchTemplate.indexExists(PERSON_INDEX_NAME)) {
            elasticsearchTemplate.createIndex(PERSON_INDEX_TYPE);
        }
        List<IndexQuery> queries = new ArrayList<>();
        for (Person person : personList) {
            IndexQuery indexQuery = new IndexQuery();
            indexQuery.setId(person.getId() + "");
            indexQuery.setObject(person);
            indexQuery.setIndexName(PERSON_INDEX_NAME);
            indexQuery.setType(PERSON_INDEX_TYPE);

            //上面的那幾步也可使用IndexQueryBuilder來構建
            //IndexQuery index = new IndexQueryBuilder().withId(person.getId() + "").withObject(person).build();

            queries.add(indexQuery);
            if (counter % 500 == 0) {
                elasticsearchTemplate.bulkIndex(queries);
                queries.clear();
                System.out.println("bulkIndex counter : " + counter);
            }
            counter++;
        }
        if (queries.size() > 0) {
            elasticsearchTemplate.bulkIndex(queries);
        }
        System.out.println("bulkIndex completed.");
    } catch (Exception e) {
        System.out.println("IndexerService.bulkIndex e;" + e.getMessage());
        throw e;
    }
}

} 注意看bulkIndex方法,這個是批量插入數據用的,bulk也是ES官方推薦使用的批量插入數據的方法。這裏是每逢500的整數倍就bulk插入一次。

package com.tianyalei.elasticsearch.controller;

import com.tianyalei.elasticsearch.model.Person; import com.tianyalei.elasticsearch.service.PersonService; import org.elasticsearch.common.unit.DistanceUnit; import org.elasticsearch.index.query.GeoDistanceQueryBuilder; import org.elasticsearch.index.query.QueryBuilders; import org.elasticsearch.search.sort.GeoDistanceSortBuilder; import org.elasticsearch.search.sort.SortBuilders; import org.elasticsearch.search.sort.SortOrder; import org.springframework.beans.factory.annotation.Autowired; import org.springframework.data.domain.PageRequest; import org.springframework.data.domain.Pageable; import org.springframework.data.elasticsearch.core.ElasticsearchTemplate; import org.springframework.data.elasticsearch.core.query.NativeSearchQueryBuilder; import org.springframework.data.elasticsearch.core.query.SearchQuery; import org.springframework.web.bind.annotation.GetMapping; import org.springframework.web.bind.annotation.RestController;

import java.text.DecimalFormat; import java.util.ArrayList; import java.util.List; import java.util.Random;

@RestController public class PersonController { @Autowired PersonService personService; @Autowired ElasticsearchTemplate elasticsearchTemplate;

@GetMapping("/add")
public Object add() {
    double lat = 39.929986;
    double lon = 116.395645;

    List<Person> personList = new ArrayList<>(900000);
    for (int i = 100000; i < 1000000; i++) {
        double max = 0.00001;
        double min = 0.000001;
        Random random = new Random();
        double s = random.nextDouble() % (max - min + 1) + max;
        DecimalFormat df = new DecimalFormat("######0.000000");
        // System.out.println(s);
        String lons = df.format(s + lon);
        String lats = df.format(s + lat);
        Double dlon = Double.valueOf(lons);
        Double dlat = Double.valueOf(lats);

        Person person = new Person();
        person.setId(i);
        person.setName("名字" + i);
        person.setPhone("電話" + i);
        person.setAddress(dlat + "," + dlon);

        personList.add(person);
    }
    personService.bulkIndex(personList);

// SearchQuery searchQuery = new NativeSearchQueryBuilder().withQuery(QueryBuilders.queryStringQuery("spring boot OR 書籍")).build(); // List<Article> articles = elas、ticsearchTemplate.queryForList(se、archQuery, Article.class); // for (Article article : articles) { // System.out.println(article.toString()); // }

return "添加數據";
}

/**
 *
 geo_distance: 查找距離某個中心點距離在必定範圍內的位置
 geo_bounding_box: 查找某個長方形區域內的位置
 geo_distance_range: 查找距離某個中心的距離在min和max之間的位置
 geo_polygon: 查找位於多邊形內的地點。
 sort能夠用來排序
 */
@GetMapping("/query")
public Object query() {
    double lat = 39.929986;
    double lon = 116.395645;

    Long nowTime = System.currentTimeMillis();
    //查詢某經緯度100米範圍內
    GeoDistanceQueryBuilder builder = QueryBuilders.geoDistanceQuery("address").point(lat, lon)
            .distance(100, DistanceUnit.METERS);

    GeoDistanceSortBuilder sortBuilder = SortBuilders.geoDistanceSort("address")
            .point(lat, lon)
            .unit(DistanceUnit.METERS)
            .order(SortOrder.ASC);

    Pageable pageable = new PageRequest(0, 50);

    NativeSearchQueryBuilder builder1 = new NativeSearchQueryBuilder().withFilter(builder).withSort(sortBuilder).withPageable(pageable);
    SearchQuery searchQuery = builder1.build();

    //queryForList默認是分頁,走的是queryForPage,默認10個
    List<Person> personList = elasticsearchTemplate.queryForList(searchQuery, Person.class);

    System.out.println("耗時:" + (System.currentTimeMillis() - nowTime));
    return personList;
}

} 看Controller類,在add方法中,咱們插入90萬條測試數據,隨機產生不一樣的經緯度地址。 在查詢方法中,咱們構建了一個查詢100米範圍內、按照距離遠近排序,分頁每頁50條的查詢條件。若是不指明Pageable的話,ESTemplate的queryForList默認是10條,經過源碼能夠看到。 啓動項目,先執行add,等待百萬數據插入,大概幾十秒。 而後執行查詢,看一下結果。

第一次查詢花費300多ms,再次查詢後時間就大幅降低,到30ms左右,由於ES已經自動緩存到內存了。 可見,ES完成地理位置的查詢仍是很是快的。適用於查詢附近的人、範圍查詢之類的功能。


後記,在後來的使用中,Elasticsearch2.3版本時,按上面的寫法出現了geo類型沒法索引的狀況,進入es的爲String,而不是標註的geofiled。在此記錄一下解決方法,將String類型修改成GeoPoint,且是org.springframework.data.elasticsearch.core.geo.GeoPoint包下的。而後須要在建立index時,顯式調用一下mapping方法,才能正確的映射爲geofield。 以下 if (!elasticsearchTemplate.indexExists("abc")) { elasticsearchTemplate.createIndex("abc"); elasticsearchTemplate.putMapping(Person.class); }

參考:ES根據地理位置查詢 http://blog.csdn.net/bingduanlbd/article/details/52253542

code:https://github.com/xiaomin0322/springboot-all

相關文章
相關標籤/搜索