Elasticsearch 5.5 SQL語句轉Java Client 及相關注意事項(三)

前言

  • 前面兩邊文章已經講述瞭如何搭建集羣以及簡單的查詢基礎,想看的移步:

     1. Elasticsearch 5.5 入門必會(一)html

     2. Elasticsearch 5.5 入門必會之Java client(二)java

 

1、怎樣用SQL思惟來寫查詢代碼

  • 寫慣了SQL而後來寫ES的查詢可能有很彆扭,ES其實也提供了queryStringQuery的方式來查詢,這個查詢和SQL有點接近了,可是本文仍是用普通代碼方式達到SQL關係查詢的邏輯

         咱們先看個簡單的代碼:mysql

@Test
	public void match() {
		SearchRequestBuilder requestBuilder = client.prepareSearch("megacorp").setTypes("employee")
				.setQuery(QueryBuilders.matchQuery("about", "rock climbing"));
		System.out.println(requestBuilder.toString());

		SearchResponse response = requestBuilder.execute().actionGet();

		System.out.println(response.status());
		if (response.status().getStatus() == 200) {
			for (SearchHit hits : response.getHits().getHits()) {
				System.out.println(hits.getSourceAsString());
			}
		}
	}

 

 ===============================================================spring

  • LIKE查詢 這個代碼其實在普通的SQL裏面是達不到這個效果的,由於matchQuery會對後面的value進行分詞後再去匹配,跳過!
/**
	 * matchphrase使用,短語精準匹配
	 */
	@Test
	public void matchPhrase() {
		SearchRequestBuilder requestBuilder = client.prepareSearch("megacorp").setTypes("employee")
				.setQuery(QueryBuilders.matchPhraseQuery("about", "rock climbing"));
		System.out.println(requestBuilder.toString());

		SearchResponse response = requestBuilder.execute().actionGet();
		System.out.println(response.status());
		if (response.status().getStatus() == 200) {
			for (SearchHit hits : response.getHits().getHits()) {
				System.out.println(hits.getSourceAsString());
			}
		}
	}

     上面的代碼你能夠理解爲:sql

select * from megacorp_employee where about like '%rock climbing%'

 

  • 聚合查詢
@Test
	public void aggregation() {
		SearchRequestBuilder searchBuilder = client.prepareSearch("megacorp").setTypes("employee")
				.addAggregation(AggregationBuilders.terms("by_interests").field("interests")
						.subAggregation(AggregationBuilders.terms("by_age").field("age")).size(10));
		System.out.println(searchBuilder.toString());
		SearchResponse response = searchBuilder.execute().actionGet();

		if (response.status().getStatus() == 200) {
			for (SearchHit hits : response.getHits().getHits()) {
				System.out.println(hits.getSourceAsString());
			}
		}
		StringTerms terms = response.getAggregations().get("by_interests");
		for (StringTerms.Bucket bucket : terms.getBuckets()) {
			System.out.println("-interest:" + bucket.getKey() + "," + bucket.getDocCount());
			if (bucket.getAggregations() != null && bucket.getAggregations().get("by_age") != null) {
				LongTerms ageTerms = bucket.getAggregations().get("by_age");
				for (LongTerms.Bucket bucket2 : ageTerms.getBuckets()) {
					System.out.println("--------by age:" + bucket2.getKey() + "," + bucket2.getDocCount());
				}
			}
		}
	}

至關於SQL裏面的數據庫

select interests,age,count(1) from megacorp_employee
group by interests,age limit 10

 

  • 布爾查詢
BoolQueryBuilder queryBuilder = QueryBuilders.boolQuery();
		if(StringUtils.isNotBlank(searchParam.getSearchWords())) {
			BoolQueryBuilder mutiShould = QueryBuilders.boolQuery();
			for(String column : searchType.getSearchColumn()) {
				mutiShould.should(QueryBuilders.termQuery(column+KEYWORD, searchParam.getSearchWords().trim()));
			}
			queryBuilder.must().add(mutiShould);
		}
		
		// 科室編碼過濾
		if(StringUtils.isNotBlank(searchParam.getDeptNo())) {
			queryBuilder.must(QueryBuilders.termQuery("admissward"+KEYWORD, searchParam.getDeptNo().trim()));
		}
		
		/**
		 * 有時間範圍
		 */
		if(searchParam.getTimeType() > 0 && searchParam.getTimeType() < 3) {
			Date startDate = searchParam.getStartDate();
			Date endDate = searchParam.getEndDate();
			RangeQueryBuilder rangeBuilder = null;
			
			// 入院日期
			if(searchParam.getTimeType() == 1) {
				if(null != startDate) {
					rangeBuilder = QueryBuilders.rangeQuery("admissdate").gte(startDate.getTime());
				}
				if(null != endDate) {
					if(null == rangeBuilder) {
						rangeBuilder = QueryBuilders.rangeQuery("admissdate").lte(endDate.getTime());
					} else {
						rangeBuilder.lte(endDate.getTime());
					}
				}
				
			// 出院日期
			} else if(searchParam.getTimeType() == 2) {
				if(null != startDate) {
					rangeBuilder = QueryBuilders.rangeQuery("disdate").gte(startDate.getTime());
				}
				if(null != endDate) {
					if(null == rangeBuilder) {
						rangeBuilder = QueryBuilders.rangeQuery("disdate").lte(endDate.getTime());
					} else {
						rangeBuilder.lte(endDate.getTime());
					}
				}
			}
			if(null != rangeBuilder) {
				queryBuilder.must().add(rangeBuilder);
			}
		}
		
		SearchRequestBuilder searchBuilder = client.prepareSearch(searchType.getIndexType().get_index())
		        .setTypes(searchType.getIndexType().get_type())
		        .setSearchType(SearchType.DFS_QUERY_THEN_FETCH)
		        .setQuery(queryBuilder) 
		        .addSort(StringUtils.isBlank(searchType.getSortColumn())?SCORE:searchType.getSortColumn()
		        		, searchType.getOrder()==null?SortOrder.DESC:searchType.getOrder())
		        .setFrom(pager.getStartRow()).setSize(pager.getPageSize()).setExplain(true);
		
		SearchResponse response = searchBuilder.execute().actionGet();
		long end = System.currentTimeMillis();
		logger.info("searchMutiField request indexType:{},searchparam:{},orderColumn:{},orderBy:{}.total hits:{},cost 【{}】 ms"
				,searchType.getIndexType().get_type(),queryBuilder.toString(),searchType.getSearchColumn(),
				searchType.getOrder(),response.getHits().totalHits,(end-start));

上面的稍微複雜一點,是我生產環境的部分代碼,對應的SQL語句是,其實你看到這一個例子應該就大概知道了怎樣用SQL轉化爲代碼,BoolQueryBuilder.must就至關於SQL裏面的 AND 的概念,Should就是ORexpress

select * from table_name where (column1='searchwords' or column2='searchwords' .. )
   and admissward='123456' and 
   admissdate > '1412000212112' and admissdate < '141976521211' limit 10
   --個人判斷邏輯是若是是入院日期查詢就 admissdate > startdate and admissdate < endate
   --若是是出院日期 就disdate > startdate and disdate < enddate
   --這個邏輯我就不分開寫出來了,省略了

 

2、使用ES注意事項

  • 默認的java.util.Date放到map,而後去建立索引,ES中會保存UTC時間格式,這個比較噁心!固然,時間格式你能夠getTime以後當作long去存儲,就是不夠直觀,也能夠經過我上一篇文章中同樣在建立索引的時候指定date類型字段的format屬性。爲了方便建立索引,我直接建立了一個xml配置文件來指定數據建立索引時固定其類型! 解析xml我就不貼了,要否則篇幅太長!
    <?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE mapping SYSTEM "elastic-config.dtd">
    <!-- 屬性參考 https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-store.html -->
    <mapping  >
     	<!--  
    	<datasource id="dataSource1" ref="springDataSource">
    	</datasource>-->	  
    	
    	<datasource id="dataSource" >
    		<username>admin</username>
    		<password>admin</password>
    		<jdbcurl>jdbc:mysql://127.0.0.1:3306/message?useUnicode=true&amp;characterEncoding=UTF-8&amp;zeroDateTimeBehavior=round&amp;useCursorFetch=true&amp;verifyServerCertificate=false&amp;useSSL=false</jdbcurl>
    		<driver>com.mysql.jdbc.Driver</driver>
    	</datasource>
    	
    	<sql-mappings>
    		<sql-mapping data-source-id="dataSource">
    			<!-- 全量索引 構建 每週星期天3點執行 -->
    			<full-sql> 
    				<sql>SELECT * FROM HAHA ORDER BY ID ASC</sql>
    				<expression>0 0 3 ? * SUN</expression>
    			</full-sql>
    			<!-- 每日增量索引構建 -->
    			<incr-sql> 
    				<sql>SELECT * FROM HAHA WHERE GMT_CREATE > DATE_ADD(NOW(),INTERVAL -2 DAY) 
    				ORDER BY ID ASC</sql>
    				<expression>0 0 2 * * ?</expression>
    			</incr-sql>
    			<search-info>
    				<index>test</index>
    				<type>test</type>
    				<columns>
    					<column index-column="idindex" 
    					        data-type="integer"
    					        sql-column="id" 
    					        index="not_analyzed" 
    					        store="no"  />
    					<column index-column="nameindex" 
    					        data-type="string"
    					        sql-column="name" 
    					        index="not_analyzed" 
    					        store="no" />
    					<column index-column="blobtindex" 
    					        data-type="byte"
    					        sql-column="blobt" 
    					        index="not_analyzed" 
    					        store="no" /> 
    					<column index-column="datesindex" 
    					        data-type="date"
    					        sql-column="ttt" 
    					        store="no" 
    					        format="yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
    					        locale="CHINA" />    
    					<column index-column="tinytestindex" 
    					        data-type="boolean"
    					        sql-column="tinytest" 
    					        index="not_analyzed" 
    					        store="no" />
    					<column index-column="moneysindex" 
    					        data-type="string"
    					        sql-column="moneys" 
    					        index="not_analyzed" 
    					        store="no" />
    					<column index-column="ggggindex" 
    					        data-type="date"
    					        sql-column="gggg" 
    					        format="yyyy-MM-dd HH:mm:ss||yyyy-MM-dd||epoch_millis"
    					        store="no" />                                            
    				</columns>
    			</search-info>
    		</sql-mapping>
    	</sql-mappings>
    </mapping>

     

  • 經過接口查出的時間格式是UTC格式,使用代碼轉換一下便可
    SimpleDateFormat formatter = new SimpleDateFormat("yyyy-MM-dd'T'HH:mm:ss.SSS'Z'");
    formatter.setTimeZone(TimeZone.getTimeZone("UTC"));
    SimpleDateFormat standard = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
    try {
    	return standard.format(formatter.parse(admiss_time));
    } catch (ParseException e) {
    	return null;
    }

     

  •  查詢須要根據時間來查詢怎麼辦?您不須要怎麼辦,不要你減去8小時再格式化
    //咱們只須要獲取當前咱們本地時間以後getTime傳入便可 admissdate >= xxxxx
    QueryBuilders.rangeQuery("admissdate").gte(startDate.getTime());

     

  • 頻繁更新的數據的索引ID,能夠儘可能不使用UUID偷懶 。一個是速度快,另外若是使用咱們自已的業務ID來當作索引的ID在更新的時候會很方便,你直接保存進去就會自動更新數據,而不是說新插一條數據,好比下面,分兩次保存只會有一條數據存在索引,由於id是同樣的!
    Map<String,Object> map = new HashMap<String,Object>();
    map.put("id", 1);
    //map.put('test',456);
    map.put("test", 1);
    //map.put('hehe',567);
    map.put("hehe", 2);
    IndexResponse response = client.prepareIndex("emr_document2", "user_info2",map.get('id').toString())
        			.setSource(map)
                    .get();

     

  •  使用ES來作日誌管控。官方有kibana+logstash+ES的日誌管理解決方案,咱們本身若是不想搞那麼複雜引入那麼多產品進來的話,能夠直接本身用RandomAccessFile方式來讀取日誌文件後寫入ES索引,像日誌這種東西比較適合每日或者每週作一個單獨餓索引,如:index = log_index_20170906 這種,好處不用說了吧,咱們磁盤空間是有限的,若是把全部日誌寫到一個索引裏面去,咱們要清理歷史不用的日誌就麻煩一點,還不如天天一個索引,而後過時後就把歷史沒用的哪一個索引直接刪掉。 

 最後

  • 我爲何使用ES?

         我單位乙方提供的數據庫沒有作比較好的分表方案,歷史數據出院一個星期就轉入B表,致使不少系統沒法正常調用出院患者的病歷數據和病人主索引信息,如今已經引入了搜索以後,正常提供所有患者主索引信息查詢服務,用起來很爽!病歷數據+患者主索引數據 總共不超過500W,查詢速度至關快,都在20ms如下!app

  • 後面我能夠拿ES作什麼?
  1. 病歷全文檢索,根據關鍵字來搜病歷(這個你們都瞭解)。
  2. 病歷歸類,提供病歷內容關鍵字歸類以後,提取一個患者的病歷連帶出與之相同診斷或者病症的患者信息及用藥方案,提供臨牀決策支持。
  3. 全院系統日誌整合監控,這個頗有必要,如今咱們大大小小系統幾十個,每一個系統天天均可能出現各類問題,若是能試試把日誌蒐集過來,作個監控報警,日子會舒服不少。
  4. 我能夠拿來吹牛逼(很重要!)哈哈,開個玩笑!其實說到底,我只是時間多一點,想學點東西,不讓本身成爲一個體制內的廢人! 有時候一我的在這裏作技術有一點小小的孤獨感和傷感。
相關文章
相關標籤/搜索