連載中...html
Information out: search and analyze 算法
之因此可以使用Elasticsearch存儲檢索文檔數據和它們的元數據還要感謝底層的搜索引擎Lucene
.sql
While you can use Elasticsearch as a document store and retrieve documents and their metadata, the real power comes from being able to easily access the full suite of search capabilities built on the Apache Lucene search engine library.數據結構
Elasticsearch基於Lucene又提供了簡單易用的REST API用於管理集羣和對數據進行索引搜索處理.簡單到你能夠直接經過命令行也能夠經過Kibana提供的開發者控制檯發起請求操做Elasticsearch.在應用中(你編寫的程序中)你可使用Elasticsearch客戶端操做Elasticsearch,目前Elasticsearch不但提供了Java、JavaScript、Go、.Net、PHP語言的客戶端還提供了使用Perl、Python、Ruby編寫的客戶端.app
Elasticsearch provides a simple, coherent REST API for managing your cluster and indexing and searching your data. For testing purposes, you can easily submit requests directly from the command line or through the Developer Console in Kibana. From your applications, you can use the Elasticsearch client for your language of choice: Java, JavaScript, Go, .NET, PHP, Perl, Python or Ruby.機器學習
Searching your dataelasticsearch
可使用Elasticsearch提供的REST API進行結構化搜索、全文檢索和組合搜索(把倆個搜索組合到一塊兒).結構化搜索有點相似於使用SQL構建的搜索.好比搜索hire_date
爲特定值的employee
的gender
和age
字段. 全文檢索是按文檔跟搜索文檔的相關程度返回搜索結果,越匹配搜索文本的文檔越在最前面.那怎麼定義越匹配
呢? 這就要提到打分機制了.後面文檔有介紹,這裏就不展開了.ide
The Elasticsearch REST APIs support structured queries, full text queries, and complex queries that combine the two. Structured queries are similar to the types of queries you can construct in SQL. For example, you could search the
gender
andage
fields in youremployee
index and sort the matches by thehire_date
field. Full-text queries find all documents that match the query string and return them sorted by relevance—how good a match they are for your search terms.學習
Elasticsearch除了支持單個詞的查詢,還支持語句查詢、類似查詢、前置匹配查詢還支持提供自動補全建議.就問你功能強大不強大?優化
In addition to searching for individual terms, you can perform phrase searches, similarity searches, and prefix searches, and get autocomplete suggestions.
須要搜索地理位置信息和其它數字類型的數據?就像上一篇介紹的Elasticsearch對這種特定類型的數據是使用了優化過的特定數據結構存儲的而不是直接存儲個文本了事,這也是它搜索快的緣由.
Have geospatial or other numerical data that you want to search? Elasticsearch indexes non-textual data in optimized data structures that support high-performance geo and numerical queries.
你可使用Elasticsearch提供的功能強大的JSON風格的查詢語言搜索數據也能夠採用相似SQL的查詢對數據進行搜索統計.Elasticsearch提供的JDBC和ODBC驅動能夠很方便跟第三方應用使用SQL交互.
You can access all of these search capabilities using Elasticsearch’s comprehensive JSON-style query language (Query DSL). You can also construct SQL-style queries to search and aggregate data natively inside Elasticsearch, and JDBC and ODBC drivers enable a broad range of third-party applications to interact with Elasticsearch via SQL.
Analyzing your data
Elasticsearch提供的聚合功能可讓咱們構建一些比較複雜統計查詢從而能夠發現數據中的一些關鍵指標、規律模式趨勢。而不僅是"大海撈針".使用聚合還能夠解答這樣的問題:
Elasticsearch aggregations enable you to build complex summaries of your data and gain insight into key metrics, patterns, and trends. Instead of just finding the proverbial 「needle in a haystack」, aggregations enable you to answer questions like:
大海里究竟有多少針?
> How many needles are in the haystack?
這些針平均多長?
> What is the average length of the needles?
每一個製造商製造的針的平均長度是多少?
> What is the median length of the needles, broken down by manufacturer?
每六個月大海中新增多少針?
還可使用聚合解答更難點的問題:
You can also use aggregations to answer more subtle questions, such as:
你最喜歡哪一個針製造商?
> What are your most popular needle manufacturers?
是否有不合格的針(批次)
執行聚合操做和搜索操做使用的是相同的數據結構,因此聚合操做像搜索操做同樣快.所以咱們能夠近實時的對數據進行分析和可視化.報表和看板能夠顯示最近的信息.
Because aggregations leverage the same data-structures used for search, they are also very fast. This enables you to analyze and visualize your data in real time. Your reports and dashboards update as your data changes so you can take action based on the latest information.
另外,聚合操做能夠跟搜索操做一塊兒使用.也就是能夠在對文檔進行搜索、過濾的同時在同一個請求中對數據進行分析操做.由於搜索和統計都是在同一個執行上下文中的,因此咱們不但能夠計算全部尺寸爲70的針數量,還能夠計算全部尺寸爲70而且符合特定條件好比不粘的繡花針數量.
What’s more, aggregations operate alongside search requests. You can search documents, filter results, and perform analytics at the same time, on the same data, in a single request. And because aggregations are calculated in the context of a particular search, you’re not just displaying a count of all size 70 needles, you’re displaying a count of the size 70 needles that match your users' search criteria—for example, all size 70 non-stick embroidery needles.
But wait, There's more.
想自動分析時序數據?你可使用機器學習功能去計算數據中的基準線識別異常數據.使用機器學習,咱們能夠:
Want to automate the analysis of your time-series data? You can use machine learning features to create accurate baselines of normal behavior in your data and identify anomalous patterns. With machine learning, you can detect:
檢測不正常的數據、計數和頻率
> Anomalies related to temporal deviations in values, counts, or frequencies
檢測稀有的數據
> Statistical rarity
從羣體中檢測出不正常的成員
更勁爆更強大的是咱們甚至都不須要指定算法訓練模型甚至連一些跟數據研究有關的配置都不須要就能夠完成.
就問你強大不強大?高級不高級?
And the best part? You can do this without having to specify algorithms, models, or other data science-related configurations.