Index Settings 重要索引配置

Index level settings can be set per-index. Settings may be:html

1 static 靜態索引配置

They can only be set at index creation time or on a closed index.java

只能在建立索引時設置或者在closed狀態的索引上設置；node

index.number_of_shardsjson

The number of primary shards that an index should have. Defaults to 5. This setting can only be set at index creation time. It cannot be changed on a closed index. Note: the number of shards are limited to 1024 per index.api

2 dynamic 動態索引配置

They can be changed on a live index using the update-index-settings API.緩存

能夠在索引存在時經過api修改；多線程

index.number_of_replicas併發

The number of replicas each primary shard has. Defaults to 1.app

index.refresh_intervaldom

How often to perform a refresh operation, which makes recent changes to the index visible to search. Defaults to 1s. Can be set to -1 to disable refresh.

index.blocks.read_only

Set to true to make the index and index metadata read only, false to allow writes and metadata changes.

index.blocks.read

Set to true to disable read operations against the index.

index.blocks.write

Set to true to disable data write operations against the index. Unlike read_only, this setting does not affect metadata. For instance, you can close an index with a write block, but not an index with a read_only block.

index.merge.scheduler.max_thread_count

The maximum number of threads on a single shard that may be merging at once. Defaults to Math.max(1, Math.min(4, Runtime.getRuntime().availableProcessors() / 2)) which works well for a good solid-state-disk (SSD). If your index is on spinning platter drives instead, decrease this to 1.

index.translog.durability

Whether or not to fsync and commit the translog after every index, delete, update, or bulk request. This setting accepts the following parameters:

request: (default) fsync and commit after every request. In the event of hardware failure, all acknowledged writes will already have been committed to disk.
async: fsync and commit in the background every sync_interval. In the event of hardware failure, all acknowledged writes since the last automatic commit will be discarded.

寫索引調優

1 Use bulk requests

批量請求

Bulk requests will yield much better performance than single-document index requests.

2 Use multiple workers/threads to send data to Elasticsearch

多線程，但要注意併發量不能太大以致於es沒法處理而報錯

Make sure to watch for TOO_MANY_REQUESTS (429) response codes (EsRejectedExecutionException with the Java client), which is the way that Elasticsearch tells you that it cannot keep up with the current indexing rate. When it happens, you should pause indexing a bit before trying again, ideally with randomized exponential backoff.

3 Increase the refresh interval

增長刷新間隔

The default index.refresh_interval is 1s, which forces Elasticsearch to create a new segment every second. Increasing this value (to say, 30s) will allow larger segments to flush and decreases future merge pressure.

4 Disable refresh and replicas for initial loads

在第一次大量寫索引時禁用刷新和副本

If you need to load a large amount of data at once, you should disable refresh by setting index.refresh_interval to -1 and set index.number_of_replicas to 0. This will temporarily put your index at risk since the loss of any shard will cause data loss, but at the same time indexing will be faster since documents will be indexed only once. Once the initial loading is finished, you can set index.refresh_interval and index.number_of_replicas back to their original values.

5 Disable swapping

禁用swap

You should make sure that the operating system is not swapping out the java process by disabling swapping.

# swapoff -a

6 Give memory to the filesystem cache

The filesystem cache will be used in order to buffer I/O operations. You should make sure to give at least half the memory of the machine running Elasticsearch to the filesystem cache.

7 Use auto-generated ids

儘可能使用自動生成id，能夠節省查找id是否存在的開銷；

When indexing a document that has an explicit id, Elasticsearch needs to check whether a document with the same id already exists within the same shard, which is a costly operation and gets even more costly as the index grows. By using auto-generated ids, Elasticsearch can skip this check, which makes indexing faster.

8 Use faster hardware

使用更快的硬件，好比更多的內存緩存或者ssd

If indexing is I/O bound, you should investigate giving more memory to the filesystem cache (see above) or buying faster drives. In particular SSD drives are known to perform better than spinning disks.

9 Indexing buffer size

增長indices.memory.index_buffer_size，一般每一個shard最多須要512M

If your node is doing only heavy indexing, be sure indices.memory.index_buffer_size is large enough to give at most 512 MB indexing buffer per shard doing heavy indexing (beyond that indexing performance does not typically improve).

indices.memory.index_buffer_size

Accepts either a percentage or a byte size value. It defaults to 10%, meaning that 10% of the total heap allocated to a node will be used as the indexing buffer size shared across all shards.

修改配置

1 索引動態配置

$ curl -XPUT -H 'Content-Type: application/json' 'http://localhost:9200/testdoc/_settings' -d '{ "index": { "refresh_interval":"-1", "number_of_replicas":0, "index.translog.durability":"async" } }'

可反覆修改，設置爲null便可恢復默認

2 集羣配置

$ vi elasticsearch.yml indices.memory.index_buffer_size: 40% thread_pool.write.queue_size: 1024

修改後同步到全部節點並重啓

注意如下配置已經deprecated

The bulk thread pool has been renamed to the write thread pool. This change was made to reflect the fact that this thread pool is used to execute all write operations: single-document index/delete/update requests, as well as bulk requests.

thread_pool.index.type
thread_pool.index.size
thread_pool.index.queue_size
thread_pool.bulk.type
thread_pool.bulk.size
thread_pool.bulk.queue_size

另外以上配置也不能經過api修改（即http://localhost:9200/_cluster/settings）

The prefix on all thread pool settings has been changed from threadpool to thread_pool.
Thread pool settings are now node-level settings. As such, it is not possible to update thread pool settings via the cluster settings API.

參考：https://www.elastic.co/guide/en/elasticsearch/reference/master/tune-for-indexing-speed.htmlhttps://www.elastic.co/guide/en/logstash/current/performance-troubleshooting.htmlhttps://www.elastic.co/guide/en/elasticsearch/reference/master/tune-for-disk-usage.htmlhttps://www.elastic.co/guide/en/elasticsearch/reference/current/modules-threadpool.htmlhttps://www.elastic.co/guide/en/elasticsearch/reference/master/index-modules.htmlhttps://www.elastic.co/guide/en/elasticsearch/reference/master/index-modules-translog.htmlhttps://www.elastic.co/guide/en/elasticsearch/reference/master/index-modules-merge.html

【原創】大數據基礎之ElasticSearch（5）重要配置及調優