ceph中的Pools、PGs和OSDs介紹(tmp)

 

 

image

How are Placement Groups used ?

A placement group (PG) aggregates objects within a pool because tracking object placement and object metadata on a per-object basis is computationally expensive–i.e., a system with millions of objects cannot realistically track placement on a per-object basis. app

The Ceph client will calculate which placement group an object should be in. It does this by hashing the object ID and applying an operation based on the number of PGs in the defined pool and the ID of the pool. See Mapping PGs to OSDs for details. less

The object’s contents within a placement group are stored in a set of OSDs. For instance, in a replicated pool of size two, each placement group will store objects on two OSDs, as shown below. ide

Should OSD #2 fail, another will be assigned to Placement Group #1 and will be filled with copies of all objects in OSD #1. If the pool size is changed from two to three, an additional OSD will be assigned to the placement group and will receive copies of all objects in the placement group. 性能

Placement groups do not own the OSD, they share it with other placement groups from the same pool or even other pools. If OSD #2 fails, the Placement Group #2 will also have to restore copies of objects, using OSD #3. this

When the number of placement groups increases, the new placement groups will be assigned OSDs. The result of the CRUSH function will also change and some objects from the former placement groups will be copied over to the new Placement Groups and removed from the old ones. .net

 

存儲池爲了提供了一些額外的功能,包括: rest

  • 複製: 你能夠設置一個對象指望的副本數量。典型配置存儲一個對象和一個它的副本(如 size = 2),但你能夠更改副本的數量。
  • 配置組: 你能夠設置一個存儲池的配置組數量。典型配置在每一個 OSD 上使用大約 100 個歸置組,這樣,不用過多計算資源就獲得了較優的均衡。設置多個存儲池的時候,要注意爲這些存儲池和集羣設置合理的配置組數量。
  • CRUSH規則:當你在存儲池裏存數據的時候,映射到存儲池的 CRUSH 規則集使得 CRUSH 肯定一條規則,用於集羣內主對象的歸置和其副本的複製。你能夠給存儲池定製 CRUSH 規則。
  • 快照: 你用 ceph osd pool mksnap 建立快照的時候,實際上建立了一小部分存儲池的快照。
  • 設置全部者:你能夠設置一個用戶 ID 爲一個存儲池的全部者。

 

When creating a new pool with:code

ceph osd pool create {pool-name} pg_num

it is mandatory to choose the value of pg_num because it cannot be calculated automatically. Here are a few values commonly used: orm

  • Less than 5 OSDs set pg_num to 128
  • Between 5 and 10 OSDs set pg_num to 512
  • Between 10 and 50 OSDs set pg_num to 4096
  • If you have more than 50 OSDs, you need to understand the tradeoffs and how to calculate the pg_num value by yourself
  • For calculating pg_num value by yourself please take help of pgcalc tool

 

 

一個配置組(PG)彙集了一系列的對象至一個組,而且映射這個組至一系列的OSD。在每個對象的基礎上追蹤對象的配置和對象的元數據在計算上是十分昂貴的–好比,擁有上百萬對象的系統在每個對象的基礎上追蹤對象的配置是不切實際的。配置組解決這一障礙性能和可伸縮性 。此外,配置組下降進程的數目時,必須跟蹤每一個對象的元數據量的Ceph的存儲和檢索數據。 對象

Placement Groups【配置組】 _1

增長配置組的數量減小了在你的集羣中每一個OSD負載的變更。咱們建議每一個OSD約50-100配置組爲了平衡內存和CPU需求和每OSD個負載。對於單個對象池,你能夠用下面的公式:

image

當使用多個數據存儲對象池,你須要確保你平衡每一個池配置組和每一個OSD配置組的數量爲了讓你到達在一個合理的配置組總數,爲每一個OSD提供合理的低變更不佔用系統資源或同步操做進程太慢。

 

 

Pool, PG and CRUSH Config Reference¶

When you create pools and set the number of placement groups for the pool, Ceph uses default values when you don’t specifically override the defaults. We recommend overridding some of the defaults. Specifically, we recommend setting a pool’s replica size and overriding the default number of placement groups. You can specifically set these values when running pool commands. You can also override the defaults by adding new ones in the [global] section of your Ceph configuration file.

 

[global]

	# By default, Ceph makes 3 replicas of objects. If you want to make four 
	# copies of an object the default value--a primary copy and three replica 
	# copies--reset the default values as shown in 'osd pool default size'.
	# If you want to allow Ceph to write a lesser number of copies in a degraded 
	# state, set 'osd pool default min size' to a number less than the
	# 'osd pool default size' value.

	osd pool default size = 4  # Write an object 4 times.
	osd pool default min size = 1 # Allow writing one copy in a degraded state.

	# Ensure you have a realistic number of placement groups. We recommend
	# approximately 100 per OSD. E.g., total number of OSDs multiplied by 100 
	# divided by the number of replicas (i.e., osd pool default size). So for
	# 10 OSDs and osd pool default size = 4, we'd recommend approximately
	# (100 * 10) / 4 = 250.

	osd pool default pg num = 250
	osd pool default pgp num = 250
相關文章
相關標籤/搜索