



How are Placement Groups used ?

A placement group (PG) aggregates objects within a pool because tracking object placement and object metadata on a per-object basis is computationally expensive–i.e., a system with millions of objects cannot realistically track placement on a per-object basis. app

The Ceph client will calculate which placement group an object should be in. It does this by hashing the object ID and applying an operation based on the number of PGs in the defined pool and the ID of the pool. See Mapping PGs to OSDs for details. less

The object’s contents within a placement group are stored in a set of OSDs. For instance, in a replicated pool of size two, each placement group will store objects on two OSDs, as shown below. ide

Should OSD #2 fail, another will be assigned to Placement Group #1 and will be filled with copies of all objects in OSD #1. If the pool size is changed from two to three, an additional OSD will be assigned to the placement group and will receive copies of all objects in the placement group. 性能

Placement groups do not own the OSD, they share it with other placement groups from the same pool or even other pools. If OSD #2 fails, the Placement Group #2 will also have to restore copies of objects, using OSD #3. this

When the number of placement groups increases, the new placement groups will be assigned OSDs. The result of the CRUSH function will also change and some objects from the former placement groups will be copied over to the new Placement Groups and removed from the old ones. .net


存儲池爲了提供了一些額外的功能,包括: rest

  • 複製: 你能夠設置一個對象指望的副本數量。典型配置存儲一個對象和一個它的副本(如 size = 2),但你能夠更改副本的數量。
  • 配置組: 你能夠設置一個存儲池的配置組數量。典型配置在每一個 OSD 上使用大約 100 個歸置組,這樣,不用過多計算資源就獲得了較優的均衡。設置多個存儲池的時候,要注意爲這些存儲池和集羣設置合理的配置組數量。
  • CRUSH規則:當你在存儲池裏存數據的時候,映射到存儲池的 CRUSH 規則集使得 CRUSH 肯定一條規則,用於集羣內主對象的歸置和其副本的複製。你能夠給存儲池定製 CRUSH 規則。
  • 快照: 你用 ceph osd pool mksnap 建立快照的時候,實際上建立了一小部分存儲池的快照。
  • 設置全部者:你能夠設置一個用戶 ID 爲一個存儲池的全部者。


When creating a new pool with:code

ceph osd pool create {pool-name} pg_num

it is mandatory to choose the value of pg_num because it cannot be calculated automatically. Here are a few values commonly used: orm

  • Less than 5 OSDs set pg_num to 128
  • Between 5 and 10 OSDs set pg_num to 512
  • Between 10 and 50 OSDs set pg_num to 4096
  • If you have more than 50 OSDs, you need to understand the tradeoffs and how to calculate the pg_num value by yourself
  • For calculating pg_num value by yourself please take help of pgcalc tool



一個配置組(PG)彙集了一系列的對象至一個組,而且映射這個組至一系列的OSD。在每個對象的基礎上追蹤對象的配置和對象的元數據在計算上是十分昂貴的–好比,擁有上百萬對象的系統在每個對象的基礎上追蹤對象的配置是不切實際的。配置組解決這一障礙性能和可伸縮性 。此外,配置組下降進程的數目時,必須跟蹤每一個對象的元數據量的Ceph的存儲和檢索數據。 對象

Placement Groups【配置組】 _1






Pool, PG and CRUSH Config Reference¶

When you create pools and set the number of placement groups for the pool, Ceph uses default values when you don’t specifically override the defaults. We recommend overridding some of the defaults. Specifically, we recommend setting a pool’s replica size and overriding the default number of placement groups. You can specifically set these values when running pool commands. You can also override the defaults by adding new ones in the [global] section of your Ceph configuration file.



	# By default, Ceph makes 3 replicas of objects. If you want to make four 
	# copies of an object the default value--a primary copy and three replica 
	# copies--reset the default values as shown in 'osd pool default size'.
	# If you want to allow Ceph to write a lesser number of copies in a degraded 
	# state, set 'osd pool default min size' to a number less than the
	# 'osd pool default size' value.

	osd pool default size = 4  # Write an object 4 times.
	osd pool default min size = 1 # Allow writing one copy in a degraded state.

	# Ensure you have a realistic number of placement groups. We recommend
	# approximately 100 per OSD. E.g., total number of OSDs multiplied by 100 
	# divided by the number of replicas (i.e., osd pool default size). So for
	# 10 OSDs and osd pool default size = 4, we'd recommend approximately
	# (100 * 10) / 4 = 250.

	osd pool default pg num = 250
	osd pool default pgp num = 250