A placement group (PG) aggregates objects within a pool because tracking object placement and object metadata on a per-object basis is computationally expensive–i.e., a system with millions of objects cannot realistically track placement on a per-object basis. app
The Ceph client will calculate which placement group an object should be in. It does this by hashing the object ID and applying an operation based on the number of PGs in the defined pool and the ID of the pool. See Mapping PGs to OSDs for details. less
The object’s contents within a placement group are stored in a set of OSDs. For instance, in a replicated pool of size two, each placement group will store objects on two OSDs, as shown below. ide
Should OSD #2 fail, another will be assigned to Placement Group #1 and will be filled with copies of all objects in OSD #1. If the pool size is changed from two to three, an additional OSD will be assigned to the placement group and will receive copies of all objects in the placement group. 性能
Placement groups do not own the OSD, they share it with other placement groups from the same pool or even other pools. If OSD #2 fails, the Placement Group #2 will also have to restore copies of objects, using OSD #3. this
When the number of placement groups increases, the new placement groups will be assigned OSDs. The result of the CRUSH function will also change and some objects from the former placement groups will be copied over to the new Placement Groups and removed from the old ones. .net
存儲池爲了提供了一些額外的功能,包括: rest
When creating a new pool with:code
ceph osd pool create {pool-name} pg_num
it is mandatory to choose the value of pg_num because it cannot be calculated automatically. Here are a few values commonly used: orm
一個配置組(PG)彙集了一系列的對象至一個組,而且映射這個組至一系列的OSD。在每個對象的基礎上追蹤對象的配置和對象的元數據在計算上是十分昂貴的–好比,擁有上百萬對象的系統在每個對象的基礎上追蹤對象的配置是不切實際的。配置組解決這一障礙性能和可伸縮性 。此外,配置組下降進程的數目時,必須跟蹤每一個對象的元數據量的Ceph的存儲和檢索數據。 對象
增長配置組的數量減小了在你的集羣中每一個OSD負載的變更。咱們建議每一個OSD約50-100配置組爲了平衡內存和CPU需求和每OSD個負載。對於單個對象池,你能夠用下面的公式:
當使用多個數據存儲對象池,你須要確保你平衡每一個池配置組和每一個OSD配置組的數量爲了讓你到達在一個合理的配置組總數,爲每一個OSD提供合理的低變更不佔用系統資源或同步操做進程太慢。
When you create pools and set the number of placement groups for the pool, Ceph uses default values when you don’t specifically override the defaults. We recommend overridding some of the defaults. Specifically, we recommend setting a pool’s replica size and overriding the default number of placement groups. You can specifically set these values when running pool commands. You can also override the defaults by adding new ones in the [global] section of your Ceph configuration file.
[global] # By default, Ceph makes 3 replicas of objects. If you want to make four # copies of an object the default value--a primary copy and three replica # copies--reset the default values as shown in 'osd pool default size'. # If you want to allow Ceph to write a lesser number of copies in a degraded # state, set 'osd pool default min size' to a number less than the # 'osd pool default size' value. osd pool default size = 4 # Write an object 4 times. osd pool default min size = 1 # Allow writing one copy in a degraded state. # Ensure you have a realistic number of placement groups. We recommend # approximately 100 per OSD. E.g., total number of OSDs multiplied by 100 # divided by the number of replicas (i.e., osd pool default size). So for # 10 OSDs and osd pool default size = 4, we'd recommend approximately # (100 * 10) / 4 = 250. osd pool default pg num = 250 osd pool default pgp num = 250