關於ceph tier的一些想法

ceph的實驗環境在公司內部用了一段時間,主要是利用rbd提供的塊設備建立虛擬機、爲虛擬機分配塊,仍是很穩定的。但如今的環境大部分配置仍是ceph的默認值,只是將journal分離出來寫到了一個單獨的分區。後面打算利用ceph tier和ssd作一些優化: node

1. 將journal寫入一塊單獨的ssd磁盤。 shell

2. 利用ssd配置一個ssd pool,將這個pool做爲其它pool的cache,這就須要ceph tier。 app

網上搜索了一下,目前尚未這麼實踐的文章以及這麼作後性能到底會提高多少。因此此方案實施後會進行相關測試: dom

1. 默認安裝ceph。 性能

2. 將journal分離到單獨的普通硬盤分區。 測試

3. 將journal分離到單獨的ssd盤。 flex

4. 加入ssd pool後。 優化

crush的設置能夠看這篇文章:http://www.sebastien-han.fr/blog/2012/12/07/ceph-2-speed-storage-with-crush/ this

I. Use case

Roughly say your infrastructure could be based on several type of servers: spa

  • storage nodes full of SSDs disks
  • storage nodes full of SAS disks
  • storage nodes full of SATA disks

Such handy mecanism is possible with the help of the CRUSH Map.

II. A bit about CRUSH

CRUSH stands for Controlled Replication Under Scalable Hashing:

  • Pseudo-random placement algorithm
  • Fast calculation, no lookup Repeatable, deterministic
  • Ensures even distribution
  • Stable mapping
  • Limited data migration
  • Rule-based configuration, rule determines data placement
  • Infrastructure topology aware, the map knows the structure of your infra (nodes, racks, row, datacenter)
  • Allows weighting, every OSD has a weight

For more details check the Ceph Official documentation.

III. Setup

What are we going to do?

  1. Retrieve the current CRUSH Map
  2. Decompile the CRUSH Map
  3. Edit it. We will add 2 buckets and 2 rulesets
  4. Recompile the new CRUSH Map.
  5. Re-inject the new CRUSH Map.

III.1. Begin

Grab your current CRUSH map:

$ ceph osd getcrushmap -o ma-crush-map
$ crushtool -d ma-crush-map -o ma-crush-map.txt

For the sake of simplicity, let’s assume that you have 4 OSDs:

  • 2 of them are SAS disks
  • 2 of them are SSD enterprise

And here is the OSD tree:

$ ceph osd tree
dumped osdmap tree epoch 621
# id    weight  type name   up/down reweight
-1  12  pool default
-3  12      rack le-rack
-2  3           host ceph-01
0   1               osd.0   up  1
1   1               osd.1   up  1
-4  3           host ceph-02
2   1               osd.2   up  1
3   1               osd.3   up  1

III.2. Default crush map

Edit your CRUSH map:

# begin crush map

# devices
device 0 osd.0
device 1 osd.1
device 2 osd.2
device 3 osd.3

# types
type 0 osd
type 1 host
type 2 rack
type 3 row
type 4 room
type 5 datacenter
type 6 pool

# buckets
host ceph-01 {
    id -2       # do not change unnecessarily
    # weight 3.000
    alg straw
    hash 0  # rjenkins1
    item osd.0 weight 1.000
    item osd.1 weight 1.000
}
host ceph-02 {
    id -4       # do not change unnecessarily
    # weight 3.000
    alg straw
    hash 0  # rjenkins1
    item osd.2 weight 1.000
    item osd.3 weight 1.000
}
rack le-rack {
    id -3       # do not change unnecessarily
    # weight 12.000
    alg straw
    hash 0  # rjenkins1
    item ceph-01 weight 2.000
    item ceph-02 weight 2.000
}
pool default {
    id -1       # do not change unnecessarily
    # weight 12.000
    alg straw
    hash 0  # rjenkins1
    item le-rack weight 4.000
}

# rules
rule data {
    ruleset 0
    type replicated
    min_size 1
    max_size 10
    step take default
    step chooseleaf firstn 0 type host
    step emit
}
rule metadata {
    ruleset 1
    type replicated
    min_size 1
    max_size 10
    step take default
    step chooseleaf firstn 0 type host
    step emit
}
rule rbd {
    ruleset 2
    type replicated
    min_size 1
    max_size 10
    step take default
    step chooseleaf firstn 0 type host
    step emit
}

# end crush map

III.3. Add buckets and rules

Now we have to add 2 new specific rules:

  • one for the SSD pool
  • one for the SAS pool

III.3.1. SSD Pool

Add a bucket for the pool SSD:

pool ssd {
    id -5       # do not change unnecessarily
    alg straw
    hash 0  # rjenkins1
    item osd.0 weight 1.000
    item osd.1 weight 1.000
}

Add a rule for the bucket nearly created:

rule ssd {
    ruleset 3
    type replicated
    min_size 1
    max_size 10
    step take ssd
    step choose firstn 0 type host
    step emit
}

III.3.1. SAS Pool

Add a bucket for the pool SAS:

pool sas {
    id -6       # do not change unnecessarily
    alg straw
    hash 0  # rjenkins1
    item osd.2 weight 1.000
    item osd.3 weight 1.000
}

Add a rule for the bucket nearly created:

rule sas {
    ruleset 4
    type replicated
    min_size 1
    max_size 10
    step take sas
    step choose firstn 0 type host
    step emit
}

Eventually recompile and inject the new CRUSH map:

$ crushtool -c ma-crush-map.txt -o ma-nouvelle-crush-map
$ ceph osd setcrushmap -i ma-nouvelle-crush-map

III.3. Create and configure the pools

Create your 2 new pools:

$ rados mkpool ssd
successfully created pool ssd
$ rados mkpool sas
successfully created pool sas

Set the rule set to the pool:

ceph osd pool set ssd crush_ruleset 3
ceph osd pool set sas crush_ruleset 4

Check that the changes have been applied successfully:

$ ceph osd dump | grep -E 'ssd|sas'
pool 3 'ssd' rep size 2 crush_ruleset 3 object_hash rjenkins pg_num 128 pgp_num 128 last_change 21 owner 0
pool 4 'sas' rep size 2 crush_ruleset 4 object_hash rjenkins pg_num 128 pgp_num 128 last_change 23 owner 0

Just create some random files and put them into your object store:

$ dd if=/dev/zero of=ssd.pool bs=1M count=512 conv=fsync
$ dd if=/dev/zero of=sas.pool bs=1M count=512 conv=fsync
$ rados -p ssd put ssd.pool ssd.pool.object
$ rados -p sas put sas.pool sas.pool.object

Where are pg active?

$ ceph osd map ssd ssd.pool.object
osdmap e260 pool 'ssd' (3) object 'ssd.pool.object' -> pg 3.c5034eb8 (3.0) -> up [1,0] acting [1,0]

$ ceph osd map sas sas.pool.object
osdmap e260 pool 'sas' (4) object 'sas.pool.object' -> pg 4.9202e7ee (4.0) -> up [3,2] acting [3,2]

CRUSH Rules! As you can see from this article CRUSH allows you to perform amazing things. The CRUSH Map could be very complex, but it brings a lot of flexibility! Happy CRUSH Mapping ;-)

相關文章
相關標籤/搜索