理解 OpenStack + Ceph （8）: 基本的 Ceph 性能測試工具和方法

時間 2019-12-11

標籤理解 openstack ceph 基本性能測試工具方法欄目系統性能简体版

原文原文鏈接

本系列文章會深刻研究 Ceph 以及 Ceph 和 OpenStack 的集成：html

（1）安裝和部署linux

（2）Ceph RBD 接口和工具ios

（3）Ceph 物理和邏輯結構git

（4）Ceph 的基礎數據結構github

（5）Ceph 與 OpenStack 集成的實現緩存

（6）QEMU-KVM 和 Ceph RBD 的緩存機制總結網絡

（7）Ceph 的基本操做和常見故障排除方法數據結構

（8）基本的性能測試工具和方法app

繼續學以至用，學習下基本的Ceph性能測試工具和方法。less

0. 測試環境

同 Ceph 的基本操做和常見故障排除方法一文中的測試環境。

1. 測試準備

1.1 磁盤讀寫性能

1.1.1 單個 OSD 磁盤寫性能，大概 165MB/s。

root@ceph1:~# echo 3 > /proc/sys/vm/drop_caches
root@ceph1:~# dd if=/dev/zero of=/var/lib/ceph/osd/ceph-0/deleteme bs=1G count=1 oflag=direct

測試發現，其結果變化很是大，有時候上 75，有時是150.

1.1.2 兩個OSD同時寫性能，大概 18 MB/s。怎麼差距那麼大呢？幾乎是單個磁盤的 1/10 了。

root@ceph1:~# for i in `mount | grep osd | awk '{print $3}'`; do (dd if=/dev/zero of=$i/deleteme bs=1G count=1 oflag=direct &) ; done

1.1.4 單個 OSD 磁盤讀性能，大概 460 MB/s。

root@ceph1:~# dd if=/var/lib/ceph/osd/ceph-0/deleteme of=/dev/null bs=2G count=1 iflag=direct

1.1.5 兩個 OSD 同時讀性能，大概 130 MB/s。

for i in `mount | grep osd | awk '{print $3}'`; do (dd if=$i/deleteme of=/dev/null bs=1G count=1 iflag=direct &); done

1.2 網絡性能

在 ceph1上運行 iperf -s -p 6900，在 ceph2 上運行 iperf -c ceph1 -p 6900，反覆屢次，兩節點之間的帶寬大約在 1 Gbits/sec = 128 MB/s。

root@ceph2:~# iperf -c ceph1 -p 6900
------------------------------------------------------------
Client connecting to ceph1, TCP port 6900
TCP window size: 85.0 KByte (default)
------------------------------------------------------------
[  3] local 192.168.56.103 port 41773 connected with 192.168.56.102 port 6900
[ ID] Interval       Transfer     Bandwidth
[  3]  0.0-10.0 sec  1.25 GBytes  1.08 Gbits/sec

2. Ceph 性能測試

2.1 RADOS 性能測試：使用 Ceph 自帶的 rados bench 工具

該工具的語法爲：rados bench -p <pool_name> <seconds> <write|seq|rand> -b <block size> -t --no-cleanup

pool_name：測試所針對的存儲池
seconds：測試所持續的秒數
<write|seq|rand>：操做模式，write：寫，seq：順序讀；rand：隨機讀
-b：block size，即塊大小，默認爲 4M
-t：讀/寫並行數，默認爲 16
--no-cleanup 表示測試完成後不刪除測試用數據。在作讀測試以前，須要使用該參數來運行一遍寫測試來產生測試數據，在所有測試結束後能夠運行 rados -p <pool_name> cleanup 來清理全部測試數據。

寫：

root@ceph1:~# rados bench -p rbd 10 write --no-cleanup
 Maintaining 16 concurrent writes of 4194304 bytes for up to 10 seconds or 0 objects
 Object prefix: benchmark_data_ceph1_12884
   sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
     0       0         0         0         0         0         -         0
     1      16        16         0         0         0         -         0
...
    12      15        75        60   19.9671         4   3.05943   2.46556
 Total time run:         12.135344
Total writes made:      75
Write size:             4194304
Bandwidth (MB/sec):     24.721

Stddev Bandwidth:       13.5647
Max bandwidth (MB/sec): 36
Min bandwidth (MB/sec): 0
Average Latency:        2.57614
Stddev Latency:         0.781915
Max latency:            4.50816
Min latency:            1.04075

順序讀：

root@ceph1:~# rados bench -p rbd 10 seq
   sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
     0      16        16         0         0         0         -         0
 Total time run:        0.601027
Total reads made:     75
Read size:            4194304
Bandwidth (MB/sec):    499.146

Average Latency:       0.123632
Max latency:           0.209325
Min latency:           0.030446

隨機讀：

root@ceph1:~# rados bench -p rbd 10 rand
   sec Cur ops   started  finished  avg MB/s  cur MB/s  last lat   avg lat
     0       3         3         0         0         0         -         0
     1      16       138       122   477.298       488   0.01702  0.116519
...
    10      16      1242      1226   488.681       448  0.108589  0.129214
 Total time run:        10.092985
Total reads made:     1242
Read size:            4194304
Bandwidth (MB/sec):    492.223

Average Latency:       0.129631
Max latency:           0.297213
Min latency:           0.007133

2.2 RADOS 性能測試：使用 rados load-gen 工具

該工具的語法爲：

# rados -p rbd load-gen

--num-objects 初始生成測試用的對象數，默認 200
--min-object-size 測試對象的最小大小，默認 1KB，單位byte
--max-object-size 測試對象的最大大小，默認 5GB，單位byte

--min-op-len 壓測IO的最小大小，默認 1KB，單位byte
--max-op-len 壓測IO的最大大小，默認 2MB，單位byte
--max-ops 一次提交的最大IO數，至關於iodepth
--target-throughput 一次提交IO的歷史累計吞吐量上限，默認 5MB/s，單位B/s
--max-backlog 一次提交IO的吞吐量上限，默認10MB/s，單位B/s
--read-percent 讀寫混合中讀的比例，默認80，範圍[0, 100]

--run-length 運行的時間，默認60s，單位秒

在 ceph1上運行 rados -p pool100 load-gen --read-percent 0 --min-object-size 1073741824 --max-object-size 1073741824 --max-ops 1 --read-percent 0 --min-op-len 4194304 --max-op-len 4194304 --target-throughput 1073741824 --max_backlog 1073741824 的結果爲：

WRITE : oid=obj-y0UPAZyRQNhnabq off=929764660 len=4194304
op 19 completed, throughput=16MB/sec
WRITE : oid=obj-nPcOZAc4ebBcnyN off=143211384 len=4194304
op 20 completed, throughput=20MB/sec
WRITE : oid=obj-sWGUAzzASPjCcwF off=343875215 len=4194304
op 21 completed, throughput=24MB/sec
WRITE : oid=obj-79r25fxxSMgVm11 off=383617425 len=4194304
op 22 completed, throughput=28MB/sec

該命令的含義是：在 1G 的對象上，以 iodepth = 1 順序寫入 block size 爲 4M 的總量爲 1G 的數據。其平均結果大概在 24MB/s，基本和 rados bench 的結果至關。

在 client 上，一樣的配置，順序寫的BW大概在 20MB/s，順序讀的 BW 大概在 100 MB/s。

可見，與 rados bench 相比，rados load-gen 的特色是能夠產生混合類型的測試負載，而 rados bench 只能產生一種類型的負載。可是 load-gen 只能輸出吞吐量，只合適作相似於 4M 這樣的大block size 數據測試，輸出還不包括延遲。

2.3 使用 rbd bench-write 進行塊設備寫性能測試

2.3.1 客戶端準備

在執行以下命令來準備 Ceph 客戶端：

root@client:/var# rbd create bd2 --size 1024
root@client:/var# rbd info --image bd2
rbd image 'bd2':
        size 1024 MB in 256 objects
        order 22 (4096 kB objects)
        block_name_prefix: rb.0.3841.74b0dc51
        format: 1
root@client:/var# rbd map bd2
root@client:/var# rbd showmapped
id pool  image snap device
1  pool1 bd1   -    /dev/rbd1
2  rbd   bd2   -    /dev/rbd2
root@client:/var# mkfs.xfs /dev/rbd2
log stripe unit (4194304 bytes) is too large (maximum is 256KiB)
log stripe unit adjusted to 32KiB
meta-data=/dev/rbd2              isize=256    agcount=9, agsize=31744 blks
         =                       sectsz=512   attr=2, projid32bit=0
data     =                       bsize=4096   blocks=262144, imaxpct=25
         =                       sunit=1024   swidth=1024 blks
naming   =version 2              bsize=4096   ascii-ci=0
log      =internal log           bsize=4096   blocks=2560, version=2
         =                       sectsz=512   sunit=8 blks, lazy-count=1
realtime =none                   extsz=4096   blocks=0, rtextents=0
root@client:/var# mkdir -p /mnt/ceph-bd2
root@client:/var# mount /dev/rbd2 /mnt/ceph-bd2/
root@client:/var# df -h /mnt/ceph-bd2/
Filesystem      Size  Used Avail Use% Mounted on
/dev/rbd2      1014M   33M  982M   4% /mnt/ceph-bd2

2.3.2 測試

rbd bench-write 的語法爲：rbd bench-write <RBD image name>，能夠帶以下參數：

--io-size：單位 byte，默認 4096 bytes = 4K
--io-threads：線程數，默認 16
--io-total：總寫入字節，單位爲字節，默認 1024M
--io-pattern <seq|rand>：寫模式，默認爲 seq 即順序寫

分別在集羣 OSD 節點上和客戶端上作測試：

（1）在 OSD 節點上作測試

root@ceph1:~# rbd bench-write bd2 --io-total 171997300
bench-write  io_size 4096 io_threads 16 bytes 171997300 pattern seq
  SEC       OPS   OPS/SEC   BYTES/SEC
    1       280    273.19  2237969.65
    2       574    286.84  2349818.65
...
   71     20456    288.00  2358395.28
   72     20763    288.29  2360852.64
elapsed:    72  ops:    21011  ops/sec:   288.75  bytes/sec: 2363740.27

此時，塊大小爲 4k，IOPS 爲 289，BW 爲 2.36 MB/s （怎麼 BW 是 block_size * IOPS 的兩倍呢？）。

（2）在客戶端上作測試

root@client:/home/s1# rbd bench-write pool.host/image.ph2  --io-total 1719973000 --io-size 4096000
bench-write  io_size 4096000 io_threads 16 bytes 1719973000 pattern seq
  SEC       OPS   OPS/SEC   BYTES/SEC
    1         5      3.41  27937685.86
    2        19      9.04  68193147.96
    3        28      8.34  62237889.75
    5        36      6.29  46538807.31
...
   39       232      5.86  40792216.64
   40       235      5.85  40666942.19
elapsed:    41  ops:      253  ops/sec:     6.06  bytes/sec: 41238190.87

此時 block size 爲 4M，IOPS 爲 6， BW 爲 41.24 MB/s。

root@client:/home/s1# rbd bench-write pool.host/image.ph2  --io-total 1719973000
bench-write  io_size 4096 io_threads 16 bytes 1719973000 pattern seq
  SEC       OPS   OPS/SEC   BYTES/SEC
    1       331    329.52  2585220.17
    2       660    329.57  2521925.67
    3      1004    333.17  2426190.82
    4      1331    332.26  2392607.58
    5      1646    328.68  2322829.13
    6      1986    330.88  2316098.66

此時 block size 爲 4K，IOPS 爲 330 左右， BW 爲 24 MB/s 左右。

備註：從 rbd bench-write vs dd performance confusion 中看起來，rados bench-write 彷佛有bug。我所使用的Ceph 是0.80.11 版本，可能補丁尚未合進來。

2.4 使用 fio +rbd ioengine

2.4.1 環境準備

運行 apt-get install fio 來安裝 fio 工具。建立 fio 配置文件：

root@client:/home/s1# cat write.fio
[write-4M]
description="write test with block size of 4M"
ioengine=rbd
clientname=admin
pool=rbd
rbdname=bd2
iodepth=32
runtime=120
rw=write #write 表示順序寫，randwrite 表示隨機寫，read 表示順序讀，randread 表示隨機讀
bs=4M

運行 fio 命令，可是出錯：

root@client:/home/s1# fio write.fio
fio: engine rbd not loadable
fio: failed to load engine rbd
Bad option <clientname=admin>
Bad option <pool=rbd>
Bad option <rbdname=bd2>
fio: job write-4M dropped
fio: file:ioengines.c:99, func=dlopen, error=rbd: cannot open shared object file: No such file or directory

其緣由是由於沒有安裝 fio librbd IO 引擎，所以當前 fio 沒法支持 rbd ioengine：

root@client:/home/s1# fio --enghelp
Available IO engines:
        cpuio
        mmap
        sync
        psync
        vsync
        pvsync
        null
        net
        netsplice
        libaio
        rdma
        posixaio
        falloc
        e4defrag
        splice
        sg
        binject

在運行 apt-get install librbd-dev 命令安裝 librbd 後，fio 仍是報一樣的錯誤。參考網上資料，下載 fio 代碼從新編譯 fio：

$ git clone git://git.kernel.dk/fio.git
$ cd fio
$ ./configure
[...]
Rados Block Device engine     yes
[...]
$ make

此時 fio 的 ioengine 列表中也有 rbd 了。fio 使用 rbd IO 引擎後，它會讀取 ceph.conf 中的配置去鏈接 Ceph 集羣。

下面是 fio 命令和結果：

root@client:/home/s1/fio# ./fio ../write.fio
write-4M: (g=0): rw=write, bs=4M-4M/4M-4M/4M-4M, ioengine=rbd, iodepth=32
fio-2.11-12-g82e6
Starting 1 process
rbd engine: RBD version: 0.1.8
Jobs: 1 (f=1): [W(1)] [100.0% done] [0KB/128.0MB/0KB /s] [0/32/0 iops] [eta 00m:00s]
write-4M: (groupid=0, jobs=1): err= 0: pid=19190: Sat Jun  4 22:30:00 2016
  Description  : ["write test with block size of 4M"]
  write: io=1024.0MB, bw=17397KB/s, iops=4, runt= 60275msec
    slat (usec): min=129, max=54100, avg=1489.10, stdev=4907.83
    clat (msec): min=969, max=15690, avg=7399.86, stdev=1328.55
     lat (msec): min=969, max=15696, avg=7401.35, stdev=1328.67
    clat percentiles (msec):
     |  1.00th=[  971],  5.00th=[ 6325], 10.00th=[ 6325], 20.00th=[ 6521],
     | 30.00th=[ 6718], 40.00th=[ 7439], 50.00th=[ 7439], 60.00th=[ 7635],
     | 70.00th=[ 7832], 80.00th=[ 8291], 90.00th=[ 8356], 95.00th=[ 8356],
     | 99.00th=[14615], 99.50th=[15664], 99.90th=[15664], 99.95th=[15664],
     | 99.99th=[15664]
    bw (KB  /s): min=245760, max=262669, per=100.00%, avg=259334.50, stdev=6250.72
    lat (msec) : 1000=1.17%, >=2000=98.83%
  cpu          : usr=0.24%, sys=0.03%, ctx=50, majf=0, minf=8
  IO depths    : 1=2.3%, 2=5.5%, 4=12.5%, 8=25.0%, 16=50.4%, 32=4.3%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=97.0%, 8=0.0%, 16=0.0%, 32=3.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=256/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=32

Run status group 0 (all jobs):
  WRITE: io=1024.0MB, aggrb=17396KB/s, minb=17396KB/s, maxb=17396KB/s, mint=60275msec, maxt=60275msec

Disk stats (read/write):
  sda: ios=0/162, merge=0/123, ticks=0/19472, in_queue=19472, util=6.18%

若是 iodepth = 1 的話，結果是：

root@client:/home/s1# fio/fio write.fio.dep1
write-4M: (g=0): rw=write, bs=4M-4M/4M-4M/4M-4M, ioengine=rbd, iodepth=1
fio-2.11-12-g82e6
Starting 1 process
rbd engine: RBD version: 0.1.8
Jobs: 1 (f=1): [W(1)] [100.0% done] [0KB/8192KB/0KB /s] [0/2/0 iops] [eta 00m:00s]
write-4M: (groupid=0, jobs=1): err= 0: pid=19250: Sat Jun  4 22:33:11 2016
  Description  : ["write test with block size of 4M"]
  write: io=1024.0MB, bw=20640KB/s, iops=5, runt= 50802msec

2.5 使用 fio + libaio 進行測試

libaio 是 Linux native asynchronous I/O。

幾種測試模式：

隨機寫：fio/fio -filename=/mnt/ceph-rbd2 -direct=1 -iodepth 1 -thread -rw=randwrite -ioengine=libaio -bs=4M -size=1G -numjobs=1 -runtime=120 -group_reporting -name=read-libaio

這些參數的含義是：

- filename：表示待測試的設備名稱。
- iodepth： libaio 會用這個 iodepth 值來調用 io_setup 準備個能夠一次提交 iodepth 個 IO 的上下文，同時申請個io請求隊列用於保持IO。
- iodepth_batch：在壓測進行的時候，系統會生成特定的IO請求，往io請求隊列裏面扔，當隊列裏面的IO個數達到 iodepth_batch 值的時候，
- iodepth_batch_complete 和 iodepth_low：調用 io_submit 批次提交請求，而後開始調用 io_getevents 開始收割已經完成的IO。每次收割多少呢？因爲收割的時候，超時時間設置爲0，因此有多少已完成就算多少，最多能夠收割 iodepth_batch_complete 值個。隨着收割，IO隊列裏面的IO數就少了，那麼須要補充新的IO。何時補充呢？當IO數目降到 iodepth_low 值的時候，就從新填充，保證 OS 能夠看到至少 iodepth_low 數目的io在電梯口排隊着。

root@client:/home/s1# fio/fio -filename=/mnt/ceph-rbd2 -direct=1 -iodepth 1 -thread -rw=randwrite -ioengine=libaio -bs=4M -size=1G -numjobs=1 -runtime=120 -group_reporting -name=read-libaio read-libaio: (g=0): rw=randwrite, bs=4M-4M/4M-4M/4M-4M, ioengine=libaio, iodepth=1
fio-2.11-12-g82e6
Starting 1 thread
Jobs: 1 (f=1): [w(1)] [100.0% done] [0KB/94302KB/0KB /s] [0/23/0 iops] [eta 00m:00s]
read-libaio: (groupid=0, jobs=1): err= 0: pid=20256: Sun Jun  5 10:00:55 2016
  write: io=1024.0MB, bw=102510KB/s, iops=25, runt= 10229msec
    slat (usec): min=342, max=5202, avg=1768.90, stdev=1176.00
    clat (usec): min=332, max=165391, avg=38165.11, stdev=27987.64
     lat (msec): min=3, max=167, avg=39.94, stdev=28.00
    clat percentiles (msec):
     |  1.00th=[    8],  5.00th=[   18], 10.00th=[   19], 20.00th=[   20],
     | 30.00th=[   22], 40.00th=[   25], 50.00th=[   29], 60.00th=[   31],
     | 70.00th=[   36], 80.00th=[   47], 90.00th=[   83], 95.00th=[  105],
     | 99.00th=[  123], 99.50th=[  131], 99.90th=[  165], 99.95th=[  165],
     | 99.99th=[  165]
    bw (KB  /s): min=32702, max=172032, per=97.55%, avg=99999.10, stdev=36075.23
    lat (usec) : 500=0.39%
    lat (msec) : 4=0.39%, 10=0.39%, 20=21.48%, 50=57.81%, 100=14.45%
    lat (msec) : 250=5.08%
  cpu          : usr=0.62%, sys=3.65%, ctx=316, majf=0, minf=9
  IO depths    : 1=100.0%, 2=0.0%, 4=0.0%, 8=0.0%, 16=0.0%, 32=0.0%, >=64=0.0%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     issued    : total=r=0/w=256/d=0, short=r=0/w=0/d=0, drop=r=0/w=0/d=0
     latency   : target=0, window=0, percentile=100.00%, depth=1

Run status group 0 (all jobs):
  WRITE: io=1024.0MB, aggrb=102510KB/s, minb=102510KB/s, maxb=102510KB/s, mint=10229msec, maxt=10229msec

Disk stats (read/write):
  sda: ios=0/1927, merge=0/1, ticks=0/30276, in_queue=30420, util=98.71%

隨機讀：fio/fio -filename=/mnt/ceph-rbd2 -direct=1 -iodepth 1 -thread -rw=randread -ioengine=libaio -bs=4M -size=1G -numjobs=1 -runtime=120 -group_reporting -name=read-libaio
順序寫：fio/fio -filename=/mnt/ceph-rbd2 -direct=1 -iodepth 1 -thread -rw=write -ioengine=libaio -bs=4M -size=1G -numjobs=1 -runtime=120 -group_reporting -name=read-libaio
隨機寫：fio/fio -filename=/mnt/ceph-rbd2 -direct=1 -iodepth 1 -thread -rw=randwrite -ioengine=libaio -bs=4M -size=1G -numjobs=1 -runtime=120 -group_reporting -name=read-libaio

3. 總結

3.1 測試工具小結

工具	用途	語法	說明
dd	磁盤讀寫性能測試	dd if=/dev/zero of=/root/testfile bs=1G count=1 oflag=direct/dsync/sync	https://www.thomas-krenn.com/en/wiki/Linux_I/O_Performance_Tests_using_dd
iperf	網絡帶寬性能測試		https://iperf.fr/
rados bench	RADOS 性能測試工具	rados bench -p <pool_name> <seconds> <write\|seq\|rand> -b <block size> -t --no-cleanup	Ceph 自帶的 RADOS 性能測試工具
rados load-gen	RADOS 性能測試工具	# rados -p rbd load-gen --num-objects #產生的對象數目 --min-object-size #最小對象大小 --max-object-size #最大對象大小 --max-ops #最大操做數目 --min-op-len #最小操做長度 --max-op-len #最大操做長度 --read-percent #讀操做的百分比 --target-throughput #目標吞吐量，單位 MB --run-length #運行時長，單位秒	Ceph 自帶的 rados 性能測試工具可在集羣內產生指定類型的負載比 rados bench 功能豐富，能指定更多的參數
rbd bench-write	ceph 自帶的 rbd 性能測試工具	rbd bench-write <RBD image name> --io-size：單位 byte，默認 4M --io-threads：線程數，默認 16 --io-total：總寫入字節，默認 1024M --io-pattern <seq\|rand>：寫模式，默認爲 seq 即順序寫	只能對塊設備作寫測試
fio + rbd ioengine	fio 結合 rbd IO 引擎的性能測試工具	參考 fio --help	Linux 平臺上作 IO 性能測試的瑞士軍刀能夠對使用內核內 rbd 和用戶空間 librados 進行比較標準規則 - 順序和隨機 IO 塊大小 - 4k，16k，64k，256k 模式 - 讀和寫支持混合模式
fio + libaio	fio 結合 linux aio 的 rbd 性能測試	參考 fio --help

3.2 測試結果比較

所使用的命令：
- rbd bench-write pool.host/image.ph2 --io-total 1719973000 --io-size 4096000 --io-threads 1 --io-pattern rand/seq
- rados -p pool.host bench 20 write -t 1 --no-cleanup
- rados -p pool100 load-gen --read-percent 0 --min-object-size 1073741824 --max-object-size 1073741824 --max-ops 1 --read-percent 0/100 --min-op-len 4194304 --max-op-len 4194304 --target-throughput 1073741824 --max_backlog 1073741824
- ceph tell osd.0 bench
- fio/fio -filename=/dev/rbd4 -direct=1 -iodepth 1 -thread -rw=write/read/randwrite/randread -ioengine=libaio -bs=4M -size=1G -numjobs=1 -runtime=120 -group_reporting -name=read-libaio
結果（僅在做者的測試環境的客戶端節點上運行以上命令的輸出）：

操做	dd 一個 OSD	dd 兩個 OSD	rados load-gen	rados bench	rbd bench-write	ceph tell osd.0 bench	fio + rbd	fio + libaio
順序寫	165	18	18	18	74 MB/s （IOPS 9）	40 MB/s	21 （iops 5）	18（iops 4）
隨機寫					67.8 MB/s （IOPS 8）		19 （iops 4）	16（iops 4）
順序讀	460	130	100	109	N/A		111（iops 27）	111（iops 27）
隨機讀				112	N/A		115（iops 28）	128（iops 31）

簡單結論（因爲環境、測試方法和數據有限，這些結論不必定正確，有些只是猜想，須要進一步研究，僅供參考）：

rados bench 和在兩個 OSD 上同時作 dd 的性能差很少。
fio + rbd 和 fio + libaio 的結果差很少，相比之下 fio + rbd 還要好一點點。
fio 順序寫和讀的 BW 和兩個 OSD 同時寫和讀的 BW 差很少。
fio 順序寫的 BW 差很少是單個 OSD 的 bench 的一半（由於個人 pool 的 size 爲 2）。
rados load-gen，rodos bench 和 fio rbd/libaio 的結果都差很少，可見均可以信任，只是每一種都有其特長，選擇合適你的測試應用場景的某個便可。
rdb bench-write 的值明顯偏高，緣由未知，也許存在 bug，詳情可參考 rbd bench-write vs dd performance confusion，選擇時需慎重。