深刻理解SPDK之五 SPDK問題排查A篇

現象

運行SPDK程序,出現下面的錯誤:app

starting write I/O failed, push back, reback to previous status
starting write I/O failed, push back, reback to previous status
starting write I/O failed, push back, reback to previous status
starting write I/O failed, push back, reback to previous status
starting write I/O failed, push back, reback to previous status

致使程序沒法執行。什麼緣由呢?less

分析過程

NVME Hardware queue 的使用禁忌

參考NVME 協議,能夠看到hardware queue 是由一個submittion queue 和
completion queue 組成,兩種配合才能處理IO請求:ide

深刻理解SPDK之五  SPDK問題排查A篇

參考協議中的說明:函數

When host software builds a command for the controller to execute, it first checks to make sure that the appropriate Submission Queue (SQx) is not full. The Submission Queue is full when the number of entries in the queue is one less than the queue size. Once an empty slot (pFreeSlot) is available:
1. Host software builds a command at SQx[pFreeSlot] with:
a. CDW0.OPC is set to the appropriate command to be executed by the controller;
b. CDW0.FUSE is set to the appropriate value, depending on whether the command is a
fused operation;
c. CDW0.CID is set to a unique identifier for the command when combined with the
Submission Queue identifier;
d. The Namespace Identifier, CDW1.NSID, is set to the namespace the command applies to;
e. MPTR shall be filled in with the offset to the beginning of the Metadata Region, if there is a data transfer and the namespace format contains metadata as a separate buffer;
f. PRP1 and/or PRP2 (or SGL Entry 1 if SGLs are used) are set to the source/destination of data transfer, if there is a data transfer; and
g. CDW10 – CDW15 are set to any command specific information;
and
2. Host software writes the corresponding Submission Queue doorbell register (SQxTDBL)
to submit one or more commands for processing.
The write to the Submission Queue doorbell register triggers the controller to consume one or more new commands contained in the Submission Queue entry. The controller indicates the most recent SQ entry that has been consumed as part of reporting completions. Host software may use this information to determine when SQ slots may be re-used for new commands.

能夠看到上面三、四、五、6步都是由NVME 控制器硬件完成的,而1/2 7/8 都由host 側的軟件完成,其中一、2有嚴格前後順序的限制,7/8也有嚴格前後順序的限制。ui

SPDK默認綁核方式

基於上面處理流程,SPDK提供了封裝上面步驟一、二、七、8的API,做爲一個函數使用。若是多個線程同時調用上面的API去控制同一組hard ware queue,就可能致使打破上面的操做順序的限制。所以,在初始化的時候,SPDK線程會默認綁定到某個處理器核上去。this

@@ -448,7 +448,7 @@ int init(const char * dev_name) {
     spdk_env_opts_init(&opts);
     opts.name = "append_demo";
     opts.shm_id = 0;
     opts.core_mask = "0x8";
     if (spdk_env_init(&opts) < 0) {
         fprintf(stderr, "Unable to initialize Spdk env\n");
         return -1;

SPDK線程注意事項

經過上面的分析能夠看到:一組HW queue pair 不能同時給多個線程使用,但不一樣hard ware queue 分別被不一樣線程同時使用。spa

驗證結果

根據上面的分析,修改了程序,錯誤一會兒沒有了。線程

現象

ERROR: requested 256 hugepages but only 2 could be allocated.
Memory might be heavily fragmented. Please try flushing the system cache, or reboot the machine.code

[root@036db0018 scripts]# free -m
total used free shared buffers cached
Mem: 128332 124880 3451 0 2665 102940
-/+ buffers/cache: 19275 109056
Swap: 0 0 0orm

[root@036db0018 scripts]# echo 3 > /proc/sys/vm/drop_caches

[root@036db0018 scripts]#
[root@036db0018.bdbl.baidu.com scripts]# free -m
total used free shared buffers cached
Mem: 128332 14382 113949 0 129 797
-/+ buffers/cache: 13455 114876
Swap: 0 0 0

再次執行下面的命令:[root@bdbl-inf-bce036db0018 scripts]# NRHUGE=256 ./single_setup_b0.sh config看到沒有報錯。

相關文章
相關標籤/搜索