Multipath即多路徑,是個通用概念。這裏要介紹的是開源的存儲多路徑技術,也就是DM multipath。有關multipath介紹很多,這裏 主要記錄我對multipath最初幾個問題和答案:html
使用虛擬機和iscsi。裝一虛擬機,添加塊設備,添加兩個網卡,再用這個塊設備建一個iscsi target。而後在一個想玩multipath的機器 上面,用iscsi client去鏈接iscsi target。至此,用lsblk會查看到原來的塊設備有兩個設備節點。安全
有時看到一串16進制數字(WWID), 有時是以mpath爲前綴的名字(user-friendly name), 有時是任意字母串(alia name)。multipath默 認用的是WWID,爲何不用好記的名字呢? 好記的名字不能工做的一個情景:根文件系統不能在multipath設備上面。好記的名字和 WWID之間的映射是保存在/etc/multipath/bindings文件裏的。要訪問這個文件,根文件系統必須已經掛載上了,而multipath服務在initrd裏就要開始工做,那個時候尚未根系統。所以,默認設置爲wwid是爲了安全。less
2:0:0:1
設備地址,數字分別對應:Host:Bus:Target:Lun
。好比咱們讓iscsi target走了兩個IP地址,那麼對於同一個設備只有 host
字段不一樣。好比:2:0:0:1
和3:0:01
。ide
起初,我對這個概念有混淆:認爲一個真實設備對應的全部路徑爲一個path group,即認爲下面是一個path group:ui
multipath-demo:~ # multipath -l 14945540000000000ccb70d0ceeee4280f8450284d6298b59 dm-0 IET,VIRTUAL-DISK size=10G features='1 retain_attached_hw_handler' hwhandler='0' wp=rw |-+- policy='service-time 0' prio=0 status=active | `- 2:0:0:0 sda 8:0 active undef unknown `-+- policy='service-time 0' prio=0 status=enabled `- 3:0:0:0 sdc 8:32 active undef unknown
其實,dm-0設備有兩個path group,每一個PG都只有一個路徑(真實環境有多條),狀態active
的是正在工做的路徑,狀態enabled
處於備用狀態,並不下發IO。 爲此,請教了作multipath的同事Martin:this
Please have a look at http://christophe.varoqui.free.fr/refbook.html Path groups are mainly used for active/passive setups, and for cases where some paths have a higher latency/lower bandwidth than others (imagine a mirrored storage with mirror legs in different physical locations, disaster avoidance: the local mirror will be much faster than remote mirrors). Only one path group is "active" at any given time. The others are serving as standby, for the case that all paths in the currently active group fail. Depending on the storage array, the host may need to take explicit action to switch from one path group to another (e.g. send a certain SCSI command that forces the storage to activate the stand-by ports). If the active path group contains multiple paths, switching between these paths (more precisely: between those paths in the path group which are not in failed state) is controlled by the "path_selector" algorithm in the kernel. The are 3 algorithms: "round-robin", "queue- length", and "service-time". See multipath.conf(5). Switching of paths inside a path group, unlike switching between path groups, is assumed to be instanteneous, and to require no explicit action. Regardless which path selector is in use, every healthy path will receive IO sooner or later, unless the multipath device is completely idle. How the paths are grouped into path groups at discovery time is determined by the "path_grouping_policy". It's "failover" by default, meaning that there's a dedicated path group for every path. But multipath's builtin hardware table sets different defaults for many real-world storage arrays. For modern setups, "group_by_prio" is often the best, combined with "detect_prio yes" or or a "prio" setting that assigns different priority to paths with different quality (e.g. "alua", "rdac", or "path_latency"). Path groups are assigned a priority which is calculated as the average of all non-failed paths in the path group. At startup, the path group with the highest prio is set as active PG. When all paths in this PG fail, the kernel will switch to the next-best PG. When paths in the best PG return to good state, the "failback" configuration on determines if, and when, to switch back to the best PG.
path grouping policy 默認是failover
, 如martin所說,各設備廠商默認策略不一樣,主流的在用group_by_prio
,做用就是把路徑分組。IO scheduling policy默認是service time
, 負責如何在一個PG的路徑中分配IO。對此,Martin給出了詳細的解釋。lua