drbd共享存儲的簡單配置-高可用存儲

  1. 主機1:server5.example.com 172.25.254.5html

    主機2:server6.example.com 172.25.254.56
    java

  2. 安裝drbdnode

  3. yum install gcc flex rpm-build kernel-devel -y
    rpmbuild ~ #在家目錄生成 rpmbuild 編譯所需路徑
    cp drbd-8.4.0.tar.gz rpmbuild/SOURCES/
    tar zxf drbd-8.4.0.tar.gz
    cd drbd-8.4.0
    ./configure --enable-spec --with-km
    rpmbuild -bb drbd.spec
    #編譯生成 drbd rpm 包
    rpmbuild -bb drbd-km.spec #編譯 drbd 內核模塊
    cd ~/rpmbuild/RPMS/x86_64
    rpm -ivh *
    拷貝生成的 rpm 包到另外一主機,並安裝軟件包:
    scp ~/rpmbuild/RPMS/x86_64/* 172.25.254.6:/root

    2.配置drbd.res文件 /etc/drbd/drbd.d
    mysql

  4. resource mysqldata {
            meta-disk internal;
            device /dev/drbd1;
            syncer {
                    verify-alg sha1;
            }
    on server5.example.com { // 此處必須爲主機名,解析裏改也不行。切忌
            disk /dev/vdb; /drbd要使用的存儲磁盤
            address 172.25.254.5:7789;
    }
    on server6.example.com {
            disk /dev/vdb;
            address 172.25.254.6:7789;
    }
    }
    兩臺主機都執行的操做
    drbdadm create-md mysqldata
    /etc/init.d/drbd start
    cat /proc/drbd 能夠查看狀態
    下來在server5上將 server5 設置爲 primary 節點,並同步數據:(在 demo 主機執行如下命令)
    drbdsetup /dev/drbd1 primary --force
    在兩臺主機上查看同步狀態:
    watch cat /proc/drbd
    數據同步結束後建立文件系統:
    mkfs.ext4 /dev/drbd1
    掛載文件系統:
    mount /dev/drbd1 /var/lib/mysql
    而後在html裏新的文件的都會保存
    要在另外一臺服務器同步
    首先須要將server5上的/dev/drbd1 卸載
    server5設置爲secondary 
    drbdadm secondary mysqldata
    server6上執行 drbdadm primary mysqldata
    drbdadm primary mysqldata
    mount /dev/drbd1 /var/lib/mysql
    便可實現同步數據庫操做


drbd介紹、工做原理及腦裂故障處理               
算法



1、drbd基本介紹sql

   drbd(全稱爲Distributed Replicated Block Device,簡稱drbd)分佈式塊設備複製,說白了就是在不一樣節點上兩個相同大小的設備塊級別之間的數據同步鏡像。drbd是由內核模塊和相關腳本而構成,用以構建高可用性的集羣。數據庫

   在高可用(HA)解決方案中使用drbd的功能,能夠代替使用一個共享盤陣存儲設備。由於數據同時存在於本地主機和遠程主機上,在遇到須要切換的時候,遠程主機只須要使用它上面的那份備份數據,就能夠繼續提供服務了。安全


2、drbd的結構示意圖及工做原理服務器

002815598.gif

   從上圖咱們能夠清晰的看出drbd是以主從(Primary/Secondary)方式工做的,這點原理與mysql的主從複製的架構有些類似。主節點 上的drbd提高爲Primary並負責接收寫入數據,當數據到達drbd模塊時,一份繼續往下走寫入到本地磁盤實現數據的持久化,同時並將接收到的要寫 入的數據發送一分到本地的drbd設備上經過tcp傳到另一臺主機的drbd設備上(Secondary node),另外一臺主機上的對應的drbd設備再將接收到的數據存入到本身的磁盤當中。這裏與mysql的基於經過二進制日誌完成數據的複製的確很類似, 可是也有一些不一樣之處。好比:mysql的從節點不能寫可是能夠讀,可是drbd的從節點是不能讀、不能掛載。網絡

   所以,drbd對同一設備塊每次只容許對主節點進行讀、寫操做,從節點不能寫也不能讀。這樣感受是否是對主機有資源浪費,的確HA架構中爲了提供冗餘能 力是有資源浪費,可是你能夠對上圖的兩臺主機創建兩個drbd資源並互爲主從,這樣兩臺機器都能利用起來,可是配置起來就複雜了。可是話又說回來,用 drbd做爲廉價的共享存儲設備,要節約不少成本,由於價格要比專用的存儲網絡便宜不少,其性能與穩定性方面也還不錯。


3、drbd的複製模式(協議)

   A協議:

       異步複製協議。一旦本地磁盤寫入已經完成,數據包已在發送隊列中,則寫被認爲是完成的。在一個節點發生故障時,可能發生數據丟失,由於被寫入到遠程節點 上的數據可能仍在發送隊列。儘管,在故障轉移節點上的數據是一致的,但沒有及時更新。所以,這種模式效率最高,可是數據不安全,存在數據丟失。

   B協議:

       內存同步(半同步)複製協議。一旦本地磁盤寫入已完成且複製數據包達到了對等節點則認爲寫在主節點上被認爲是完成的。數據丟失可能發生在參加的兩個節點同時故障的狀況下,由於在傳輸中的數據可能不會被提交到磁盤

   C協議:

       同步複製協議。只有在本地和遠程節點的磁盤已經確認了寫操做完成,寫才被認爲完成。沒有數據丟失,因此這是一個羣集節點的流行模式,但I/O吞吐量依賴於網絡帶寬。所以,這種模式數據相對安全,可是效率比較低。


4、drbd的安裝配置

   一、安裝 sudo apt-get install drbd8-utils

   二、兩節點準備工做

       node一、node2時間同步;

       兩節點各自準備一個大小相同的分區塊;

       創建雙機互信,實現互信登錄

   三、drbd文件結構說明

       /etc/drbd.conf                        主配置文件

       /etc/drbd.d/global_common.conf        定義配置global、common段

       /etc/drbd.d/*.res                     定義資源

   四、drbd配置

       4.一、global_common.conf

           global {

               usage-count no;       是否加入統計

           }

           common {

               protocol C;            使用什麼協議

               handlers {

                       定義處理機制程序,/usr/lib/drbd/ 裏有大量的程序腳本,可是不必定靠譜

               }

               startup {

                       定義啓動超時等

               }

               disk {

                       磁盤相關公共設置,好比I/O,磁盤故障了怎麼辦

               }

               net {

                       定義網絡傳輸、加密算法等

               }

               syncer {

                       rate 1000M;         定義網絡傳輸速率

               }

           }

       4.二、資源配置(*.res)            

           resource name{

               meta-disk internal;      # node1/node2 公用部分能夠提取到頂部

               on node1{

                   device    /dev/drbd0;

                   disk      /dev/sda6;

                   address   192.168.1.101:7789;

             }

             on node2 {

                   device    /dev/drbd0;

                   disk      /dev/sda6;

                   address   192.168.1.102:7789;

             }

           }

       五、以上文件在兩個節點上必須相同,所以,能夠基於ssh將剛纔配置的文件所有同步至另一個節點

             # scp -p  /etc/drbd.d/*  node2:/etc/drbd.d /

       六、啓動測試

           1)初始化資源,在Node1和Node2上分別執行:

           # sudo drbdadm create-md mydata


           2)啓動服務,在Node1和Node2上分別執行:

           # sudo service drbd start


           3)查看啓動狀態

           # cat /proc/drbd


           4)從上面的信息中能夠看出此時兩個節點均處於Secondary狀態。因而,咱們接下來須要將其中一個節點設置爲Primary。在要設置爲Primary的節點上執行以下命令:

           # sudo drbdadm -- --overwrite-data-of-peer primary all (第一次執行此命令)

           # sudo drbdadm primary --force mydata


           第一次執行完此命令後,在後面若是須要設置哪一個是主節點時,就可使用另一個命令:

           # /sbin/drbdadm primary r0或者/sbin/drbdadm primary all


           5)監控數據同步

           # watch -n1 'cat /proc/drbd'


           6)數據同步完成格式化drbd分區,並掛載

           # sudo mke2fs -t ext4 /dev/drbd0

           # sudo moun/dev/drbd0 /mnt

           # ls -l /mnt

014333446.jpg

       測試OK~


5、腦裂故障處理

014437789.jpg

   在作Corosync+DRBD的高可用MySQL集羣實驗中,意外發現各個節點沒法識別對方,鏈接爲StandAlone則主從節點沒法通訊,效果如上圖。


   如下爲drbd腦裂手動恢復過程(以node1的數據位主,放棄node2不一樣步數據):

   1)將Node1設置爲主節點並掛載測試,mydata爲定義的資源名

       # drbdadm primary mydata  

       # mount /dev/drbd0 /mydata

       # ls -lh /mydata   查看文件狀況


   2)將Node2設置爲從節點並丟棄資源數據

       # drbdadm secondary mydata

       # drbdadm -- --discard-my-data connect mydata


   3)在Node1主節點上手動鏈接資源

       # drbdadm connect mydata


   4)最後查看各個節點狀態,鏈接已恢復正常

       # cat /proc/drbd

       測試效果以下圖(故障修復):

015146467.jpg


6、drbd其餘相關(文獻部分):

   一、 DRBD各類狀態含義

The resource-specific output from/proc/drbd contains various pieces ofinformation about the resource:

  • cs(connection state). Status of the network connection. See the section called 「Connection states」 for details about the various connection states.

  • ro(roles). Roles of the nodes. The role of the local node isdisplayed first, followed by the role of the partnernode shown after the slash. See the section called 「Resource roles」 for details about thepossible resource roles.

  • ds(disk states). State of the hard disks. Prior to the slash thestate of the local node is displayed, after the slashthe state of the hard disk of the partner node isshown. See the section called 「Disk states」 for details about the variousdisk states.

  • ns(network send). Volume of net data sent to the partner via thenetwork connection; in Kibyte.

  • nr(network receive). Volume of net data received by the partner viathe network connection; in Kibyte.

  • dw(disk write). Net data written on local hard disk; inKibyte.

  • dr(disk read). Net data read from local hard disk; in Kibyte.

  • al(activity log). Number of updates of the activity log area of the metadata.

  • bm(bit map). Number of updates of the bitmap area of the metadata.

  • lo(local count). Number of open requests to the local I/O sub-systemissued by DRBD.

  • pe(pending). Number of requests sent to the partner, but thathave not yet been answered by the latter.

  • ua(unacknowledged). Number of requests received by the partner via thenetwork connection, but that have not yet beenanswered.

  • ap(application pending). Number of block I/O requests forwarded to DRBD, butnot yet answered by DRBD.

  • ep(epochs). Number of epoch objects. Usually 1. Might increaseunder I/O load when using either thebarrier or the nonewriteordering method. Since 8.2.7.

  • wo(write order). Currently used write ordering method:b (barrier), f(flush),d(drain) or n(none). Since8.2.7.

  • oos(out of sync). Amount of storage currently out of sync; inKibibytes. Since 8.2.6.

   二、 DRBD鏈接狀態

A resource may have one of the following connectionstates:

  • StandAlone. No network configuration available. The resourcehas not yet been connected, or has beenadministratively disconnected (using drbdadm disconnect), or has dropped its connectiondue to failed authentication or split brain.

  • Disconnecting. Temporary state during disconnection. The nextstate is StandAlone.

  • Unconnected. Temporary state, prior to a connection attempt.Possible next states: WFConnection andWFReportParams.

  • Timeout. Temporary state following a timeout in thecommunication with the peer. Next state:Unconnected.

  • BrokenPipe. Temporary state after the connection to the peerwas lost. Next state: Unconnected.

  • NetworkFailure. Temporary state after the connection to thepartner was lost. Next state: Unconnected.

  • ProtocolError. Temporary state after the connection to thepartner was lost. Next state: Unconnected.

  • TearDown. Temporary state. The peer is closing theconnection. Next state: Unconnected.

  • WFConnection. This node is waiting until the peer node becomesvisible on the network.

  • WFReportParams. TCP connection has been established, this nodewaits for the first network packet from thepeer.

  • Connected. A DRBD connection has been established, datamirroring is now active. This is the normalstate.

  • StartingSyncS. Full synchronization, initiated by theadministrator, is just starting. The next possiblestates are: SyncSource or PausedSyncS.

  • StartingSyncT. Full synchronization, initiated by theadministrator, is just starting. Next state:WFSyncUUID.

  • WFBitMapS. Partial synchronization is just starting. Nextpossible states: SyncSource or PausedSyncS.

  • WFBitMapT. Partial synchronization is just starting. Nextpossible state: WFSyncUUID.

  • WFSyncUUID. Synchronization is about to begin. Next possiblestates: SyncTarget or PausedSyncT.

  • SyncSource. Synchronization is currently running, with thelocal node being the source ofsynchronization.

  • SyncTarget. Synchronization is currently running, with thelocal node being the target ofsynchronization.

  • PausedSyncS. The local node is the source of an ongoingsynchronization, but synchronization is currentlypaused. This may be due to a dependency on thecompletion of another synchronization process, ordue to synchronization having been manuallyinterrupted by drbdadm pause-sync.

  • PausedSyncT. The local node is the target of an ongoingsynchronization, but synchronization is currentlypaused. This may be due to a dependency on thecompletion of another synchronization process, ordue to synchronization having been manuallyinterrupted by drbdadm pause-sync.

  • VerifyS. On-line device verification is currently running,with the local node being the source ofverification.

  • VerifyT. On-line device verification is currently running,with the local node being the target ofverification.