RabbitMQ腦裂問題解決方案調查

時間 2019-12-12

原文原文鏈接

現象：

RabbitMQ GUI上顯示node

Network partition detected
Mnesia reports that this RabbitMQ cluster has experienced a network partition. There is a risk of losing data. Please read RabbitMQ documentation about network partitions and the possible solutions.數據庫

緣由分析：

這是因爲網絡問題致使集羣出現了腦裂臨時解決辦法：api

在相對不怎麼信任的分區裏，對那個分區的節點實行

在出現問題的節點上執行: sbin/rabbitmqctl stop_app
在出現問題的節點上執行: sbin/rabbitmqctl start_app

注意：mq集羣不能採用kill -9 殺死進程，不然生產者和消費者不能及時識別mq的斷連，會影響生產者和消費者正常的業務處理。網絡

Rabbitmq network partition的斷定及恢復策略的選擇

RabbitMQ Network Partitions問題具體分析和解決方案

Clustering and Network Partitions
RabbitMQ clusters do not tolerate network partitions well. If you are thinking of clustering across a WAN, don’t. You should use federation or the shovel instead.
However, sometimes accidents happen. This page documents how to detect network partitions, some of the bad effects that may happen during partitions, and how to recover from them.
RabbitMQ stores information about queues, exchanges, bindings etc in Erlang’s distributed database, Mnesia. Many of the details of what happens around network partitions are related to Mnesia’s behaviour.app

集羣和網絡分區
RabbitMQ集羣並不能很好的「忍受」網絡分區。若是你想將RabbitMQ集羣創建在廣域網上，記住那是行不通的，除非你使用federation或者shovel等插件。less

然而有時候會有一些意想不到的事情發生。本文主要講述了RabbitMQ集羣如何檢測網絡分區，發生網絡分區帶來的影響以及如何恢復。分佈式

RabbitMQ會將queues, exchanges, bindings等信息存儲在Erlang的分佈式數據庫——Mnesia中，許多圍繞網絡分區的一些細節都和這個Mnesia的行爲有關。ide

Detecting network partitions
Mnesia will typically determine that a node is down if another node is unable to contact it for a minute or so (see the page on net_ticktime). If two nodes come back into contact, both having thought the other is down, Mnesia will determine that a partition has occurred. This will be written to the RabbitMQ log in a form like:ui

=ERROR REPORT==== 15-Oct-2012::18:02:30 === Mnesia(rabbit@smacmullen): ** ERROR ** mnesia_event got {inconsistent_database, running_partitioned_network, hare@smacmullen}

RabbitMQ nodes will record whether this event has ever occurred while the node is up, and expose this information through rabbitmqctl cluster_status and the management plugin.
rabbitmqctl cluster_status will normally show an empty list for partitions:this

# rabbitmqctl cluster_status Cluster status of node rabbit@smacmullen ... [{nodes,[{disc,[hare@smacmullen,rabbit@smacmullen]}]}, {running_nodes,[rabbit@smacmullen,hare@smacmullen]}, {partitions,[]}] ...done.

However, if a network partition has occurred then information about partitions will appear there:

# rabbitmqctl cluster_status Cluster status of node rabbit@smacmullen ... [{nodes,[{disc,[hare@smacmullen,rabbit@smacmullen]}]}, {running_nodes,[rabbit@smacmullen,hare@smacmullen]}, {partitions,[{rabbit@smacmullen,[hare@smacmullen]}, {hare@smacmullen,[rabbit@smacmullen]}]}] ...done.

The management plugin API will return partition information for each node under partitions in /api/nodes. The management plugin UI will show a large red warning on the overview page if a partition has occurred.

檢測網絡分區
若是另外一個節點在一分鐘（或者一個net_ticktime時間）內不能鏈接上一個節點，那麼Mnesia一般任務這個節點已經掛了。就算以後兩個節點連通（譯者注：應該是指網絡上的可連通），可是這兩個節點都認爲對方已經掛了，Mnesia此時認定發送了網絡分區的狀況。這些會被記錄在RabbitMQ的日誌中，以下所示：

=ERROR REPORT==== 15-Oct-2012::18:02:30 === Mnesia(rabbit@smacmullen): ** ERROR ** mnesia_event got {inconsistent_database, running_partitioned_network, hare@smacmullen}

當一個節點起來的時候，RabbitMQ會記錄是否發生了網絡分區，你能夠經過rabbitmqctl cluster_status這個命令或者管理插件看到相關信息。正常狀況下，經過rabbitmqctl cluster_status命令查看到的信息中partitions那一項是空的，就像這樣：

# rabbitmqctl cluster_status Cluster status of node rabbit@smacmullen ... [{nodes,[{disc,[hare@smacmullen,rabbit@smacmullen]}]}, {running_nodes,[rabbit@smacmullen,hare@smacmullen]}, {partitions,[]}] ...done.

然而當網絡分區發生時，會變成這樣：

# rabbitmqctl cluster_status Cluster status of node rabbit@smacmullen ... [{nodes,[{disc,[hare@smacmullen,rabbit@smacmullen]}]}, {running_nodes,[rabbit@smacmullen,hare@smacmullen]}, {partitions,[{rabbit@smacmullen,[hare@smacmullen]}, {hare@smacmullen,[rabbit@smacmullen]}]}] ...done.

經過管理插件的API（under partitions in /api/nodes）能夠獲取到在各個節點的分區信息.

經過Web UI能夠在Overview這一頁看到一個大的紅色的告警窗口，就像這樣：

During a network partition
While a network partition is in place, the two (or more!) sides of the cluster can evolve independently, with both sides thinking the other has crashed. Queues, bindings, exchanges can be created or deleted separately.Mirrored queues which are split across the partition will end up with one master on each side of the partition, again with both sides acting independently. Other undefined and weird behaviour may occur.
It is important to understand that when network connectivity is restored, this state of affairs persists. The cluster will continue to act in this way until you take action to fix it.

網絡分區期間
當一個集羣發生網絡分區時，這個集羣會分紅兩部分（或者更多），它們各自爲政，互相都認爲對方分區內的節點已經掛了，包括queues, bindings, exchanges這些信息的建立和銷燬都處於自身分區內，與其餘分區無關。若是原集羣中配置了鏡像隊列，而這個鏡像隊列又牽涉到兩個（或者多個）網絡分區的節點時，每個網絡分區中都會出現一個master節點（譯者注：若是rabbitmq版本較新，分區節點個數充足，也會出現新的slave節點。），對於各個網絡分區，此隊列都是互相獨立的。固然也會有一些其餘未知的、怪異的事情發生。

當網絡（這裏只網絡連通性，network connectivity）恢復時，網絡分區的狀態仍是會保持，除非你採起了一些措施去解決他。

Partitions caused by suspend / resume
While we refer to 「network」 partitions, really a partition is any case in which the different nodes of a cluster can have communication interrupted without any node failing. In addition to network failures, suspending and resuming an entire OS can also cause partitions when used against running cluster nodes - as the suspended node will not consider itself to have failed, or even stopped, but the other nodes in the cluster will consider it to have done so.
While you could suspend a cluster node by running it on a laptop and closing the lid, the most common reason for this to happen is for a virtual machine to have been suspended by the hypervisor. While it’s fine to run RabbitMQ clusters in virtualised environments, you should make sure that VMs are not suspended while running. Note that some virtualisation features such as migration of a VM from one host to another will tend to involve the VM being suspended.
Partitions caused by suspend and resume will tend to be asymmetrical - the suspended node will not necessarily see the other nodes as having gone down, but will be seen as down by the rest of the cluster. This has particular implications for pause_minority mode.

掛起/恢復致使的分區
當咱們涉及到「網絡分區」時，當集羣中的不一樣的節點發生交互失敗中斷(communication interrupted)等，可是又沒有節點掛掉這種狀況下，纔是發生了分區。然而除了網絡失敗(network failures)緣由，操做系統的掛起或者恢復也會致使集羣內節點的網絡分區。由於發生掛起的節點不會認爲自身已經失敗或者中止工做，可是集羣內的其餘節點會這麼認爲。

若是一個集羣中的一個節點運行在一臺筆記本上，而後你合上了筆記本，這樣這個節點就掛起了。或者說一種更常見的現象，節點運行在某臺虛擬機上，而後虛擬機的管理程序掛起了這個虛擬機節點，這樣也可能發生掛起。

因爲掛起/恢復致使的分區並不對稱——掛起的節點將看不到其餘節點是否消失，可是集羣中剩餘的節點能夠觀察到，這一點貌似暗示了pause_minority這種模式（下面會涉及到）。

Recovering from a network partition
To recover from a network partition, first choose one partition which you trust the most. This partition will become the authority for the state of Mnesia to use; any changes which have occurred on other partitions will be lost.
Stop all nodes in the other partitions, then start them all up again. When they rejoin the cluster they will restore state from the trusted partition.
Finally, you should also restart all the nodes in the trusted partition to clear the warning.
It may be simpler to stop the whole cluster and start it again; if so make sure that the first node you start is from the trusted partition.

從網絡分區中恢復
將來從網絡分區中恢復，首先須要挑選一個信任的分區，這個分區纔有決定Mnesia內容的權限，發生在其餘分區的改變將不被記錄到Mnesia中而直接丟棄。

中止（stop）其餘分區的節點，而後啓動(start)這些節點，以後從新將這些節點加入到當前信任的分區之中。

最後，你應該重啓(restart)信任的分區中全部的節點，以去除告警。

你也能夠簡單的關閉整個集羣的節點，而後再啓動每個節點，固然，你要確保你啓動的第一個節點在你所信任的分區之中。

Automatically handling partitions
RabbitMQ also offers three ways to deal with network partitions automatically: pause-minority mode, pause-if-all-down mode and autoheal mode. (The default behaviour is referred to as ignore mode).
In pause-minority mode RabbitMQ will automatically pause cluster nodes which determine themselves to be in a minority (i.e. fewer or equal than half the total number of nodes) after seeing other nodes go down. It therefore chooses partition tolerance over availability from the CAP theorem. This ensures that in the event of a network partition, at most the nodes in a single partition will continue to run. The minority nodes will pause as soon as a partition starts, and will start again when the partition ends.
In pause-if-all-down mode, RabbitMQ will automatically pause cluster nodes which cannot reach any of the listed nodes. In other words, all the listed nodes must be down for RabbitMQ to pause a cluster node. This is close to the pause-minority mode, however, it allows an administrator to decide which nodes to prefer, instead of relying on the context. For instance, if the cluster is made of two nodes in rack A and two nodes in rack B, and the link between racks is lost, pause-minority mode will pause all nodes. In pause-if-all-down mode, if the administrator listed the two nodes in rack A, only nodes in rack B will pause. Note that it is possible the listed nodes get split across both sides of a partition: in this situation, no node will pause. That is why there is an additional ignore/autoheal argument to indicate how to recover from the partition.
In autoheal mode RabbitMQ will automatically decide on a winning partition if a partition is deemed to have occurred, and will restart all nodes that are not in the winning partition. Unlike pause_minority mode it therefore takes effect when a partition ends, rather than when one starts.
The winning partition is the one which has the most clients connected (or if this produces a draw, the one with the most nodes; and if that still produces a draw then one of the partitions is chosen in an unspecified way).
You can enable either mode by setting the configuration parameter cluster_partition_handling for therabbit application in your configuration file to:
● pause_minority
● {pause_if_all_down, [nodes], ignore | autoheal}
● autoheal

自動處理分區
RabbitMQ提供了三種方法自動的解決網絡分區：pause-minority mode, pause-if-all-down mode以及autoheal mode。（默認的是ignore模式）

在pause-minority mode下，顧名思義，當發生網絡分區時，集羣中的節點在觀察到某些節點「丟失」時，會自動檢測其自身是否處於少數派（小於或者等於集羣中一半的節點數），RabbitMQ會自動關閉這些節點的運做。根據CAP原理來講，這裏保障了P，即分區耐受性（partition tolerance）。這樣確保了在發生網絡分區的狀況下，大多數節點（固然這些節點在同一個分區中）能夠繼續運行。「少數派」中的節點在分區發生時會關閉，當分區結束時又會啓動。

在pause-if-all-down mode下，RabbitMQ在集羣中的節點不能和list中的任何節點交互時纔會關閉集羣的節點（{pause_if_all_down, [nodes], ignore | autoheal}，list即[nodes]中的節點）。也就是說，只有在list中全部的節點失敗時纔會關閉集羣的節點。這個模式和pause-minority mode有點類似，可是，這個模式容許管理員的任命而挑選信任的節點，而不是根據上下文關係。舉個案例，一個集羣，有四個節點，2個節點在A機架上，另2個節點在B機架上，此時A機架和B機架的鏈接丟失，那麼根據pause-minority mode全部的節點都將被關閉。

在autoheal mode下，當認爲發生網絡分區時，RabbitMQ會自動決定一個獲勝（winning）的分區，而後重啓不在這個分區中的節點。

一個獲勝的分區（a winning partition）是指客戶端鏈接最多的一個分區。（若是產生一個平局，即有兩個（或多個）分區的客戶端鏈接數同樣多，那麼節點數最多的一個分區就是a winning partition. 若是此時節點數也同樣多，將會以一個未知的方式挑選winning partition.）

你能夠經過在RabbitMQ配置文件中設置cluster_partition_handling參數使下面任何一種模式生效：

pause_minority
{pause_if_all_down, [nodes], ignore | autoheal}
autoheal

Which mode should I pick?
It’s important to understand that allowing RabbitMQ to deal with network partitions automatically does not make them less of a problem. Network partitions will always cause problems for RabbitMQ clusters; you just get some degree of choice over what kind of problems you get. As stated in the introduction, if you want to connect RabbitMQ clusters over generally unreliable links, you should use federation or the shovel.
With that said, you might wish to pick a recovery mode as follows:
● ignore - Your network really is reliable. All your nodes are in a rack, connected with a switch, and that switch is also the route to the outside world. You don’t want to run any risk of any of your cluster shutting down if any other part of it fails (or you have a two node cluster).
● pause_minority - Your network is maybe less reliable. You have clustered across 3 AZs in EC2, and you assume that only one AZ will fail at once. In that scenario you want the remaining two AZs to continue working and the nodes from the failed AZ to rejoin automatically and without fuss when the AZ comes back.
● autoheal - Your network may not be reliable. You are more concerned with continuity of service than with data integrity. You may have a two node cluster.

我該挑選那種模式？
有一點必需要清楚，容許RabbitMQ可以自動的處理網絡分區並不必定會有正面的成效，也有能會帶來更多的問題。網絡分區會致使RabbitMQ集羣產生衆多的問題，你須要對你所遇到的問題做出必定的選擇。就像本文開篇所說的，若是你置RabbitMQ集羣於一個不可靠的網絡環境下，你須要使用federation或者shovel插件。

你可能選擇以下的恢復模式：

ignore: 你的網絡很可靠，全部的節點都在一個機架上，鏈接在同一個交換機上，這個交換機也鏈接在WAN上，你不須要冒險而關閉部分節點。（或者適合只有兩個節點的集羣。）
pause_minority: 你的網絡相對沒有那麼的可靠。好比你在EC2上創建了三個節點的集羣，假設其中一個節點宕了，在這種策略下，剩餘的兩個節點還能夠繼續工做，失敗的節點能夠在恢復以後從新加入集羣
autoheal: 你的網絡很是不可靠，你更關心服務的連續性而不是數據的完整性。適合有兩個節點的集羣。

More about pause-minority mode
The Erlang VM on the paused nodes will continue running but the nodes will not listen on any ports or do any other work. They will check once per second to see if the rest of the cluster has reappeared, and start up again if it has.
Note that nodes will not enter the paused state at startup, even if they are in a minority then. It is expected that any such minority at startup is due to the rest of the cluster not having been started yet.
Also note that RabbitMQ will pause nodes which are not in a strict majority of the cluster - i.e. containing more than half of all nodes. It is therefore not a good idea to enable pause-minority mode on a cluster of two nodes since in the event of any network partition or node failure, both nodes will pause. However, pause_minoritymode is likely to be safer than ignore mode for clusters of more than two nodes, especially if the most likely form of network partition is that a single minority of nodes drops off the network.
Finally, note that pause_minority mode will do nothing to defend against partitions caused by cluster nodes being suspended. This is because the suspended node will never see the rest of the cluster vanish, so will have no trigger to disconnect itself from the cluster.

有關pause-minority模式的更多信息
關閉的RabbitMQ節點所在主機上的Erlang虛擬機仍是在正常運行，可是此節點並不會監放任何端口也不會執行其餘任務。這些節點每秒會檢測一次剩下的集羣節點是否會再次出現，若是出現，就啓動本身繼續運行。

注意上面所說的「關閉的RabbitMQ節點」並不會在啓動時就進入關閉狀態，即便它們在「少數派（minority）」。這些「少數派」可能在「剩餘的集羣節點」沒有啓動好以前就啓動了。

一樣須要注意的是RabbitMQ也會關閉不是嚴格意義上的「大多數（majority）」——數量超過集羣的一半。所以在一個集羣只有兩個節點的時候並不適合採用pause-minority模式，由於因爲其中任何一個節點失敗而發生網絡分區時，兩個節點都會被關閉。然而若是集羣中的節點個數遠大於兩個時，pause_minority模式比ignore模式更加的可靠，特別是網絡分區一般是因爲單個節點掉出網絡。

最後，須要注意的是pause_minority模式將不會防止因爲集羣節點被掛起而致使的分區。這是由於掛起的節點將永遠不會看到集羣的其他部分的消失，所以將沒有觸發器將其從集羣中斷開。

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。