簡述html
在搭建HyperLedger Fabric環境的過程當中,咱們會用到一個configtx.yaml文件(可參考Hyperledger Fabric 1.0 從零開始(八)——Fabric多節點集羣生產部署),該配置文件主要用於構建創世區塊(在構建創世區塊以前須要先建立與之對應的全部節點的驗證文件集合),其中在配置Orderer信息中有一個OrdererType參數,該參數可配置爲"solo" and "kafka",以前博文所講的環境配置皆是solo,即單節點共識。node
本文主要介紹的是採用kafka(分佈式隊列)方式實現的共識方案。
使用kafka集羣配置的緣由也很簡單,爲orderer共識及排序服務提供足夠的容錯空間,當咱們向peer節點提交Transaction的時候,peer節點會獲得或返回(基於SDK)一個讀寫集結果,該結果會發送給orderer節點進行共識和排序,此時若是orderer節點忽然down掉,導致請求服務失效而引起的數據丟失等問題,且目前的sdk對orderer發送的Transaction的回調會佔用極長的時間,當大批量數據導入的時候該回調可認爲不可用。git
固此,在部署生產環境時,須要對orderer進行容錯處理,而所謂的容錯即搭建一個orderer節點集羣,該集羣會依賴於kafka和zookeeper。github
crypto-config.yaml算法
該文件在以前介紹過,將被crytogen工具調用,文件中包括了網絡拓撲,同時容許咱們給organization以及component(隸屬於organization的組件)生成一個證書與私鑰的集合。每個organization被分配一個惟一的根證書(綁定了隸屬於organization的具體的component,包括peers與orderers)。Hyperledger Fabric的transaction與通訊均被節點的私鑰(keystore)進行簽名,截止被公鑰進行驗證(signcerts)。docker
在crypto-comfig中有一個OrdererOrgs配置,在該配置中咱們能夠設置當前Fabric生產中容許的最大orderer節點數量及名稱,具體在OrdererOrgs-Specs中進行設置,以下圖所示:apache
在上述配置中,咱們假定設置了三臺orderer節點服務器,這個數量根據自身平臺可自行設置。服務器
隨後,咱們能夠經過以下命令來生成咱們須要的驗證文件信息集:網絡
./bin/cryptogen generate --config=./crypto-config.yaml
該信息集在ordererOrganizations目錄下,經過瀏覽該目錄下組織機構目錄下的orderers目錄,咱們能夠看到以下圖所示結構:session
即生成全部咱們所需的orderer節點驗證文件。
configtx.yaml
該文件即本文開頭所述文件,用於構建創世區塊所需文件。
結合crypto-comfig文件的配置內容,既然指定並建立了三個orderer節點服務配置,則根據上述配置結果,咱們來定義configtx文件中的Orderer配置信息,具體參考以下:
1 ################################################################################ 2 # 3 # SECTION: Orderer 4 # 5 # - This section defines the values to encode into a config transaction or 6 # genesis block for orderer related parameters 7 # 8 ################################################################################ 9 Orderer: &OrdererExample 10 11 # Orderer Type: The orderer implementation to start 12 # Available types are "solo" and "kafka" 13 OrdererType: kafka 14 15 Addresses: 16 - orderer0.example.com:7050 17 - orderer1.example.com:7050 18 - orderer2.example.com:7050 19 20 # Batch Timeout: The amount of time to wait before creating a batch 21 BatchTimeout: 2s 22 23 # Batch Size: Controls the number of messages batched into a block 24 BatchSize: 25 26 # Max Message Count: The maximum number of messages to permit in a batch 27 MaxMessageCount: 10 28 29 # Absolute Max Bytes: The absolute maximum number of bytes allowed for 30 # the serialized messages in a batch. 31 # 設置最大的區塊大小。每一個區塊最大有Orderer.AbsoluteMaxBytes個字節(不包括頭部)。 32 # 假定這裏設置的值爲A,記住這個值,這會影響怎樣配置Kafka代理。 33 AbsoluteMaxBytes: 99 MB 34 35 # Preferred Max Bytes: The preferred maximum number of bytes allowed for 36 # the serialized messages in a batch. A message larger than the preferred 37 # max bytes will result in a batch larger than preferred max bytes. 38 # 設置每一個區塊建議的大小。Kafka對於相對小的消息提供更高的吞吐量;區塊大小最好不要超過1MB。 39 PreferredMaxBytes: 512 KB 40 41 Kafka: 42 # Brokers: A list of Kafka brokers to which the orderer connects 43 # NOTE: Use IP:port notation 44 # 包含Kafka集羣中至少兩個代理的地址信息(IP:port), 45 # 這個list不須要是徹底的(這些是你的種子代理), 46 # 這個代理表示當前Order所要鏈接的Kafka代理。 47 Brokers: 48 - x.x.x.x:9092 49 - x.x.x.xx:9092 50 51 # Organizations is the list of orgs which are defined as participants on 52 # the orderer side of the network 53 Organizations:
該配置文件中的主要設置都作了註釋,可參考註釋來對本身的生產平臺進行設定,另外OrdererType的value需設置成kafka。
在完成crypto-comfig和configtx配置後,接下來就須要配置zookeeper、kafka和orderer的啓動yaml文件了。
zookeeper
zookeeper的yaml文件配置主要約定了集羣內彼此的端口信息,官方的demo已經比較詳細了,這裏貼出配置內容以下:
1 # Copyright IBM Corp. All Rights Reserved. 2 # 3 # SPDX-License-Identifier: Apache-2.0 4 # 5 # ZooKeeper的基本運轉流程: 6 # 1、選舉Leader。 7 # 2、同步數據。 8 # 3、選舉Leader過程當中算法有不少,但要達到的選舉標準是一致的。 9 # 4、Leader要具備最高的執行ID,相似root權限。 10 # 5、集羣中大多數的機器獲得響應並follow選出的Leader。 11 # 12 13 version: '2' 14 15 services: 16 17 zookeeper1: 18 container_name: zookeeper1 19 hostname: zookeeper1 20 image: hyperledger/fabric-zookeeper 21 restart: always 22 environment: 23 # ======================================================================== 24 # Reference: https://zookeeper.apache.org/doc/r3.4.9/zookeeperAdmin.html#sc_configuration 25 # ======================================================================== 26 # 27 # myid 28 # The ID must be unique within the ensemble and should have a value 29 # ID在集合中必須是惟一的而且應該有一個值 30 # between 1 and 255. 31 # 在1和255之間。 32 - ZOO_MY_ID=1 33 # 34 # server.x=[hostname]:nnnnn[:nnnnn] 35 # The list of servers that make up the ZK ensemble. The list that is used 36 # by the clients must match the list of ZooKeeper servers that each ZK 37 # server has. There are two port numbers `nnnnn`. The first is what 38 # followers use to connect to the leader, while the second is for leader 39 # election. 40 # 組成ZK集合的服務器列表。客戶端使用的列表必須與ZooKeeper服務器列表所擁有的每個ZK服務器相匹配。 41 # 有兩個端口號 `nnnnn`。第一個是追隨者用來鏈接領導者的東西,第二個是領導人選舉。 42 - ZOO_SERVERS=server.1=zookeeper1:2888:3888 server.2=zookeeper2:2888:3888 server.3=zookeeper3:2888:3888 43 ports: 44 - "2181:2181" 45 - "2888:2888" 46 - "3888:3888" 47 extra_hosts: 48 - "zookeeper1:x.x.x.x" 49 - "zookeeper2:x.x.x.xx" 50 - "zookeeper3:x.x.x.xxx" 51 - "kafka1:xx.x.x.x" 52 - "kafka2:xx.x.x.xx" 53 - "kafka3:xx.x.x.xxx" 54 - "kafka3:xx.x.x.xxxx"
zookeeper集羣將會是3個、5個或7個,它的值須要是一個奇數避免split-brain狀況,同時選擇大於1的值爲了不單點故障。超過7個zookeeper servers會被認爲overkill。
這裏給出了zookeeper1的配置文件,其它二、3…等配置文件內容相似,其中的ID禁止相同。
kafka
kafka的配置信息與zookeeper和orderer都息息相關,具體都已經寫在註釋中了,如還有疑問可留言提問,具體的kafka1的yaml以下:
1 # Copyright IBM Corp. All Rights Reserved. 2 # 3 # SPDX-License-Identifier: Apache-2.0 4 # 5 6 version: '2' 7 8 services: 9 10 kafka1: 11 container_name: kafka1 12 hostname: kafka1 13 image: hyperledger/fabric-kafka 14 restart: always 15 environment: 16 # ======================================================================== 17 # Reference: https://kafka.apache.org/documentation/#configuration 18 # ======================================================================== 19 # 20 # broker.id 21 - KAFKA_BROKER_ID=1 22 # 23 # min.insync.replicas 24 # Let the value of this setting be M. Data is considered committed when 25 # it is written to at least M replicas (which are then considered in-sync 26 # and belong to the in-sync replica set, or ISR). In any other case, the 27 # write operation returns an error. Then: 28 # 1. If up to M-N replicas -- out of the N (see default.replication.factor 29 # below) that the channel data is written to -- become unavailable, 30 # operations proceed normally. 31 # 2. If more replicas become unavailable, Kafka cannot maintain an ISR set 32 # of M, so it stops accepting writes. Reads work without issues. The 33 # channel becomes writeable again when M replicas get in-sync. 34 # 35 # min.insync.replicas = M---設置一個M值(例如1<M<N,查看下面的default.replication.factor) 36 # 數據提交時會寫入至少M個副本(這些數據而後會被同步而且歸屬到in-sync 副本集合或ISR)。 37 # 其它狀況,寫入操做會返回一個錯誤。接下來: 38 # 1)若是channel寫入的數據多達N-M個副本變的不可用,操做能夠正常執行。 39 # 2)若是有更多的副本不可用,Kafka不能夠維護一個有M數量的ISR集合,所以Kafka中止接收寫操做。Channel只有當同步M個副本後才能夠從新能夠寫。 40 - KAFKA_MIN_INSYNC_REPLICAS=2 41 # 42 # default.replication.factor 43 # Let the value of this setting be N. A replication factor of N means that 44 # each channel will have its data replicated to N brokers. These are the 45 # candidates for the ISR set of a channel. As we noted in the 46 # min.insync.replicas section above, not all of these brokers have to be 47 # available all the time. In this sample configuration we choose a 48 # default.replication.factor of K-1 (where K is the total number of brokers in 49 # our Kafka cluster) so as to have the largest possible candidate set for 50 # a channel's ISR. We explicitly avoid setting N equal to K because 51 # channel creations cannot go forward if less than N brokers are up. If N 52 # were set equal to K, a single broker going down would mean that we would 53 # not be able to create new channels, i.e. the crash fault tolerance of 54 # the ordering service would be non-existent. 55 # 56 # 設置一個值N,N<K。 57 # 設置replication factor參數爲N表明着每一個channel都保存N個副本的數據到Kafka的代理上。 58 # 這些都是一個channel的ISR集合的候選。 59 # 如同在上邊min.insync.replicas section設置部分所描述的,不是全部的代理(orderer)在任什麼時候候都是可用的。 60 # N的值必須小於K,若是少於N個代理的話,channel的建立是不能成功的。 61 # 所以,若是設置N的值爲K,一個代理失效後,那麼區塊鏈網絡將不能再建立新的channel---orderering service的crash容錯也就不存在了。 62 - KAFKA_DEFAULT_REPLICATION_FACTOR=3 63 # 64 # zookeper.connect 65 # Point to the set of Zookeeper nodes comprising a ZK ensemble. 66 # 指向Zookeeper節點的集合,其中包含ZK的集合。 67 - KAFKA_ZOOKEEPER_CONNECT=zookeeper1:2181,zookeeper2:2181,zookeeper3:2181 68 # 69 # zookeeper.connection.timeout.ms 70 # The max time that the client waits to establish a connection to 71 # Zookeeper. If not set, the value in zookeeper.session.timeout.ms (below) 72 # is used. 73 #- KAFKA_ZOOKEEPER_CONNECTION_TIMEOUT_MS = 6000 74 # 75 # zookeeper.session.timeout.ms 76 #- KAFKA_ZOOKEEPER_SESSION_TIMEOUT_MS = 6000 77 # 78 # socket.request.max.bytes 79 # The maximum number of bytes in a socket request. ATTN: If you set this 80 # env var, make sure to update `brokerConfig.Producer.MaxMessageBytes` in 81 # `newBrokerConfig()` in `fabric/orderer/kafka/config.go` accordingly. 82 #- KAFKA_SOCKET_REQUEST_MAX_BYTES=104857600 # 100 * 1024 * 1024 B 83 # 84 # message.max.bytes 85 # The maximum size of envelope that the broker can receive. 86 # 87 # 在configtx.yaml中會設置最大的區塊大小(參考configtx.yaml中AbsoluteMaxBytes參數)。 88 # 每一個區塊最大有Orderer.AbsoluteMaxBytes個字節(不包括頭部),假定這裏設置的值爲A(目前99)。 89 # message.max.bytes和replica.fetch.max.bytes應該設置一個大於A。 90 # 爲header增長一些緩衝區空間---1MB已經足夠大。上述不一樣設置值之間知足以下關係: 91 # Orderer.AbsoluteMaxBytes < replica.fetch.max.bytes <= message.max.bytes 92 # (更完整的是,message.max.bytes應該嚴格小於socket.request.max.bytes的值,socket.request.max.bytes的值默認被設置爲100MB。 93 # 若是想要區塊的大小大於100MB,須要編輯fabric/orderer/kafka/config.go文件裏硬編碼的值brokerConfig.Producer.MaxMessageBytes, 94 # 修改後從新編譯源碼獲得二進制文件,這種設置是不建議的。) 95 - KAFKA_MESSAGE_MAX_BYTES=103809024 # 99 * 1024 * 1024 B 96 # 97 # replica.fetch.max.bytes 98 # The number of bytes of messages to attempt to fetch for each channel. 99 # This is not an absolute maximum, if the fetched envelope is larger than 100 # this value, the envelope will still be returned to ensure that progress 101 # can be made. The maximum message size accepted by the broker is defined 102 # via message.max.bytes above. 103 # 104 # 試圖爲每一個通道獲取的消息的字節數。 105 # 這不是絕對最大值,若是獲取的信息大於這個值,則仍然會返回信息,以確保能夠取得進展。 106 # 代理所接受的最大消息大小是經過上一條message.max.bytes定義的。 107 - KAFKA_REPLICA_FETCH_MAX_BYTES=103809024 # 99 * 1024 * 1024 B 108 # 109 # unclean.leader.election.enable 110 # Data consistency is key in a blockchain environment. We cannot have a 111 # leader chosen outside of the in-sync replica set, or we run the risk of 112 # overwriting the offsets that the previous leader produced, and --as a 113 # result-- rewriting the blockchain that the orderers produce. 114 # 數據一致性在區塊鏈環境中是相當重要的。 115 # 咱們不能從in-sync 副本(ISR)集合以外選取channel leader, 116 # 不然咱們將會面臨對於以前的leader產生的offsets覆蓋的風險, 117 # 這樣的結果是,orderers產生的區塊可能會從新寫入區塊鏈。 118 - KAFKA_UNCLEAN_LEADER_ELECTION_ENABLE=false 119 # 120 # log.retention.ms 121 # Until the ordering service in Fabric adds support for pruning of the 122 # Kafka logs, time-based retention should be disabled so as to prevent 123 # segments from expiring. (Size-based retention -- see 124 # log.retention.bytes -- is disabled by default so there is no need to set 125 # it explicitly.) 126 # 127 # 除非orderering service對Kafka日誌的修剪增長支持, 128 # 不然須要關閉基於時間的日誌保留方式而且避免分段到期 129 # (基於大小的日誌保留方式log.retention.bytes在寫本文章時在Kafka中已經默認關閉,所以不須要再次明確設置這個配置)。 130 - KAFKA_LOG_RETENTION_MS=-1 131 ports: 132 - "9092:9092" 133 extra_hosts: 134 - "zookeeper1:x.x.x.x" 135 - "zookeeper2:x.x.x.xx" 136 - "zookeeper3:x.x.x.xxx" 137 - "kafka1:xx.x.x.x" 138 - "kafka2:xx.x.x.xx" 139 - "kafka3:xx.x.x.xxx" 140 - "kafka4:xx.x.x.xxxx"
kafka至少須要4臺服務器來構成集羣,這是爲了知足crash容錯的最小節點數。若是有4個代理,那麼能夠容錯一個代理崩潰,一個代理中止服務後,channel仍然能夠繼續讀寫,新的channel能夠被建立。
如配置信息中註釋所示,最小寫入同步的副本數量大於1,即最小爲2;
如配置信息中註釋所示,默認副本保存channel信息的數量,必須大於最小寫入同步的副本數量,即最小值3;
若要保證容錯,即kafka集羣最少須要4臺服務器來保證,此時容許一個代理出現問題。
orderer
orderer服務節點配置信息相對於solo有些變化,主要是新增了部分參數,以下所示:
1 # Copyright IBM Corp. All Rights Reserved. 2 # 3 # SPDX-License-Identifier: Apache-2.0 4 # 5 6 version: '2' 7 8 services: 9 10 orderer0.example.com: 11 container_name: orderer0.example.com 12 image: hyperledger/fabric-orderer 13 environment: 14 - CORE_VM_DOCKER_HOSTCONFIG_NETWORKMODE=example_default 15 - ORDERER_GENERAL_LOGLEVEL=error 16 # - ORDERER_GENERAL_LOGLEVEL=debug 17 - ORDERER_GENERAL_LISTENADDRESS=0.0.0.0 18 - ORDERER_GENERAL_LISTENPORT=7050 19 #- ORDERER_GENERAL_GENESISPROFILE=ExampleOrdererGenesis 20 - ORDERER_GENERAL_GENESISMETHOD=file 21 - ORDERER_GENERAL_GENESISFILE=/var/hyperledger/orderer/orderer.genesis.block 22 - ORDERER_GENERAL_LOCALMSPID=ExampleMSP 23 - ORDERER_GENERAL_LOCALMSPDIR=/var/hyperledger/orderer/msp 24 #- ORDERER_GENERAL_LEDGERTYPE=ram 25 #- ORDERER_GENERAL_LEDGERTYPE=file 26 # enabled TLS 27 - ORDERER_GENERAL_TLS_ENABLED=false 28 - ORDERER_GENERAL_TLS_PRIVATEKEY=/var/hyperledger/orderer/tls/server.key 29 - ORDERER_GENERAL_TLS_CERTIFICATE=/var/hyperledger/orderer/tls/server.crt 30 - ORDERER_GENERAL_TLS_ROOTCAS=[/var/hyperledger/orderer/tls/ca.crt] 31 32 - ORDERER_KAFKA_RETRY_LONGINTERVAL=10s 33 - ORDERER_KAFKA_RETRY_LONGTOTAL=100s 34 - ORDERER_KAFKA_RETRY_SHORTINTERVAL=1s 35 - ORDERER_KAFKA_RETRY_SHORTTOTAL=30s 36 - ORDERER_KAFKA_VERBOSE=true 37 - ORDERER_KAFKA_BROKERS=[xx.x.x.x:9092,xx.x.x.xx:9092,xx.x.x.xxx:9092,xx.x.x.xxxx:9092] 38 working_dir: /opt/gopath/src/github.com/hyperledger/fabric 39 command: orderer 40 volumes: 41 - ../config/channel-artifacts/genesis.block:/var/hyperledger/orderer/orderer.genesis.block 42 - ../config/crypto-config/ordererOrganizations/example.com/orderers/orderer0.example.com/msp:/var/hyperledger/orderer/msp 43 - ../config/crypto-config/ordererOrganizations/example.com/orderers/orderer0.example.com/tls/:/var/hyperledger/orderer/tls 44 networks: 45 default: 46 aliases: 47 - example 48 ports: 49 - 7050:7050 50 extra_hosts: 51 - "kafka1:xx.x.x.x" 52 - "kafka2:xx.x.x.xx" 53 - "kafka3:xx.x.x.xxx" 54 - "kafka4:xx.x.x.xxxx"
啓動順序
基於zookeeper和kafka的集羣方案,啓動順序主要跟隨依賴,先啓動zookeeper集羣,隨後啓動kafka集羣,最後啓動orderer集羣。
各位若是還有不明白的地方,能夠參考官方demo,具體能夠在fabric/bddtests目錄中找到dc-orderer-kafka-base.yml和dc-orderer-kafka.yml兩個配置文件。
關於15樓romeovan的回覆,提至此處,以便更多朋友可以看到:
還有個問題,想提醒下各位,在hyperledger的zookeeper分佈式部署到多臺服務器的時候,須要在docker-compose的zookeeper配置中修改參數:
ZOO_SERVERS後面加上quorumListenOnAllIPs=true,否則極可能會拋出異常:This ZooKeeper instance is not currently serving requests
由於不設置這個參數,zookeeper只會訪問第一塊網卡即127.0.0.1會把此網卡的端口開啓監聽,不會開啓實際ip的網卡的端口監聽
參閱18樓Will.Hu使用TLS跑通的方案,該層主已提供git地址,以下: