1、錯誤1
————————————————
版權聲明:本文爲CSDN博主「AllInCode」的原創文章,遵循 CC 4.0 BY-SA 版權協議,轉載請附上原文出處連接及本聲明。
原文連接:http://www.javashuo.com/article/p-glkpylpg-mh.html
1.一、錯誤描述
ZooKeeper Server(「FOLLOWER和LEADER」都有)的日誌中顯示有如下所示錯誤:
2016-05-14 15:33:01,818 [myid:2] - ERROR [CommitProcessor:2:NIOServerCnxn@178] -
Unexpected Exception:
java.nio.channels.CancelledKeyException
at sun.nio.ch.SelectionKeyImpl.ensureValid(SelectionKeyImpl.java:55)
at sun.nio.ch.SelectionKeyImpl.interestOps(SelectionKeyImpl.java:59)
at org.apache.zookeeper.server.NIOServerCnxn.sendBuffer(NIOServerCnxn.ja
va:151)
at org.apache.zookeeper.server.NIOServerCnxn.sendResponse(NIOServerCnxn.
java:1081)
at org.apache.zookeeper.server.FinalRequestProcessor.proce***equest(Fina
lRequestProcessor.java:170)
at org.apache.zookeeper.server.quorum.CommitProcessor.run(CommitProcessor.java:74)
1.二、錯誤緣由分析
ZooKeeper Server發送回覆時,Socket鏈接已經被關閉。java
1.三、錯誤解決
當ZooKeeper Server發送回覆時,增長一個「sk.isValid()」的判斷。以上實際上是一個bug,在ZooKeeper 3.4.8版本中獲得修復。apache
1.四、其餘
這個錯誤在上線「使用ZooKeeper獲取MQ地址方案」以前也存在。
2、錯誤2
2.一、錯誤描述
ZooKeeper Server(「FOLLOWER」)日誌中顯示有如下所示錯誤,出現該錯誤後,做爲「FOLLOWER」的該ZooKeeper Server在一段時間內會中止工做:
2016-05-15 04:04:40,569 [myid:1] - WARN [SyncThread:1:FileTxnLog@334] - fsync-ing the write ahead log in SyncThread:1 took 2243ms which will adversely effect operation latency. See the
ZooKeeper troubleshooting guide
————————————————
2016-05-14 15:32:50,764 [myid:1] - WARN [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@89] - Exception when following the leader
java.io.EOFException
at java.io.DataInputStream.readInt(DataInputStream.java:375)
at org.apache.jute.BinaryInputArchive.readInt(BinaryInputArchive.java:63)
at org.apache.zookeeper.server.quorum.QuorumPacket.deserialize(QuorumPacket.java:83)
at org.apache.jute.BinaryInputArchive.readRecord(BinaryInputArchive.java:103)
at org.apache.zookeeper.server.quorum.Learner.readPacket(Learner.java:153)
at org.apache.zookeeper.server.quorum.Follower.followLeader(Follower.java:85)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:786)
2016-05-14 15:32:50,764 [myid:1] - INFO [QuorumPeer[myid=1]/0:0:0:0:0:0:0:0:2181:Follower@166] - shutdown called
java.lang.Exception: shutdown Follower
at org.apache.zookeeper.server.quorum.Follower.shutdown(Follower.java:166)
at org.apache.zookeeper.server.quorum.QuorumPeer.run(QuorumPeer.java:790)
相應的ZooKeeper Server(「LEADER」)日誌中顯示有以下所示錯誤:
2016-05-14 15:32:42,605 [myid:3] - WARN [SyncThread:3:FileTxnLog@334] - fsync-i
ng the write ahead log in SyncThread:3 took 3041ms which will adversely effect o
peration latency. See the ZooKeeper troubleshooting guidesession
2016-05-14 15:32:50,764 [myid:3] - WARN [QuorumPeer[myid=3]/0:0:0:0:0:0:0:0:218
1:LearnerHandler@687] - Closing connection to peer due to transaction timeout.
2016-05-14 15:32:50,764 [myid:3] - WARN [LearnerHandler-/10.110.20.23:39390:Lea
rnerHandler@646] - *** GOODBYE /10.110.20.23:39390 ****
2016-05-14 15:32:50,764 [myid:3] - WARN [LearnerHandler-/10.110.20.23:39390:Lea
rnerHandler@658] - Ignoring unexpected exception
java.lang.InterruptedException
at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireInterrup
tibly(AbstractQueuedSynchronizer.java:1199)
at java.util.concurrent.locks.ReentrantLock.lockInterruptibly(ReentrantL
ock.java:312)
at java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java
:294)
at org.apache.zookeeper.server.quorum.LearnerHandler.shutdown(LearnerHan
dler.java:656)
at org.apache.zookeeper.server.quorum.LearnerHandler.run(LearnerHandler.
java:649)ide
2.二、錯誤緣由分析
「FOLLOWER」在跟「LEADER」同步時,fsync操做時間過長,致使超時。測試
2.三、錯誤解決
增長「tickTime」或者「initLimit和syncLimit」的值,或者二者都增大。ui
2.四、其餘
這個錯誤在上線「使用ZooKeeper獲取MQ地址方案」以前也存在,只不過沒有這麼高頻率,而上線了「使用ZooKeeper獲取MQ地址方案」以後,ZooKeeper Server之間的同步數據量增大,ZooKeeper Server的負載加劇,於是最終致使高頻率出現上述錯誤。
————————————————.net
有一些網友給了一些解決方案,就是在zk配置中增長時間單元,使得鏈接的超時時間變大,從而保證同步延遲不會超過session的超時時間。因而我嘗試修改了配置:rest
tickTime=4000日誌
initLimit=20orm
syncLimit=10
tickTime是zk中的時間單元,其餘時間設置都是按照其倍數來肯定的,這裏是4s。原來的配置是
tickTime=2000
initLimit=10
syncLimit=5
我都增長了一倍。這樣,若是zk的forceSync消耗的時間不是特別的長,仍是能在session過時以前返回,這樣鏈接勉強還能夠維持。可是實際應用中,仍是會不斷的報同步延遲太高的警告:
fsync-ing the write ahead log in SyncThread:0 took 8001ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide
去查了下storm和kafka的日誌,仍是動不動就檢測到disconnected、session time out等日誌,雖然服務基本不會掛,但說明問題仍是沒有解決。
最後無奈之下采用了一個網友的建議:在zoo.cfg配置文件中新增一項配置
forceSync=no
的確解決了問題,再也不出現同步延遲過高的問題,日誌裏再也不有以前的warn~
固然從該配置的意思上,咱們就知道這並非一個完美的解決方案,由於它將默認爲yes的forceSync改成了no。這誠然能夠解決同步延遲的問題,由於它使得forceSync再也不執行!!!
咱們能夠這樣理解:zk的forceSync默認爲yes,意思是,每次zk接收到一些數據以後,因爲forceSync=yes,因此會馬上去將當前的狀態信息同步到磁盤日誌文件中,同步完成以後纔會給出應答。在正常的狀況下,這沒有是什麼問題,可是在個人測試環境下,因爲某種我未知的緣由,使得寫入日誌到磁盤很是的慢,因而在這期間,zk的日誌出現了
fsync-ing the write ahead log in SyncThread:0 took 8001ms which will adversely effect operation latency. See the ZooKeeper troubleshooting guide 而後因爲同步日誌耗時過久,鏈接得不到回覆,若是已經超過了鏈接的超時時間設置,那麼鏈接(好比kafka)會認爲,該鏈接已經失效,將從新申請創建~因而kafka和storm不斷的報錯,不斷的重連,偶爾還會掛掉。