轉載請註明:@ni掌櫃 nileader@gmail.comgit
問題描述github
公司以前進行了幾回機房容災演習中,常常是模擬一個機房掛掉的場景,把一個機房的網絡切掉,使得這個機房內部網絡通訊正常,與外部的網絡不通。在容災演習過程當中,咱們發現ZK的客戶端應用中出現大量相似這樣的日誌:apache
An exception was thrown while closing send thread for ession 0x for server null, unexpected error, closing socket connection and attempting 服務器
從這個日誌中,紅色部分出現的是null。當時看到這個狀況,以爲,正常狀況正在,這個地方應用出現的是那個被隔離的機房中部署的ZK的機器IP的,可是這裏出現的是null,很是困惑。網絡
具體描述也能夠在這裏查看:https://issues.apache.org/jira/browse/ZOOKEEPER-1480session
問題定位socket
看了下3.4.3及其之前版本的ZooKeeper代碼,發現問題出在這裏,日誌打印的邏輯在這裏:ide
- } catch (Throwable e) {
- if (closing) {
- if (LOG.isDebugEnabled()) {
- // closing so this is expected
- LOG.debug("An exception was thrown while closing send thread for session 0x"
- + Long.toHexString(getSessionId())
- + " : " + e.getMessage());
- }
- break;
- } else {
- // this is ugly, you have a better way speak up
- if (e instanceof SessionExpiredException) {
- LOG.info(e.getMessage() + ", closing socket connection");
- } else if (e instanceof SessionTimeoutException) {
- LOG.info(e.getMessage() + RETRY_CONN_MSG);
- } else if (e instanceof EndOfStreamException) {
- LOG.info(e.getMessage() + RETRY_CONN_MSG);
- } else if (e instanceof RWServerFoundException) {
- LOG.info(e.getMessage());
- } else {
- LOG.warn(
- "Session 0x"
- + Long.toHexString(getSessionId())
- + " for server "
- + clientCnxnSocket.getRemoteSocketAddress()
- + ", unexpected error"
- + RETRY_CONN_MSG, e);
- }
能夠看到,在打印日誌過程,是經過clientCnxnSocket.getRemoteSocketAddress() 來獲取當前鏈接的服務器地址的,那再來看下這個方法:this
- /**
- * Returns the address to which the socket is connected.
- * @return ip address of the remote side of the connection or null if not connected
- */
- @Override
- SocketAddress getRemoteSocketAddress() {
- // a lot could go wrong here, so rather than put in a bunch of code
- // to check for nulls all down the chain let's do it the simple
- // yet bulletproof way
- try {
- return ((SocketChannel) sockKey.channel()).socket()
- .getRemoteSocketAddress();
- } catch (NullPointerException e) {
- return null;
- }
- }
- /**
- * Returns the address of the endpoint this socket is connected to, or
- * <code>null</code> if it is unconnected.
- * @return a <code>SocketAddress</code> reprensenting the remote endpoint of this
- * socket, or <code>null</code> if it is not connected yet.
- * @see #getInetAddress()
- * @see #getPort()
- * @see #connect(SocketAddress, int)
- * @see #connect(SocketAddress)
- * @since 1.4
- */
- public SocketAddress getRemoteSocketAddress() {
- if (!isConnected())
- return null;
- return new InetSocketAddress(getInetAddress(), getPort());
- }
因此,如今基本就能夠定位問題了,若是服務器端非正常關閉socket鏈接(例如容災演習的時候把機房網絡切斷),那麼getRemoteSocketAddress這個方法就會返回null了,也就是日誌中爲何出現null的緣由了。 spa
問題解決
這個日誌輸出對於開發人員來講很是重要,在排查問題過程當中能夠清楚的定位當時是哪臺服務器出現問題,可是這裏一旦輸出null,那麼將無從下手。這裏我作了一些改進,確保出現問題的時候,客戶端可以輸出當前出現問題的服務器IP。在這裏下載補丁:https://github.com/downloads/nileader/taokeeper/getCurrentZooKeeperAddr_for_3.4.3.patch
首先是給org.apache.zookeeper.client.HostProvider類添加兩個接口,分別用於獲取「當前地址列中正在使用的地址序號」和獲取「全部地址列表」。關於ZooKeeper客戶端地址列表獲取和隨機原理,具體能夠查看這個文章《ZooKeeper客戶端地址列表的隨機原理》。
- public interface HostProvider {
- …… ……
- /**
- * Get current index that is connecting or connected.
- * @see ZOOKEEPER-1480:https://issues.apache.org/jira/browse/ZOOKEEPER-1480
- * */
- public int getCurrentIndex();
- /**
- * Get all server address that config when use zookeeper client.
- * @return List
- * @see ZOOKEEPER-1480:https://issues.apache.org/jira/browse/ZOOKEEPER-1480
- */
- public List<InetSocketAddress> getAllServerAddress();
- }
其次是修改org.apache.zookeeper.ClientCnxn類中日誌輸出邏輯:
- /**
- * Get current zookeeper addr that client is connected or connecting.<br>
- * Note:The method will return null if can't not get host ip.
- * */
- private InetSocketAddress getCurrentZooKeeperAddr(){
- try {
- InetSocketAddress addr = null;
- if( null == hostProvider || null == hostProvider.getAllServerAddress() )
- return addr;
- int index = hostProvider.getCurrentIndex();
- if ( index >= 0 ) {
- addr = hostProvider.getAllServerAddress().get( index );
- }
- return addr;
- } catch ( Exception e ) {
- return null;
- }
- }
- …… ……
- //get current ZK host to log
- InetSocketAddress addr = getCurrentZooKeeperAddr();
- LOG.warn(
- "Session 0x"
- + Long.toHexString(getSessionId())
- + " for server ip: " + addr + ", detail conn: "
- + clientCnxnSocket.getRemoteSocketAddress()
- + ", unexpected error"
- + RETRY_CONN_MSG, e);