爲了提高系統的性能,進一步提升系統的吞吐能力,最近公司不少系統都在進行異步化改造。在異步化改造的過程當中,確定會比之前碰到更多的多線程問題,上週就碰到ZooKeeper客戶端異步化過程當中的一個死鎖問題,這裏說明下。java
一般ZooKeeper對於同一個API,提供了同步和異步兩種調用方式。
同步接口很容易理解,使用方法以下:node
ZooKeeper zk = new ZooKeeper(...); List children = zk.getChildren( path, true );
異步接口就相對複雜一點,使用方法以下:apache
ZooKeeper zk = new ZooKeeper(...); zk.getChildren( path, true, new AsyncCallback.Children2Callback() { @Override public void proce***esult( int rc, String path, Object ctx, List children, Stat stat ) { System.out.println( "Recive the response." ); } }, null);
咱們能夠看到,異步調用中,須要註冊一個Children2Callback,並實現回調方法:proce***esult。服務器
上週碰到這樣的問題:應用註冊了對某znode子節點列表變化的監聽,邏輯是在接受到ZooKeeper服務器節點列表變動通知(EventType.NodeChildrenChanged)的時候,會從新獲取一次子節點列表。以前,他們是使用同步接口,整個應用能夠正常運行,可是此次異步化改造後,出現了詭異現象,可以收到子節點的變動通知,可是沒法從新獲取子節點列表了。session
下面,我首先把應用以前使用同步接口的邏輯代碼,用一個簡單的demo來演示下,以下:多線程
package book.chapter05; import java.io.IOException; import java.util.List; import java.util.concurrent.CountDownLatch; import org.apache.zookeeper.CreateMode; import org.apache.zookeeper.KeeperException; import org.apache.zookeeper.WatchedEvent; import org.apache.zookeeper.Watcher; import org.apache.zookeeper.Watcher.Event.EventType; import org.apache.zookeeper.ZooDefs.Ids; import org.apache.zookeeper.ZooKeeper; import org.apache.zookeeper.Watcher.Event.KeeperState; /** * ZooKeeper API 獲取子節點列表,使用同步(sync)接口。 * @author <a href="mailto:nileader@gmail.com">銀時</a> */ public class ZooKeeper_GetChildren_API_Sync_Usage implements Watcher { private CountDownLatch connectedSemaphore = new CountDownLatch( 1 ); private static CountDownLatch _semaphore = new CountDownLatch( 1 ); private ZooKeeper zk; ZooKeeper createSession( String connectString, int sessionTimeout, Watcher watcher ) throws IOException { ZooKeeper zookeeper = new ZooKeeper( connectString, sessionTimeout, watcher ); try { connectedSemaphore.await(); } catch ( InterruptedException e ) { } return zookeeper; } /** create path by sync */ void createPath_sync( String path, String data, CreateMode createMode ) throws IOException, KeeperException, InterruptedException { if ( zk == null ) { zk = this.createSession( "domain1.book.zookeeper:2181", 5000, this ); } zk.create( path, data.getBytes(), Ids.OPEN_ACL_UNSAFE, createMode ); } /** Get children znodes of path and set watches */ List getChildren( String path ) throws KeeperException, InterruptedException, IOException{ System.out.println( "===Start to get children znodes.===" ); if ( zk == null ) { zk = this.createSession( "domain1.book.zookeeper:2181", 5000, this ); } return zk.getChildren( path, true ); } public static void main( String[] args ) throws IOException, InterruptedException { ZooKeeper_GetChildren_API_Sync_Usage sample = new ZooKeeper_GetChildren_API_Sync_Usage(); String path = "/get_children_test"; try { sample.createPath_sync( path, "", CreateMode.PERSISTENT ); sample.createPath_sync( path + "/c1", "", CreateMode.PERSISTENT ); List childrenList = sample.getChildren( path ); System.out.println( childrenList ); //Add a new child znode to test watches event notify. sample.createPath_sync( path + "/c2", "", CreateMode.PERSISTENT ); _semaphore.await(); } catch ( KeeperException e ) { System.err.println( "error: " + e.getMessage() ); e.printStackTrace(); } } /** * Process when receive watched event */ @Override public void process( WatchedEvent event ) { System.out.println( "Receive watched event:" + event ); if ( KeeperState.SyncConnected == event.getState() ) { if( EventType.None == event.getType() && null == event.getPath() ){ connectedSemaphore.countDown(); }else if( event.getType() == EventType.NodeChildrenChanged ){ //children list changed try { System.out.println( this.getChildren( event.getPath() ) ); _semaphore.countDown(); } catch ( Exception e ) {} } } } }
輸出結果以下:dom
Receive watched event:WatchedEvent state:SyncConnected type:None path:null ===Start to get children znodes.=== [c1] Receive watched event:WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/get_children_test ===Start to get children znodes.=== [c1, c2]
在上面這個程序中,咱們首先建立了一個父節點:/get_children_test,以及一個子節點:/get_children_test/c1。而後調用getChildren的同步接口來獲取/get_children_test節點下的全部子節點,調用的同時註冊一個watches。以後,咱們繼續向/get_children_test節點建立子節點:/get_children_test/c2,這個時候,由於咱們以前咱們註冊了一個watches,所以,一旦此時有子節點被建立,ZooKeeperServer就會向客戶端發出「子節點變動」的通知,因而,客戶端能夠再次調用getChildren方法來獲取新的子節點列表。異步
這個例子固然是可以正常運行的。如今,咱們進行異步化改造,以下:ide
package book.chapter05; import java.io.IOException; import java.util.List; import java.util.concurrent.CountDownLatch; import org.apache.zookeeper.AsyncCallback; import org.apache.zookeeper.CreateMode; import org.apache.zookeeper.KeeperException; import org.apache.zookeeper.WatchedEvent; import org.apache.zookeeper.Watcher; import org.apache.zookeeper.Watcher.Event.EventType; import org.apache.zookeeper.ZooDefs.Ids; import org.apache.zookeeper.data.Stat; import org.apache.zookeeper.ZooKeeper; import org.apache.zookeeper.Watcher.Event.KeeperState; /** * ZooKeeper API 獲取子節點列表,使用異步(ASync)接口。 * @author <a href="mailto:nileader@gmail.com">銀時</a> */ public class ZooKeeper_GetChildren_API_ASync_Usage_Deadlock implements Watcher { private CountDownLatch connectedSemaphore = new CountDownLatch( 1 ); private static CountDownLatch _semaphore = new CountDownLatch( 1 ); private ZooKeeper zk; ZooKeeper createSession( String connectString, int sessionTimeout, Watcher watcher ) throws IOException { ZooKeeper zookeeper = new ZooKeeper( connectString, sessionTimeout, watcher ); try { connectedSemaphore.await(); } catch ( InterruptedException e ) { } return zookeeper; } /** create path by sync */ void createPath_sync( String path, String data, CreateMode createMode ) throws IOException, KeeperException, InterruptedException { if ( zk == null ) { zk = this.createSession( "domain1.book.zookeeper:2181", 5000, this ); } zk.create( path, data.getBytes(), Ids.OPEN_ACL_UNSAFE, createMode ); } /** Get children znodes of path and set watches */ void getChildren( String path ) throws KeeperException, InterruptedException, IOException{ System.out.println( "===Start to get children znodes.===" ); if ( zk == null ) { zk = this.createSession( "domain1.book.zookeeper:2181", 5000, this ); } final CountDownLatch _semaphore_get_children = new CountDownLatch( 1 ); zk.getChildren( path, true, new AsyncCallback.Children2Callback() { @Override public void proce***esult( int rc, String path, Object ctx, List children, Stat stat ) { System.out.println( "Get Children znode result: [response code: " + rc + ", param path: " + path + ", ctx: " + ctx + ", children list: " + children + ", stat: " + stat ); _semaphore_get_children.countDown(); } }, null); _semaphore_get_children.await(); } public static void main( String[] args ) throws IOException, InterruptedException { ZooKeeper_GetChildren_API_ASync_Usage_Deadlock sample = new ZooKeeper_GetChildren_API_ASync_Usage_Deadlock(); String path = "/get_children_test"; try { sample.createPath_sync( path, "", CreateMode.PERSISTENT ); sample.createPath_sync( path + "/c1", "", CreateMode.PERSISTENT ); //Get children and register watches. sample.getChildren( path ); //Add a new child znode to test watches event notify. sample.createPath_sync( path + "/c2", "", CreateMode.PERSISTENT ); _semaphore.await(); } catch ( KeeperException e ) { System.err.println( "error: " + e.getMessage() ); e.printStackTrace(); } } /** * Process when receive watched event */ @Override public void process( WatchedEvent event ) { System.out.println( "Receive watched event:" + event ); if ( KeeperState.SyncConnected == event.getState() ) { if( EventType.None == event.getType() && null == event.getPath() ){ connectedSemaphore.countDown(); }else if( event.getType() == EventType.NodeChildrenChanged ){ //children list changed try { this.getChildren( event.getPath() ); _semaphore.countDown(); } catch ( Exception e ) { e.printStackTrace(); } } } } }
輸出結果以下:性能
Receive watched event:WatchedEvent state:SyncConnected type:None path:null ===Start to get children znodes.=== Get Children znode result: [response code: 0, param path: /get_children_test, ctx: null, children list: [c1], stat: 555,555,1373931727380,1373931727380,0,1,0,0,0,1,556 Receive watched event:WatchedEvent state:SyncConnected type:NodeChildrenChanged path:/get_children_test ===Start to get children znodes.===
在上面這個demo中,執行邏輯和以前的同步版本基本一致,惟一有區別的地方在於獲取子節點列表的過程異步化了。這樣一改造,問題就出來了,整個程序在進行第二次獲取節點列表的時候,卡住了。和應用方確認了,以前同步版本歷來沒有出現過這個現象的,因此開始排查這個異步化中哪裏會阻塞。
這裏,咱們重點講解在ZooKeeper客戶端中,須要處理來自服務端的兩類事件通知:一類是Watches時間通知,另外一類則是異步接口調用的響應。值得一提的是,在ZooKeeper的客戶端線程模型中,這兩個事件由同一個線程處理,而且是串行處理。具體能夠本身查看事件處理的核心類:org.apache.zookeeper.ClientCnxn.EventThread。