使用JGroups TCP實現EHCache的集羣

時間 2019-11-07

原文原文鏈接

最近一個項目採用ehcache做爲緩存技術，由於負載須要，使用兩臺服務器作負載均衡，因此須要作緩存的集羣處理，綜合各方面因素，決定使用JGroups的方式，接下來是連續3天的折磨，今天終於搞定，把這個過程總結出來分享，但願相似須要的朋友別再重蹈個人曲折。html

一、首先不要一上來就搜索如何配置，要把基礎環境搭好，這也是網上90%的相似文章中不涉及的。像 http://blog.csdn.net/kindy1022/article/details/6681299 這樣的文章才真正有用，但仍然不夠詳細。接下來是詳細內容：nginx

（1）我使用nginx + tomcat7 + jdk7；bootstrap

（2）ehcache版本爲2.10，建議你們直接使用ehcache-2.10.jar而不要用ehcache-core-xxx.jar+ehcache-terracotta-xxx.jar；緩存

（3）jgroups使用最新的jgroups-3.6.4FINAL.jar，這個容易被忽略，網上不多有人提到，由於有ehcahce-jgroupsreplication-xxx.jar，因此會覺得這就夠了，關鍵啓動還不報錯。另外沒必要降版本；tomcat

（4）ehcache-jgroupsreplication-1.7.jar（就是查看這裏面的源碼時，發現JGroupsCacheReceiver須要jgroups jar包的支持）服務器

二、再說配置文件，這是網上傳訛最多的，一是不講版本，直接貼配置，二是配置自己也有錯誤。建議你們去jgroups和ehcache的官網上看相關的配置，注意是相關的配置，ehcache官網上給出的jgroups也不完整，也沒標版本。這裏必定要注意。建議你們將jgroups的配置使用單獨的配置文件，這樣更合理一些。mybatis

（1）echache配置文件ehcache.xml，首先增長peerproviderapp

<cacheManagerPeerProviderFactory class="net.sf.ehcache.distribution.jgroups.JGroupsCacheManagerPeerProviderFactory"
        properties="jgroups_tcp.xml" />

（2）爲每個須要同步的cache配置listener，固然asynchronousReplicationIntervalMillis不是必須的，默認是1000，bootstrapCacheLoaderFactory也能夠不要負載均衡

<cache name="mybatis_common" overflowToDisk="true" eternal="true"  
        timeToIdleSeconds="300" timeToLiveSeconds="600" maxElementsInMemory="10000"  
        maxElementsOnDisk="100" diskPersistent="true" diskExpiryThreadIntervalSeconds="300"  
        diskSpoolBufferSizeMB="50" memoryStoreEvictionPolicy="LRU">
          <cacheEventListenerFactory class="net.sf.ehcache.distribution.jgroups.JGroupsCacheReplicatorFactory"
              properties="replicateAsynchronously=true, replicatePuts=true,
              replicateUpdates=true, replicateUpdatesViaCopy=false, replicateRemovals=true",
              asynchronousReplicationIntervalMillis=500/>
          <bootstrapCacheLoaderFactory class="net.sf.ehcache.distribution.jgroups.JGroupsBootstrapCacheLoaderFactory" 
              properties="bootstrapAsynchronously=false"/>
</cache>

（3）jgroups_tcp.xml以下：參考 http://www.jgroups.org/manual/index.html#_tcpeclipse

<TCP bind_port="7800" />
<TCPPING timeout="3000"
         initial_hosts="app1_IP[7800],app2_IP[7800]"
         port_range="10"
         num_initial_members="3"/>
<VERIFY_SUSPECT timeout="1500"  />
<pbcast.NAKACK2 use_mcast_xmit="false" gc_lag="100"
               retransmit_timeout="300,600,1200,2400,4800"
               discard_delivered_msgs="true"/>
<pbcast.STABLE stability_delay="1000" desired_avg_gossip="50000"
               max_bytes="400000"/>
<pbcast.GMS print_local_addr="true" join_timeout="3000" shun="false"
               view_bundling="true"/>

究竟是bind_port仍是start_port，官網給出的是bind_port。

三、通常狀況下，這樣就足夠了，可是事有例外，若是仍然不行，看看下面的可能性：

（1）集羣的服務器能不能連通，有沒有防火牆之類

（2）每臺服務器是否是有完整的、惟一的hostname，若是你的hostname有中文，建議改爲英文，若是你剛好使用mac電腦開發測試，那它的電腦名和hostname是兩回事，默認的hostname是localhost，這個不行，要改爲正經的。

（3）如今的eclipse能夠反編譯class文件，而且能夠在class上打斷點debug，在ehcahce-jgroupsreplication-xxx.jar裏找到listener類和JGroupsCacheReceiver，加上斷點，看發送和接收消息是否都被觸發。

Good Luck。