mycat分片規則

時間 2019-12-05

標籤 mycat 分片規則简体版

原文原文鏈接

 
 配置：schema文件rule字段，rule文件name字段 

 
 （1）分片枚舉：sharding-by-intfile 

 
 （2）主鍵範圍：auto-sharding-long 

 
 （3）一致性hash：sharding-by-murmur 

 
 （4）字符串hash解析：sharding-by-stringhash 

 
 （5）按日期（天）分片：sharding-by-date 

 
 （6）按單月小時拆分：sharding-by-hour 

 
 （6）天然月分片：sharding-by-month 

 
 --------常見的10種分片方法-------- 

  一、 
 枚舉法 

 
 <tableRule name="sharding-by-intfile"> 

 
     <rule> 

 
       <columns>user_id</columns> 

 
       <algorithm>hash-int</algorithm> 

 
     </rule> 

 
   </tableRule> 

 
 <function name="hash-int" class="io.mycat.route.function.PartitionByFileMap"> 

 
     <property name="mapFile">partition-hash-int.txt</property> 

 
     <property name="type">0</property> 

 
     <property name="defaultNode">0</property> 

 
   </function> 

 
 理解： 

 
 切分規則根據文件(partition-hash-int.txt)。此種分片規則理解爲枚舉分區，會比較適合於取值固定的場合，好比說性別（0,1），省份（固定值）。 

 
 優勢： 

 
 用逗號分隔能夠把多個值放在一個分區裏面。 

 
 缺點： 

 
 其餘非枚舉狀況不適合。 

 
 枚舉分區：sharding-by-intfile 

  二、 
 範圍約定 

 
 <tableRule name="auto-sharding-long"> 

 
     <rule> 

 
       <columns>user_id</columns> 

 
       <algorithm>rang-long</algorithm> 

 
     </rule> 

 
   </tableRule> 

 
 <function name="rang-long" class="io.mycat.route.function.AutoPartitionByLong"> 

 
     <property name="mapFile">autopartition-long.txt</property> 

 
   </function> 

 
 理解： 

 
 切分規則根據文件(autopartition-long.txt)。一種範圍切分的方式，制定基準列的取值範圍，而後把這一範圍的全部數據都放到一個DN上面。 

 
 優勢： 

 
 適用於總體數量可知或總數量爲固定值的狀況。 

 
 缺點： 

 
 dn劃分節點是事先建好的，須要擴展時比較麻煩。 

 
 潛在的問題，若是在短期發生海量的順序插入操做，而每個DN（分庫）設定的數量比較高(好比說一個DN設定的放1000W條數據),那麼在這個時候,會出現某一個DN（分庫）IO壓力很是高，而其餘幾個DN（分庫）徹底沒有IO操做，就會出現相似於DB中常見的熱塊/熱盤的現象。 

  三、 
 求模法 

 
 <tableRule name="mod-long"> 

 
     <rule> 

 
       <columns>user_id</columns> 

 
       <algorithm>mod-long</algorithm> 

 
     </rule> 

 
   </tableRule> 

 
   <function name="mod-long" class="io.mycat.route.function.PartitionByMod"> 

 
    <!-- how many data nodes  --> 

 
     <property name="count">3</property> 

 
   </function> 

 
 理解： 

 
 切分規則根據配置中輸入的數值n。此種分片規則將數據分紅n份（一般dn節點也爲n），從而將數據均勻的分佈於各節點上。 

 
 優勢： 

 
 這種策略能夠很好的分散數據庫寫的壓力。比較適合於單點查詢的情景。 

 
 缺點： 

 
 一旦出現了範圍查詢，就須要MyCAT去合併結果，當數據量偏高的時候，這種跨庫查詢+合併結果消耗的時間有可能會增長不少，尤爲是還出現了order by的時候。 

  四、 
 固定分片hash算法 

 
 <tableRule name="rule1"> 

 
     <rule> 

 
       <columns>user_id</columns> 

 
       <algorithm>func1</algorithm> 

 
     </rule> 

 
 </tableRule> 

 
   <function name="func1" class="io.mycat.route.function.PartitionByLong"> 

 
     <property name="partitionCount">2,1</property> 

 
     <property name="partitionLength">256,512</property> 

 
   </function> 

 
 理解： 

 
 切分規則根據配置中輸入的數值對。上面columns 標識將要分片的表字段，algorithm 分片函數，partitionCount 分片個數列表，partitionLength 分片範圍列表。（均分時比求模法更靈活） 

 
 分區長度：默認爲最大2^n=1024 ,即最大支持1024分區 

 
 約束 :count,length兩個數組的長度必須是一致的。 

 
 優勢： 

 
 這種策略比較靈活，能夠均勻分配也能夠非均勻分配，各節點的分配比例和容量大小由count,length兩個參數決定。 

 
 缺點： 

 
 跟求模法相似。 

  五、 
 日期列分區法 

 
 <tableRule name="sharding-by-date"> 

 
       <rule> 

 
         <columns>create_time</columns> 

 
         <algorithm>sharding-by-date</algorithm> 

 
       </rule> 

 
    </tableRule>  

 
 <function name="sharding-by-date" class="io.mycat.route.function..PartitionByDate"> 

 
    <property name="dateFormat">yyyy-MM-dd</property> 

 
     <property name="sBeginDate">2014-01-01</property> 

 
     <property name="sPartionDay">10</property> 

 
   </function> 

 
 理解： 

 
 切分規則根據配置中輸入的各項值。配置中配置了格式，開始日期，分區天數，即默認從開始日期算起，分隔10天一個分區。 

  六、 
 通配取模 

 
 <tableRule name="sharding-by-pattern"> 

 
       <rule> 

 
         <columns>user_id</columns> 

 
         <algorithm>sharding-by-pattern</algorithm> 

 
       </rule> 

 
    </tableRule> 

 
 <function name="sharding-by-pattern" class="io.mycat.route.function.PartitionByPattern"> 

 
     <property name="patternValue">256</property> 

 
     <property name="defaultNode">2</property> 

 
     <property name="mapFile">partition-pattern.txt</property> 

 
   </function> 

 
 理解： 

 
 切分規則根據配置中輸入的數值以及文件（partition-pattern.txt）。patternValue 即求模基數，defaoultNode 默認節點，若是不配置了默認，則默認是0即第一個結點。配置文件中，1-32 即表明id%256後分布的範圍，若是在1-32則在分區1，其餘類推，若是id非數字數據，則會分配在defaoultNode 默認節點配置文件中，1-32 即表明id%256後分布的範圍，若是在1-32則在分區1，其餘類推，若是id非數字數據，則會分配在defaoultNode 默認節點。 

 
 優勢： 

 
 這種策略能夠很好的分散數據庫寫的壓力。比較適合於單點查詢的情景。 

 
 缺點： 

 
 一旦出現了範圍查詢，就須要MyCAT去合併結果，當數據量偏高的時候，這種跨庫查詢+合併結果消耗的時間有可能會增長不少，尤爲是還出現了order by的時候。 

  七、 
 ASCII求模通配 

 
 <tableRule name="sharding-by-prefixpattern"> 

 
       <rule> 

 
         <columns>user_id</columns> 

 
         <algorithm>sharding-by-prefixpattern</algorithm> 

 
       </rule> 

 
    </tableRule> 

 
 <function name="sharding-by-pattern" class="io.mycat.route.function.PartitionByPrefixPattern"> 

 
     <property name="patternValue">256</property> 

 
     <property name="prefixLength">5</property> 

 
     <property name="mapFile">partition-pattern.txt</property> 

 
   </function> 

 
 理解： 

 
 切分規則根據配置中輸入的數值及文件（partition-pattern.txt）。patternValue 即求模基數，prefixLength ASCII 截取的位數。此種方式相似方式6通配取模只不過採起的是將列種獲取前prefixLength位列全部ASCII碼的和進行求模sum%patternValue ,獲取的值，在通配範圍內的也就是分片數。 

  八、 
 編程指定 

 
 <tableRule name="sharding-by-substring"> 

 
       <rule> 

 
         <columns>user_id</columns> 

 
         <algorithm>sharding-by-substring</algorithm> 

 
       </rule> 

 
    </tableRule> 

 
 <function name="sharding-by-substring" class="io.mycat.route.function.PartitionDirectBySubString"> 

 
     <property name="startIndex">0</property> <!-- zero-based --> 

 
     <property name="size">2</property> 

 
     <property name="partitionCount">8</property> 

 
     <property name="defaultPartition">0</property> 

 
   </function> 

 
 理解： 

 
 此方法爲直接根據字符子串（必須是數字）計算分區號（由應用傳遞參數，顯式指定分區號）。 

 
 例如id=05-100000002在此配置中表明根據id中從startIndex=0，開始，截取siz=2位數字即05，05就是獲取的分區，若是沒傳默認分配到defaultPartition。 

  九、 
 字符串拆分hash解析 

 
 <tableRule name="sharding-by-stringhash"> 

 
       <rule> 

 
         <columns>user_id</columns> 

 
         <algorithm>sharding-by-stringhash</algorithm> 

 
       </rule> 

 
    </tableRule> 

 
 <function name="sharding-by-substring" class="io.mycat.route.function.PartitionByString"> 

 
     <property name=length>512</property> <!-- zero-based --> 

 
     <property name="count">2</property> 

 
     <property name="hashSlice">0:2</property> 

 
   </function> 

 
 理解： 

 
 函數中length表明字符串hash求模基數，count分區數，hashSlice hash預算位 

 
 即根據子字符串 hash運算。 

  十、 
 一致性hash 

 
 <tableRule name="sharding-by-murmur"> 

 
       <rule> 

 
         <columns>user_id</columns> 

 
         <algorithm>murmur</algorithm> 

 
       </rule> 

 
    </tableRule> 

 
 <function name="murmur" class="io.mycat.route.function.PartitionByMurmurHash"> 

 
       <property name="seed">0</property><!-- 默認是0--> 

 
       <property name="count">2</property><!-- 要分片的數據庫節點數量，必須指定，不然無法分片—> 

 
       <property name="virtualBucketTimes">160</property><!-- 一個實際的數據庫節點被映射爲這麼多虛擬節點，默認是160倍，也就是虛擬節點數是物理節點數的160倍--> 

 
       <!-- 

 
       <property name="weightMapFile">weightMapFile</property> 

 
                      節點的權重，沒有指定權重的節點默認是1。以properties文件的格式填寫，以從0開始到count-1的整數值也就是節點索引爲key，以節點權重值爲值。全部權重值必須是正整數，不然以1代替 --> 

 
       <!-- 

 
       <property name="bucketMapPath">/etc/mycat/bucketMapPath</property> 

 
                       用於測試時觀察各物理節點與虛擬節點的分佈狀況，若是指定了這個屬性，會把虛擬節點的murmur hash值與物理節點的映射按行輸出到這個文件，沒有默認值，若是不指定，就不會輸出任何東西 --> 

 
   </function> 

 
 優勢： 

 
 一致性hash預算有效解決了分佈式數據的擴容問題，前1-9中id規則都多少存在數據擴容難題，而10規則解決了數據擴容難點 

 
 上述整理的分片規則，部分驗證、詳細的理解以及優缺點信息還未補全，但願能與你們共同窗習探討填補空缺。 

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。