在0.98版本中,默認的compaction算法換成了ExploringCompactionPolicy,以前是RatioBasedCompactionPolicy java
ExploringCompactionPolicy繼承RatioBasedCompactionPolicy,重寫了applyCompactionPolicy方法,applyCompactionPolicy是對minor compaction的選擇文件的策略算法。 算法
applyCompactionPolicy方法內容: app
public List<StoreFile> applyCompactionPolicy(final List<StoreFile> candidates, boolean mightBeStuck, boolean mayUseOffPeak, int minFiles, int maxFiles) { //此ratio爲後面算法使用,可設置非高峯時間段的ratio(默認:5.0)從而合併更多的數據 final double currentRatio = mayUseOffPeak ? comConf.getCompactionRatioOffPeak() : comConf.getCompactionRatio(); // Start off choosing nothing. List<StoreFile> bestSelection = new ArrayList<StoreFile>(0); List<StoreFile> smallest = mightBeStuck ? new ArrayList<StoreFile>(0) : null; long bestSize = 0; long smallestSize = Long.MAX_VALUE; int opts = 0, optsInRatio = 0, bestStart = -1; // for debug logging // Consider every starting place. for (int start = 0; start < candidates.size(); start++) { // Consider every different sub list permutation in between start and end with min files. for (int currentEnd = start + minFiles - 1; currentEnd < candidates.size(); currentEnd++) { List<StoreFile> potentialMatchFiles = candidates.subList(start, currentEnd + 1); // Sanity checks if (potentialMatchFiles.size() < minFiles) { continue; } if (potentialMatchFiles.size() > maxFiles) { continue; } // Compute the total size of files that will // have to be read if this set of files is compacted. long size = getTotalStoreSize(potentialMatchFiles); // Store the smallest set of files. This stored set of files will be used // if it looks like the algorithm is stuck. if (mightBeStuck && size < smallestSize) { smallest = potentialMatchFiles; smallestSize = size; } if (size > comConf.getMaxCompactSize()) { continue; } ++opts; if (size >= comConf.getMinCompactSize() && !filesInRatio(potentialMatchFiles, currentRatio)) { continue; } ++optsInRatio; if (isBetterSelection(bestSelection, bestSize, potentialMatchFiles, size, mightBeStuck)) { bestSelection = potentialMatchFiles; bestSize = size; bestStart = start; } } } if (bestSelection.size() == 0 && mightBeStuck) { LOG.debug("Exploring compaction algorithm has selected " + smallest.size() + " files of size "+ smallestSize + " because the store might be stuck"); return new ArrayList<StoreFile>(smallest); } LOG.debug("Exploring compaction algorithm has selected " + bestSelection.size() + " files of size " + bestSize + " starting at candidate #" + bestStart + " after considering " + opts + " permutations with " + optsInRatio + " in ratio"); return new ArrayList<StoreFile>(bestSelection);
從代碼得知,主要算法以下: less
private boolean filesInRatio(final List<StoreFile> files, final double currentRatio) { if (files.size() < 2) { return true; } long totalFileSize = getTotalStoreSize(files); for (StoreFile file : files) { long singleFileSize = file.getReader().length(); long sumAllOtherFileSizes = totalFileSize - singleFileSize; if (singleFileSize > sumAllOtherFileSizes * currentRatio) { return false; } } return true; }
private boolean isBetterSelection(List<StoreFile> bestSelection, long bestSize, List<StoreFile> selection, long size, boolean mightBeStuck) { if (mightBeStuck && bestSize > 0 && size > 0) { // Keep the selection that removes most files for least size. That penaltizes adding // large files to compaction, but not small files, so we don't become totally inefficient // (might want to tweak that in future). Also, given the current order of looking at // permutations, prefer earlier files and smaller selection if the difference is small. final double REPLACE_IF_BETTER_BY = 1.05; double thresholdQuality = ((double)bestSelection.size() / bestSize) * REPLACE_IF_BETTER_BY; return thresholdQuality < ((double)selection.size() / size); } // Keep if this gets rid of more files. Or the same number of files for less io. return selection.size() > bestSelection.size() || (selection.size() == bestSelection.size() && size < bestSize); }
主要算法至此結束,下面說說其餘細節及其優化部分: ide
步驟6的ratio默認值是1.2,可是打開了非高峯時間段的優化時,能夠有不一樣的值,非高峯的ratio默認值是5.0,此優化目的是爲了在業務低估時能夠合併更多的數據,目前此優化只能是天的小說時間段,還不算靈活。 優化
算法中關於mightBeStuck的邏輯部分,這個參數是用來表示是否有可能compaction會被卡住,它的狀態是 待選文件數 - 正在作compaction的文件數 + futureFiles(默認值是0,有正在作compaction的文件時是1) >= hbase.hstore.blockingStoreFiles (默認是10,此配置在flush中也會用到,之後分析flush的時候會補充),若是是true時: this
mightBeStuck的優化部分,至關於保證在不少的文件數的狀況下,也能夠選出一個最小解去作compaction,而不用再讓文件繼續增加下去直到有一個合適的組合出現。 spa
此算法跟RatioBasedCompactionPolicy的區別,簡單的說就是RatioBasedCompactionPolicy是簡單的從頭至尾遍歷StoreFile列表,遇到一個符合Ratio條件的序列就選定執行Compaction。而ExploringCompactionPolicy則是從頭至尾遍歷的同時記錄下當前最優,而後從中選擇一個全局最優列表。 debug