hdfs元數據文件(fsimage、edits)清理

Standby namenode (sbn)的EditLogTailer的功能之一就是觸發Active Namenode(nn) rollEditLog。每隔${dfs.ha.tail-edits.period}秒(默認60)秒,EditLogTailer檢測一次當前時間距離上一次roll的時間是否超過${dfs.ha.log-roll.period}秒(默認120),若是超過就經過rpc請求觸發nn 進行rollEditLog。nn經過NameNodeRpcServer接收rpc請求,以後由FSNamesystem處理:java

  1. finalize 當前正在接收寫入的edit文件 : 將edits_inprogress_${begTxid} finaliz 到edits_${begTxid}_${endTxid}
  2. 觸發全部的JournalNode(jn)進行rollEditLog
  3. 建立新的edits文件(edits_inprogress_${endTxid+1})用於記錄新的事務

rollEditLog週期性將事務日誌記錄到一個個獨立的小文件,長此以往,必然會在nn和全部jn上產生大量的文件,一旦checkpoint,這些文件大都沒什麼用,所以須要有一個清理策略。node

sbn在完成checkpoint【參考checkpoint過程】 後會觸發sbn、nn和jn對保留在磁盤上的歷史版本的元數據文件:fsimage文件和edits文件進行清理。清理過程就是根據配置的事物保留策略和鏡像保留策略將超出保留範圍的fsimage文件和edits文件直接刪除。ide

肯定清理範圍並清理

操做的入口爲Fsimage的purgeOldStorage(NameNodeFile nnf)方法,實際管理fsimage文件和edits文件的是NNStorageRetentionManager,由它經過配置計算出哪些文件保留、哪些文件刪除。ui

public static final String  DFS_NAMENODE_NUM_CHECKPOINTS_RETAINED_KEY = "dfs.namenode.num.checkpoints.retained";
public static final int     DFS_NAMENODE_NUM_CHECKPOINTS_RETAINED_DEFAULT = 2;
public static final String  DFS_NAMENODE_NUM_EXTRA_EDITS_RETAINED_KEY = "dfs.namenode.num.extra.edits.retained";
public static final int     DFS_NAMENODE_NUM_EXTRA_EDITS_RETAINED_DEFAULT = 1000000; //1M
public static final String  DFS_NAMENODE_MAX_EXTRA_EDITS_SEGMENTS_RETAINED_KEY = "dfs.namenode.max.extra.edits.segments.retained";
public static final int     DFS_NAMENODE_MAX_EXTRA_EDITS_SEGMENTS_RETAINED_DEFAULT = 10000; // 10k
public NNStorageRetentionManager(
      Configuration conf,
      NNStorage storage,
      LogsPurgeable purgeableLogs,
      StoragePurger purger) {
    this.numCheckpointsToRetain = conf.getInt(
        DFSConfigKeys.DFS_NAMENODE_NUM_CHECKPOINTS_RETAINED_KEY,
        DFSConfigKeys.DFS_NAMENODE_NUM_CHECKPOINTS_RETAINED_DEFAULT);
    this.numExtraEditsToRetain = conf.getLong(
        DFSConfigKeys.DFS_NAMENODE_NUM_EXTRA_EDITS_RETAINED_KEY,
        DFSConfigKeys.DFS_NAMENODE_NUM_EXTRA_EDITS_RETAINED_DEFAULT);
    this.maxExtraEditsSegmentsToRetain = conf.getInt(
        DFSConfigKeys.DFS_NAMENODE_MAX_EXTRA_EDITS_SEGMENTS_RETAINED_KEY,
        DFSConfigKeys.DFS_NAMENODE_MAX_EXTRA_EDITS_SEGMENTS_RETAINED_DEFAULT);
    Preconditions.checkArgument(numCheckpointsToRetain > 0,
        "Must retain at least one checkpoint");
    Preconditions.checkArgument(numExtraEditsToRetain >= 0,
        DFSConfigKeys.DFS_NAMENODE_NUM_EXTRA_EDITS_RETAINED_KEY +
        " must not be negative");

    this.storage = storage;
    this.purgeableLogs = purgeableLogs;
    this.purger = purger;
  }

第一步:肯定fsimage文件的清理範圍。NNStorageRetentionManager經過getImageTxIdToRetain( )方法找出全部的fsimage文件,並按照TxId升序存入集合,根據配置的fsimage文件保留數量(numCheckpointsToRetain)和集合size肯定起始保留範圍minTxId,TxId大於等於minTxId的fsimage文件保留,Txid小於minTxid的fsimage文件將被刪除。this

private long getImageTxIdToRetain(FSImageTransactionalStorageInspector inspector) {
    //找出全部的fsimage文件,根據文件名解析出txid,而後構建出FSImageFile對象
    List<FSImageFile> images = inspector.getFoundImages();
    //使用TreeSet 保證imageTxIds 內保存的txid 按升序排序
    TreeSet<Long> imageTxIds = Sets.newTreeSet();
    for (FSImageFile image : images) {
      imageTxIds.add(image.getCheckpointTxId());
    }

    List<Long> imageTxIdsList = Lists.newArrayList(imageTxIds);
    if (imageTxIdsList.isEmpty()) {
      return 0;
    }
    //imageTxIdsList保存的txid 降序排序,保證全部的fsimage按txid由大到小(生成時間由近及遠)的順序排序,方便肯定刪除的位置
    Collections.reverse(imageTxIdsList);
    int toRetain = Math.min(numCheckpointsToRetain, imageTxIdsList.size());    
    long minTxId = imageTxIdsList.get(toRetain - 1);
    //txid小於minTxId的fsimage文件將被刪除,其餘的保留
    LOG.info("Going to retain " + toRetain + " images with txid >= " +
        minTxId);
    return minTxId;
  }

第二步:清理fsimage文件。NNStorageRetentionManager調用purgeCheckpointsOlderThan( )方法進行fsimage文件清理。遍歷存儲目錄下的每個fsimage文件,只要其Txid小於minTxid,就直接刪除。刪除過程由DeletionStoragePurger完成,先刪除fsimsge文件,而後刪除對應保存md5值的文件。spa

第三步:肯定edits文件的清理範圍。根據第一步算出的minTxid(minImageTxId)、配置的事物保留數量numExtraEditsToRetain(默認1000000)和配置的最大edits文件保留數量maxExtraEditsSegmentsToRetain(默認10000)肯定清理範圍purgeLogsFrom.net

// If fsimage_N is the image we want to keep, then we need to keep
    // all txns > N. We can remove anything < N+1, since fsimage_N
    // reflects the state up to and including N. However, we also
    // provide a "cushion" of older txns that we keep, which is
    // handy for HA, where a remote node may not have as many
    // new images.
    //
    // First, determine the target number of extra transactions to retain based
    // on the configured amount.
    long minimumRequiredTxId = minImageTxId + 1;
    long purgeLogsFrom = Math.max(0, minimumRequiredTxId - numExtraEditsToRetain);
ArrayList<EditLogInputStream> editLogs = new ArrayList<EditLogInputStream>();
    purgeableLogs.selectInputStreams(editLogs, purgeLogsFrom, false);
    Collections.sort(editLogs, new Comparator<EditLogInputStream>() {
      @Override
      public int compare(EditLogInputStream a, EditLogInputStream b) {
        return ComparisonChain.start()
            .compare(a.getFirstTxId(), b.getFirstTxId())
            .compare(a.getLastTxId(), b.getLastTxId())
            .result();
      }
    });

    // Remove from consideration any edit logs that are in fact required.
    while (editLogs.size() > 0 &&
        editLogs.get(editLogs.size() - 1).getFirstTxId() >= minimumRequiredTxId) {
      editLogs.remove(editLogs.size() - 1);
    }

    // Next, adjust the number of transactions to retain if doing so would mean
    // keeping too many segments around.
    while (editLogs.size() > maxExtraEditsSegmentsToRetain) {
      purgeLogsFrom = editLogs.get(0).getLastTxId() + 1;
      editLogs.remove(0);
    }

    // Finally, ensure that we're not trying to purge any transactions that we
    // actually need.
    if (purgeLogsFrom > minimumRequiredTxId) {
      throw new AssertionError("Should not purge more edits than required to "
          + "restore: " + purgeLogsFrom + " should be <= "
          + minimumRequiredTxId);
    }

    purgeableLogs.purgeLogsOlderThan(purgeLogsFrom);

第四步:清理edits文件。rest

清理的時機

sbn:sbn每完成一次checkpoint就會清理一次。日誌

sbn日誌

nn:當sbn在完成checkpoint後會將新的fsimaeg文件上傳到nn,nn經過ImageServlet接收sbn上傳的fsimage文件,以後便會對fsimage調用purgeOldStorage(NameNodeFile nnf)方法進行清理。code

nn日誌

jn:nn在清理過程當中會向全部的jn發送包含清理位置minTxIdToKeep的rpc請求,jn在收到請求後將txid小於minTxIdToKeep的edits文件所有清理。 jn日誌

相關文章
相關標籤/搜索