flink1.7 checkpoint源碼分析

初始化state類
//org.apache.flink.streaming.runtime.tasks.StreamTask#initializeState
initializeState();
private void initializeState() throws Exception {

StreamOperator<?>[] allOperators = operatorChain.getAllOperators();

for (StreamOperator<?> operator : allOperators) {
if (null != operator) {
operator.initializeState();
}
}
}
operator.initializeState() 調用的方法路徑 org.apache.flink.streaming.api.operators.AbstractStreamOperator#initializeState() ,全部的操做流類都繼承該類,同時也沒有重寫這個方法。
public final void initializeState() throws Exception {
////這裏會調用狀態後端,裏面很重要
1. final StreamOperatorStateContext context =
streamTaskStateManager.streamOperatorStateContext(
getOperatorID(),
getClass().getSimpleName(),
this,
keySerializer,
streamTaskCloseableRegistry,
metrics);
...
|
streamTaskStateManager.streamOperatorStateContext(......)調用方法的路徑org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl#streamOperatorStateContext
......
// -------------- Keyed State Backend 這裏是重點 關於checkpoint--------------
keyedStatedBackend = keyedStatedBackend(
keySerializer,
operatorIdentifierText,
prioritizedOperatorSubtaskStates,
streamTaskCloseableRegistry,
metricGroup);

// -------------- Operator State Backend 這裏是重點 關於checkpoint --------------
operatorStateBackend = operatorStateBackend(
operatorIdentifierText,
prioritizedOperatorSubtaskStates,
streamTaskCloseableRegistry);
......
keyedStatedBackend() 這個方法最裏面是調用了 org.apache.flink.streaming.api.operators.BackendRestorerProcedure#attemptCreateAndRestore
private T attemptCreateAndRestore(Collection restoreState) throws Exception {
......
// create a new, empty backend.
final T backendInstance = instanceSupplier.get();

// attempt to restore from snapshot (or null if no state was checkpointed).
backendInstance.restore(restoreState);
......
}
backendInstance.restore(restoreState)調用的方法路徑org.apache.flink.runtime.state.DefaultOperatorStateBackend#restore
// registeredOperatorStates這個對象是核心
...
PartitionableListState<?> listState = registeredOperatorStates.get(restoredSnapshot.getName());

if (null == listState) {
listState = new PartitionableListState<>(restoredMetaInfo);
//重點,這裏就是存儲了快照狀態類
//********************************************************************
registeredOperatorStates.put(listState.getStateMetaInfo().getName(), listState);
//********************************************************************
} else {
// TODO with eager state registration in place, check here for serializer migration strategies
}
...
triggerCheckpoint 將定時觸發執行checkpoint,而上面是是初始化的執行邏輯
apache

定時快照state類
org.apache.flink.runtime.checkpoint.CheckpointCoordinator#triggerCheckpoint(long, boolean) 
......
// send the messages to the tasks that trigger their checkpoint 我猜想這裏就是遠程發送觸發checkpoint的步驟 這裏進行的數據文件的生成奶奶的
for (Execution execution: executions) {
execution.triggerCheckpoint(checkpointID, timestamp, checkpointOptions);
}
......
execution.triggerCheckpoint調用路徑 org.apache.flink.runtime.executiongraph.Execution#triggerCheckpoint
/**
* Trigger a new checkpoint on the task of this execution.
* @param checkpointId of th checkpoint to trigger
* @param timestamp of the checkpoint to trigger
* @param checkpointOptions of the checkpoint to trigger
*/
public void triggerCheckpoint(long checkpointId, long timestamp, CheckpointOptions checkpointOptions) {
      ......
final LogicalSlot slot = assignedResource;
if (slot != null) {
final TaskManagerGateway taskManagerGateway = slot.getTaskManagerGateway();
taskManagerGateway.triggerCheckpoint(attemptId, getVertex().getJobId(), checkpointId, timestamp, checkpointOptions);
}
      .....
}
taskManagerGateway.triggerCheckpoint(......)裏面最終調用路徑 org.apache.flink.runtime.taskexecutor.TaskExecutor#triggerCheckpoint
@Override
public CompletableFuture triggerCheckpoint(
ExecutionAttemptID executionAttemptID,long checkpointId,long checkpointTimestamp,CheckpointOptions checkpointOptions) {
  ......
final Task task = taskSlotTable.getTask(executionAttemptID);
if (task != null) {
task.triggerCheckpointBarrier(checkpointId, checkpointTimestamp, checkpointOptions);

return CompletableFuture.completedFuture(Acknowledge.get());
}
  ......
}
task.triggerCheckpointBarrier(......)調用路徑 org.apache.flink.runtime.taskmanager.Task#triggerCheckpointBarrier
/**
後端

  • Calls the invokable to trigger a checkpoint.
  • 這裏開始出發執行checkpoint,應該算是入口了,會調用org.apache.flink.streaming.runtime.tasks.StreamTask#triggerCheckpoint
  • AsyncCheckpointRunnable 任務在裏面被執行
  • @param checkpointID The ID identifying the checkpoint.
  • @param checkpointTimestamp The timestamp associated with the checkpoint.
  • @param checkpointOptions Options for performing this checkpoint.
    */
    public void triggerCheckpointBarrier(
    final long checkpointID,
    long checkpointTimestamp,
    final CheckpointOptions checkpointOptions) {

    final AbstractInvokable invokable = this.invokable;
    final CheckpointMetaData checkpointMetaData = new CheckpointMetaData(checkpointID, checkpointTimestamp);

    if (executionState == ExecutionState.RUNNING && invokable != null) {

    // build a local closure
    final String taskName = taskNameWithSubtask;
    final SafetyNetCloseableRegistry safetyNetCloseableRegistry =
    FileSystemSafetyNet.getSafetyNetCloseableRegistryForThread();

    Runnable runnable = new Runnable() {
    @Override
    public void run() {
    // set safety net from the task's context for checkpointing thread
    LOG.debug("Creating FileSystem stream leak safety net for {}", Thread.currentThread().getName());
    FileSystemSafetyNet.setSafetyNetCloseableRegistryForThread(safetyNetCloseableRegistry);

    try {
    boolean success = invokable.triggerCheckpoint(checkpointMetaData, checkpointOptions);
    ......
    }
      ......
    }
    };
    //建立線程數爲1的線程池,提交runnable任務運行
    executeAsyncCallRunnable(runnable, String.format("Checkpoint Trigger for %s (%s).", taskNameWithSubtask, executionId));
    }
    }
    invokable.triggerCheckpoint(.....)裏面最終調用的方法鏈以下:
    org.apache.flink.streaming.runtime.tasks.StreamTask#triggerCheckpoint
    org.apache.flink.streaming.runtime.tasks.StreamTask#performCheckpoint
    // we can do a checkpoint

    // All of the following steps happen as an atomic step from the perspective of barriers and
    // records/watermarks/timers/callbacks.
    // We generally try to emit the checkpoint barrier as soon as possible to not affect downstream
    // checkpoint alignments

    // Step (1): Prepare the checkpoint, allow operators to do some pre-barrier work.
    //           The pre-barrier work should be nothing or minimal in the common case.
    operatorChain.prepareSnapshotPreBarrier(checkpointMetaData.getCheckpointId());

    // Step (2): Send the checkpoint barrier downstream 生成狀態數據 存儲數據的對象爲checkpointOptions 尼瑪 今天debug沒有生成數據呦
    operatorChain.broadcastCheckpointBarrier(
    checkpointMetaData.getCheckpointId(),
    checkpointMetaData.getTimestamp(),
    checkpointOptions);

    // Step (3): Take the state snapshot. This should be largely asynchronous, to not
    //           impact progress of the streaming topology
    checkpointState(checkpointMetaData, checkpointOptions, checkpointMetrics);
    checkpointState(......) 裏面最終調用org.apache.flink.streaming.runtime.tasks.StreamTask.CheckpointingOperation#executeCheckpointing()
    重點警惕線.....................................................
    ......
    //調用用戶的快照方法
    for (StreamOperator<?> op : allOperators) {//不一樣的算子對應的子類不同,
    checkpointStreamOperator(op);
    }
    //後面生成數據,哪裏生成數據了,要找到

    //這個run任務好像只生成元數據
    // we are transferring ownership over snapshotInProgressList for cleanup to the thread, active on submit
    AsyncCheckpointRunnable asyncCheckpointRunnable = new AsyncCheckpointRunnable(
    owner,
    operatorSnapshotsInProgress,
    checkpointMetaData,
    checkpointMetrics,
    startAsyncPartNano);

    owner.cancelables.registerCloseable(asyncCheckpointRunnable);
    owner.asyncOperationsThreadPool.submit(asyncCheckpointRunnable;
    ......
  1. checkpointStreamOperator(op);

private void checkpointStreamOperator(StreamOperator<?> op) throws Exception {
if (null != op) {
       //這個構造方法是核心
OperatorSnapshotFutures snapshotInProgress = op.snapshotState(
checkpointMetaData.getCheckpointId(),
checkpointMetaData.getTimestamp(),
checkpointOptions,
storageLocation);
operatorSnapshotsInProgress.put(op.getOperatorID(), snapshotInProgress);
}
}
op.snapshotState()是核心,調用org.apache.flink.streaming.api.operators.AbstractStreamOperator#snapshotState(long, long, org.apache.flink.runtime.checkpoint.CheckpointOptions, org.apache.flink.runtime.state.CheckpointStreamFactory)
注意由於op是子類,有些累實現AbstractStreamOperator有些子類實現AbstractUdfStreamOperator,因此在下面調用snapshotState(snapshotContext)方法時,會根據子類的實現不一樣,調用org.apache.flink.streaming.api.operators.AbstractStreamOperator#snapshotState(org.apache.flink.runtime.state.StateSnapshotContext)
或org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator#snapshotState
AbstractStreamOperator 實現類有94個
AbstractUdfStreamOperator實現類有42個
AbstractUdfStreamOperator繼承AbstractStreamOperator
@Override
public final OperatorSnapshotFutures snapshotState(long checkpointId, long timestamp, CheckpointOptions checkpointOptions,
CheckpointStreamFactory factory) throws Exception {

try (StateSnapshotContextSynchronousImpl snapshotContext = new StateSnapshotContextSynchronousImpl(
checkpointId,
timestamp,
factory,
keyGroupRange,
getContainingTask().getCancelables())) {

//繼承AbstractUdfStreamOperator的操做類會調用用戶的快照方法,繼承AbstractStreamOperator的操做類會調用這個方法,可是這個方法沒有作什麼東西。
snapshotState(snapshotContext);
       //上面調用好用戶的快照方法了,就是肯定了狀態類裏面目前的數據了。
       //下面就是如何訪問到狀態類,講狀態內的數據寫入磁盤了。
snapshotInProgress.setKeyedStateRawFuture(snapshotContext.getKeyedStateStreamFuture());
snapshotInProgress.setOperatorStateRawFuture(snapshotContext.getOperatorStateStreamFuture());
//這裏是生產狀態數據文件
if (null != operatorStateBackend) {
System.out.println(Thread.currentThread().getName()+"::這裏將狀態數據寫入文件中");
snapshotInProgress.setOperatorStateManagedFuture(
operatorStateBackend.snapshot(checkpointId, timestamp, factory, checkpointOptions));
}
       //這裏是生產狀態數據文件
if (null != keyedStateBackend) {
snapshotInProgress.setKeyedStateManagedFuture(
keyedStateBackend.snapshot(checkpointId, timestamp, factory, checkpointOptions));
}
}
return snapshotInProgress;
}
operatorStateBackend.snapshot(checkpointId, timestamp, factory, checkpointOptions))調用路徑org.apache.flink.runtime.state.DefaultOperatorStateBackend#snapshot
謎底就在下面
public RunnableFuture<SnapshotResult > snapshot(
long checkpointId,
long timestamp,
@Nonnull CheckpointStreamFactory streamFactory,
@Nonnull CheckpointOptions checkpointOptions) throws Exception {
long syncStartTime = System.currentTimeMillis();

       //這個是超級關鍵的地方,你想知道如何訪問到用戶函數中的狀態類,就在這裏。
RunnableFuture<SnapshotResult > snapshotRunner =
snapshotStrategy.snapshot(checkpointId, timestamp, streamFactory, checkpointOptions);

snapshotStrategy.logSyncCompleted(streamFactory, syncStartTime);
return snapshotRunner;
}
snapshotStrategy.snapshot(checkpointId, timestamp, streamFactory, checkpointOptions)調用路徑,取決於用戶指定的後端狀態,默認調用路徑以下org.apache.flink.runtime.state.DefaultOperatorStateBackend.DefaultOperatorStateBackendSnapshotStrategy#snapshot
DefaultOperatorStateBackendSnapshotStrategy 是DefaultOperatorStateBackend的內部類
public RunnableFuture<SnapshotResult > snapshot(......) throws IOException {
//貌似數據就存在 registeredOperatorStates對象裏面 其實下面的步驟不用研究,就是將狀態數據寫入文件,主要看看這個registeredOperatorStates是怎麼弄到的
//************重點 registeredOperatorStates   對象
final Map<String, PartitionableListState<?>> registeredOperatorStatesDeepCopies =
new HashMap<>(registeredOperatorStates.size());
final Map<String, BackendWritableBroadcastState<?, ?>> registeredBroadcastStatesDeepCopies =
new HashMap<>(registeredBroadcastStates.size());

ClassLoader snapshotClassLoader = Thread.currentThread().getContextClassLoader();
try {
// eagerly create deep copies of the list and the broadcast states (if any)
// in the synchronous phase, so that we can use them in the async writing.
//entry.getValue() 裏面就是狀態類 將狀態類存儲在新建的map對象中
if (!registeredOperatorStates.isEmpty()) {
for (Map.Entry<String, PartitionableListState<?>> entry : registeredOperatorStates.entrySet()) {
PartitionableListState<?> listState = entry.getValue();
if (null != listState) {
listState = listState.deepCopy();
}
registeredOperatorStatesDeepCopies.put(entry.getKey(), listState);
}
}
//廣播狀態
if (!registeredBroadcastStates.isEmpty()) {
for (Map.Entry<String, BackendWritableBroadcastState<?, ?>> entry : registeredBroadcastStates.entrySet()) {
BackendWritableBroadcastState<?, ?> broadcastState = entry.getValue();
if (null != broadcastState) {
broadcastState = broadcastState.deepCopy();
}
registeredBroadcastStatesDeepCopies.put(entry.getKey(), broadcastState);
}
}
}
api

//這個方法裏面生成了狀態數據文件
        AsyncSnapshotCallable<SnapshotResult<OperatorStateHandle>> snapshotCallable =
            new AsyncSnapshotCallable<SnapshotResult<OperatorStateHandle>>() {


@Override
protected SnapshotResult callInternal() throws Exception {
......
// get the registered operator state infos ...
List operatorMetaInfoSnapshots =
new ArrayList<>(registeredOperatorStatesDeepCopies.size());

for (Map.Entry<String, PartitionableListState<?>> entry :
registeredOperatorStatesDeepCopies.entrySet()) {
operatorMetaInfoSnapshots.add(entry.getValue().getStateMetaInfo().snapshot());
}

// ... get the registered broadcast operator state infos ...
List broadcastMetaInfoSnapshots =
new ArrayList<>(registeredBroadcastStatesDeepCopies.size());

for (Map.Entry<String, BackendWritableBroadcastState<?, ?>> entry :
registeredBroadcastStatesDeepCopies.entrySet()) {
broadcastMetaInfoSnapshots.add(entry.getValue().getStateMetaInfo().snapshot());
}

// ... write them all in the checkpoint stream ...
DataOutputView dov = new DataOutputViewStreamWrapper(localOut);

OperatorBackendSerializationProxy backendSerializationProxy =
new OperatorBackendSerializationProxy(operatorMetaInfoSnapshots, broadcastMetaInfoSnapshots);

backendSerializationProxy.write(dov);

// ... and then go for the states ...

......
}
};

final FutureTask<SnapshotResult > task =
snapshotCallable.toAsyncSnapshotFutureTask(closeStreamOnCancelRegistry);

if (!asynchronousSnapshots) {
task.run();
}

return task;
}
}
從上面咱們能夠看到,狀態類都存放在registeredOperatorStatesDeepCopies這個map中。
用戶可以更新狀態類的數據都是由於這樣訪問到了狀態類
public void initializeState(FunctionInitializationContext context) throws Exception {
......
checkpointedState = context.getOperatorStateStore().getListState(descriptor);
......
}
調用的就是org.apache.flink.runtime.state.DefaultOperatorStateBackend#getListState(org.apache.flink.api.common.state.ListStateDescriptor )
/
* @Description: 返回狀態類的時候,將狀態類放入map對象供後面寫入文件中
* @Param:
* @return:
* @Author: intsmaze
* @Date: 2019/1/18
/
private ListState getListState(
ListStateDescriptor stateDescriptor,
OperatorStateHandle.Mode mode) throws StateMigrationException {
@SuppressWarnings("unchecked")
PartitionableListState previous = (PartitionableListState) accessedStatesByName.get(name);
if (previous != null) {
checkStateNameAndMode(
previous.getStateMetaInfo().getName(),
name,
previous.getStateMetaInfo().getAssignmentMode(),
mode);
return previous;
}
      ......
PartitionableListState partitionableListState = (PartitionableListState) registeredOperatorStates.get(name);

if (null == partitionableListState) {
// no restored state for the state name; simply create new state holder
partitionableListState = new PartitionableListState<>(
new RegisteredOperatorStateBackendMetaInfo<>(
name,
partitionStateSerializer,
mode));
//這裏也會存儲狀態類數據registeredOperatorStates這個對象和DefaultOperatorStateBackendSnapshotStrategy類的快照方法訪問的對象共享
//
**********************************************************
registeredOperatorStates.put(name, partitionableListState);
}
app

相關文章
相關標籤/搜索