聊聊storm trident的state

本文主要研究一下storm trident的statehtml

StateType

storm-2.0.0/storm-client/src/jvm/org/apache/storm/trident/state/StateType.javajava

public enum StateType {
    NON_TRANSACTIONAL,
    TRANSACTIONAL,
    OPAQUE
}
  • StateType有三種類型,NON_TRANSACTIONAL非事務性,TRANSACTIONAL事務性,OPAQUE不透明事務
  • 對應的spout也有三類,non-transactional、transactional以及opaque transactional

State

storm-2.0.0/storm-client/src/jvm/org/apache/storm/trident/state/State.java數據庫

/**
 * There's 3 different kinds of state:
 *
 * 1. non-transactional: ignores commits, updates are permanent. no rollback. a cassandra incrementing state would be like this 2.
 * repeat-transactional: idempotent as long as all batches for a txid are identical 3. opaque-transactional: the most general kind of state.
 * updates are always done based on the previous version of the value if the current commit = latest stored commit Idempotent even if the
 * batch for a txid can change.
 *
 * repeat transactional is idempotent for transactional spouts opaque transactional is idempotent for opaque or transactional spouts
 *
 * Trident should log warnings when state is idempotent but updates will not be idempotent because of spout
 */
// retrieving is encapsulated in Retrieval interface
public interface State {
    void beginCommit(Long txid); // can be null for things like partitionPersist occuring off a DRPC stream

    void commit(Long txid);
}
  • non-transactional,忽略commits,updates是持久的,沒有rollback,cassandra的incrementing state屬於這個類型;at-most或者at-least once語義
  • repeat-transactional,簡稱transactional,要求無論是否replayed,同一個batch的txid始終相同,並且裏頭的tuple也不變,一個tuple只屬於一個batch,各個batch之間不會重疊;對於state更新來講,replay遇到相同的txid,便可跳過;在數據庫須要較少的state,可是容錯性較差,保證exactly once語義
  • opaque-transactional,簡稱opaque,是用的比較多的一類,它的容錯性比transactional強,它不要求一個tuple始終在同一個batch/txid,也就是說容許一個tuple在這個batch處理失敗,可是在其餘batch中處理成功,可是它能夠保證每一個tuple只在某一個batch中exactly成功處理一次;OpaqueTridentKafkaSpout就是這個類型的實現,它能容忍kafka節點丟失的錯誤;對於state更新來講,replay遇到相同的txid,則須要基於prevValue使用當前的值覆蓋掉;在數據庫須要更多空間來存儲state,可是容錯性好,保證exactly once語義

MapState

storm-2.0.0/storm-client/src/jvm/org/apache/storm/trident/state/map/MapState.javaapache

public interface MapState<T> extends ReadOnlyMapState<T> {
    List<T> multiUpdate(List<List<Object>> keys, List<ValueUpdater> updaters);

    void multiPut(List<List<Object>> keys, List<T> vals);
}
  • MapState繼承了ReadOnlyMapState接口,而ReadOnlyMapState則繼承了State接口
  • 這裏主要舉MapState的幾個實現類分析一下

NonTransactionalMap

storm-2.0.0/storm-client/src/jvm/org/apache/storm/trident/state/map/NonTransactionalMap.java緩存

public class NonTransactionalMap<T> implements MapState<T> {
    IBackingMap<T> _backing;

    protected NonTransactionalMap(IBackingMap<T> backing) {
        _backing = backing;
    }

    public static <T> MapState<T> build(IBackingMap<T> backing) {
        return new NonTransactionalMap<T>(backing);
    }

    @Override
    public List<T> multiGet(List<List<Object>> keys) {
        return _backing.multiGet(keys);
    }

    @Override
    public List<T> multiUpdate(List<List<Object>> keys, List<ValueUpdater> updaters) {
        List<T> curr = _backing.multiGet(keys);
        List<T> ret = new ArrayList<T>(curr.size());
        for (int i = 0; i < curr.size(); i++) {
            T currVal = curr.get(i);
            ValueUpdater<T> updater = updaters.get(i);
            ret.add(updater.update(currVal));
        }
        _backing.multiPut(keys, ret);
        return ret;
    }

    @Override
    public void multiPut(List<List<Object>> keys, List<T> vals) {
        _backing.multiPut(keys, vals);
    }

    @Override
    public void beginCommit(Long txid) {
    }

    @Override
    public void commit(Long txid) {
    }
}
  • NonTransactionalMap包裝了IBackingMap,beginCommit及commit方法都不作任何操做
  • multiUpdate方法構造List<T> ret,而後使用IBackingMap的multiPut來實現

TransactionalMap

storm-2.0.0/storm-client/src/jvm/org/apache/storm/trident/state/map/TransactionalMap.javajvm

public class TransactionalMap<T> implements MapState<T> {
    CachedBatchReadsMap<TransactionalValue> _backing;
    Long _currTx;

    protected TransactionalMap(IBackingMap<TransactionalValue> backing) {
        _backing = new CachedBatchReadsMap(backing);
    }

    public static <T> MapState<T> build(IBackingMap<TransactionalValue> backing) {
        return new TransactionalMap<T>(backing);
    }

    @Override
    public List<T> multiGet(List<List<Object>> keys) {
        List<CachedBatchReadsMap.RetVal<TransactionalValue>> vals = _backing.multiGet(keys);
        List<T> ret = new ArrayList<T>(vals.size());
        for (CachedBatchReadsMap.RetVal<TransactionalValue> retval : vals) {
            TransactionalValue v = retval.val;
            if (v != null) {
                ret.add((T) v.getVal());
            } else {
                ret.add(null);
            }
        }
        return ret;
    }

    @Override
    public List<T> multiUpdate(List<List<Object>> keys, List<ValueUpdater> updaters) {
        List<CachedBatchReadsMap.RetVal<TransactionalValue>> curr = _backing.multiGet(keys);
        List<TransactionalValue> newVals = new ArrayList<TransactionalValue>(curr.size());
        List<List<Object>> newKeys = new ArrayList();
        List<T> ret = new ArrayList<T>();
        for (int i = 0; i < curr.size(); i++) {
            CachedBatchReadsMap.RetVal<TransactionalValue> retval = curr.get(i);
            TransactionalValue<T> val = retval.val;
            ValueUpdater<T> updater = updaters.get(i);
            TransactionalValue<T> newVal;
            boolean changed = false;
            if (val == null) {
                newVal = new TransactionalValue<T>(_currTx, updater.update(null));
                changed = true;
            } else {
                if (_currTx != null && _currTx.equals(val.getTxid()) && !retval.cached) {
                    newVal = val;
                } else {
                    newVal = new TransactionalValue<T>(_currTx, updater.update(val.getVal()));
                    changed = true;
                }
            }
            ret.add(newVal.getVal());
            if (changed) {
                newVals.add(newVal);
                newKeys.add(keys.get(i));
            }
        }
        if (!newKeys.isEmpty()) {
            _backing.multiPut(newKeys, newVals);
        }
        return ret;
    }

    @Override
    public void multiPut(List<List<Object>> keys, List<T> vals) {
        List<TransactionalValue> newVals = new ArrayList<TransactionalValue>(vals.size());
        for (T val : vals) {
            newVals.add(new TransactionalValue<T>(_currTx, val));
        }
        _backing.multiPut(keys, newVals);
    }

    @Override
    public void beginCommit(Long txid) {
        _currTx = txid;
        _backing.reset();
    }

    @Override
    public void commit(Long txid) {
        _currTx = null;
        _backing.reset();
    }
}
  • TransactionalMap採起的是CachedBatchReadsMap<TransactionalValue>,這裏泛型使用的是TransactionalValue,beginCommit會設置當前的txid,重置_backing,commit的時候會重置txid,而後重置_backing
  • multiUpdate方法中判斷若是_currTx已經存在值,且該值!retval.cached(即不是本次事務中multiPut進去的),那麼不會更新該值(skip the update),使用newVal = val
  • multiPut方法構造批量的TransactionalValue,而後使用CachedBatchReadsMap.multiPut(List<List<Object>> keys, List<T> vals)方法,該方法更新值以後會更新到緩存

OpaqueMap

storm-2.0.0/storm-client/src/jvm/org/apache/storm/trident/state/map/OpaqueMap.javaide

public class OpaqueMap<T> implements MapState<T> {
    CachedBatchReadsMap<OpaqueValue> _backing;
    Long _currTx;

    protected OpaqueMap(IBackingMap<OpaqueValue> backing) {
        _backing = new CachedBatchReadsMap(backing);
    }

    public static <T> MapState<T> build(IBackingMap<OpaqueValue> backing) {
        return new OpaqueMap<T>(backing);
    }

    @Override
    public List<T> multiGet(List<List<Object>> keys) {
        List<CachedBatchReadsMap.RetVal<OpaqueValue>> curr = _backing.multiGet(keys);
        List<T> ret = new ArrayList<T>(curr.size());
        for (CachedBatchReadsMap.RetVal<OpaqueValue> retval : curr) {
            OpaqueValue val = retval.val;
            if (val != null) {
                if (retval.cached) {
                    ret.add((T) val.getCurr());
                } else {
                    ret.add((T) val.get(_currTx));
                }
            } else {
                ret.add(null);
            }
        }
        return ret;
    }

    @Override
    public List<T> multiUpdate(List<List<Object>> keys, List<ValueUpdater> updaters) {
        List<CachedBatchReadsMap.RetVal<OpaqueValue>> curr = _backing.multiGet(keys);
        List<OpaqueValue> newVals = new ArrayList<OpaqueValue>(curr.size());
        List<T> ret = new ArrayList<T>();
        for (int i = 0; i < curr.size(); i++) {
            CachedBatchReadsMap.RetVal<OpaqueValue> retval = curr.get(i);
            OpaqueValue<T> val = retval.val;
            ValueUpdater<T> updater = updaters.get(i);
            T prev;
            if (val == null) {
                prev = null;
            } else {
                if (retval.cached) {
                    prev = val.getCurr();
                } else {
                    prev = val.get(_currTx);
                }
            }
            T newVal = updater.update(prev);
            ret.add(newVal);
            OpaqueValue<T> newOpaqueVal;
            if (val == null) {
                newOpaqueVal = new OpaqueValue<T>(_currTx, newVal);
            } else {
                newOpaqueVal = val.update(_currTx, newVal);
            }
            newVals.add(newOpaqueVal);
        }
        _backing.multiPut(keys, newVals);
        return ret;
    }

    @Override
    public void multiPut(List<List<Object>> keys, List<T> vals) {
        List<ValueUpdater> updaters = new ArrayList<ValueUpdater>(vals.size());
        for (T val : vals) {
            updaters.add(new ReplaceUpdater<T>(val));
        }
        multiUpdate(keys, updaters);
    }

    @Override
    public void beginCommit(Long txid) {
        _currTx = txid;
        _backing.reset();
    }

    @Override
    public void commit(Long txid) {
        _currTx = null;
        _backing.reset();
    }

    static class ReplaceUpdater<T> implements ValueUpdater<T> {
        T _t;

        public ReplaceUpdater(T t) {
            _t = t;
        }

        @Override
        public T update(Object stored) {
            return _t;
        }
    }
}
  • OpaqueMap採起的是CachedBatchReadsMap<OpaqueValue>,這裏泛型使用的是OpaqueValue,beginCommit會設置當前的txid,重置_backing,commit的時候會重置txid,而後重置_backing
  • 與TransactionalMap的不一樣,這裏在multiPut的時候,使用的是ReplaceUpdater,而後調用multiUpdate強制覆蓋
  • multiUpdate方法與TransactionalMap的不一樣,它是基於prev值來進行update的,算出newVal

小結

  • trident嚴格按batch的順序更新state,好比txid爲3的batch必須在txid爲2的batch處理完以後才能處理
  • state分三種類型,分別是non-transactional、transactional、opaque transactional,對應的spout也是這三種類型ui

    • non-transactional沒法保證exactly once,它多是at-least once或者at-most once;其state計算參考NonTransactionalMap,對於beginCommit及commit操做都無處理
    • transactional類型可以保證exactly once,可是要求比較嚴格,要同一個batch的txid及tuple在replayed的時候仍然保持一致,所以容錯性差一點,可是它的state計算相對簡單,參考TransactionalMap,遇到同一個txid的值,skip掉便可
    • opaque transactional類型也可以保證exactly once,它容許一個tuple處理失敗以後,出如今其餘batch中處理,於是容錯性好,可是state計算要多存儲prev值,參考OpaqueMap,遇到同一個txid的值,使用prev值跟當前值進行覆蓋
  • trident將保證exactly once的state的計算都封裝好了,使用的時候,在persistentAggregate傳入相應的StateFactory便可,支持多種StateType的factory能夠選擇使用StateType屬性,經過傳入不一樣的參數構造不一樣transactional的state;也能夠經過實現StateFactory自定義實現state factory,另外也能夠經過繼承BaseQueryFunction來自定義stateQuery查詢,自定義更新的話,能夠繼承BaseStateUpdater,而後經過partitionPersist傳入

doc

相關文章
相關標籤/搜索