mxnet(github-mxnet)的python接口至關完善,咱們能夠徹底不看C++的代碼就能直接訓練模型,若是咱們要學習它的C++的代碼,從python訓練與預測的模型中能夠看到C++的代碼是怎麼被調用的。上一篇博客中,我已經說明了mshadow的工做原理——mshadow的原理--MXNet;在這一篇中,來講明一下mxnet的訓練過程,看python是調用發哪些C++的接口,但對C++接口的更進一步解釋並無很詳細,具體能夠本身看源碼,後面也可能會有新的博客解釋。html
下面是mxnet訓練的簡單樣例代碼,python調試所用的工具是Wing Pro,C++的調試工具推薦使用Qt Creator,Qt Creator要求有Cmakelist,而後要打開Debug編譯相關的so文件才能調試。node
# -*- coding: utf-8 -*- import mxnet as mx import numpy as np import logging logging.getLogger().setLevel(logging.DEBUG) # product data def productData(Dim, half_len): ''' product data for training or eval Dim : dimension half_len : 2*half_len is the number of training data ''' data = np.append(np.random.uniform(-1, 0, [half_len, Dim]), np.random.uniform(0, 1, [half_len, Dim]), axis = 0) label = np.append(np.zeros(half_len), np.ones(half_len)) return data, label #get the data np.random.seed(1) Dim = 3 train_data,train_label = productData(Dim, 1) eval_data, eval_label = productData(Dim, 1) #data iter batch_size = 1 train_iter = mx.io.NDArrayIter(train_data,train_label, batch_size, shuffle=True) eval_iter = mx.io.NDArrayIter(eval_data, eval_label, batch_size, shuffle=False) #input variable X = mx.sym.Variable('data') Y = mx.symbol.Variable('softmax_label') #netword config fc_1 = mx.sym.FullyConnected(data=X, name='fc1', num_hidden = 2) fc_2 = mx.sym.FullyConnected(data=fc_1, name='fc2', num_hidden = 3) fc_3 = mx.sym.FullyConnected(data=fc_2, name='fc3', num_hidden = 4) lro = mx.sym.SoftmaxOutput(data=fc_3, label=Y, name="softmax") #build the model model = mx.mod.Module( symbol = lro , data_names=['data'], label_names = ['softmax_label']# network structure ) #train the model model.fit(train_iter, eval_iter, optimizer_params={'learning_rate':0.5, 'momentum': 0.9}, num_epoch=1, eval_metric='mse', batch_end_callback = mx.callback.Speedometer(batch_size, 1)) #predict the result pre = model.predict(eval_iter).asnumpy() print np.argmax(pre, axis = 1)
上面的代碼十分簡單,對於mxnet python訓練的人都很容易看明白第一點,在這裏不展開講這些python代碼的具體意義,而講這些代碼是怎麼與mxnet底層的C++代碼交互的,python與C++交互的python庫ctypes,本人用的mxnet版本是0.7,其它版本的代碼結構不會差異太大。python
mx.io.NDArrayIter
沒有引用到C++的函數,當建立一個變量符號(Symbol Variable)時,會引用到MXSymbolCreateVariable
函數。要注意的是調用的python函數若是是mxnet包內的,就會引用包的相應函數,調用的C++函數都會封裝在C_api.h中,對應的函數在./src/c_api
下。調用過程如下:Variable()_python --> MXSymbolCreateVariable()_C++ --> CreateVariable()_C++
。咱們來看一下C++中Symbol類及其與之相關的結構體:c++
/*! * \brief Symbol is used to represent dynamically generated symbolic computation graph. * * This class is used as a tool to generate computation graphs(aka. configuration) of the network. * Symbol is always composite, the head Node is the output node of the symbol. * An atomic symbol can be seen as a special case of the composite symbol with only the head node. */ class Symbol { public: ... protected: // Declare node, internal data structure. struct Node; /*! \brief an entry that represents output data from a node */ struct DataEntry { /*! \brief the source node of this data */ std::shared_ptr<Node> source; /*! \brief index of output from the source. */ uint32_t index; /*! \brief enabled default copy constructor */ DataEntry() {} /*! \brief constructor from index */ DataEntry(std::shared_ptr<Node> source, uint32_t index) : source(source), index(index) {} }; /*! * \brief the head nodes of Symbols * This head is only effective when */ std::vector<DataEntry> heads_; ... } /*! * \brief Node is represents node of an operator in the symbolic graph. * * It stores connection to the inputs to function represented by OperatorProperty * NOTE on data structure: there are three types of node: * - Normal node: contains all the necessary elements of a graph. * - OperatorProperty: the inputs_ is empty, represents an OperatorProperty that has not been applied. * - Variable: the sym_ is nullptr, represents an named Variable of tensors that can be composed. */ struct Symbol::Node { /*! \brief Operator of this node */ std::unique_ptr<OperatorProperty> op; /*! \brief name of the node */ std::string name; /*! \brief inputs to this node */ std::vector<DataEntry> inputs; /*! \brief source node of the current node */ std::shared_ptr<Symbol::Node> backward_source_node; /*! * \brief additional attributes about the node, * Use pointer to save space, as attr can be accessed in a slow way, * not every node will have attributes. */ std::unique_ptr<std::map<std::string, std::string> > attr; /*! *\brief constructor *\param op the OperatorProperty to construct the Node *\param name the name of the symbol */ explicit Node(OperatorProperty *op, const std::string& name) : op(op), name(name) {} /*! *\brief copy constructor constructor */ explicit Node(const Node& other) : name(other.name) { if (other.op != nullptr) { op.reset(other.op->Copy()); } if (other.attr.get() != nullptr) { attr.reset(new std::map<std::string, std::string>(*(other.attr))); } } ~Node() { ... } /*! \return Whether the symbol is atomic */ inline bool is_atomic() const { return inputs.size() == 0 && op != nullptr; } /*! \return Whether it is unit variable */ inline bool is_variable() const { return op == nullptr && !backward_source_node; } /*! \return Whether it is backward op */ inline bool is_backward() const { return backward_source_node.get() != nullptr; } }; /*! \return whwther the symbol is atomic */ inline bool Symbol::is_atomic() const { return heads_[0].source->is_atomic(); }
經過上面的inline bool is_variable()
函數能夠看到variable的特色,建立一個variable也特別簡單,直接建立一個Symbol的並把初始數據壓入到heads_
容器中就能建立,以下:git
Symbol Symbol::CreateVariable(const std::string &name) { Symbol s; s.heads_.push_back(DataEntry(std::make_shared<Node>(nullptr, name), 0)); return s; }
在mxnet中層(mx.sym.FullyConnected\mx.sym.SoftmaxOutput等
)和變量都是Symbol。github
mxnet中的層的種類多是會發生變化的,當用C++寫一個新的層時,都要先註冊到mxnet內核dlmc中,python在載入Symbol模塊時,會動態加載全部的層。下面先來簡單地說明python是如何動態加載的,再來看下mxnet中的python是如何動態加載的。算法
import sys def fib(n): a, b = 0, 1 result = [] while(b<n): result.append(b) a, b = b, a+b print(result) print("load function in here") setattr(sys.modules[__name__], "FIBC", fib)
假如上面的代碼放在load_test.py
中,當import load_test
時會先運行腳本中第一行和最後兩行代碼,最後一行代碼將FIBC
定位到fib
上,因此至關於能夠引用FIBC
函數,結果以下:apache
>>> import load_test load function in here >>> load_test.fib(16) [1, 1, 2, 3, 5, 8, 13] >>> load_test.FIBC(16) [1, 1, 2, 3, 5, 8, 13]
那麼在mxnet的python中是怎麼實現的呢?在導入Symbol模塊時會運行_init_symbol_module()
,這個函數能加載註冊在mxnet內核中的全部Symbol,來看下面兩個函數:api
def _init_symbol_module(): """List and add all the atomic symbol functions to current module.""" plist = ctypes.POINTER(ctypes.c_void_p)() size = ctypes.c_uint() check_call(_LIB.MXSymbolListAtomicSymbolCreators(ctypes.byref(size), ctypes.byref(plist))) module_obj = sys.modules[__name__] module_internal = sys.modules["mxnet._symbol_internal"] for i in range(size.value): hdl = SymbolHandle(plist[i]) function = _make_atomic_symbol_function(hdl) if function.__name__.startswith('_'): setattr(module_internal, function.__name__, function) else: setattr(module_obj, function.__name__, function) def _make_atomic_symbol_function(handle): """Create an atomic symbol function by handle and funciton name.""" name = ctypes.c_char_p() desc = ctypes.c_char_p() key_var_num_args = ctypes.c_char_p() num_args = mx_uint() arg_names = ctypes.POINTER(ctypes.c_char_p)() arg_types = ctypes.POINTER(ctypes.c_char_p)() arg_descs = ctypes.POINTER(ctypes.c_char_p)() ret_type = ctypes.c_char_p() check_call(_LIB.MXSymbolGetAtomicSymbolInfo( handle, ctypes.byref(name), ctypes.byref(desc), ctypes.byref(num_args), ctypes.byref(arg_names), ctypes.byref(arg_types), ctypes.byref(arg_descs), ctypes.byref(key_var_num_args), ctypes.byref(ret_type))) param_str = ctypes2docstring(num_args, arg_names, arg_types, arg_descs) key_var_num_args = py_str(key_var_num_args.value) func_name = py_str(name.value) desc = py_str(desc.value) if key_var_num_args: desc += '\nThis function support variable length of positional input.' doc_str = ('%s\n\n' + '%s\n' + 'name : string, optional.\n' + ' Name of the resulting symbol.\n\n' + 'Returns\n' + '-------\n' + 'symbol: Symbol\n' + ' The result symbol.') doc_str = doc_str % (desc, param_str) extra_doc = "\n" + '\n'.join([x.__doc__ for x in type.__subclasses__(SymbolDoc) if x.__name__ == '%sDoc' % func_name]) doc_str += re.sub(re.compile(" "), "", extra_doc) def creator(*args, **kwargs): """Activation Operator of Neural Net. The parameters listed below can be passed in as keyword arguments. Parameters ---------- name : string, required. Name of the resulting symbol. Returns ------- symbol: Symbol the resulting symbol """ param_keys = [] param_vals = [] symbol_kwargs = {} name = kwargs.pop('name', None) attr = kwargs.pop('attr', None) if key_var_num_args and key_var_num_args not in kwargs: param_keys.append(c_str(key_var_num_args)) param_vals.append(c_str(str(len(args)))) for k, v in kwargs.items(): if isinstance(v, Symbol): symbol_kwargs[k] = v else: param_keys.append(c_str(k)) param_vals.append(c_str(str(v))) # create atomic symbol param_keys = c_array(ctypes.c_char_p, param_keys) param_vals = c_array(ctypes.c_char_p, param_vals) sym_handle = SymbolHandle() check_call(_LIB.MXSymbolCreateAtomicSymbol( handle, mx_uint(len(param_keys)), param_keys, param_vals, ctypes.byref(sym_handle))) if len(args) != 0 and len(symbol_kwargs) != 0: raise TypeError( '%s can only accept input' 'Symbols either as positional or keyword arguments, not both' % func_name) if key_var_num_args and len(symbol_kwargs) != 0: raise ValueError('This function supports variable length of Symbol arguments.\n' + 'Please pass all the input Symbols via positional arguments' + ' instead of keyword arguments.') s = Symbol(sym_handle) attr = AttrScope.current.get(attr) if attr: s._set_attr(**attr) hint = func_name.lower() name = NameManager.current.get(name, hint) s._compose(*args, name=name, **symbol_kwargs) return s creator.__name__ = func_name creator.__doc__ = doc_str return creator
MXSymbolListAtomicSymbolCreators
中獲取以註冊在內核中的OperatorPropertyReg
對象數組。_make_atomic_symbol_function
這個函數用獲取相應Symbol的信息,以及返回一個creator
的對象,能夠看到creator.__name__
是以Symbol的名字來命名的。setattr(module_obj, function.__name__, function)
將剛纔返回的creator
寫入到這個模板中,當導入這個模板後,能夠直接引用creator.__name__
來調用相應的creator(*args, **kwargs)
函數。至於如何向mxnet內核註冊,能夠看下全鏈接層的樣例:數組
DMLC_REGISTER_PARAMETER(FullyConnectedParam); MXNET_REGISTER_OP_PROPERTY(FullyConnected, FullyConnectedProp) .describe("Apply matrix multiplication to input then add a bias.") .add_argument("data", "Symbol", "Input data to the FullyConnectedOp.") .add_argument("weight", "Symbol", "Weight matrix.") .add_argument("bias", "Symbol", "Bias parameter.") .add_arguments(FullyConnectedParam::__FIELDS__()); struct FullyConnectedParam : public dmlc::Parameter<FullyConnectedParam> { int num_hidden; bool no_bias; DMLC_DECLARE_PARAMETER(FullyConnectedParam) { // TODO(bing) add support for boolean DMLC_DECLARE_FIELD(num_hidden).set_lower_bound(1) .describe("Number of hidden nodes of the output."); DMLC_DECLARE_FIELD(no_bias).set_default(false) .describe("Whether to disable bias parameter."); } };
這一段的題目我也不知道叫什麼名字好,其實就是建立一個層的Symbol,但這個Symbol內有Node是與層有關的操做(operator)。下面這幾個層是過程都是同樣的,對於每個層都建立一個相應的Symbol,從上面能夠看到調用這些函數時,其實是調用一個Creator
對象,因此單卡調試python代碼會直接入到creator(*args, **kwargs)
中,咱們繼續看下在這個函數中的操做,咱們以fc_3 = mx.sym.FullyConnected(data=fc_2, name='fc3', num_hidden = 4)
爲例。
#netword config fc_1 = mx.sym.FullyConnected(data=X, name='fc1', num_hidden = 2) fc_2 = mx.sym.FullyConnected(data=fc_1, name='fc2', num_hidden = 3) fc_3 = mx.sym.FullyConnected(data=fc_2, name='fc3', num_hidden = 4) lro = mx.sym.SoftmaxOutput(data=fc_3, label=Y, name="softmax")
有creator(*args, **kwargs)
中先是將參數中的Symbol對象(在這裏是fc_2
)與非Symbol對象分開(定義在FullyConnectedParam
的num_hidden
),將非Symbol對象的參數傳入到C++函數中MXSymbolCreateAtomicSymbol
中建立Symbol,並掛在這個Symbol的heads_[0].source
。
建立了Symbol後,還要裝前一層的Symbol掛在這一層上面,這裏調用s._compose(*args, name=name, **symbol_kwargs)
。這個函數調用了C++中的MXSymbolCompose --> Compose
,Compose
會將是上層的Symbol對象掛在heads_[0].source->inputs
相應位置上,heads_[0].source->inputs
的位置有這個Symbol的heads_[0].source->op->ListArguments
決定的。有這例子中,fc3.heads_[0].source->inputs[0] = fc2
,FullyConnectedProp.ListArguments
以下,其它的空位用NULL
(從上面的is_variable()
能夠看出這裏填充的是variable
)填充,最後返回這個操做Symbol。
std::vector<std::string> ListArguments() const override { if (!param_.no_bias) { return {"data", "weight", "bias"}; } else { return {"data", "weight"}; } }
到運行完lro = mx.sym.SoftmaxOutput(data=fc_3, label=Y, name="softmax")
,咱們能夠獲得一個以下的網絡結構圖,但這還不是計算圖,這裏我將Symbol分爲兩類,一類是層,便是Symbol:OP
;一類是變量,便是Symbol:Var
。
#build the model model = mx.mod.Module( symbol = lro , data_names=['data'], label_names = ['softmax_label']# network structure )
這個是構建一個模型,這個初始化函數我想講的是arg_names = symbol.list_arguments()
,這個涉及到圖的深度優先搜索,調用的是C++內的MXSymbolListArguments
,C++中主要是以下三個函數作了深度優先搜索而後返回變量的列表。
std::vector<std::string> Symbol::ListArguments() const { std::vector<std::string> ret; if (this->is_atomic()) { return heads_[0].source->op->ListArguments(); } else { this->DFSVisit([&ret](const std::shared_ptr<Node> &node) { if (node->is_variable()) { ret.push_back(node->name); } }); return ret; } } template<typename FVisit> inline void Symbol::DFSVisit(FVisit fvisit) const { typedef const std::shared_ptr<Node>* GNode; std::vector<GNode> head_nodes(heads_.size()); std::transform(heads_.begin(), heads_.end(), head_nodes.begin(), [](const DataEntry& e)->GNode { return &e.source; }); graph::PostOrderDFSVisit<GNode, Node*>( head_nodes, [fvisit](GNode n) { fvisit(*n); }, // FVisit [](GNode n)->Node* { return n->get(); }, // HashFunc [](GNode n)->uint32_t { return (*n)->inputs.size() + static_cast<int>((*n)->is_backward()); }, // InDegree [](GNode n, uint32_t index)->GNode { // GetInput if (index < (*n)->inputs.size()) { return &(*n)->inputs.at(index).source; } else { return &(*n)->backward_source_node; } }); } template <typename GNode, typename HashType, typename FVisit, typename HashFunc, typename InDegree, typename GetInput> void PostOrderDFSVisit(const std::vector<GNode>& heads, FVisit fvisit, HashFunc hash, InDegree indegree, GetInput getinput) { std::vector<std::pair<GNode, uint32_t> > stack; std::unordered_set<HashType> visited; for (auto& head : heads) { HashType head_hash = hash(head); if (visited.count(head_hash) == 0) { stack.push_back(std::make_pair(head, 0)); visited.insert(head_hash); } while (!stack.empty()) { std::pair<GNode, uint32_t>& back = stack.back(); if (back.second == indegree(back.first)) { fvisit(back.first); stack.pop_back(); } else { const GNode& input = getinput(back.first, back.second++); HashType input_hash = hash(input); if (visited.count(input_hash) == 0) { stack.push_back(std::make_pair(input, 0)); visited.insert(input_hash); } } } } }
從第一個函數ListArguments()
能夠看到,若是Symbol是variable,則放到輸出結果ret
中。第二個函數DFSVisit(FVisit fvisit)
是幫第三個函數PostOrderDFSVisit(...)
構建一些匿名函數。關鍵是看第三個函數,咱們在初始化模型時掛上去的lro
,也圖1中的Symbol:OP--Out
。這裏這裏深度優先搜索(DFS)的步驟以下:
back
。back.second
的值是訪問的次數back
從容器中拿掉,且若是back.first
是變量則放到輸出結果ret
中。back.first
中的輸入input[back.second]
拿出放入到容器的最後,且back.second
的值增長一。從圖1的頂層開始的DFS,按以上步驟能夠獲得的結果以下(要注意的是下面的順序是惟一的):
['data', 'fc1_weight', 'fc1_bias', 'fc2_weight', 'fc2_bias', 'fc3_weight', 'fc3_bias', 'softmax_label']
從這個順序也能夠看到爲何用DFS,由於遍歷的順序恰好是前向傳播計算的順序。
在訓練以前會根據設備來綁定執行器(Bind Executor),沒有明確指出執行器時,默認爲cpu(0)
,通常來講一個Executor對應該硬件的一個設備,好比一個cpu、一個gpu。python的函數調用過程以下:
base_module.py : model.fit --> module.py : bind --> excutor_group.py : DataParallelExecutorGroup.__init__ --> bind_exec --> _bind_ith_exec --> symbol.py : bind --> C++ : MXExecutorBindEX
_bind_ith_exec
是python代碼中最關鍵的一個,它是不只綁定執行器,還分配了前向(arg_arrays)和後向(grad_arrays)傳播所須要的內存空間、Symbol是否要後向傳播(grad_req)、矩形形狀的推斷(infer shape)。其中infer shape
也是引用了C++的代碼,裏面用到了迭代器生成TShape
、拓樸排序等知識。
C++的調用關係如下:
MXExecutorBindEX() --> Executor::Bind() --> GraphExecutor::init()
看下GraphExecutor::init()
具體作了什麼,InitGraph初始化了計算圖,這個計算圖包括了前向和後向的,InitDataEntryInfo初始化一些傳入來的變量,InitDataEntryMemory這個是爲中間的一些輸出分配內存空間,這裏涉及到兩個省內存的策略:
ForwardInplaceOption
與BackwardInplaceOption
GraphStoragePool
。其實還有一個省內存的策略,不過與計算圖無關,就是我在上篇博客所說的——mshadow的原理--MXNet。
inline void Init(Symbol symbol, const Context& default_ctx, const std::map<std::string, Context>& ctx_map, const std::vector<NDArray> &in_args, const std::vector<NDArray> &arg_grad_store, const std::vector<OpReqType> &grad_req_type, const std::vector<NDArray> &aux_states, Executor* shared_exec = nullptr) { enable_inplace_allocation_ = dmlc::GetEnv("MXNET_EXEC_ENABLE_INPLACE", true); prefer_bulk_execution_ = dmlc::GetEnv("MXNET_EXEC_PREFER_BULK_EXEC", true); if (shared_exec != NULL) { GraphExecutor* gexec = dynamic_cast<GraphExecutor*>(shared_exec); CHECK(gexec) << "Input executor for sharing memory must have GraphExecutor type."; shared_mem_ = gexec->shared_mem_; } else { shared_mem_ = std::make_shared<GraphStoragePool>(); } CHECK_EQ(grad_req_type.size(), arg_grad_store.size()); bool need_backward = false; for (auto req : grad_req_type) { if (req != kNullOp) need_backward = true; } this->InitGraph(symbol, default_ctx, ctx_map, in_args, arg_grad_store, grad_req_type, need_backward); this->InitDataEntryInfo(in_args, arg_grad_store, grad_req_type, aux_states); this->InitOperators(); this->InitDataEntryMemory(); this->InitResources(); this->InitCachedOps(); this->InitOpSegs(); }
如圖2所示,這是mxnet省內存策略的效果:
訓練以前,先初始化除了輸入數的全部變量,初始化訓練的算法,這個在base_module.py:
self.init_params(initializer=initializer, arg_params=arg_params, aux_params=aux_params, allow_missing=allow_missing, force_init=force_init) self.init_optimizer(kvstore=kvstore, optimizer=optimizer, optimizer_params=optimizer_params)
訓練的步驟主要是forward_backward
與update
,代碼以下:
################################################################################ # training loop ################################################################################ for epoch in range(begin_epoch, num_epoch): tic = time.time() eval_metric.reset() for nbatch, data_batch in enumerate(train_data): if monitor is not None: monitor.tic() self.forward_backward(data_batch) self.update() self.update_metric(eval_metric, data_batch.label) if monitor is not None: monitor.toc_print() if batch_end_callback is not None: batch_end_params = BatchEndParam(epoch=epoch, nbatch=nbatch, eval_metric=eval_metric, locals=locals()) for callback in _as_list(batch_end_callback): callback(batch_end_params) # one epoch of training is finished for name, val in eval_metric.get_name_value(): self.logger.info('Epoch[%d] Train-%s=%f', epoch, name, val) toc = time.time() self.logger.info('Epoch[%d] Time cost=%.3f', epoch, (toc-tic)) if epoch_end_callback is not None: arg_params, aux_params = self.get_params() for callback in _as_list(epoch_end_callback): callback(epoch, self.symbol, arg_params, aux_params) #---------------------------------------- # evaluation on validation set if eval_data: res = self.score(eval_data, validation_metric, batch_end_callback=eval_batch_end_callback, epoch=epoch) for name, val in res: self.logger.info('Epoch[%d] Validation-%s=%f', epoch, name, val) # end of 1 epoch, reset the data-iter for another epoch train_data.reset()
forward
與backward
最後都調用了void RunOps(bool is_train, size_t topo_start, size_t topo_end)
,估計這個函數纔是整個訓練的核心,但個函數涉及到的同步、異步處理的parameter server(PS),PS很複雜,在這裏就再也不展開討論了。
【防止爬蟲轉載而致使的格式問題——連接】:
http://www.cnblogs.com/heguanyou/p/7604326.html