接上篇。java
Net和Propagation具有後,咱們就能夠訓練了。訓練師要作的事情就是,怎麼把一大批樣本分紅小批訓練,而後把小批的結果合併成完整的結果(批量/增量);何時調用學習師根據訓練的結果進行學習,而後改進網絡的權重和狀態;何時決定訓練結束。
node
那麼這兩位老師兒長的什麼樣子,又是怎麼作到的呢?算法
public interface Trainer { public void train(Net net,DataProvider provider); } public interface Learner { public void learn(Net net,TrainResult trainResult); }
所謂Trainer便是給定數據,對指定網絡進行訓練;所謂Learner便是給定訓練結果,而後對指定網絡進行權重調整。網絡
下面給出這兩個接口的簡單實現。ide
Trainer學習
Trainer實現簡單的批量訓練功能,在給定的迭代次數後中止。代碼示例以下。this
public class CommonTrainer implements Trainer { int ecophs; Learner learner; List<Double> costs = new ArrayList<>(); List<Double> accuracys = new ArrayList<>(); int batchSize = 1; public CommonTrainer(int ecophs, Learner learner) { super(); this.ecophs = ecophs; this.learner = learner == null ? new MomentAdaptLearner() : learner; } public CommonTrainer(int ecophs, Learner learner, int batchSize) { this(ecophs, learner); this.batchSize = batchSize; } public void trainOne(final Net net, DataProvider provider) { final Propagation propagation = new Propagation(net); DoubleMatrix input = provider.getInput(); DoubleMatrix target = provider.getTarget(); final int allLen = target.columns; final int[] nodesNum = net.getNodesNum(); final int layersNum = net.getLayersNum(); List<DoubleMatrix> inputBatches = this.getBatches(input); final List<DoubleMatrix> targetBatches = this.getBatches(target); final List<Integer> batchLen = MatrixUtil.getEndPosition(targetBatches); final BackwardResult backwardResult = new BackwardResult(net, allLen); // 分批並行訓練 Parallel.For(inputBatches, new Parallel.Operation<DoubleMatrix>() { @Override public void perform(int index, DoubleMatrix subInput) { ForwardResult subResult = propagation.forward(subInput); DoubleMatrix subTarget = targetBatches.get(index); BackwardResult backResult = propagation.backward(subTarget, subResult); DoubleMatrix cost = backwardResult.cost; DoubleMatrix accuracy = backwardResult.accuracy; DoubleMatrix inputDeltas = backwardResult.getInputDelta(); int start = index == 0 ? 0 : batchLen.get(index - 1); int end = batchLen.get(index) - 1; int[] cIndexs = ArraysHelper.makeArray(start, end); cost.put(cIndexs, backResult.cost); if (accuracy != null) { accuracy.put(cIndexs, backResult.accuracy); } inputDeltas.put(ArraysHelper.makeArray(0, nodesNum[0] - 1), cIndexs, backResult.getInputDelta()); for (int i = 0; i < layersNum; i++) { DoubleMatrix gradients = backwardResult.gradients.get(i); DoubleMatrix biasGradients = backwardResult.biasGradients .get(i); DoubleMatrix subGradients = backResult.gradients.get(i) .muli(backResult.cost.columns); DoubleMatrix subBiasGradients = backResult.biasGradients .get(i).muli(backResult.cost.columns); gradients.addi(subGradients); biasGradients.addi(subBiasGradients); } } }); // 求均值 for(DoubleMatrix gradient:backwardResult.gradients){ gradient.divi(allLen); } for(DoubleMatrix gradient:backwardResult.biasGradients){ gradient.divi(allLen); } // this.mergeBackwardResult(backResults, net, input.columns); TrainResult trainResult = new TrainResult(null, backwardResult); learner.learn(net, trainResult); Double cost = backwardResult.getMeanCost(); Double accuracy = backwardResult.getMeanAccuracy(); if (cost != null) costs.add(cost); if (accuracy != null) accuracys.add(accuracy); System.out.println(cost); System.out.println(accuracy); } @Override public void train(Net net, DataProvider provider) { for (int i = 0; i < this.ecophs; i++) { this.trainOne(net, provider); } } }
Learnerspa
Learner是具體的調整算法,當梯度計算出來後,它負責對網絡權重進行調整。調整算法的選擇直接影響着網絡收斂的快慢。本文的實現採用簡單的動量-自適應學習率算法。.net
其迭代公式以下:code
$$W(t+1)=W(t)+\Delta W(t)$$
$$\Delta W(t)=rate(t)(1-moment(t))G(t)+moment(t)\Delta W(t-1)$$
$$rate(t+1)=\begin{cases} rate(t)\times 1.05 & \mbox{if } cost(t)<cost(t-1)\\ rate(t)\times 0.7 & \mbox{else if } cost(t)<cost(t-1)\times 1.04\\ 0.01 & \mbox{else} \end{cases}$$
$$moment(t+1)=\begin{cases} 0.9 & \mbox{if } cost(t)<cost(t-1)\\ moment(t)\times 0.7 & \mbox{else if } cost(t)<cost(t-1)\times 1.04\\ 1-0.9 & \mbox{else} \end{cases}$$
示例代碼以下:
public class MomentAdaptLearner implements Learner { Net net; double moment = 0.9; double lmd = 1.05; double preCost = 0; double eta = 0.01; double currentEta=eta; double currentMoment=moment; TrainResult preTrainResult; public MomentAdaptLearner(double moment, double eta) { super(); this.moment = moment; this.eta = eta; this.currentEta=eta; this.currentMoment=moment; } @Override public void learn(Net net, TrainResult trainResult) { if (this.net == null) init(net); BackwardResult backwardResult = trainResult.backwardResult; BackwardResult preBackwardResult = preTrainResult.backwardResult; double cost=backwardResult.getMeanCost(); this.modifyParameter(cost); System.out.println("current eta:"+this.currentEta); System.out.println("current moment:"+this.currentMoment); for (int j = 0; j < net.getLayersNum(); j++) { DoubleMatrix weight = net.getWeights().get(j); DoubleMatrix gradient = backwardResult.gradients.get(j); gradient = gradient.muli(currentEta * (1 - this.currentMoment)).addi( preBackwardResult.gradients.get(j).muli(this.currentMoment)); preBackwardResult.gradients.set(j, gradient); weight.subi(gradient); DoubleMatrix b = net.getBs().get(j); DoubleMatrix bgradient = backwardResult.biasGradients.get(j); bgradient = bgradient.muli(currentEta * (1 - this.currentMoment)).addi( preBackwardResult.biasGradients.get(j).muli(this.currentMoment)); preBackwardResult.biasGradients.set(j, bgradient); b.subi(bgradient); } } public void modifyParameter(double cost){ if(cost<this.preCost){ this.currentEta*=1.05; this.currentMoment=moment; }else if(cost<1.04*this.preCost){ this.currentEta*=0.7; this.currentMoment*=0.7; }else{ this.currentEta=eta; this.currentMoment=1-moment; } this.preCost=cost; } public void init(Net net) { this.net = net; BackwardResult bResult = new BackwardResult(); for (DoubleMatrix weight : net.getWeights()) { bResult.gradients.add(DoubleMatrix.zeros(weight.rows, weight.columns)); } for (DoubleMatrix b : net.getBs()) { bResult.biasGradients.add(DoubleMatrix.zeros(b.rows, b.columns)); } preTrainResult=new TrainResult(null,bResult); } }
如今,一個簡單的神經網路從生成到訓練已經簡單實現完畢。
下一步,使用Levenberg-Marquardt學習算法改進收斂速率。