主要兩個目錄
src: 包含源碼實現
include: 頭文件html
src目錄的架構,主要代碼在caffe目錄中,包含net.cpp, solver.cpp, blob.cpp, layer.cpp, blob.cpp, common.cpp, layers目錄主要包含一些層,是caffe核心。proto中只有一個caffe.proto文件,裏面使用protobuf語言描述了各類對象的成員變量, solvers主要提供不一樣的優化器,sgd, adam, rmsprop, adagrad,test目錄包含一些單元測試用例, util經常使用工具函數:python
├── caffe │ ├── layers │ ├── proto │ ├── solvers │ ├── test │ │ └── test_data │ └── util └── gtest
首先來看caffe目錄下的幾個cpp:編程
blob.cpp common.cpp data_transformer.cpp internal_thread.cpp layer.cpp layer_factory.cpp net.cpp parallel.cpp solver.cpp syncedmem.cpp
blob.cpp是caffe中主要的數據傳輸類型。
common.cppapi
在根目錄下有一個tools目錄,主要用來編譯一個caffe的可執行檔,裏面提供了caffe的一些可執行參數,經過配置參數來達到使用caffe的目的。數組
caffe.cpp compute_image_mean.cpp convert_imageset.cpp device_query.cpp extract_features.cpp finetune_net.cpp net_speed_benchmark.cpp test_net.cpp train_net.cpp upgrade_net_proto_binary.cpp upgrade_net_proto_text.cpp upgrade_solver_proto_text.cpp
main.cpp
中分別註冊了幾個函數到g_brew_map中,分別是train, test, time, device_query。網絡
首先來看train函數,使用一個solver_param對象來解析solver參數,架構
caffe::SolverParameter solver_param; caffe::ReadSolverParamsFromTextFileOrDie(FLAGS_solver, &solver_param);
經過SolverRegistery::CreateSolver
建立一個solver對象, solver對象有一個 shared_ptr<Net<Dtype> > net_
成員變量:app
shared_ptr<caffe::Solver<float> >
solver(caffe::SolverRegistry<float>::CreateSolver(solver_param));
Net
對象是整個網絡的主體,那麼一個Net
究竟包含什麼呢?最主要的是三個變量, layers_
, params_
, blobs_
,以下:框架
template <typename Dtype>
class Net {
private:
vector<shared_ptr<Layer<Dtype> > > layers_;
vector<shared_ptr<Blob<Dtype> > > params_;
vector<shared_ptr<Blob<Dtype> > > blobs_;
};
layers_
是構成網絡的基本組件; params_
是每層的濾波器參數,這個變量和每層layer
的blobs_
變量是共享數據的,即這邊的params_
存儲的是layer
的blobs_
的指針; blobs_
是各層的中間數據。dom
Net
構造函數接收一個NetParameter
參數,只是調用了一下Init
函數:
template <typename Dtype>
Net<Dtype>::Net(const NetParameter& param) {
Init(param);
}
NetParameter
在caffe.proto的定義以下:
message NetParameter {
optional string name = 1;
repeated string input = 3;
repeated BlobShape input_shape = 8;
repeated int32 input_dim = 4;
optional bool force_backward = 5 [default = false];
optional NetState state = 6;
repeated LayerParameter layer = 100; // ID 100 so layers are printed last.
}
message LayerParameter {
optional string name = 1; // the layer name
optional string type = 2; // the layer type
repeated string bottom = 3; // the name of each bottom blob
repeated string top = 4; // the name of each top blob
// The blobs containing the numeric parameters of the layer.
repeated BlobProto blobs = 7;
optional TransformationParameter transform_param = 100;
}
NetParameter
的核心是LayerParameter
,
LayerParamter
(定義進行了簡化)的核心是bottom
名, top
名, 以及參數blobs
。
這個NetParamter
利用protobuf
從train.prototxt
, vgg.caffemodel
進行讀取初始化,而後去構造Net
對象,有了Net
整個網絡也就搭建起來了。
以後能夠調用solver->Solve();
函數來開始整個網絡的訓練,而在Solve()
函數中,則調用Step()
函數,Step()
函數主要用來進行每次的迭代,裏面有個循環,每一個循環是一次iter,每一個iter進行iter_size
次前向反向傳播(FowardBackward()
),並對這個batch的loss取平均更新優化器。
這裏的iter_size
參數是爲了防止因爲GPU內存不足致使沒法使用較大的batch size
帶來的問題,由於它實際更新loss的迭代次數是iter_size * batch_size
,這樣就能夠與使用較大的batch size
是相同的結果。例如網絡在batch_size = 128
時取得較好的結果,但因爲GPU內存不夠,只夠32張圖片,那麼能夠將batch_size
設爲32,將iter_size
設爲4,取得的效果與batch_size = 128
同樣。
while (iter_ < stop_iter) {
// ...
Dtype loss = 0;
for (int i = 0; i < param_.iter_size(); ++i) {
loss += net_->ForwardBackward();
}
loss /= param_.iter_size();
// average the loss across iterations for smoothed reporting
UpdateSmoothedLoss(loss, start_iter, average_loss);
// ...
ApplyUpdate();
// ...
}
查看FowardBackward()
實現以下,分別進行了Forward, Backward,並在前向傳播時記錄了loss:
Dtype ForwardBackward() {
Dtype loss;
Forward(&loss);
Backward();
return loss;
}
再看Foward(&loss)
實現,調用了FowardFromTo(0, layers_.size() - 1)
函數:
template <typename Dtype>
const vector<Blob<Dtype>*>& Net<Dtype>::Forward(Dtype* loss) {
if (loss != NULL) {
*loss = ForwardFromTo(0, layers_.size() - 1);
} else {
ForwardFromTo(0, layers_.size() - 1);
}
return net_output_blobs_;
}
FowardFromTo(0, layers_.szie()-1)
遍歷了每一個層,使每一個層分別調用Forward()
函數,bottom_vecs_
,top_vecs_
的類型是vector<vector<Blob<Dtype>*> >
,傳入每層的類型是vector<Blob<Dtype>*>
,這個vector
表示層可能有多個輸入或輸出:
template <typename Dtype>
Dtype Net<Dtype>::ForwardFromTo(int start, int end) {
Dtype loss = 0;
for (int i = start; i <= end; ++i) {
Dtype layer_loss = layers_[i]->Forward(bottom_vecs_[i], top_vecs_[i]);
loss += layer_loss;
}
return loss;
}
因此,以上的solver, net都是爲了layer服務,核心的功能實現仍是在layer當中,咱們先來看卷積層(conv_layer.cpp)的Forward實現。
爲了對layer有足夠的理解,咱們先來閱讀與layer相關的對象。全部layer的基類是Layer
,因爲實現的類都是使用模板編程,若是沒有靜態地調用相關模板類,編譯器是不會進行特化的。而咱們的調用過程都是經過配置文件train.prototxt進行動態初始化相關的類,這樣就會發現找不到這個類。爲了不這個問題,在類定義後面都進行一下聲明,這樣確保在使用的時候能夠找到這個類,使用的是一個宏:
INSTANTIATE_CLASS(ConvolutionLayer);
宏的定義以下:
#define INSTANTIATE_CLASS(classname) \ char gInstantiationGuard##classname; \ template class classname<float>; \
template class classname<double>
實際上就是聲明瞭一下ConvolutionLayer<float>
, ConvolutionLayer<double>
:
char gInstantiationGuardConvolutionLayer;
template class ConvolutionLayer<float>;
template class ConvolutionLayer<double>;
除此以外,有那麼多的Layer,caffe實現了一個工廠模型(layer_factory.cpp),將layer進行統一管理,也就是須要將全部Layer都註冊到一個map,裏面的key對應Layer名,value是生成相應的Layer函數,這樣在使用的時候就能夠根據類型實例化相應的Layer對象了。提供了兩個宏定義:
#define REGISTER_LAYER_CREATOR(type, creator) \ static LayerRegisterer<float> g_creator_f_##type(#type, creator<float>); \
static LayerRegisterer<double> g_creator_d_##type(#type, creator<double>) \
#define REGISTER_LAYER_CLASS(type) \ template <typename Dtype> \
shared_ptr<Layer<Dtype> > Creator_##type##Layer(const LayerParameter& param) \ { \ return shared_ptr<Layer<Dtype> >(new type##Layer<Dtype>(param)); \
} \
REGISTER_LAYER_CREATOR(type, Creator_##type##Layer)
先看第一個宏,傳入兩個參數,一個是類型(Convolution),第二個是建立函數,如在layer_factory.cpp
中有以下代碼(進行了簡化):
template <typename Dtype>
shared_ptr<Layer<Dtype> > GetConvolutionLayer(const LayerParameter& param) {
// 簡化...
return shared_ptr<Layer<Dtype> >(new ConvolutionLayer<Dtype>(param));
// 簡化...
}
REGISTER_LAYER_CREATOR(Convolution, GetConvolutionLayer);
那麼宏翻譯過來就是以下:
static LayerRegisterer<float> g_creator_f_Convolution("Convolution", creator<float>);
static LayerRegisterer<double> g_creator_d_Convolution("Convolution", creator<double>) ;
因此咱們再來看看LayerRegisterer
這個類幹了什麼:
LayerRegistry<Dtype>::AddCreator(type, creator);
調用了靜態函數LayerRegistry<Dtype>::AddCreator
,繼續看:
class LayerRegistry {
public:
static CreatorRegistry& Registry() {
static CreatorRegistry* g_registry_ = new map<string, Creator>();
return *g_registry_;
}
static void AddCreator(const string& type, Creator creator) {
CreatorRegistry& registry = Registry();
CHECK_EQ(registry.count(type), 0) << "Layer type " << type << " already registered.";
registry[type] = creator;
}
}
能夠看到維護了一個單例map類型對象g_registry_
,這個對象存儲了類型與對應的建立函數。
第二個宏,假如是這樣調用REGISTER_LAYER_CLASS(Convolution),則能夠翻譯成下面的樣子:
template <typename Dtype>
shared_ptr<Layer<Dtype> > Creator_ConvolutionLayer(const LayerParameter& param)
{
return shared_ptr<Layer<Dtype> >(new ConvolutionLayer<Dtype>(param));
}
REGISTER_LAYER_CREATOR(type, Creator_ConvolutionLayer)
就是這個類不須要特殊建立,直接使用這個默認建立方法(Creator_ConvolutionLayer
)就能夠。而一些特殊的例子好比Convolution要進行其它的處理,因此要特殊寫建立函數(GetConvolutionLayer
),固然大多數層均可以直接調用這個默認的函數進行建立。
caffe中的數據的基本存儲、操做對象就是Blob,還提供了CPU、GPU數據同步功能。
Blob的數據基本存儲就是數組,是按照行存儲的。
Blob主要存儲了兩個數據,data_, diff_,分別是數據與梯度。
blob是一個四維的數組。維度從高到低分別是:(num_,channels_,height_,width_)
對於圖像數據來講就是:圖片個數,彩色通道個數,寬,高,好比說有10張圖片,分別是512*256
大小,彩色三通道,則爲:(10,3,256,512)
:
template <typename Blob>
class Blob {
public:
inline int num() const { return LegacyShape(0); }
inline int channels() const { return LegacyShape(1); }
inline int height() const { return LegacyShape(2); }
inline int width() const { return LegacyShape(3); }
inline const shared_ptr<SyncedMemory>& data() const {
return data_;
}
inline const shared_ptr<SyncedMemory>& diff() const {
return diff_;
}
void Update() {
caffe_axpy<Dtype>(count_, Dtype(-1), static_cast<const Dtype*>(diff_->cpu_data()), static_cast<Dtype*>(data_->mutable_cpu_data()));
}; // 數據更新,即減去當前計算出來的梯度
void FromProto(const BlobProto& proto, bool reshape = true); // 將數據進行反序列化,從磁盤導入以前存儲的blob
void ToProto(BlobProto* proto, bool write_diff = false) const; // 將數據進行序列化,便於存儲
protected:
shared_ptr<SyncedMemory> data_;
shared_ptr<SyncedMemory> diff_;
shared_ptr<SyncedMemory> shape_data_;
vector<int> shape_;
int count_;
int capacity_;
DISABLE_COPY_AND_ASSIGN(Blob);
}; // class Blob
Layer基類的Forward
方法,注意這並不是是一個virtual方法,也就意味着它不但願子類對這個函數進行修改,便可以認爲全部Layer都是使用的這個Forward函數,因此咱們來看看具體的步驟:
template <typename Dtype>
class Layer {
public:
explicit Layer(const LayerParameter& param) : layer_param_(param) {
phase_ = param.phase();
if (layer_param_.blobs_size() > 0) {
blobs_.resize(layer_param_.blobs_size());
for (int i = 0; i < layer_param_.blobs_size(); ++i) {
blobs_[i].reset(new Blob<Dtype>());
blobs_[i]->FromProto(layer_param_.blobs(i));
}
}
}
virtual ~Layer() {}
void SetUp(const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {
CheckBlobCounts(bottom, top);
LayerSetUp(bottom, top);
Reshape(bottom, top);
SetLossWeights(top);
}
/** * @brief Does layer-specific setup: your layer should implement this function * as well as Reshape. * * @param bottom * the preshaped input blobs, whose data fields store the input data for * this layer * @param top * the allocated but unshaped output blobs * * This method should do one-time layer specific setup. This includes reading * and processing relevent parameters from the <code>layer_param_</code>. * Setting up the shapes of top blobs and internal buffers should be done in * <code>Reshape</code>, which will be called before the forward pass to * adjust the top blob sizes. */
virtual void LayerSetUp(const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {}
/** * @brief Adjust the shapes of top blobs and internal buffers to accommodate * the shapes of the bottom blobs. * * @param bottom the input blobs, with the requested input shapes * @param top the top blobs, which should be reshaped as needed * * This method should reshape top blobs as needed according to the shapes * of the bottom (input) blobs, as well as reshaping any internal buffers * and making any other necessary adjustments so that the layer can * accommodate the bottom blobs. */
virtual void Reshape(const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) = 0;
/** * @brief Given the bottom blobs, compute the top blobs and the loss. * * @param bottom * the input blobs, whose data fields store the input data for this layer * @param top * the preshaped output blobs, whose data fields will store this layers' * outputs * \return The total loss from the layer. * * The Forward wrapper calls the relevant device wrapper function * (Forward_cpu or Forward_gpu) to compute the top blob values given the * bottom blobs. If the layer has any non-zero loss_weights, the wrapper * then computes and returns the loss. * * Your layer should implement Forward_cpu and (optionally) Forward_gpu. */
inline Dtype Forward(const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top);
/** * @brief Given the top blob error gradients, compute the bottom blob error * gradients. * * @param top * the output blobs, whose diff fields store the gradient of the error * with respect to themselves * @param propagate_down * a vector with equal length to bottom, with each index indicating * whether to propagate the error gradients down to the bottom blob at * the corresponding index * @param bottom * the input blobs, whose diff fields will store the gradient of the error * with respect to themselves after Backward is run * * The Backward wrapper calls the relevant device wrapper function * (Backward_cpu or Backward_gpu) to compute the bottom blob diffs given the * top blob diffs. * * Your layer should implement Backward_cpu and (optionally) Backward_gpu. */
inline void Backward(const vector<Blob<Dtype>*>& top, const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom);
vector<shared_ptr<Blob<Dtype> > >& blobs() {
return blobs_;
}
const LayerParameter& layer_param() const { return layer_param_; }
protected:
/** The protobuf that stores the layer parameters */
LayerParameter layer_param_; //層的參數: 卷積核大小,步長
Phase phase_;
/** The vector that stores the learnable parameters as a set of blobs. */
vector<shared_ptr<Blob<Dtype> > > blobs_; //濾波器參數
vector<bool> param_propagate_down_;
vector<Dtype> loss_;
virtual void Forward_cpu(const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) = 0;
virtual void Forward_gpu(const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top) {
// LOG(WARNING) << "Using CPU code as backup.";
return Forward_cpu(bottom, top);
}
virtual void Backward_cpu(const vector<Blob<Dtype>*>& top, const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom) = 0;
virtual void Backward_gpu(const vector<Blob<Dtype>*>& top, const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom) {
// LOG(WARNING) << "Using CPU code as backup.";
Backward_cpu(top, propagate_down, bottom);
}
private:
DISABLE_COPY_AND_ASSIGN(Layer);
};
template <typename Dtype>
inline Dtype Layer<Dtype>::Forward(const vector<Blob<Dtype>*>& bottom,
const vector<Blob<Dtype>*>& top) {
Dtype loss = 0;
Reshape(bottom, top);
switch (Caffe::mode()) {
case Caffe::CPU:
Forward_cpu(bottom, top);
for (int top_id = 0; top_id < top.size(); ++top_id) {
if (!this->loss(top_id)) { continue; }
const int count = top[top_id]->count();
const Dtype* data = top[top_id]->cpu_data();
const Dtype* loss_weights = top[top_id]->cpu_diff();
loss += caffe_cpu_dot(count, data, loss_weights);
}
break;
case Caffe::GPU:
Forward_gpu(bottom, top);
#ifndef CPU_ONLY
for (int top_id = 0; top_id < top.size(); ++top_id) {
if (!this->loss(top_id)) { continue; }
const int count = top[top_id]->count();
const Dtype* data = top[top_id]->gpu_data();
const Dtype* loss_weights = top[top_id]->gpu_diff();
Dtype blob_loss = 0;
caffe_gpu_dot(count, data, loss_weights, &blob_loss);
loss += blob_loss;
}
#endif
break;
default:
LOG(FATAL) << "Unknown caffe mode.";
}
return loss;
}
在Layer中比較重要的幾個函數,Setup
, LayerSetup
, Reshape
, Forward
, BackWard
, Forward_cpu
, Forward_gpu
, Backward_cpu
, Backward_gpu
。
Reshape
, Forward_cpu
, Backward_cpu
函數是純虛函數,子類必定要對其進行實現;LayerSetup
,Forward_gpu
, Backward_gpu
是虛函數,能夠根據須要進行重寫。Setup
, Forward
, BackWard
是普通函數,不要重寫;因爲卷積也有許多種,因此在中間加了BaseConvolutionLayer
類,作爲全部卷積類的基類。實現了以下函數,並將Reshape
函數由純虛函數變爲了虛函數:
LayerSetUp(const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top)
Reshape(const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top)
forward_cpu_gemm(const Dtype* input, const Dtype* weights, Dtype* output, bool skip_im2col)
forward_cpu_bias(Dtype* output, const Dtype* bias)
backward_cpu_gemm(const Dtype* output, const Dtype* weights, Dtype* input)
weight_cpu_gemm(const Dtype* input, const Dtype* output, Dtype* weights)
backward_cpu_bias(Dtype* bias, const Dtype* input)
forward_gpu_gemm(const Dtype* input, const Dtype* weights, Dtype* output, bool skip_im2col)
forward_gpu_bias(Dtype* output, const Dtype* bias)
backward_gpu_gemm(const Dtype* output, const Dtype* weights, Dtype* input)
weight_gpu_gemm(const Dtype* input, const Dtype* output, Dtype* weights)
backward_gpu_bias(Dtype* bias, const Dtype* input)
ConvolutionLayer
繼承BaseConvolutionLayer
,實現了以下函數:
Forward_cpu(const vector<Blob<Dtype>*>& bottom, const vector<Blob<Dtype>*>& top)
Backward_cpu(const vector<Blob<Dtype>*>& top, const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom)
在Layer的Forward
函數中,首先調用Reshape
函數,這時調用的是BaseConvolutionLayer::Reshape
函數,caffe的數據組織類型爲Blob,在輸入(bottom)大小已知,卷積參數已知的狀況下,是能夠計算輸出(top)的Blob的shape,以下:
// Shape the tops.
bottom_shape_ = &bottom[0]->shape();
compute_output_shape();
vector<int> top_shape(bottom[0]->shape().begin(),
bottom[0]->shape().begin() + channel_axis_);
top_shape.push_back(num_output_);
for (int i = 0; i < num_spatial_axes_; ++i) {
top_shape.push_back(output_shape_[i]);
}
for (int top_id = 0; top_id < top.size(); ++top_id) {
top[top_id]->Reshape(top_shape);
}
裏面對每一個輸出top[i]調用了其成員函數Reshape,Blob的Reshape函數以下:
template <typename Dtype>
void Blob<Dtype>::Reshape(const vector<int>& shape) {
CHECK_LE(shape.size(), kMaxBlobAxes);
count_ = 1;
shape_.resize(shape.size());
if (!shape_data_ || shape_data_->size() < shape.size() * sizeof(int)) {
shape_data_.reset(new SyncedMemory(shape.size() * sizeof(int)));
}
int* shape_data = static_cast<int*>(shape_data_->mutable_cpu_data());
for (int i = 0; i < shape.size(); ++i) {
CHECK_GE(shape[i], 0);
if (count_ != 0) {
CHECK_LE(shape[i], INT_MAX / count_) << "blob size exceeds INT_MAX";
}
count_ *= shape[i];
shape_[i] = shape[i]; //拷到內部
shape_data[i] = shape[i];
}
if (count_ > capacity_) { //內存不夠
capacity_ = count_;
data_.reset(new SyncedMemory(capacity_ * sizeof(Dtype))); //從新申請
diff_.reset(new SyncedMemory(capacity_ * sizeof(Dtype)));
}
}
其實就是將傳入的shape
複製到Blob的內部變量shape_
中,並判斷內存是否知足要求,不知足要求的話從新申請內存。
前向傳播這裏咱們分析cpu的狀況,Reshape
以後是Forward_cpu
,如今調用的是ConvolutionLayer::Forward_cpu
函數:
template <typename Dtype>
void ConvolutionLayer<Dtype>::Forward_cpu(const vector<Blob<Dtype>*>& bottom,
const vector<Blob<Dtype>*>& top) {
const Dtype* weight = this->blobs_[0]->cpu_data();
for (int i = 0; i < bottom.size(); ++i) {
const Dtype* bottom_data = bottom[i]->cpu_data();
Dtype* top_data = top[i]->mutable_cpu_data();
for (int n = 0; n < this->num_; ++n) {
this->forward_cpu_gemm(bottom_data + n * this->bottom_dim_, weight,
top_data + n * this->top_dim_);
if (this->bias_term_) {
const Dtype* bias = this->blobs_[1]->cpu_data();
this->forward_cpu_bias(top_data + n * this->top_dim_, bias);
}
}
}
}
代碼主要是對於每一個bottom、top,要作num_(batch_size)次矩陣乘法(forward_cpu_gemm
),將bottom_data與weight相乘,結果保存到top_data中,這裏mutable_cpu_data
表示要對這個地址進行寫數據,具體地矩陣乘法:
template <typename Dtype>
void BaseConvolutionLayer<Dtype>::forward_cpu_gemm(const Dtype* input,
const Dtype* weights, Dtype* output, bool skip_im2col) {
const Dtype* col_buff = input;
if (!is_1x1_) {
if (!skip_im2col) {
conv_im2col_cpu(input, col_buffer_.mutable_cpu_data());
}
col_buff = col_buffer_.cpu_data();
}
for (int g = 0; g < group_; ++g) {
caffe_cpu_gemm<Dtype>(CblasNoTrans, CblasNoTrans, conv_out_channels_ /
group_, conv_out_spatial_dim_, kernel_dim_,
(Dtype)1., weights + weight_offset_ * g, col_buff + col_offset_ * g,
(Dtype)0., output + output_offset_ * g);
}
}
這裏有conv_im2col_cpu
函數。若是咱們不進行轉換,咱們須要循環進行屢次矩陣乘法,這裏使用這個函數將每一個patch(kxkxC)拉直,而後將這些patch堆在一塊兒,這樣就能夠只進行一次卷積就能夠求出全部結果,caffe_cpu_gemm就是封裝的cblas的矩陣乘法ouput = weights * col_buff
。
再回到Forward
函數中,作完Forward_cpu
後,會遍歷全部層判斷是不是loss層,若是是則根據cpu_diff()
計算loss:
inline Dtype Layer<Dtype>::Forward(const vector<Blob<Dtype>*>& bottom,
const vector<Blob<Dtype>*>& top) {
Dtype loss = 0;
Reshape(bottom, top);
Forward_cpu(bottom, top);
for (int top_id = 0; top_id < top.size(); ++top_id) {
if (!this->loss(top_id)) { continue; }
const int count = top[top_id]->count();
const Dtype* data = top[top_id]->cpu_data();
const Dtype* loss_weights = top[top_id]->cpu_diff();
loss += caffe_cpu_dot(count, data, loss_weights);
}
}
這樣Forward
函數就結束了,下面開始進入Backward
函數,直接來看Layer
的Backward
函數,以下:
template <typename Dtype>
inline void Layer<Dtype>::Backward(const vector<Blob<Dtype>*>& top,
const vector<bool>& propagate_down,
const vector<Blob<Dtype>*>& bottom) {
switch (Caffe::mode()) {
case Caffe::CPU:
Backward_cpu(top, propagate_down, bottom);
break;
case Caffe::GPU:
Backward_gpu(top, propagate_down, bottom);
break;
default:
LOG(FATAL) << "Unknown caffe mode.";
}
}
裏面直接調用Backward_cpu
函數,來看ConvolutionLayer
的Backward_cpu
函數,以下:
template <typename Dtype>
void ConvolutionLayer<Dtype>::Backward_cpu(const vector<Blob<Dtype>*>& top,
const vector<bool>& propagate_down, const vector<Blob<Dtype>*>& bottom) {
const Dtype* weight = this->blobs_[0]->cpu_data();
Dtype* weight_diff = this->blobs_[0]->mutable_cpu_diff();
for (int i = 0; i < top.size(); ++i) {
const Dtype* top_diff = top[i]->cpu_diff();
const Dtype* bottom_data = bottom[i]->cpu_data();
Dtype* bottom_diff = bottom[i]->mutable_cpu_diff();
// Bias gradient, if necessary.
if (this->bias_term_ && this->param_propagate_down_[1]) {
Dtype* bias_diff = this->blobs_[1]->mutable_cpu_diff();
for (int n = 0; n < this->num_; ++n) {
this->backward_cpu_bias(bias_diff, top_diff + n * this->top_dim_);
}
}
if (this->param_propagate_down_[0] || propagate_down[i]) {
for (int n = 0; n < this->num_; ++n) {
// gradient w.r.t. weight. Note that we will accumulate diffs.
if (this->param_propagate_down_[0]) {
this->weight_cpu_gemm(bottom_data + n * this->bottom_dim_,
top_diff + n * this->top_dim_, weight_diff);
}
// gradient w.r.t. bottom data, if necessary.
if (propagate_down[i]) {
this->backward_cpu_gemm(top_diff + n * this->top_dim_, weight,
bottom_diff + n * this->bottom_dim_);
}
}
}
}
}
裏面根據top_diff
分別更新了當前層的weight_diff
(weight_cpu_gemm
),和bottom_diff
(backward_cpu_gemm
)(計算bottom_diff其實是爲了weight_diff)。
那麼Backward也結束了,它分別計算了各層的權重參數的梯度(weight_diff)、以及各層blob的梯度(bottom_diff)。
再回到solver.Solver
函數中,發現下面是執行ApplyUpdate()
函數,纔是真正更新參數的時候,solver.ApplyUpdate()
實際上調用了Net.Update()
函數,以下:
template <typename Dtype>
void Net<Dtype>::Update() {
for (int i = 0; i < learnable_params_.size(); ++i) {
learnable_params_[i]->Update();
}
}
這裏的learnable_params_
實際上就是每層可訓練的參數,也就是每層的權重參數Blob,咱們以前更新了這些Blob裏的diff
值,那咱們再繼續看看Blob.Update()
函數裏作了什麼:
void Blob<Dtype>::Update() {
// We will perform update based on where the data is located.
switch (data_->head()) {
case SyncedMemory::HEAD_AT_CPU:
// perform computation on CPU
caffe_axpy<Dtype>(count_, Dtype(-1),
static_cast<const Dtype*>(diff_->cpu_data()),
static_cast<Dtype*>(data_->mutable_cpu_data()));
break;
//...
}
}
主要是作了以下的計算data_ = data_ - diff_
,caffe_axpy
其實是封裝了cblas的函數,主要作兩個函數相加,因爲傳入的係數是Dtype(-1)
,因此是進行了相減更新data_
,至此,每層的權重參數都獲得了更新,那麼一次迭代更新也就結束了。下面就是屢次調用這個過程,直到訓練獲得一個較好的權重參數。
測試test階段,不須要solver,直接使用Net進行Forward就能夠獲得結果:
Net<float> caffe_net(FLAGS_model, caffe::TEST, FLAGS_level, &stages);
const vector<Blob<float>*>& result = caffe_net.Forward(&iter_loss);
首先有一個_caffe.cpp文件,裏面將全部caffe框架編譯成一個_caffe.so,而pycaffe.py至關於一個wrapper,封裝了一些python接口。pycaffe中能夠將_caffe.so
中的對象import進來,看成python對象使用,以下:
from ._caffe import Net, SGDSolver, NesterovSolver, AdaGradSolver, \
RMSPropSolver, AdaDeltaSolver, AdamSolver, NCCL, Timer
之因此能夠導入直接使用,這是由於在_caffe.cpp中使用BOOST_PYTHON_MODULE
進行了導出:
BOOST_PYTHON_MODULE(_caffe) { ... }
以下是導出一個類的方法:
#include<string>
#include<boost/python.hpp>
using namespace std;
using namespace boost::python;
struct World {
void set(string msg) { this->msg = msg; }
string greet() { return msg; }
string msg;
};
BOOST_PYTHON_MODULE(hello) //導出的module 名字
{
class_<World>("World")
.def("greet", &World::greet)
.def("set", &World::set);
}
以下是python中調用導出的方法:
import hello
planet = hello.World() # 調用默認構造函數,產生類對象
planet.set("howdy") # 調用對象的方法
print planet.greet() # 調用對象的方法
若是不想導出任何構造函數,則使用no_init:
class_<Abstract>("Abstract",no_init)
最後,caffe目錄中提供了一個__init__.py
文件,將整個caffe目錄變成一個python包:
from .pycaffe import Net, SGDSolver, NesterovSolver, AdaGradSolver, RMSPropSolver, AdaDeltaSolver, AdamSolver, NCCL, Timer
from ._caffe import init_log, log, set_mode_cpu, set_mode_gpu, set_device, Layer, get_solver, layer_type_list, set_random_seed, solver_count, set_solver_count, solver_rank, set_solver_rank, set_multiprocess, has_nccl
from ._caffe import __version__
from .proto.caffe_pb2 import TRAIN, TEST
from .classifier import Classifier
from .detector import Detector
from . import io
from .net_spec import layers, params, NetSpec, to_proto
這樣,外面就可使用caffe.Net, caffe.init_log, caffe.__version__, caffe.TRAIN, caffe.Classifier caffe.Detector caffe.io...
去使用caffe的Python接口了。