caffe詳解之softmax層

從零開始，一步一步學習caffe的使用，期間貫穿深度學習和調參的相關知識！html

softmax layer

softmax layer: 輸出似然值

layers {
  bottom: "cls3_fc"
  top: "prob"
  name: "prob"
  type: "softmax"
}

公式以下所示：c++

softmax-loss layer：輸出loss值

layer {
  name: "loss"
  type: "SoftmaxWithLoss"
  bottom: "ip1"
  bottom: "label"
  top: "loss"
  loss_param{
    ignore_label：0
    normalize: 1
    normalization: FULL
  }
}

公式以下所示：web

loss_param 說明：微信

ignore_label
int型變量，默認爲空。
若是指定值，則label等於ignore_label的樣本將不參與Loss計算，而且反向傳播時梯度直接置0.ide
normalize
bool型變量，即Loss會除以參與計算的樣本總數；不然Loss等於直接求和函數
normalization
enum型變量，默認爲VALID，具體表明狀況以下面的代碼。學習

enum NormalizationMode {
  // Divide by the number of examples in the batch times spatial dimensions.
  // Outputs that receive the ignore label will NOT be ignored in computing the normalization factor.
  FULL = 0;

  // Divide by the total number of output locations that do not take the
  // ignore_label.  If ignore_label is not set, this behaves like FULL.
  VALID = 1;

  // Divide by the batch size.
  BATCH_SIZE = 2;

  //
  NONE = 3;
}

(1) 未設置normalization，可是設置了normalize:
normalize==1 : 歸一化方式爲VALID
normalize==0 : 歸一化方式爲BATCH_SIZE
(2)一旦設置normalization，歸一化方式則由normalization決定，再也不考慮normalize。this

其餘說明

softmax的上溢與下溢

對於softmax的計算公式來講，對於比較小的輸入數據來講是沒有什麼問題的，可是針對指數函數的特色，對於較大或者較小的數據進行softmax計算會出現數據上溢與下溢的問題。計算機中浮點數的最大表示位數爲2^64,若是超過此數會產生上溢inf,一樣數據小於2^(-64)計算機在計算過程當中會產生下溢-inf。舉個例子:spa

對於[3,1,-3]，直接計算是可行的，咱們能夠獲得(0.88,0.12,0)。.net
對於[1000,1000,1000]，咱們會獲得inf（上溢）；
對於[-1000,-999,-1000]，咱們會獲得-inf（下溢）。

softmax解決上溢與下溢的辦法

對任意a都成立，這意味着咱們能夠自由地調節指數函數的指數部分，一個典型的作法是取輸入向量中的最大值：a=max{x1,x2…..xn}
這能夠保證指數最大不會超過0，因而避免了上溢。即使剩餘的部分下溢出了，加了a以後，也能獲得一個合理的值。
而且softmax不受輸入的常數偏移影響，即softmax(x)=softmax(x+c)證實以下：

參考

softmax函數計算時候爲何要減去一個最大值？
caffe層解讀系列-softmax_loss(http://blog.csdn.net/shuzfan/article/details/51460895)

－長按關注－

本文分享自微信公衆號 - AI異構（gh_ed66a0ffe20a）。
若有侵權，請聯繫 support@oschina.cn 刪除。
本文參與「OSC源創計劃」，歡迎正在閱讀的你也加入，一塊兒分享。