http://openaccess.thecvf.com/content_cvpr_2017/papers/Kodirov_Semantic_Autoencoder_for_CVPR_2017_paper.pdfweb
Semantic Autoencoder for Zero-Shot Learning,Elyor Kodirov Tao Xiang Shaogang Gong,Queen Mary University of London, UK,{e.kodirov, t.xiang, s.gong}@qmul.ac.uk算法
亮點性能
- 經過對耦學習提高零次學習系統的性能(相似CycleGan)
- 結構很是簡潔,且可直接求解,速度很是快
- 有效應用到其餘相關任務(監督聚類)上,證實了範化性能
方法學習
Linear autoencoder測試
Model Formulationspa
which is a well-known Sylvester equation which can be solved efficiently by the Bartels-Stewart algorithm (matlab sylvester).3d
零次學習:基於以上算法有兩種測試的方法:code
- 將一個未知的類別特徵樣本xi經過W映射到語義空間(屬性)si,經過比較語義空間的距離找到離它最近的類別(無訓練樣本),即爲它的標籤
- 將全部無訓練數據類別的語義特徵S經過WT映射到特徵空間X,經過比較一個未知類別的樣本xi和映射到特徵空間的類別中心X的距離,找到離它最近的類別,即爲它的標籤
- 以上兩種算法獲得結果的準確度基本相同。
監督聚類:在這個問題中,語義空間即爲類別標籤空間(one-hot class label)。全部測試數據被影射到訓練類別標籤空間,而後使用k-means聚合orm
與已有模型的關係:零度學習已有模型通常學習一個知足如下條件的影射:blog
或者,在[54]中將屬性影射到特徵空間,學習目標變爲,
文中的算法結合了這二者,並且因爲W*=WT,在對耦學習中W不可能太大(不然,x乘以兩個範數很大的的矩陣沒法恢復原來的初始值),正則化項能夠被忽略。
實驗
零次學習
數據集:Semantic word vector representation is used for large-scale datasets (ImNet-1 and ImNet-2). We train a skip-gram text model on a corpus of 4.6M Wikipedia documents to obtain the word2vec2 [38, 37] word vectors.
特徵:除 ImNet-1用AlexNet提取外,其餘均使用了GoogleNet
結果:
- Our SAE model achieves the best results on all 6 datasets.
- On the smallscale datasets, the gap between our model’s results to the strongest competitor ranges from 3.5% to 6.5%.
- On the large-scale datasets, the gaps are even bigger: On the largest ImNet-2, our model improves over the state-of-the-art SS-Voc [22] by 8.8%.
- Both the encoder and decoder projection functions in our SAE model (SAE (W) and SAE (WT) respectively) can be used for effective ZSL.
- The encoder projection function seems to be slightly better overall.
- Measures how well a zero-shot learning method can trade-off between recognising data from seen classes and that of unseen classes
- Holding out 20% of the data samples from the seen classes and mixing them with the samples from the unseen classes.
- On AwA, our model is slightly worse than the SynCstruct [13].
- However, on the more challenging CUB dataset, our method significantly outperforms the competitors.
聚類
數據集: A synthetic dataset and Oxford Flowers-17 (848 images)
結果:
- On computational cost, our model (93s) is more expensive than MLCA (39%) but much better than all others (hours~days).
- Achieves the best clustering accuracy