face recognition[MobileFaceNet]

時間 2019-11-06

標籤 face recognition mobilefacenet 简体版

原文原文鏈接

本文來自《MobileFaceNets: Efficient CNNs for Accurate Real-Time Face Verification on Mobile Devices》，時間線爲2018年4月。是北京交通大學和握奇數據公司的做品。
人臉發展至今，效果相比傳統方法有了很大的提高，然而受限於機器資源和實時性部署等需求，須要考慮諸如MobileNet等網絡的使用。git

0 引言

在愈來愈多的手機和嵌入式設備上，人臉驗證變成愈來愈流行的一個認證技術。然而，如今高準確度的人臉驗證模型都是創建在又深又寬的CNN模型上的，並經過各類loss函數去提供有監督訓練。而大CNN模型須要較多計算力，這對於移動和嵌入式設備來講，是沒法知足的。幾個高效的CNN架構，如MobilenetV1，ShuffleNet，MobileNetV2近些年來做爲解決移動設備的視覺識別任務。一種簡單的方式就是不修改這些CNN結構，直接延用到人臉驗證上，而這對於現今的人臉識別榜單上的結果來講，簡直不能看。github

本文做者提出的模型參數都不到1百萬個，且在相同的實驗環境下，MobileFaceNets效果是MobileNetV2的2倍多。經過在提煉過的MS-Celeb-1M數據集上採樣ArcFace 的loss從頭訓練，MobileFaceNets模型size只有4MB，且在LFW上得到了99.55%的準確度，在MegaFace挑戰1的TAR@FAR10-6上得到了92.59%的準確度，這就能夠與那些大CNN模型相比較了。注意到如今的許多方法如剪枝[37]，low-bit 量化[29],和知識蒸餾[16]均可以用來提高MobileFaceNets的效率。網絡

1 本文主要工做

本部分介紹了本文提出的極端高效的CNN模型，以加速移動設備上實時人臉驗證，這克服了人臉驗證上常見mobile net的不足。爲了讓結果可復現，採用了ArcFace loss去訓練整我的臉驗證模型，涉及的部分參數延用參考文獻[5]。架構

1.1 常見移動設備上網絡在人臉驗證上的不足

在常見的視覺識別任務中使用的mobile網絡都有一個全局平均池化層（global average pooling layer，GAP），如MobileNetV1，Shufflenet，Mobilenetv2.對於人臉驗證和識別任務，一些研究者[5,14]發現帶有全局平均池化的CNN準確度要低於不帶有GAP層的網絡。不過只是尚未理論性的分析這一結論。這裏藉助文獻[19]的相關描述來分析這一現象。app

一般人臉驗證流程包含：預處理人臉圖片，提取人臉特徵，基於特徵距離類似性對2張人臉進行匹配。經過採用[5,20,21,22]中的預處理方法，並基於MTCNN進行人臉檢測和5我的臉關鍵點標註並進行對齊，獲得每一個人臉圖片大小112x112，而後經過減去127.5，除以128來進行歸一化。最後，一我的臉特徵embedding CNN 會將每一個對齊後的人臉映射到一個特徵向量上，如圖1.
框架

不失通常性，下面採用Mobilenetv2做爲人臉特徵embedding CNN的結構。爲了讓輸出map和原始網絡224x224輸入同樣的size，在第一個卷積層使用stride=1而不是2，由於stride=2會致使準確度較低。因此在全局平均池化層前面的卷積層輸出（稱爲FMap-end）的空間分辨率是7x7。雖然理論上FMap-end角上單元的感覺也和中心區域單元的感覺野大小是同樣的，但是他們處在輸入圖片的不一樣位置。如[24]所述，中心區域感覺野比其餘區域在最後輸出上更有影響，且一個感覺野內部的這種影響呈現高斯分佈。FMap-end的角單元的感覺野上有效的感覺野size要小於中心單元上的有效感覺野。當輸入圖像時一個對齊的人臉，FMap-end的一個角單元攜帶比中心單元更少的人臉信息。所以FMap-end上不一樣的單元對於提取一我的臉特徵向量有着不一樣的重要性。

在MobileNetv2中，平鋪後的FMap-end不合適直接用來做爲人臉特徵，由於維度過高了(62720維)。因此天然作法就是加上全局平均池化層並做爲特徵向量，而這在許多研究者文獻中[5,14]證明準確度也較低，如表2
ide

由於全局平均池化層將FMap-end上每一個神經元視爲等同重要性，這是不合理的。另外一個流行的作法就是將全局平均池化層替換成一個全鏈接層，以此將FMap-end映射到一個更緊湊的特徵向量上，這卻會增長整個模型的參數量，即便當維度是128維，Mobilenetv2這個全鏈接層也會額外增長8百萬個參數。因此這個方法本文不採用。

1.2 全局逐深度卷積（Global Depthwise Convolution）

爲了讓FMap-end中不一樣的單元有不一樣的重要性，做者將全局平均池化替換成全局逐深度卷積（global depthwise convolution layer， GDConv）。一個GDConv層就是一個逐深度卷積（如文獻[1,25]），其kernel大小等於輸入的size，pad=0，stride=1。全局逐深度卷積層的輸出爲:
函數

這裏F是輸入的feature map，其size爲 \(W\times H\times M\)；K是逐深度卷積核，其size爲 \(W\times H\times M\);G是輸出，其size爲 \(1\times 1\times M\)。其中在G的第 \(m\)個通道上只有一個元素 \(G_m\)。其中 \((i,j)\)表示F和K中的空間位置，m表示通道的索引。
全局逐深度卷積的計算量爲：
\[W\cdot H\cdot M\]
當在MobilenetV2的FMap-end後採用全局逐深度卷積，其核爲7x7x1280，即有1280個通道。計算代價爲62720MAdds(即相乘-相加的操做次數，如[3])，和62720個參數。假設MobilenetV2-GDConv表示帶有全局逐深度的Mobilenetv2。當基於CIASIA-Wefface數據集，Arcface loss訓練MobileNetV2 和 MobileNetV2-GDConv，後者貨得明顯更好的準確度。因此MobilenetFaceNet採用GDConv結構。

1.3 MobileFaceNet 結構

如今，詳細描述下Mobilefacenet結構。Mobilenetv2中的殘差bottlenecks是mobilefacenet的主要構建塊。爲了方便描述，這裏採用[3]中同樣的概念。MobileFaceNet的結構如表1.
性能

特別的，在MobileFaceNet中的bottleneck的擴展因子比Mobilenetv2中更小一些。且使用PReLU做爲激活函數，比ReLU更好。另外，在網絡開始就使用了一個快速下采樣的策略，在後幾層卷積層採用較早維度約間策略，一個線性1x1的卷積層而後接上一個線性全局逐深度卷積層做爲特徵輸出層。在訓練中採用BN。而後再部署以前採用BN摺疊（如[29]中3.2部分）。

MobileFaceNet網絡的計算量是221百萬MAdds和0.99百萬的參數量。框架進一步細節以下，爲了減小計算量，將輸入從112x112減小到112x96或者直接96x96。爲了減小參數量，移除了MobileFaceNet中GDConv後面的1x1卷積層，此時網絡命名爲MobileFaceNet-M。從MobileFaceNet-M，移除GDConv前面的1x1卷積層，進一步減小網絡結構，此時網絡命名爲MobileFaceNet-S。這三個網絡的性能在下面作詳細比較。學習

2 實驗及分析

2.1 訓練參數配置和LFW與AgeDB上結果對比

做者採用MobileNetv1，ShuffleNet，MobileNetv2（第一個卷積層stride=1，由於stride=2時候準確度很低）做爲baseline模型。全部的MobileFaceNet模型和baseline模型基於CASIA-Webface數據集上從頭開始訓練，採用ArcFace loss。權值衰減超參爲0.0005，在全局操做後的權值衰減超參爲0.0004。使用動量爲0.9的SGD優化模型，batchsize爲512.學習率開始爲0.1，而後再36K，52K，58K迭代次數時分別除以10。最終迭代次數爲60K次。而後如表2中結果，基於LFW和AgeDB-30進行結果對比。

如以前表2所示，MobileFaceNet得到明顯更好的結果，且速度更快。96x96輸入的MobileFaceNet速度最快。爲了驗證極端性能，MobileFaceNet，MobileFaceNet（112x96），MobileFaceNet（96x96）基於乾淨的MS-Celeb-M訓練集，ArcFace loss進行訓練。結果如表3.

2.2 在MegaFace挑戰上結果

本文中採用Facescrub[36]數據集做爲測試集去評估MobileFaceNet在Megaface挑戰1上的結果。表4給出告終果，其中以0.5百萬張圖片做爲閾值區分是large protocol仍是small protocol。

reference：

Howard, A. G., Zhu, M., Chen, B., Kalenichenko, D., Wang, W., Weyand, T., et al.: Mobilenets: Efficient convolutional neural networks for mobile vision applications. CoRR, abs/1704.04861 (2017)
Zhang, X., Zhou, X., Lin, M., Sun, J.: Shufflenet: An extremely efficient convolutional neural network for mobile devices. CoRR, abs/1707.01083 (2017)
Sandler, M., Howard, A., Zhu, M., Zhmoginov, A., Chen, L.C.: MobileNetV2: Inverted Residuals and Linear Bottlenecks. CoRR, abs/1801.04381 (2018)
Guo, Y., Zhang, L., Hu, Y., He, X., Gao, J.: Ms-celeb-1m: A dataset and benchmark for large-scale face recognition. arXiv preprint, arXiv: 1607.08221 (2016)
Deng, J., Guo, J., Zafeiriou, S.: ArcFace: Additive Angular Margin Loss for Deep Face Recognition. arXiv preprint, arXiv: 1801.07698 (2018)
Huang, G.B., Ramesh, M., Berg, T., et al.: Labeled faces in the wild: a database for studying face recognition in unconstrained environments. (2007)
Kemelmacher-Shlizerman, I., Seitz, S. M., Miller, D., Brossard, E.: The megaface benchmark: 1 million faces for recognition at scale. In: CVPR (2016)
Moschoglou, S., Papaioannou, A., Sagonas, C., Deng, J., Kotsia, I., Zafeiriou, S.: Agedb: The first manually collected in-the-wild age database. In: CVPRW (2017)
Iandola, F. N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: Squeezenet: Alexnet-level accuracy with 50x fewer parameters and 0.5 mb model size. arXiv preprint, arXiv:1602.07360 (2016)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: NIPS (2012)
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: CVPR. IEEE (2009)
Russakovsky, O., Deng, J., Su, H., et al.: Imagenet large scale visual recognition challenge. Int. J. Comput. Vis. 115, 211–252 (2015)
Zoph, B., Vasudevan, V., Shlens, J., Le, Q.V.: Learning transferable architectures for scalable image recognition. arXiv preprint, arXiv:1707.07012 (2017)
Wu, X., He, R., Sun, Z., Tan, T.: A light cnn for deep face representation with noisy labels. arXiv preprint, arXiv:1511.02683 (2016)
Wu, B., Wan, A., Yue, X., Jin, P., Zhao, S., Golmant, N., et al.: Shift: A Zero FLOP, Zero Parameter Alternative to Spatial Convolutions. arXiv preprint, arXiv: 1711.08141 (2017)
Hinton, G. E., Vinyals, O., Dean, J.: Distilling the knowledge in a neural network. In arXiv:1503.02531 (2015)
Luo, P., Zhu, Z., Liu, Z., Wang, X., Tang, X., Luo, P., et al.: Face Model Compression by Distilling Knowledge from Neurons. In: AAAI (2016)
Schroff, F., Kalenichenko, D., Philbin, J.: Facenet: a unified embedding for face recognition and clustering. In: CVPR (2015)
Long, J., Zhang, N., Darrell, T.: Do convnets learn correspondence? Advances in Neural Information Processing Systems, 2, 1601-1609 (2014)
Liu, W., Wen, Y., Yu, Z., Li, M., Raj, B., Song, L.: Sphereface: Deep hypersphere embedding for face recognition. In: CVPR (2017)
Wang, F., Cheng, J., Liu, W., Liu, H.: Additive margin softmax for face verification. IEEE Signal Proc. Let., 25(7), 926-930 (2018)
Wang, H., Wang, Y., Zhou, Z., Ji, X., Gong, D., Zhou, J., et al.: CosFace: Large Margin Cosine Loss for Deep Face Recognition. In arXiv: 1801.0941 (2018)
Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint Face Detection and Alignment using Multi-task Cascaded Convolutional Networks. IEEE Signal Proc. Let., 23(10):1499–1503, 2016.
Luo, W., Li, Y., Urtasun, R., Zemel, R.: Understanding the Effective Receptive Field in Deep Convolutional Neural Networks. In: NIPS (2016)
Chollet, F.: Xception: Deep learning with depthwise separable convolutions. arXiv preprint, arXiv:1610.02357 (2016)
Yi, D., Lei, Z., Liao, S., Li, S. Z.: Learning face representation from scratch. arXiv preprint, arXiv:1411.7923 (2014)
He, K., Zhang, X., Ren, S., Sun, J.: Delving deep into rectifiers: Surpassing human-level performance on imagenet classification. In: CVPR (2015)
Ioffe, S., Szegedy, C.: Batch normalization: accelerating deep network training by reducing internal covariate shift. In: International Conference on Machine Learning (2015)
Jacob, B., Kligys, S., Chen, B., Zhu, M., Tang, M., Howard, A., et al.: Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. arXiv preprint, arXiv: 1712.05877 (2017)
NCNN: a high-performance neural network inference framework optimized for the mobile platform, https://github.com/Tencent/ncnn, the version in Apr 20, 2018.
Taigman, Y., Yang, M., Ranzato, M., et al.: DeepFace: closing the gap to human-level performance in face verification. In: CVPR (2014)
Omkar M Parkhi, Andrea Vedaldi, Andrew Zisserman, et al, 「Deep face recognition,」 In BMVC, volume 1, page 6, 2015.
Sun, Y., Wang, X., Tang, X.: Deeply learned face representations are sparse, selective, and robust. In: Computer Vision and Pattern Recognition, pp. 2892–2900 (2015).
Wen, Y., Zhang, K., Li, Z., Qiao, Y.: A discriminative feature learning approach for deep face recognition. In: ECCV (2016)
Deng, W., Chen, B., Fang, Y., Hu, J.: Deep Correlation Feature Learning for Face Verification in the Wild. IEEE Signal Proc. Let., 24(12), 1877 – 1881 (2017)
Ng, H. W., Winkler, S.: A data-driven approach to cleaning large face datasets. In: IEEE International Conference on Image Processing (ICIP), pp. 343–347 (2014)
Han, S., Mao, H., Dally, W. J.: Deep compression: Compressing deep neural network with pruning, trained quantization and Huffman coding. CoRR, abs/1510.00149 (2015)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR (2016)

1. face detection[Face R-CNN]
2. face detection[Face R-FCN]
3. face++
4. Face Databases
5. face recongnition
6. Face Alignment
7. ArcFace:Insight Face
8. face recognition[Euclidean-distance-based loss][Center Face]
9. [Face++]Face初探——人臉檢測
10. 人臉檢測-- Face R-FCN + Face R-CNN
更多相關文章...
• R 繪圖 - 中文支持 - R 語言教程
• SVG 參考手冊 - SVG 教程

相關標籤/搜索

python+opencv+dlib+face

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。