CV Code|計算機視覺開源週報20200502期

五月第二週，盤點本週新開源或即將開源的CV代碼，涵蓋方向普遍，不只涉及到技術創新，還涉及多種CV應用，但願對你們有幫助。

圖像分割ios

[1].A Hand Motion-guided Articulation and Segmentation Estimationgit

手部運動引導的關節模型估計與分割github

做者 | Richard Sahala Hartanto, Ryoichi Ishikawa, Menandro Roxas, Takeshi Oishi

單位 | 東京大學

論文 | https://arxiv.org/abs/2005.03691

代碼 | https://github.com/cln515/Articulation-Estimation

[2].A Sim2Real Deep Learning Approach for the Transformation of Images from Multiple Vehicle-Mounted Cameras to a Semantically Segmented Image in Bird’s Eye View算法

Sim2Real深度學習方法，用於將圖像從多個車載攝像頭轉換爲鳥瞰圖中的語義分割圖像微信

做者 | Lennart Reiher, Bastian Lampe, Lutz Eckstein網絡

單位 | 德國聯邦教育與研究部；亞琛工業大學架構

論文 | https://arxiv.org/abs/2005.04078app

代碼 | https://github.com/ika-rwth-aachen/Cam2BEV框架

[3].BiSeNet V2: Bilateral Network with Guided Aggregation for Real-time Semantic Segmentation機器學習

BiSeNetV2 實時語義分割算法的非官方TF實現，在cityscapes驗證集上達到71.563 miou，在GTX1070 GPU上達到 83fps。

做者 | Changqian Yu, Changxin Gao, Jingbo Wang, Gang Yu, Chunhua Shen, Nong Sang

單位 | 華中科技大學；阿德萊德大學；香港中文大學；騰訊

論文 | https://arxiv.org/abs/2004.02147

代碼 | https://github.com/MaybeShewill-CV/bisenetv2-tensorflow

[4]Class-Incremental Learning for Semantic Segmentation Re-Using Neither Old Data Nor Old Labels

既不使用舊數據，也不使用舊標籤的語義分割的類增量學習

做者 | Marvin Klingner, Andreas Bär, Philipp Donn, Tim Fingscheidt

單位 | Technische Universitat Braunschweig

論文 | https://arxiv.org/abs/2005.06050

代碼 | https://github.com/ifnspaml/CIL_Segmentation（將開源）

[5].Detection and Retrieval of Out-of-Distribution Objects in Semantic Segmentation

在語義分割中檢測和檢索不在訓練集分佈內的目標，在Cityscapes數據集上訓練，在A2D2數據集測試。

重磅！200G超大自動駕駛數據集A2D2下載

做者 | Philipp Oberdiek, Matthias Rottmann, Gernot A. Fink

單位 | 多特蒙德工業大學；伍珀塔爾大學

論文 | https://arxiv.org/abs/2005.06831

代碼 | https://github.com/RonMcKay/OODRetrieval

目標檢測

#半監督目標檢測#

[6].A Simple Semi-Supervised Learning Framework for Object Detection

谷歌提出新算法STAC，使用在無標籤的圖像上檢測到的目標的僞標籤訓練更新模型，在VOC07數據集上改進了AP0.5從76.3到79.8，在COCO數據集上僅使用5%標籤數據實現 24.38mAP（相對比，監督方法使用10%標籤數據達到23.86 mAP）。

做者 | Kihyuk Sohn, Zizhao Zhang, Chun-Liang Li, Han Zhang, Chen-Yu Lee, Tomas Pfister

單位 | 谷歌

論文 | https://arxiv.org/abs/2005.04757v1

代碼 | https://github.com/google-research/ssl_detection/

#擁擠場景目標檢測#

[7].IterDet: Iterative Scheme for ObjectDetection in Crowded Environments

目標檢測每每會生成大量的目標候選框，一般的作法是使用NMS過濾目標。但對於擁擠場景的目標檢測，這每每會把靠的過近的正確的目標個體去掉了。

爲此，本文發明了一種迭代的目標檢測方法，目標檢測一次後圖像被再一次輸入網絡，但此前檢測結果被保留，使其再也不被檢測到。這種迭代檢測機制大大改進了擁擠場景的目標檢測，代碼已開源。

做者 | Danila Rukhovich, Konstantin Sofiiuk, Danil Galeev, Olga Barinova, Anton Konushin

單位 | 三星公司

論文 | https://arxiv.org/abs/2005.05708v1

代碼 | https://github.com/saic-vul/iterdet

#煙霧識別#

[8].RISE Video Dataset: Recognizing Industrial Smoke Emissions

RISE視頻數據集：識別工業煙氣排放，代碼與數據集都是開源的

做者 | Yen-Chia Hsu, Ting-Hao (Kenneth)Huang, Ting-Yao Hu, Paul Dille, Sean Prendi, Ryan Hoffman, Anastasia Tsuhlares, Randy Sargent, Illah Nourbakhsh

單位 | 賓夕法尼亞州立大學；CMU

論文 | https://arxiv.org/abs/2005.06111

代碼 | https://github.com/CMU-CREATE-Lab/deep-smoke-machine

人臉技術

[9].High Resolution Face Age Editing

高分辨率人臉年齡編輯

人臉年齡編輯：迫不得已花落去，似曾類似春又來！

做者 | Xu Yao, Gilles Puy, Alasdair Newson, Yann Gousseau, Pierre Hellier

單位 | 巴黎綜合理工學院；Valeo.ai

論文 | https://arxiv.org/abs/2005.04410

代碼 | https://github.com/InterDigitalInc/HRFAE

[10].DeepFaceLab: A simple, flexible and extensible face swapping framework

風靡全球的換臉軟件DeepFaceLab 發佈論文公佈了其技術原理，這是一款在Github上有近1.4W顆星的工程，也被衆多youtube博主推薦和使用，據稱95%的假視頻背後的技術支持來自DeepFaceLab。

儘管仍具爭議，但該工程開發者但願藉助公佈技術細節，促進你們對換臉技術的瞭解和使用。

值得一提的是，儘管做者們沒有公佈工做單位，但從名字看出該軟件大部分核心開發者是華人。

做者 | Ivan Petrov, Daiheng Gao, Nikolay Chervoniy, Kunlin Liu, Sugasa Marangonda, Chris Umé, Jian Jiang, Luis RP, Sheng Zhang, Pingyu Wu, Weiming Zhang

論文 | https://arxiv.org/abs/2005.05535

代碼 | https://github.com/iperov/DeepFaceLab/

#微表情識別#

[11].ICE-GAN: Identity-aware and Capsule-Enhanced GAN for Micro-Expression Recognition and Synthesis

ICE-GAN：用於微表情識別和合成，個體感知和膠囊加強GAN方法

做者 | Jianhui Yu, Chaoyi Zhang, Yang Song, Weidong Cai

單位 | 悉尼大學；新南威爾士大學

論文 | https://arxiv.org/abs/2005.04370

代碼 | https://github.com/crane-papercode/ICE-GAN（即將開源）

目標跟蹤

[12].TSDM: Tracking by SiamRPN++ with a Depthrefiner and a Mask-generator

大連理工大學提出一種結合深度信息（RGB-D）與 SiamRPN++算法的目標跟蹤器，其高精度版本跟蹤精度大幅超越現有SOTA方法，幀率可達23fps，輕量級版本可達31fps，是一種實用的跟蹤方法。

做者 | Pengyao Zhao, Quanli Liu, Wei Wang, Qiang Guo

單位 | 大連理工大學

論文 | https://arxiv.org/abs/2005.04063

代碼 | https://github.com/lql-team/TSDM

視線估計

[13].MLGaze: Machine Learning‐Based Analysis of Gaze Error Patterns in Consumer Eye Tracking Systems

基於機器學習的消費級眼動跟蹤系統中凝視錯誤模式分析

做者 | Anuradha Kar

論文 | https://arxiv.org/abs/2005.03795

代碼 | https://github.com/anuradhakar49/

MLGaze

數據集 | https://data.mendeley.com/datasets/cfm4d9y7bh/1

無監督、自監督

[14].Learning to Segment Actions from Observation and Narration

使用視頻旁白進行動做分割，專一無監督和弱監督方法但取得了和監督方法可比較的精度。

做者| Daniel Fried、 Jean-Baptiste Alayrac、 Phil Blunsom、 Chris Dyer、 Stephen Clark、 Aida Nematzadeh†、

單位 | DeepMind、加州大學伯克利分校

論文 | https://arxiv.org/pdf/2005.03684.pdf

代碼 | https://github.com/dpfried/action-segmentation

#自監督強化學習#

[15].Planning to Explore via Self-Supervised World Models

做者 | Ramanan Sekar, Oleh Rybkin, Kostas Daniilidis, Pieter Abbeel, Danijar Hafner, Deepak Pathak

單位 | 賓夕法尼亞大學；加州大學伯克利分校；谷歌；多倫多大學；卡內基梅隆大學；Facebook

論文 | https://arxiv.org/abs/2005.05960

代碼 | https://github.com/ramanans1/

plan2explore

視頻 | https://youtu.be/GftqnPWsCWw

#CVPR 2020#

[16].On the uncertainty of self-supervised monocular depth estimation

自監督單目深度估計的不肯定性研究

單位｜博洛尼亞大學

論文 | https://arxiv.org/abs/2005.06209

代碼 | https://github.com/mattpoggi/mono-uncertainty

人羣計數

[17].Adaptive Mixture Regression Network with Local Counting Map for Crowd Counting

做者發明了新的訓練目標Local Counting Map和新的網絡架構Adaptive Mixture Regression Network，實現更加精確的人羣計數。

做者 | Xiyang Liu, Jie Yang, Tieqiang Wang, Wenrui Ding

單位 | 北航、順豐、中科院自動化所

論文 | https://arxiv.org/abs/2005.05776v1

代碼 | https://github.com/xiyang1012/Local-Crowd-Counting

[18].Ambient Sound Helps: Audiovisual Crowd Counting in Extreme Conditions

Ambient Sound Helps：極端條件下的視聽人羣計數

收集了一個名爲auDiovISual Crowd cOunting（DISCO）的大規模基準測試數據集，該數據集包含1,935張圖像和相應的音頻剪輯以及170,270個帶標註的實例。

做者 | Di Hu, Lichao Mou, Qingzhong Wang, Junyu Gao, Yuansheng Hua, Dejing Dou, Xiao Xiang Zhu

單位 | 香港城市大學；百度；西北工業大學；慕尼黑工業大學

論文 | https://arxiv.org/abs/2005.07097

代碼 | https://github.com/qingzwang/

AudioVisualCrowdCounting

視頻檢索

[19].Condensed Movies: Story Based Retrieval with Contextual Embeddings

VGG組最新視頻檢索論文，構建了基於關鍵場景的超大濃縮視頻數據集，提出了全新的基於story的text-to-video 檢索任務，並開發了baseline，展現了利用上下文信息對該任務的有效改進。

做者 | Max Bain，Arsha Nagrani[，Andrew Brown，Andrew Zisserman

單位 | 牛津大學

論文 | https://arxiv.org/pdf/2005.04208.pdf

代碼 | http://www.robots.ox.ac.uk/

~vgg/research/condensed-movies

（無權訪問）

視頻描述

#ACL 2020#

[20].Dense-Caption Matching and Frame-Selection Gating for Temporal Localization in VideoQA

做者 | Hyounghun Kim, Zineng Tang, Mohit Bansal

單位 | UNC Chapel Hill

論文 | https://arxiv.org/abs/2005.06409

代碼 | https://github.com/hyounghk/

VideoQADenseCapFrameGate-ACL2020

視頻識別

[21].TAM: Temporal Adaptive Module for Video Recognition

做者發明了一種時域自適應模塊（TAM），可方便嵌入到2D CNNs中去，僅須要增長稍許計算代價。在Kinetics-400 數據集上戰勝了其餘時域方法，在Something-Something數據集上取得了大大超過以前SOTA的精度。

做者 | Zhaoyang Liu, Limin Wang, Wayne Wu, Chen Qian, Tong Lu

單位 | 南大；商湯

論文 | https://arxiv.org/abs/2005.06803

代碼 | https://github.com/liu-zhy/TANet（將開源）

行人行爲預測

[22].Pedestrian Action Anticipation using Contextual Feature Fusion in Stacked RNNs

使用 Stacked RNNs 結合上下文特徵融合的行人行爲預測

做者 | Amir Rasouli, Iuliia Kotseruba, John K. Tsotsos

單位 | 約克大學

論文 | https://arxiv.org/abs/2005.06582

代碼 | https://github.com/aras62/SF-GRU

數據集 | http://data.nvision2.eecs.yorku.ca/PIE_dataset/

圖像修補

[23].Enhanced Residual Networks for Context-based Image Outpainting

本文提出一種加強的殘差網絡GAN模型用於圖像向外擴展，生成天然合理的視覺修補圖。

做者 | Przemek Gardias, Eric Arthur, Huaming Sun

單位 | 伍斯特理工學院

論文 | https://arxiv.org/abs/2005.06723

代碼 | https://github.com/etarthur/Outpainting

深度視頻插值

[24].W-Cell-Net: Multi-frame Interpolation of Cellular Microscopy Videos

W-Cell-Net：細胞顯微視頻的多幀插值方法

做者 | Rohit Saha, Abenezer Teklemariam, Ian Hsu, Alan M. Moses

單位 | 多倫多大學

論文 | https://arxiv.org/abs/2005.06684

代碼 | https://github.com/RohitSaha/W-Cell-Net_cellular_video_interpolation

物體計數

[25].Introduction of a new Dataset and Method for Detecting and Counting the Pistachios based on Deep Learning

#如何開心地數開心果？# 開心果是重要的食物，對伊朗來講也是重要的出口農產品。而開着口的開心果價格更高，這催生了食品生產企業想要對開心果進行檢測和分類的需求。伊朗的學者製做了一個開心果的數據集，含423幅標註圖像3927個標註的開心果個體，並提出在視頻中用RetinaNet檢測開心果，再分類的方法進行檢測和計數，總計數精度94.75%。

數據集和代碼都開源了，但願對那些有計數須要的應用有啓發！

做者 | Mohammad Rahimzadeh, Abolfazl Attar

單位 | 伊朗科學技術大學；伊朗沙力夫理工大學

論文 | https://arxiv.org/abs/2005.03990

代碼 | https://github.com/mr7495/Pistachio-Counting

數據集 | https://github.com/mr7495/Pesteh-Set

人臉活體檢測

[26].Learning Generalized Spoof Cues for Face Anti-spoofing

百度活體檢測論文，再也不假設非活體的類型，將活體檢測看做異常檢測問題，提出一種殘差學習框架學習活體和非活體的鑑別特徵。戰勝了以前的SOTA方法。

做者 | Haocheng Feng, Zhibin Hong, Haixiao Yue, Yang Chen, Keyao Wang, Junyu Han, Jingtuo Liu, Errui Ding

單位 | 百度，北航

論文 | https://arxiv.org/abs/2005.03922

代碼 | https://github.com/vis-var/lgsc-for-fas

人體動做識別與檢測

[27].3DV: 3D Dynamic Voxel for Action Recognition in Depth Video

#CVPR 2020# 深度視頻中的動做識別 3D Dynamic Voxel 方法

3DV: 3D Dynamic Voxel for Action Recognition in Depth Video

做者 | Yancheng Wang, Yang Xiao, Fu Xiong, Wenxiang Jiang, Zhiguo Cao, Joey Tianyi Zhou, Junsong Yuan

單位 | 華科、曠視等

論文 | https://arxiv.org/abs/2005.05501v1

代碼 | https://github.com/3huo/3DV-Action

數據增廣

[28].AutoCLINT: The Winning Method in AutoCV Challenge 2019

AutoCV Challenge 2019 冠軍方案論文及代碼，設計有效的代碼優化和自動數據增廣。

做者 | Woonhyuk Baek, Ildoo Kim, Sungwoong Kim, Sungbin Lim

單位 | Kakao Brain；UNIST

論文 | https://arxiv.org/abs/2005.04373

代碼 | https://github.com/kakaobrain/autoclint

神經架構遷移

[29].Neural Architecture Transfer

神經架構搜索（NAS）常常被用於特定任務的網絡設計，好比針對移動端、GPU、CPU分別搜索不一樣的網絡架構，但若是要在多個設備上部署，依次搜索的方式耗費大量的資源。

本文提出一種神經架構遷移的概念，設計特定任務（好比分類）的超網絡，而從它的採樣獲得的子集能夠直接用，而不須要多餘的訓練。

在11個涵蓋大規模多類和小規模細粒度的的圖像分類的全部基準測試中，該文方法改進了全部移動端部署的SOTA方法（好比在ImageNet上獲得的模型比EfficientNet-B0精度高且計算量少）。小規模細粒度的任務增益更多，所須要的時間也相比NAS方法減小了一個數量級。

該方法特別適合一次性設計多個針對不一樣硬件或者目標的場景。

做者| Zhichao Lu, Gautam Sreekumar, Erik Goodman, Wolfgang Banzhaf, Kalyanmoy Deb, Vishnu Naresh Boddeti

單位 | 密歇根州立大學

論文 | https://arxiv.org/abs/2005.05859v1

代碼 | https://github.com/human-analysis/neural-architecture-transfer（404）

神經架構搜索

[30].Neural Architecture Search for Gliomas Segmentation on Multimodal Magnetic Resonance Imaging

使用NAS的基於多模態磁共振成像的膠質瘤神經結構分割算法研究

做者 | Feifan Wang, Bharat Biswal

單位 | 電子科技大學;新澤西理工學院

論文 | https://arxiv.org/abs/2005.06338

代碼 | https://github.com/woodywff/brats_2019

對抗學習

#CVPR2020#

[31].Projection & Probability-Driven Black-Box Attack

投影和機率驅動的黑盒攻擊

做者 | Jie Li, Rongrong Ji, Hong Liu, Jianzhuang Liu, Bineng Zhong, Cheng Deng, Qi Tian

單位 | 華僑大學；廈門大學；諾亞方舟華爲實驗室；西安電子科技大學

論文 | https://arxiv.org/abs/2005.03837

代碼 | https://github.com/theFool32/PPBA

[32].Adversarial examples are useful too

做者 | Ali Borji

論文 | https://arxiv.org/abs/2005.06107

代碼 | https://github.com/aliborji/Backdoor_defense.git

光譜重建

[33].Hierarchical Regression Network for Spectral Reconstruction from RGB Images

RGB圖像光譜重建的層次迴歸網絡

本文提出一個以PixelShuffle層做爲層間交互的4層層次迴歸網絡（HRNet）

在NTIRE 2020挑戰賽中，是賽道2（真實世界圖像）的獲勝方法，在賽道1（清潔圖像）中排名第三。

做者 | Yuzhi Zhao, Lai-Man Po, Qiong Yan, Wei Liu, Tingyu Lin

單位 | 香港城市大學；哈工大；商湯

論文 | https://arxiv.org/abs/2005.04703

代碼 | https://github.com/zhaoyuzhi/

Hierarchical-Regression-Network-for-

Spectral-Reconstruction-from-RGB-Images

3D姿態估計

#CVPR 2020#

[34].Epipolar Transformers

卡耐基梅隆大學和Facebook的學者提出一種利用對極幾何變換從2D信息構建3D感知特徵的方法，使得3D姿態估計更好的利用場景3D信息，在InterHand 和 Human3.6M數據集上取得更高的精度。代碼已開源。

做者 | Yihui He, Rui Yan, Katerina Fragkiadaki, Shoou-I Yu

單位 | Facebook Reality Labs；卡內基梅隆大學

論文 | https://arxiv.org/abs/2005.04551

代碼 | https://github.com/yihui-he/epipolar-transformers

醫學影像處理

[35].iUNets: Fully invertible U-Nets with Learnable Upand Downsampling

劍橋大學學者提出一種徹底可逆的UNet架構iUNets。UNet被普遍用用於圖像到圖像的變換，好比分割任務或者其逆問題成像。但在一些高維數據如3D醫學成像中，原始的UNet每每對內存要求很高，做者發明了可學習的且可逆的上下采樣操做，提出了一種徹底可逆的UNet架構iUNet，容許內存高效的反向傳播。在CT醫學圖像的後處理和腦瘤分割的任務中表現出更好的結果。基於PyTorch的代碼已開源。

做者 | Christian Etmann, Rihuan Ke, Carola-Bibiane Schönlieb

單位 | 劍橋大學

論文 | https://arxiv.org/abs/2005.05220

開源庫 | https://github.com/cetmann/iunets

運動遷移

[36].Unpaired Motion Style Transfer from Video to Animation

#SIGGRAPH 2020# 把真人動做遷移到動畫角色上

做者 | Kfir Aberman, Yijia Weng, Dani Lischinski, Daniel Cohen-Or, Baoquan Chen

單位 | 特拉維夫大學&北電；北大；AICFVE；希伯來大學

論文 | https://arxiv.org/abs/2005.05751

代碼 | https://github.com/DeepMotionEditing/

deep-motion-editing

視頻 | https://www.youtube.com/watch?v=m04zuBSdGrc

二進制神經網絡

[37].Binarizing MobileNet via Evolution-based Searching

該方法達到了60.09％的Top-1準確性，而且賽過了最新的CI-BCNN

做者 | Hai Phan, Zechun Liu, Dang Huynh, Marios Savvides, Kwang-Ting Cheng, Zhiqiang Shen

單位 | Axon Enterprise；CMU；香港科技大學

論文 | https://arxiv.org/abs/2005.06305

代碼 | https://github.com/HaiPhan1991/BinMobileNet_Evo_Search

FPGA加速CNN

[38].ZynqNet: An FPGA-Accelerated Embedded Convolutional Neural Network

FPGA實現的嵌入式CNN

做者 | David Gschwend

單位 | 蘇黎世聯邦理工學院

論文 | https://arxiv.org/abs/2005.06892

代碼 | https://github.com/dgschwend/zynqnet

局部特徵提取與圖像匹配

[39].The Information & Mutual Information Ratio for Counting Image Features and Their Matches

做者 | Ali Khajegili Mirabadi, Stefano Rini

單位 | 臺灣交通大學

論文 | https://arxiv.org/abs/2005.06739

代碼 | https://github.com/AliKhajegiliM/IR-and-MIR（將開源）

在我愛計算機視覺公衆號對話框回覆「 CVCode 」便可獲取以上全部論文下載地址。（網盤位置：Code週報--20200502期）