Utterance-level Aggregation For Speaker Recognition In The Wild筆記

時間 2020-12-30

原文原文鏈接

論文鏈接：https://arxiv.org/abs/1902.10107v1 開源代碼：http://www.robots.ox.ac.uk/~vgg/research/speakerID/ 網絡結構輸入：每幀257維向量，256維的頻率量+1維的DC量主幹網絡：Thin-ResNet，提取frame-level特徵 NetVLAD或GhostVLAD層：將frame-level的特徵轉換成

>>阅读原文<<