『深度應用』對CenterNet的一些思考與質疑·對比與U版YoloV3速度與精度

歡迎你們關注小宋公衆號《極簡 AI》帶你學深度學習python

基於深度學習的理論學習與應用開發技術分享,筆者會常常分享深度學習乾貨內容,你們在學習或者應用深度學習時,遇到什麼問題也能夠與我在上面交流知無不答。git

0. 引子

筆者很喜歡 CenterNet 極簡的網絡結構,CenterNet 只經過 FCN(全卷積)的方法實現了對於目標的檢測與分類,無需 anchor 與 nms 等複雜的操做高效的同時精度也不差。同時也能夠很將此結構簡單的修改就能夠應用到人體姿態估計與三維目標檢測之中github

後面一些針對 CenterNet 結構應用於其餘任務,也取得不錯的效果,好比人臉檢測 CenterFace 以及目標追蹤 CenterTrack 與 FairMot。這些內容後面等筆者研習事後再補充,後面應該會作一個類 CenterNet 結構總結對比,感興趣的讀者能夠持續關注一下。算法

下面要引出寫此篇博文的了,在研習 CenterNet 時看到了 CenterNet 與 YoloV3 的對比,在速度與精度都實現了超越,其實針對這個結論筆者仍是略帶懷疑態度的。markdown

**YoloV3 網絡的特色是速度快,精度不是很高,經常使用於實際的檢測項目中,實現實時檢測識別。**相較於二階段(two stage)的 Faster Rcnn 具有速度優點,相較於單階段(one stage)的 SSD(Single Shot Detection)與 RetinaNet 有速度與精度的優點。網絡

因此筆者對 CenterNet 針對 YoloV3 速度的提高仍是有些懷疑的,YoloV3 能夠說目前是工業上最經常使用也是最好用的目標檢測算法,若是真的如 CenterNet 的論文結論所述,CenterNet 同時也具有結構簡單使用方便的特色(先忽略 DCN,部署全面支持只是時間問題),確定能取代 YoloV3 的地位。框架

針對上述狀況,筆者打算作一下對比實驗,測試在相同的硬件與環境的條件下,來測試 CenterNet 與 YoloV3 的精度與速度的測試,其實爲了簡化實驗,這裏只測試在相同尺寸下 CenterNet 與 YoloV3 的速度對比,精度以文章的內容爲準。ide

1. 實驗條件

爲了讀者能承認與方便復現筆者的結果,這裏列出實驗的硬件與環境:oop

  • 系統:Ubuntu 18.04.4 LTS
  • CPU:Intel® Core™ i5-9400F CPU @ 2.90GHz × 6
  • GPU:GeForce RTX 2060 SUPER/PCIe/SSE2
  • Cuda:10.1
  • Pytorch:1.5.0

實驗參考開源:post

CenterNet:github.com/xingyizhou/…

YoloV3:github.com/ultralytics…

2. 實驗過程

1.U 版 YoloV3

1. 最大邊放縮 320

運行:~/yolov3$

python detect.py --weights weights/yolov3-spp-ultralytics.pt --img-size 320 --cfg cfg/yolov3-spp.cfg --source data/samples/
複製代碼

結果:模型平均耗時 12ms

Model Summary: 225 layers, 6.29987e+07 parameters, 6.29987e+07 gradients
image 1/10 data/samples/16004479832_a748d55f21_k.jpg: 256x320 4 persons, 2 dogs, Done. (0.011s)
image 2/10 data/samples/17790319373_bd19b24cfc_k.jpg: 192x320 9 persons, 2 cars, 3 motorcycles, 1 trucks, Done. (0.013s)
image 3/10 data/samples/18124840932_e42b3e377c_k.jpg: 256x320 3 persons, 2 boats, Done. (0.012s)
image 4/10 data/samples/19064748793_bb942deea1_k.jpg: 256x320 14 cars, Done. (0.011s)
image 5/10 data/samples/24274813513_0cfd2ce6d0_k.jpg: 256x320 11 persons, 1 cars, Done. (0.013s)
image 6/10 data/samples/33823288584_1d21cf0a26_k.jpg: 256x320 9 persons, 7 bicycles, 1 backpacks, Done. (0.011s)
image 7/10 data/samples/33887522274_eebd074106_k.jpg: 256x320 4 persons, 1 cars, 1 buss, 1 trucks, Done. (0.011s)
image 8/10 data/samples/34501842524_3c858b3080_k.jpg: 256x320 3 cars, 2 stop signs, Done. (0.011s)
image 9/10 data/samples/bus.jpg: 320x256 4 persons, 1 buss, 1 handbags, Done. (0.011s)
image 10/10 data/samples/zidane.jpg: 192x320 3 persons, 3 ties, Done. (0.010s)
複製代碼

2. 最大邊放縮 512

運行:~/yolov3$

python detect.py --weights weights/yolov3-spp-ultralytics.pt --img-size 512 --cfg cfg/yolov3-spp.cfg --source data/samples/
複製代碼

輸出: 模型平均耗時 20ms

Using CUDA device0 _CudaDeviceProperties(name='GeForce RTX 2060 SUPER', total_memory=7979MB)
Model Summary: 225 layers, 6.29987e+07 parameters, 6.29987e+07 gradients
image 1/10 data/samples/16004479832_a748d55f21_k.jpg: 384x512 4 persons, 2 dogs, Done. (0.018s)
image 2/10 data/samples/17790319373_bd19b24cfc_k.jpg: 320x512 9 persons, 4 cars, 2 motorcycles, 1 trucks, 1 benchs, 1 chairs, Done. (0.019s)
image 3/10 data/samples/18124840932_e42b3e377c_k.jpg: 384x512 3 persons, 3 boats, 1 birds, Done. (0.018s)
image 4/10 data/samples/19064748793_bb942deea1_k.jpg: 384x512 4 persons, 18 cars, 2 traffic lights, Done. (0.019s)
image 5/10 data/samples/24274813513_0cfd2ce6d0_k.jpg: 384x512 13 persons, 1 trucks, Done. (0.020s)
image 6/10 data/samples/33823288584_1d21cf0a26_k.jpg: 384x512 16 persons, 6 bicycles, 2 backpacks, 1 bottles, 1 cell phones, Done. (0.020s)
image 7/10 data/samples/33887522274_eebd074106_k.jpg: 384x512 3 persons, 1 cars, 1 buss, Done. (0.020s)
image 8/10 data/samples/34501842524_3c858b3080_k.jpg: 384x512 3 cars, 1 stop signs, Done. (0.020s)
image 9/10 data/samples/bus.jpg: 512x384 4 persons, 1 buss, 1 stop signs, 1 ties, 1 skateboards, Done. (0.018s)
image 10/10 data/samples/zidane.jpg: 320x512 3 persons, 2 ties, Done. (0.015s)
複製代碼

3. 最大邊放縮 800

運行:~/yolov3$

python detect.py --weights weights/yolov3-spp-ultralytics.pt --img-size 800 --cfg cfg/yolov3-spp.cfg --source data/samples/
複製代碼

輸出:模型平均耗時 40ms

Model Summary: 225 layers, 6.29987e+07 parameters, 6.29987e+07 gradients
image 1/10 data/samples/16004479832_a748d55f21_k.jpg: 544x800 7 persons, 2 dogs, 1 handbags, Done. (0.041s)
image 2/10 data/samples/17790319373_bd19b24cfc_k.jpg: 480x800 10 persons, 8 cars, 1 motorcycles, 2 trucks, 1 chairs, Done. (0.040s)
image 3/10 data/samples/18124840932_e42b3e377c_k.jpg: 544x800 3 persons, 4 boats, Done. (0.045s)
image 4/10 data/samples/19064748793_bb942deea1_k.jpg: 544x800 6 persons, 27 cars, 1 buss, 1 trucks, 6 traffic lights, Done. (0.045s)
image 5/10 data/samples/24274813513_0cfd2ce6d0_k.jpg: 544x800 14 persons, 2 cars, 1 trucks, 1 ties, Done. (0.047s)
image 6/10 data/samples/33823288584_1d21cf0a26_k.jpg: 544x800 21 persons, 11 bicycles, 1 backpacks, 2 handbags, 4 bottles, 1 cell phones, Done. (0.038s)
image 7/10 data/samples/33887522274_eebd074106_k.jpg: 608x800 6 persons, 1 cars, 1 buss, Done. (0.038s)
image 8/10 data/samples/34501842524_3c858b3080_k.jpg: 544x800 4 cars, 1 trucks, 2 stop signs, Done. (0.037s)
image 9/10 data/samples/bus.jpg: 800x608 4 persons, 1 bicycles, 1 buss, 1 ties, Done. (0.039s)
image 10/10 data/samples/zidane.jpg: 480x800 3 persons, 2 ties, Done. (0.029s)
複製代碼

4. 最大邊放縮 1024

運行:~/yolov3$

python detect.py --weights weights/yolov3-spp-ultralytics.pt --img-size 1024 --cfg cfg/yolov3-spp.cfg --source data/samples/
複製代碼

結果: 模型平均耗時 50ms

Model Summary: 225 layers, 6.29987e+07 parameters, 6.29987e+07 gradients
image 1/10 data/samples/16004479832_a748d55f21_k.jpg: 704x1024 5 persons, 1 dogs, 1 handbags, Done. (0.054s)
image 2/10 data/samples/17790319373_bd19b24cfc_k.jpg: 576x1024 13 persons, 5 cars, 1 motorcycles, 2 trucks, 1 umbrellas, 1 chairs, Done. (0.049s)
image 3/10 data/samples/18124840932_e42b3e377c_k.jpg: 704x1024 3 persons, 6 boats, 4 birds, Done. (0.049s)
image 4/10 data/samples/19064748793_bb942deea1_k.jpg: 704x1024 10 persons, 24 cars, 1 buss, 2 trucks, 7 traffic lights, Done. (0.051s)
image 5/10 data/samples/24274813513_0cfd2ce6d0_k.jpg: 704x1024 14 persons, 1 cars, 1 trucks, 2 handbags, 2 ties, Done. (0.048s)
image 6/10 data/samples/33823288584_1d21cf0a26_k.jpg: 704x1024 25 persons, 8 bicycles, 1 backpacks, 1 handbags, 1 kites, 2 bottles, 1 cell phones, Done. (0.051s)
image 7/10 data/samples/33887522274_eebd074106_k.jpg: 768x1024 5 persons, 1 cars, 1 buss, Done. (0.052s)
image 8/10 data/samples/34501842524_3c858b3080_k.jpg: 704x1024 4 cars, 1 trucks, 1 stop signs, Done. (0.048s)
image 9/10 data/samples/bus.jpg: 1024x768 3 persons, 1 bicycles, 1 buss, 1 cell phones, Done. (0.054s)
image 10/10 data/samples/zidane.jpg: 576x1024 1 persons, 3 ties, Done. (0.042s)
Results saved to /home/song/yolov3/output
Done. (0.943s) avg time (0.094s)
複製代碼

YoloV3 輸出照片

2.CenterNet

1. 最大邊放縮 320

運行:~/CenterNet/src$

python demo.py ctdet --demo ../data/samples --load_model ../models/ctdet_coco_dla_2x.pth --input_h 256 --input_w 320
複製代碼

結果: 模型平均耗時 16ms

loaded ../models/ctdet_coco_dla_2x.pth, epoch 230
torch.Size([1, 3, 256, 320])
/opt/conda/conda-bld/pytorch_1587428398394/work/aten/src/ATen/native/BinaryOps.cpp:81: UserWarning: Integer division of tensors using div or / is deprecated, and in a future release div will perform true division as in Python 3. Use true_divide or floor_divide (// in Python) instead.
tot 0.202s |load 0.005s |pre 0.003s |net 0.191s |dec 0.001s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 256, 320])
tot 0.059s |load 0.031s |pre 0.009s |net 0.016s |dec 0.001s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 256, 320])
tot 0.027s |load 0.010s |pre 0.002s |net 0.012s |dec 0.001s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 256, 320])
tot 0.052s |load 0.022s |pre 0.008s |net 0.018s |dec 0.001s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 256, 320])
tot 0.053s |load 0.021s |pre 0.009s |net 0.019s |dec 0.001s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 256, 320])
tot 0.057s |load 0.028s |pre 0.008s |net 0.017s |dec 0.001s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 256, 320])
tot 0.036s |load 0.008s |pre 0.005s |net 0.019s |dec 0.001s |post 0.003s |merge 0.000s 
torch.Size([1, 3, 256, 320])
tot 0.053s |load 0.027s |pre 0.006s |net 0.016s |dec 0.001s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 256, 320])
tot 0.044s |load 0.017s |pre 0.005s |net 0.019s |dec 0.001s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 256, 320])
tot 0.028s |load 0.009s |pre 0.003s |net 0.012s |dec 0.002s |post 0.002s |merge 0.000s 
複製代碼

2. 最大邊放縮 512

運行:~/CenterNet/src$

python demo.py ctdet --demo ../data/samples --load_model ../models/ctdet_coco_dla_2x.pth --input_h 384 --input_w 512
複製代碼

輸出:模型平均耗時 20ms

loaded ../models/ctdet_coco_dla_2x.pth, epoch 230
torch.Size([1, 3, 384, 512])
/opt/conda/conda-bld/pytorch_1587428398394/work/aten/src/ATen/native/BinaryOps.cpp:81: UserWarning: Integer division of tensors using div or / is deprecated, and in a future release div will perform true division as in Python 3. Use true_divide or floor_divide (// in Python) instead.
tot 0.206s |load 0.005s |pre 0.006s |net 0.193s |dec 0.001s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 384, 512])
tot 0.041s |load 0.012s |pre 0.007s |net 0.019s |dec 0.001s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 384, 512])
tot 0.031s |load 0.005s |pre 0.005s |net 0.019s |dec 0.001s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 384, 512])
tot 0.060s |load 0.018s |pre 0.016s |net 0.022s |dec 0.001s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 384, 512])
tot 0.046s |load 0.009s |pre 0.010s |net 0.023s |dec 0.001s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 384, 512])
tot 0.055s |load 0.018s |pre 0.014s |net 0.018s |dec 0.002s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 384, 512])
tot 0.059s |load 0.021s |pre 0.015s |net 0.019s |dec 0.001s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 384, 512])
tot 0.037s |load 0.008s |pre 0.007s |net 0.018s |dec 0.001s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 384, 512])
tot 0.070s |load 0.038s |pre 0.009s |net 0.020s |dec 0.001s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 384, 512])
tot 0.062s |load 0.027s |pre 0.014s |net 0.019s |dec 0.001s |post 0.002s |merge 0.000s 
複製代碼
  1. 最大邊放縮 800

運行:~/CenterNet/src$

python demo.py ctdet --demo ../data/samples --load_model ../models/ctdet_coco_dla_2x.pth --input_h 544 --input_w 800
複製代碼

結果:模型平均耗時 41ms

loaded ../models/ctdet_coco_dla_2x.pth, epoch 230
torch.Size([1, 3, 544, 800])
/opt/conda/conda-bld/pytorch_1587428398394/work/aten/src/ATen/native/BinaryOps.cpp:81: UserWarning: Integer division of tensors using div or / is deprecated, and in a future release div will perform true division as in Python 3. Use true_divide or floor_divide (// in Python) instead.
tot 0.234s |load 0.005s |pre 0.015s |net 0.211s |dec 0.001s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 544, 800])
tot 0.105s |load 0.031s |pre 0.027s |net 0.044s |dec 0.001s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 544, 800])
tot 0.092s |load 0.023s |pre 0.023s |net 0.042s |dec 0.001s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 544, 800])
tot 0.073s |load 0.009s |pre 0.021s |net 0.040s |dec 0.001s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 544, 800])
tot 0.091s |load 0.021s |pre 0.026s |net 0.040s |dec 0.001s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 544, 800])
tot 0.085s |load 0.019s |pre 0.022s |net 0.042s |dec 0.001s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 544, 800])
tot 0.091s |load 0.021s |pre 0.026s |net 0.040s |dec 0.002s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 544, 800])
tot 0.085s |load 0.017s |pre 0.022s |net 0.042s |dec 0.001s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 544, 800])
tot 0.098s |load 0.033s |pre 0.020s |net 0.040s |dec 0.003s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 544, 800])
tot 0.074s |load 0.011s |pre 0.020s |net 0.040s |dec 0.001s |post 0.002s |merge 0.000s 
複製代碼

4. 最大邊放縮 1024

運行:~/CenterNet/src$

python demo.py ctdet --demo ../data/samples --load_model ../models/ctdet_coco_dla_2x.pth --input_h 704 --input_w 1024
複製代碼

結果:模型平均耗時 53ms

loaded ../models/ctdet_coco_dla_2x.pth, epoch 230
torch.Size([1, 3, 704, 1024])
/opt/conda/conda-bld/pytorch_1587428398394/work/aten/src/ATen/native/BinaryOps.cpp:81: UserWarning: Integer division of tensors using div or / is deprecated, and in a future release div will perform true division as in Python 3. Use true_divide or floor_divide (// in Python) instead.
tot 0.260s |load 0.005s |pre 0.025s |net 0.227s |dec 0.002s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 704, 1024])
tot 0.100s |load 0.007s |pre 0.027s |net 0.063s |dec 0.002s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 704, 1024])
tot 0.112s |load 0.014s |pre 0.034s |net 0.060s |dec 0.002s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 704, 1024])
tot 0.117s |load 0.022s |pre 0.029s |net 0.062s |dec 0.002s |post 0.002s |merge 0.000s
torch.Size([1, 3, 704, 1024])
tot 0.119s |load 0.021s |pre 0.033s |net 0.061s |dec 0.002s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 704, 1024])
tot 0.090s |load 0.007s |pre 0.018s |net 0.061s |dec 0.002s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 704, 1024])
tot 0.108s |load 0.020s |pre 0.033s |net 0.051s |dec 0.001s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 704, 1024])
tot 0.078s |load 0.007s |pre 0.018s |net 0.051s |dec 0.001s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 704, 1024])
tot 0.116s |load 0.036s |pre 0.025s |net 0.050s |dec 0.002s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 704, 1024])
tot 0.082s |load 0.006s |pre 0.020s |net 0.051s |dec 0.003s |post 0.002s |merge 0.000s 
複製代碼

3. 補充實驗

在進一步研究了兩個代碼的實現後,筆者發現了實驗的一個問題,只對比了模型推理速度,雖然能看出模型推理效率。可是在實際應用場景中,先後處理也有必定耗時,因此筆者增長了一個在 640/1280 尺寸上總體耗時對比,來講明實際應用時速度差別。

1.U 版 YoloV3

1. 最大邊放縮 640

運行:~/yolov3$

python detect.py --weights weights/yolov3-spp-ultralytics.pt --img-size 640  --cfg cfg/yolov3-spp.cfg --source data/samples/
複製代碼

 結果:模型平均耗時 26ms,總體平均耗時 64ms

Model Summary: 225 layers, 6.29987e+07 parameters, 6.29987e+07 gradients
image 1/10 data/samples/16004479832_a748d55f21_k.jpg: 448x640 5 persons, 2 dogs, Done. (0.028s)
image 2/10 data/samples/17790319373_bd19b24cfc_k.jpg: 384x640 12 persons, 6 cars, 3 motorcycles, 1 trucks, 1 chairs, Done. (0.026s)
image 3/10 data/samples/18124840932_e42b3e377c_k.jpg: 448x640 3 persons, 4 boats, Done. (0.025s)
image 4/10 data/samples/19064748793_bb942deea1_k.jpg: 448x640 8 persons, 22 cars, 1 buss, 1 trucks, 3 traffic lights, 1 clocks, Done. (0.028s)
image 5/10 data/samples/24274813513_0cfd2ce6d0_k.jpg: 448x640 14 persons, 1 cars, 1 trucks, Done. (0.026s)
image 6/10 data/samples/33823288584_1d21cf0a26_k.jpg: 448x640 19 persons, 6 bicycles, 1 backpacks, 2 bottles, 1 cell phones, Done. (0.028s)
image 7/10 data/samples/33887522274_eebd074106_k.jpg: 512x640 5 persons, 1 cars, 1 buss, Done. (0.025s)
image 8/10 data/samples/34501842524_3c858b3080_k.jpg: 448x640 5 cars, 1 trucks, 2 stop signs, Done. (0.024s)
image 9/10 data/samples/bus.jpg: 640x512 4 persons, 1 buss, Done. (0.025s)
image 10/10 data/samples/zidane.jpg: 384x640 3 persons, 2 ties, Done. (0.021s)
Results saved to /home/song/yolov3/output
Done. (0.638s) avg time (0.064s)
複製代碼

2. 最大邊放縮 1280

運行:~/yolov3$

python detect.py --weights weights/yolov3-spp-ultralytics.pt --img-size 1280  --cfg cfg/yolov3-spp.cfg --source data/samples/
複製代碼

 結果:模型平均耗時 78ms,總體平均耗時 124ms

Model Summary: 225 layers, 6.29987e+07 parameters, 6.29987e+07 gradients
image 1/10 data/samples/16004479832_a748d55f21_k.jpg: 896x1280 4 persons, 3 dogs, 1 backpacks, Done. (0.085s)
image 2/10 data/samples/17790319373_bd19b24cfc_k.jpg: 704x1280 16 persons, 10 cars, 1 motorcycles, 1 buss, 1 trucks, 1 umbrellas, 1 chairs, Done. (0.061s)
image 3/10 data/samples/18124840932_e42b3e377c_k.jpg: 896x1280 5 persons, 3 boats, 3 birds, Done. (0.075s)
image 4/10 data/samples/19064748793_bb942deea1_k.jpg: 896x1280 11 persons, 27 cars, 1 buss, 5 trucks, 7 traffic lights, 1 remotes, Done. (0.075s)
image 5/10 data/samples/24274813513_0cfd2ce6d0_k.jpg: 896x1280 14 persons, 2 cars, 1 trucks, 5 handbags, 7 ties, Done. (0.074s)
image 6/10 data/samples/33823288584_1d21cf0a26_k.jpg: 832x1280 27 persons, 10 bicycles, 2 backpacks, 3 handbags, 1 kites, 2 bottles, 1 cell phones, Done. (0.070s)
image 7/10 data/samples/33887522274_eebd074106_k.jpg: 960x1280 5 persons, 1 cars, 1 buss, Done. (0.079s)
image 8/10 data/samples/34501842524_3c858b3080_k.jpg: 896x1280 5 cars, 1 trucks, 1 stop signs, 1 benchs, Done. (0.075s)
image 9/10 data/samples/bus.jpg: 1280x960 4 persons, 1 bicycles, 1 ties, 1 cups, Done. (0.078s)
image 10/10 data/samples/zidane.jpg: 768x1280 2 ties, Done. (0.062s)
Results saved to /home/song/yolov3/output
Done. (1.236s) avg time (0.124s)
複製代碼

2.CenterNet

1. 最大邊放縮 640

運行:~/CenterNet/src$

python demo.py ctdet --demo  ../data/samples --load_model ../models/ctdet_coco_dla_2x.pth  --input_h 448 --input_w 640
複製代碼

 結果:模型平均耗時 27ms,總體平均耗時 50ms

loaded ../models/ctdet_coco_dla_2x.pth, epoch 230
torch.Size([1, 3, 448, 640])
/opt/conda/conda-bld/pytorch_1587428398394/work/aten/src/ATen/native/BinaryOps.cpp:81: UserWarning: Integer division of tensors using div or / is deprecated, and in a future release div will perform true division as in Python 3. Use true_divide or floor_divide (// in Python) instead.
tot 0.223s |load 0.005s |pre 0.010s |net 0.206s |dec 0.001s |post 0.002s |merge 0.000s 
torch.Size([1, 3, 448, 640])
tot 0.079s |load 0.026s |pre 0.021s |net 0.029s |dec 0.001s |post 0.002s |merge 0.000s |
torch.Size([1, 3, 448, 640])
tot 0.074s |load 0.023s |pre 0.018s |net 0.029s |dec 0.002s |post 0.002s |merge 0.000s |
torch.Size([1, 3, 448, 640])
tot 0.044s |load 0.005s |pre 0.007s |net 0.029s |dec 0.001s |post 0.002s |merge 0.000s |
torch.Size([1, 3, 448, 640])
tot 0.043s |load 0.005s |pre 0.006s |net 0.029s |dec 0.001s |post 0.002s |merge 0.000s |
torch.Size([1, 3, 448, 640])
tot 0.044s |load 0.006s |pre 0.006s |net 0.029s |dec 0.001s |post 0.002s |merge 0.000s |
torch.Size([1, 3, 448, 640])
tot 0.042s |load 0.004s |pre 0.007s |net 0.028s |dec 0.001s |post 0.002s |merge 0.000s |
torch.Size([1, 3, 448, 640])
tot 0.047s |load 0.006s |pre 0.007s |net 0.032s |dec 0.001s |post 0.002s |merge 0.000s |
torch.Size([1, 3, 448, 640])
tot 0.043s |load 0.009s |pre 0.008s |net 0.024s |dec 0.001s |post 0.002s |merge 0.000s |
torch.Size([1, 3, 448, 640])
tot 0.040s |load 0.006s |pre 0.007s |net 0.024s |dec 0.001s |post 0.002s |merge 0.000s |
複製代碼

2. 最大邊放縮 1280

運行:~/CenterNet/src$

python demo.py ctdet --demo  ../data/samples --load_model ../models/ctdet_coco_dla_2x.pth  --input_h 896 --input_w 1280
複製代碼

  結果:模型平均耗時 78ms,總體平均耗時 115ms

loaded ../models/ctdet_coco_dla_2x.pth, epoch 230
torch.Size([1, 3, 896, 1280])
/opt/conda/conda-bld/pytorch_1587428398394/work/aten/src/ATen/native/BinaryOps.cpp:81: UserWarning: Integer division of tensors using div or / is deprecated, and in a future release div will perform true division as in Python 3. Use true_divide or floor_divide (// in Python) instead.
tot 0.302s |load 0.005s |pre 0.039s |net 0.254s |dec 0.003s |post 0.002s |merge 0.000s |
torch.Size([1, 3, 896, 1280])
tot 0.137s |load 0.007s |pre 0.041s |net 0.085s |dec 0.002s |post 0.002s |merge 0.000s |
torch.Size([1, 3, 896, 1280])
tot 0.114s |load 0.005s |pre 0.027s |net 0.077s |dec 0.002s |post 0.002s |merge 0.000s |
torch.Size([1, 3, 896, 1280])
tot 0.160s |load 0.023s |pre 0.057s |net 0.076s |dec 0.002s |post 0.002s |merge 0.000s |
torch.Size([1, 3, 896, 1280])
tot 0.114s |load 0.004s |pre 0.025s |net 0.080s |dec 0.002s |post 0.002s |merge 0.000s |
torch.Size([1, 3, 896, 1280])
tot 0.113s |load 0.006s |pre 0.026s |net 0.076s |dec 0.002s |post 0.002s |merge 0.000s |
torch.Size([1, 3, 896, 1280])
tot 0.111s |load 0.004s |pre 0.025s |net 0.077s |dec 0.002s |post 0.002s |merge 0.000s |
torch.Size([1, 3, 896, 1280])
tot 0.126s |load 0.014s |pre 0.029s |net 0.079s |dec 0.002s |post 0.002s |merge 0.000s |
torch.Size([1, 3, 896, 1280])
tot 0.125s |load 0.010s |pre 0.031s |net 0.080s |dec 0.002s |post 0.002s |merge 0.000s |
torch.Size([1, 3, 896, 1280])
tot 0.114s |load 0.006s |pre 0.026s |net 0.078s |dec 0.002s |post 0.002s |merge 0.000s |
複製代碼

4. 實驗總結

CenterNet vs YoloV3 速度 模型推理耗時

模型 \ 尺寸 320 512 800 1024
YoloV3-spp-ultralytics 12ms 20ms 40ms 50ms
CenterNet-DLA-34 16ms 20ms 41ms 53ms

CenterNet vs YoloV3 速度 模型推理 / 總體耗時                                                        

模型 \ 尺寸 640 模型 640 總體 1280 模型 1280 總體
YoloV3-spp-ultralytics 26ms 64ms 77ms 124ms
CenterNet-DLA-34 27ms 50ms 78ms 115ms

CenterNet vs YoloV3 速度 模型大小 / 內存消耗  

模型 \ 資源 模型體積 1280 尺寸內存佔用
YoloV3-spp-ultralytics 252.3 MB (252,297,867 字節) 1.7G
CenterNet-DLA-34 80.9 MB (80,911,783 字節) 1.2G

速度與資源依據,筆者親測結果,歡迎復現質疑

 
模型 \ 尺寸 512
YoloV3 32.7 map
YoloV3-spp 35.6 map
YoloV3-spp-ultralytics 42.6 map
CenterNet-DLA-34 37.4 map

精度依據:

1.github.com/ultralytics…

2.github.com/xingyizhou/…

結論以下:

關於筆者的質疑部分 「筆者對 CenterNet 針對 YoloV3 速度的提高仍是有些懷疑的」,實驗結果部分證實筆者懷疑的正確性。

單純看模型推理速度方面,CenterNet-DLA-34 在不一樣尺度下均比 YoloV3-spp 版本耗時增長一些(1%-3%)與論文略有不符。可是若是將處理時間也考慮進去,CenterNet-DLA-34 在不一樣尺度下均比 YoloV3-spp 版本耗時減小仍是很明顯的,約有 5%-10% 的提速。

在模型大小與內存佔用方面,CenterNet-DLA-34 效果較與 YoloV3-spp 版本提高仍是比較明顯,體積降低爲 YoloV3-spp 版本的 25% 左右,推理 GPU 內存佔用也降低爲 70% 左右,考慮這是 Anchor Free 方法帶來的優點。

從表格 CenterNet vs YoloV3x coco 精度 中能夠看出在相同尺度下,CenterNet 相較於 YoloV3 原版提高比較明顯 5 個百分點,相較於 YoloV3-spp 也有 2 個百分點提高 ,可是相較於 YoloV3-spp-ultralytics(U 版 YoloV3-spp),仍是有 5 個百分點的不足。固然這個前提是這些數據準確可靠的,我傾向於相信這個結果,但沒法對此結果負責。總結一下:CenterNet 相較於 YoloV3 原版提高比較明顯,可是針對改進 YoloV3-spp 提高不明顯,也低於 U 版 YoloV3-spp。

總結以下,CenterNet 不失爲開創性的工做,統一了關鍵點與目標檢測的流程,結構簡單,使用便捷,筆者很是喜好這個網絡,把它應用到實際場景之中,速度精度較 YoloV3 乃至 YoloV3-spp 均有提高,除了部署難度會稍微大些(主要是 DCN 目前推理框架支持不友好,可是也是有解決方法的)。

CenterNet 憑藉**結構簡單,使用便捷,速度快精度高,佔用內存少等優勢,**是能夠替換 YoloV3,具有必定優點。雖然 YoloV4 也出來了,筆者以爲,可是 YoloV4 在精度提高的同時,總體的複雜程度模型耗時也增長一些,YoloV4 徹底替換 YoloV3,並不現實(讀者若是對 YoloV4 對比 YoloV3 效果感興趣,能夠評論說出來,若是感興趣朋友多,筆者能夠更新一篇)。

最後到了本文的結論,CenterNet 相較於 YoloV3 優點很明顯,推薦嘗試替換

相關文章
相關標籤/搜索