[譯] 如何在 Keras 中用 YOLOv3 進行對象檢測

時間 2019-11-06

標籤如何 keras 中用 yolov3 yolov 進行對象檢測简体版

原文原文鏈接

原文地址：How to Perform Object Detection With YOLOv3 in Keras

原文做者：Jason Brownlee

譯文出自：掘金翻譯計劃

本文永久連接：github.com/xitu/gold-m…

譯者：Daltan

校對者：lsvih, zhmhhu

如何在 Keras 中用 YOLOv3 進行對象檢測html

對象檢測是計算機視覺的一項任務，涉及對給定圖像識別一個或多個對象的存在性、位置、類型等屬性。前端

然而，如何找到合適的方法來解決對象識別（它們在哪）、對象定位（其程度如何）、對象分類（它們是什麼）的問題，是一項具備挑戰性的任務。android

多年來，在諸如標準基準數據集和計算機視覺競賽領域等對象識別方法等方面，深度學習技術取得了先進成果。其中值得關注的是 YOLO（You Only Look Once），這是一種卷積神經網絡系列算法，經過單一端到端模型實時進行對象檢測，取得了幾乎是最早進的結果。ios

本教程教你如何創建 YOLOv3 模型，並在新圖像上進行對象檢測。git

學完本教程，你會知道：github

用於對象檢測的、基於卷積神經網絡系列模型的 YOLO 算法，和其最新變種 YOLOv3。
使用 Keras 深度學習庫的 YOLOv3 開源庫的最佳實現。
如何使用預處理過的 YOLOv3，來對新圖像進行對象定位和檢測。

咱們開始吧。算法

How to Perform Object Detection With YOLOv3 in Keras

如何在 Keras 中用 YOLOv3 進行對象檢測 David Berkowitz 圖，部分權利保留。後端

教程概覽

本教程分爲三個部分，分別是：api

用於對象檢測的 YOLO
Experiencor 的 YOLO3 項目
用 YOLOv3 進行對象檢測

用於對象檢測的 YOLO

對象檢測是計算機視覺的任務，不只涉及在單圖像中對一個或多個對象定位，還涉及在該圖像中對每一個對象進行分類。數組

對象檢測這項富有挑戰性的計算機視覺任務，不只須要在圖像中成功定位對象、找到每一個對象並對其繪製邊框，還須要對定位好的對象進行正確的分類。

YOLO（You Only Look Once）是一系列端到端的深度學習系列模型，用於快速對象檢測，由 Joseph Redmon 等人於 2015 年的論文《You Only Look Once：統一實時對象檢測》中首次闡述。

該方法涉及單個深度卷積神經網絡（最初是 GoogLeNet 的一個版本，後來更新了，稱爲基於 VGG 的 DarkNet），將輸入分紅單元網格，每一個格直接預測邊框和對象分類。獲得的結果是，大量的候選邊界框經過後處理步驟合併到最終預測中。

在寫本文時有三種主要變體：YOLOv一、YOLOv二、YOLOv3。第一個版本提出了通用架構，而第二個版本則改進了設計，並使用了預約義的錨定框來改進邊界框方案，第三個版本進一步完善模型架構和訓練過程。

雖然模型的準確性略遜於基於區域的卷積神經網絡（R-CNN），但因爲 YOLO 模型的檢測速度快，所以在對象檢測中很受歡迎，一般能夠在視頻或攝像機的輸入上實時顯示檢測結果。

在一次評估中，單個神經網絡直接從完整圖像預測邊界框和類別機率。因爲整個檢測管道是一個單一的網絡，所以能夠直接對檢測性能進行端到端優化。

— You Only Look Once: Unified, Real-Time Object Detection, 2015.

本教程專一於使用 YOLOv3。

在 Keras 項目中實踐 YOLO3

每一個版本的 YOLO 源代碼以及預先訓練過的模型均可如下載獲得。

官方倉庫 DarkNet GitHub 中，包含了論文中提到的 YOLO 版本的源代碼，是用 C 語言編寫的。該倉庫還提供了分步使用教程，來教授如何用代碼進行對象檢測。

從頭開始實現這個模型確實頗有挑戰性，特別是對新手來講，由於須要開發不少自定義的模型元素，來進行訓練和預測。例如，即便是直接使用預先訓練過的模型，也須要複雜的代碼來提取和解釋模型輸出的預測邊界框。

咱們可使用第三方實現過的代碼，而不是從頭開始寫代碼。有許多第三方實現是爲了在 Keras 中使用 YOLO 而設計的，但沒有一個實現是標準化了並設計爲庫來使用的。

YAD2K 項目是事實意義上的 YOLOv2 標準，它提供了將預先訓練的權重轉換爲 Keras 格式的腳本，使用預先訓練的模型進行預測，並提供提取解釋預測邊界框所需的代碼。許多其餘第三方開發人員已將此代碼用做起點，並對其進行了更新以支持 YOLOv3。

使用預訓練的 YOLO 模型最普遍使用的項目可能就是「keras-yolo3：使用 YOLO3 訓練和檢測物體」了，該項目由 Huynh Ngoc Anh 開發，也可稱他爲 Experiencor。該項目中的代碼已在 MIT 開源許可下提供。與 YAD2K 同樣，該項目提供了可用於加載和使用預訓練的 YOLO 模型的腳本，也可在新數據集上開發基於 YOLOv3 的遷移學習模型。

Experiencor 還有一個 keras-yolo2 項目，裏面的代碼和 YOLOv2 很像，也有詳細教程教你如何使用這個倉庫的代碼。keras-yolo3 彷佛是這個項目的更新版。

有意思的是，Experiencor 以這個模型爲基礎作了些實驗，在諸如袋鼠數據集、racoon 數據集、紅細胞檢測等等標準對象檢測問題上，訓練了 YOOLOv3 的多種版本。他列出了模型表現結果，還給出了模型權重以供下載，甚至還發布了展現模型表現結果的 YouTube 視頻。好比：

Raccoon Detection using YOLO 3

本教程以 Experiencor 的 keras-yolo3 項目爲基礎，使用 YOLOv3 進行對象檢測。

這裏是創做本文時的代碼分支，以防倉庫發生變化或被刪除（這在第三方開源項目中可能會發生）。

用YOLOv3進行對象檢測

keras-yolo3 項目提供了不少使用 YOLOv3 的模型，包括對象檢測、遷移學習、從頭開始訓練模型等。

本節使用預訓練模型對未見圖像進行對象檢測。用一個該倉庫的 Python 文件就能實現這個功能，文件名是 yolo3_one_file_to_detect_them_all.py，有 435 行。該腳本實際上是用預訓練權重準備模型，再用此模型進行對象檢測，最後輸出一個模型。此外，該腳本依賴 OpenCV。

咱們不直接使用該程序，而是用該程序中的元素構建本身的腳本，先準備並保存 Keras YOLOv3 模型，而後加載並對新圖像進行預測。

建立並保存模型

第一步是下載預訓練的模型權重。

下面是基於 MSCOCO 數據集、使用 DarNet 代碼訓練好的模型。下載模型權重，並置之於當前工做路徑，重命名爲 yolov3.weights。文件很大，下載下來可能須要一會，速度跟你的網絡有關。

YOLOv3 Pre-trained Model Weights (yolov3.weights) (237 MB)

下一步是定義一個 Keras 模型，確保模型中層的數量和類型與下載的模型權重相匹配。模型構架稱爲 DarkNet ，最初基本上是基於 VGG-16 模型的。

腳本文件 yolo3_one_file_to_detect_them_all.py 提供了 make_yolov3_model() 函數，用來建立模型，還有輔助函數 _conv_block()，用來建立層塊。兩個函數都能從該腳本中複製。

如今定義 YOLOv3 的 Keras 模型。

# define the model
model  =  make_yolov3_model()
複製代碼

接下來載入模型權重。DarkNet 用的權重存儲形式不重要，咱們也無需手動解碼，用腳本中的 WeightReader 類就能夠。

要想用 WeightReader，先得把權重文件（好比 yolov3.weights）的路徑實例化。下面的代碼將解析文件並將模型權重加載到內存中，這樣其格式能夠在 Keras 模型中使用了。

# load the model weights
weight_reader  =  WeightReader('yolov3.weights')
複製代碼

而後調用 WeightReader 實例的 load_weights() 函數，傳遞定義的 Keras 模型，將權重設置到圖層中。

# set the model weights into the model
weight_reader.load_weights(model)
複製代碼

代碼如上。如今就有 YOLOv3 模型能夠用了。

將此模型保存爲 Keras 兼容的 .h5 模型文件，以備待用。

# save the model to file
model.save('model.h5')
複製代碼

將以上這些連在一塊兒。代碼都是從 yolo3_one_file_to_detect_them_all.py 複製過來的，包括函數的完整代碼以下。

# create a YOLOv3 Keras model and save it to file
# based on https://github.com/experiencor/keras-yolo3
import struct
import numpy as np
from keras.layers import Conv2D
from keras.layers import Input
from keras.layers import BatchNormalization
from keras.layers import LeakyReLU
from keras.layers import ZeroPadding2D
from keras.layers import UpSampling2D
from keras.layers.merge import add, concatenate
from keras.models import Model

def _conv_block(inp, convs, skip=True):
	x = inp
	count = 0
	for conv in convs:
		if count == (len(convs) - 2) and skip:
			skip_connection = x
		count += 1
		if conv['stride'] > 1: x = ZeroPadding2D(((1,0),(1,0)))(x) # peculiar padding as darknet prefer left and top
		x = Conv2D(conv['filter'],
				   conv['kernel'],
				   strides=conv['stride'],
				   padding='valid' if conv['stride'] > 1 else 'same', # peculiar padding as darknet prefer left and top
				   name='conv_' + str(conv['layer_idx']),
				   use_bias=False if conv['bnorm'] else True)(x)
		if conv['bnorm']: x = BatchNormalization(epsilon=0.001, name='bnorm_' + str(conv['layer_idx']))(x)
		if conv['leaky']: x = LeakyReLU(alpha=0.1, name='leaky_' + str(conv['layer_idx']))(x)
	return add([skip_connection, x]) if skip else x

def make_yolov3_model():
	input_image = Input(shape=(None, None, 3))
	# Layer 0 => 4
	x = _conv_block(input_image, [{'filter': 32, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 0},
								  {'filter': 64, 'kernel': 3, 'stride': 2, 'bnorm': True, 'leaky': True, 'layer_idx': 1},
								  {'filter': 32, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 2},
								  {'filter': 64, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 3}])
	# Layer 5 => 8
	x = _conv_block(x, [{'filter': 128, 'kernel': 3, 'stride': 2, 'bnorm': True, 'leaky': True, 'layer_idx': 5},
						{'filter':  64, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 6},
						{'filter': 128, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 7}])
	# Layer 9 => 11
	x = _conv_block(x, [{'filter':  64, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 9},
						{'filter': 128, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 10}])
	# Layer 12 => 15
	x = _conv_block(x, [{'filter': 256, 'kernel': 3, 'stride': 2, 'bnorm': True, 'leaky': True, 'layer_idx': 12},
						{'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 13},
						{'filter': 256, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 14}])
	# Layer 16 => 36
	for i in range(7):
		x = _conv_block(x, [{'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 16+i*3},
							{'filter': 256, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 17+i*3}])
	skip_36 = x
	# Layer 37 => 40
	x = _conv_block(x, [{'filter': 512, 'kernel': 3, 'stride': 2, 'bnorm': True, 'leaky': True, 'layer_idx': 37},
						{'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 38},
						{'filter': 512, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 39}])
	# Layer 41 => 61
	for i in range(7):
		x = _conv_block(x, [{'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 41+i*3},
							{'filter': 512, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 42+i*3}])
	skip_61 = x
	# Layer 62 => 65
	x = _conv_block(x, [{'filter': 1024, 'kernel': 3, 'stride': 2, 'bnorm': True, 'leaky': True, 'layer_idx': 62},
						{'filter':  512, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 63},
						{'filter': 1024, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 64}])
	# Layer 66 => 74
	for i in range(3):
		x = _conv_block(x, [{'filter':  512, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 66+i*3},
							{'filter': 1024, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 67+i*3}])
	# Layer 75 => 79
	x = _conv_block(x, [{'filter':  512, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 75},
						{'filter': 1024, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 76},
						{'filter':  512, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 77},
						{'filter': 1024, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 78},
						{'filter':  512, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 79}], skip=False)
	# Layer 80 => 82
	yolo_82 = _conv_block(x, [{'filter': 1024, 'kernel': 3, 'stride': 1, 'bnorm': True,  'leaky': True,  'layer_idx': 80},
							  {'filter':  255, 'kernel': 1, 'stride': 1, 'bnorm': False, 'leaky': False, 'layer_idx': 81}], skip=False)
	# Layer 83 => 86
	x = _conv_block(x, [{'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 84}], skip=False)
	x = UpSampling2D(2)(x)
	x = concatenate([x, skip_61])
	# Layer 87 => 91
	x = _conv_block(x, [{'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 87},
						{'filter': 512, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 88},
						{'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 89},
						{'filter': 512, 'kernel': 3, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 90},
						{'filter': 256, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True, 'layer_idx': 91}], skip=False)
	# Layer 92 => 94
	yolo_94 = _conv_block(x, [{'filter': 512, 'kernel': 3, 'stride': 1, 'bnorm': True,  'leaky': True,  'layer_idx': 92},
							  {'filter': 255, 'kernel': 1, 'stride': 1, 'bnorm': False, 'leaky': False, 'layer_idx': 93}], skip=False)
	# Layer 95 => 98
	x = _conv_block(x, [{'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True, 'leaky': True,   'layer_idx': 96}], skip=False)
	x = UpSampling2D(2)(x)
	x = concatenate([x, skip_36])
	# Layer 99 => 106
	yolo_106 = _conv_block(x, [{'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True,  'leaky': True,  'layer_idx': 99},
							   {'filter': 256, 'kernel': 3, 'stride': 1, 'bnorm': True,  'leaky': True,  'layer_idx': 100},
							   {'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True,  'leaky': True,  'layer_idx': 101},
							   {'filter': 256, 'kernel': 3, 'stride': 1, 'bnorm': True,  'leaky': True,  'layer_idx': 102},
							   {'filter': 128, 'kernel': 1, 'stride': 1, 'bnorm': True,  'leaky': True,  'layer_idx': 103},
							   {'filter': 256, 'kernel': 3, 'stride': 1, 'bnorm': True,  'leaky': True,  'layer_idx': 104},
							   {'filter': 255, 'kernel': 1, 'stride': 1, 'bnorm': False, 'leaky': False, 'layer_idx': 105}], skip=False)
	model = Model(input_image, [yolo_82, yolo_94, yolo_106])
	return model

class WeightReader:
	def __init__(self, weight_file):
		with open(weight_file, 'rb') as w_f:
			major,	= struct.unpack('i', w_f.read(4))
			minor,	= struct.unpack('i', w_f.read(4))
			revision, = struct.unpack('i', w_f.read(4))
			if (major*10 + minor) >= 2 and major < 1000 and minor < 1000:
				w_f.read(8)
			else:
				w_f.read(4)
			transpose = (major > 1000) or (minor > 1000)
			binary = w_f.read()
		self.offset = 0
		self.all_weights = np.frombuffer(binary, dtype='float32')

	def read_bytes(self, size):
		self.offset = self.offset + size
		return self.all_weights[self.offset-size:self.offset]

	def load_weights(self, model):
		for i in range(106):
			try:
				conv_layer = model.get_layer('conv_' + str(i))
				print("loading weights of convolution #" + str(i))
				if i not in [81, 93, 105]:
					norm_layer = model.get_layer('bnorm_' + str(i))
					size = np.prod(norm_layer.get_weights()[0].shape)
					beta  = self.read_bytes(size) # bias
					gamma = self.read_bytes(size) # scale
					mean  = self.read_bytes(size) # mean
					var   = self.read_bytes(size) # variance
					weights = norm_layer.set_weights([gamma, beta, mean, var])
				if len(conv_layer.get_weights()) > 1:
					bias   = self.read_bytes(np.prod(conv_layer.get_weights()[1].shape))
					kernel = self.read_bytes(np.prod(conv_layer.get_weights()[0].shape))
					kernel = kernel.reshape(list(reversed(conv_layer.get_weights()[0].shape)))
					kernel = kernel.transpose([2,3,1,0])
					conv_layer.set_weights([kernel, bias])
				else:
					kernel = self.read_bytes(np.prod(conv_layer.get_weights()[0].shape))
					kernel = kernel.reshape(list(reversed(conv_layer.get_weights()[0].shape)))
					kernel = kernel.transpose([2,3,1,0])
					conv_layer.set_weights([kernel])
			except ValueError:
				print("no convolution #" + str(i))

	def reset(self):
		self.offset = 0

# define the model
model = make_yolov3_model()
# load the model weights
weight_reader = WeightReader('yolov3.weights')
# set the model weights into the model
weight_reader.load_weights(model)
# save the model to file
model.save('model.h5')
複製代碼

在現代的硬件設備中運行此示例代碼，可能只須要不到一分鐘的時間。

當權重文件加載後，你能夠看到由 WeightReader 類輸出的調試信息報告。

...
loading weights of convolution #99
loading weights of convolution #100
loading weights of convolution #101
loading weights of convolution #102
loading weights of convolution #103
loading weights of convolution #104
loading weights of convolution #105
複製代碼

運行結束時，當前工做路徑下保存了 model.h5 文件，大小接近原始權重文件（237MB），可是能夠像 Keras 模型同樣能夠加載該文件並直接使用。

作預測

咱們須要一張用於對象檢測的新照片，理想狀況下圖片中的對象是咱們知道的模型從 MSCOCO數據集可識別的對象。

這裏使用一張三匹斑馬的圖片，是 Boegh 在旅行時拍攝的，且帶有發佈許可。

三匹斑馬圖片
Boegh 攝，部分權利保留。

三匹斑馬圖片（zebra.jpg）

下載這張圖片，放在當前工做路徑，命名爲 zebra.jpg 。

儘管解釋預測結果須要一些工做，但作出預測是直截了當的。

第一步是加載 Keras 模型，這多是作預測過程當中最慢的一步了。

# load yolov3 model
model  =  load_model('model.h5')
複製代碼

接下來要加載新的圖像，並將其整理成適合做爲模型輸入的形式。模型想要的輸入形式是 416×416 正方形的彩色圖片。

使用 load_img() Keras 函數加載圖像，target_size 參數的做用是加載圖片後調整圖像的大小。也能夠用 img_to_array() 函數將加載的 PIL 圖像對象轉換成 Numpy 數組，而後從新調整像素值，使其從 0-255 調整到 0-1 的 32 位浮點值。

# load the image with the required size
image = load_img('zebra.jpg', target_size=(416, 416))
# convert to numpy array
image = img_to_array(image)
# scale pixel values to [0, 1]
image = image.astype('float32')
image /= 255.0
複製代碼

咱們但願稍後再次顯示原始照片，這意味着咱們須要將全部檢測到的對象的邊界框從方形形狀縮放回原始形狀。這樣，咱們就能夠加載圖片並恢復原始形狀了。

load the image to get its shape
image  =  load_img('zebra.jpg')
width,  height  =  image.size
複製代碼

以上步驟能夠都連在一塊兒，寫成 load_image_pixels() 函數，方便使用。該函數的輸入是文件名、目標尺寸，返回的是縮放過的像素數據，這些數據可做爲 Keras 模型的輸入，還返回原始圖像的寬度和高度。

# load and prepare an image
def load_image_pixels(filename, shape):
    # load the image to get its shape
    image = load_img(filename)
    width, height = image.size
    # load the image with the required size
    image = load_img(filename, target_size=shape)
    # convert to numpy array
    image = img_to_array(image)
    # scale pixel values to [0, 1]
    image = image.astype('float32')
    image /= 255.0
    # add a dimension so that we have one sample
    image = expand_dims(image, 0)
    return image, width, height
複製代碼

而後調用該函數，加載斑馬圖。

# define the expected input shape for the model
input_w, input_h = 416, 416
# define our new photo
photo_filename = 'zebra.jpg'
# load and prepare image
image, image_w, image_h = load_image_pixels(photo_filename, (input_w, input_h))
複製代碼

將該圖片給 Keras 模型作輸入，進行預測。

# make prediction
yhat = model.predict(image)
# summarize the shape of the list of arrays
print([a.shape for a in yhat])
複製代碼

以上就是作預測自己的過程。完整示例以下。

# load yolov3 model and perform object detection
# based on https://github.com/experiencor/keras-yolo3
from numpy import expand_dims
from keras.models import load_model
from keras.preprocessing.image import load_img
from keras.preprocessing.image import img_to_array

# load and prepare an image
def load_image_pixels(filename, shape):
    # load the image to get its shape
    image = load_img(filename)
    width, height = image.size
    # load the image with the required size
    image = load_img(filename, target_size=shape)
    # convert to numpy array
    image = img_to_array(image)
    # scale pixel values to [0, 1]
    image = image.astype('float32')
    image /= 255.0
    # add a dimension so that we have one sample
    image = expand_dims(image, 0)
    return image, width, height

# load yolov3 model
model = load_model('model.h5')
# define the expected input shape for the model
input_w, input_h = 416, 416
# define our new photo
photo_filename = 'zebra.jpg'
# load and prepare image
image, image_w, image_h = load_image_pixels(photo_filename, (input_w, input_h))
# make prediction
yhat = model.predict(image)
# summarize the shape of the list of arrays
print([a.shape for a in yhat])
複製代碼

示例代碼返回有三個 Numpy 數組的列表，其形狀做爲輸出展示出來。

這些數據既預測了邊框，又預測了標籤的種類，可是是編碼過的。這些結果須要解釋一下才行。

[(1, 13, 13, 255), (1, 26, 26, 255), (1, 52, 52, 255)]
複製代碼

作出預測與解釋結果

實際上模型的輸出是編碼過的候選邊框，這些候選邊框來源於三種不一樣大小的網格，框自己是由錨框的情境定義的，由基於在 MSCOCO 數據集中對對象尺寸的分析，仔細選擇得來的。

由 experincor 提供的腳本中有一個 decode_netout() 函數，能夠一次一個取每一個 Numpy 數組，將候選邊框和預測的分類解碼。此外，全部不能有足夠把握（好比機率低於某個閾值）描述對象的邊框都將被忽略掉。此處使用 60% 或 0.6 的機率閾值。該函數返回 BoundBox 的實例列表，這個實例定義了每一個邊界框的角。這些邊界框表明了輸入圖像的形狀和類別機率。

# define the anchors
anchors = [[116,90, 156,198, 373,326], [30,61, 62,45, 59,119], [10,13, 16,30, 33,23]]
# define the probability threshold for detected objects
class_threshold = 0.6
boxes = list()
for i in range(len(yhat)):
	# decode the output of the network
	boxes += decode_netout(yhat[i][0], anchors[i], class_threshold, input_h, input_w)
複製代碼

接下來要將邊框拉伸至原來圖像的形狀。這一步頗有用，由於這意味着稍後咱們能夠繪製原始圖像並繪製邊界框，但願可以檢測到真實對象。

由 Experiencor 提供的腳本中有 correct_yolo_boxes() 函數，能夠轉換邊框座標，把邊界框列表、一開始加載的圖片的原始形狀以及網絡中輸入的形狀做爲參數。邊界框的座標直接更新：

# correct the sizes of the bounding boxes for the shape of the image
correct _yolo_boxes(boxes,  image_h,  image_w,  input_h,  input_w)
複製代碼

模型預測了許多邊框，大多數框是同一對象。可篩選邊框列表，將那些重疊的、指向統一對象的框都合併。可將重疊數量定義爲配置參數，此處是50%或0.5 。這一篩選步驟的條件並非最嚴格的，並且須要更多後處理步驟。

該腳本經過 do_nms() 實現這一點，該函數的參數是邊框列表和閾值。該函數整理的不是重疊的邊框，而是重疊類的預測機率。這樣若是檢測到另外的對象類型，邊框仍還可用。

# suppress non-maximal boxes
do_nms(boxes,  0.5)
複製代碼

這樣留下的邊框數量就同樣了，但只有少數有用。咱們只能檢索那些強烈預測對象存在的邊框：超過 60% 的置信率。這能夠經過遍歷全部框並檢查類預測值來實現。而後，咱們能夠查找該框的相應類標籤並將其添加到列表中。每一個邊框須要跟每一個類標籤一一覈對，以防同一個框強烈預測多個對象。

建立一個 get_boxes() 函數實現這一步，將邊框列表、已知標籤、分類閾值做爲參數，將對應的邊框列表、標籤、和評分當作返回值。

# get all of the results above a threshold
def get_boxes(boxes, labels, thresh):
	v_boxes, v_labels, v_scores = list(), list(), list()
	# enumerate all boxes
	for box in boxes:
		# enumerate all possible labels
		for i in range(len(labels)):
			# check if the threshold for this label is high enough
			if box.classes[i] > thresh:
				v_boxes.append(box)
				v_labels.append(labels[i])
				v_scores.append(box.classes[i]*100)
				# don't break, many labels may trigger for one box
	return v_boxes, v_labels, v_scores
複製代碼

用邊框列表當作參數調用該函數。

咱們還須要一個字符串列表，其中包含模型中已知的類標籤，順序要和訓練模型時候的順序保持一致，特別是 MSCOCO 數據集中的類標籤。值得慶幸的是，這些在 Experiencor 的腳本中也提供。

# define the labels
labels = ["person", "bicycle", "car", "motorbike", "aeroplane", "bus", "train", "truck",
    "boat", "traffic light", "fire hydrant", "stop sign", "parking meter", "bench",
    "bird", "cat", "dog", "horse", "sheep", "cow", "elephant", "bear", "zebra", "giraffe",
    "backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee", "skis", "snowboard",
    "sports ball", "kite", "baseball bat", "baseball glove", "skateboard", "surfboard",
    "tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana",
    "apple", "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake",
    "chair", "sofa", "pottedplant", "bed", "diningtable", "toilet", "tvmonitor", "laptop", "mouse",
    "remote", "keyboard", "cell phone", "microwave", "oven", "toaster", "sink", "refrigerator",
    "book", "clock", "vase", "scissors", "teddy bear", "hair drier", "toothbrush"]
# get the details of the detected objects
v_boxes, v_labels, v_scores = get_boxes(boxes, labels, class_threshold)
複製代碼

如今有了預測對象較強的少數邊框，能夠對它們作個總結。

# summarize what we found
for i in range(len(v_boxes)):
    print(v_labels[i], v_scores[i])
複製代碼

咱們還能夠繪製原始照片並在每一個檢測到的物體周圍繪製邊界框。這能夠經過從每一個邊界框檢索座標並建立 Rectangle 對象來實現。

box = v_boxes[i]
# get coordinates
y1, x1, y2, x2 = box.ymin, box.xmin, box.ymax, box.xmax
# calculate width and height of the box
width, height = x2 - x1, y2 - y1
# create the shape
rect = Rectangle((x1, y1), width, height, fill=False, color='white')
# draw the box
ax.add_patch(rect)
複製代碼

也能夠用類標籤和置信度以字符串形式繪製出來。

# draw text and score in top left corner
label = "%s (%.3f)" % (v_labels[i], v_scores[i])
pyplot.text(x1, y1, label, color='white')
複製代碼

下面的 draw_boxes() 函數實現了這一點，獲取原始照片的文件名、對應邊框列表、標籤、評分，繪製出檢測到的全部對象。

# draw all results
def draw_boxes(filename, v_boxes, v_labels, v_scores):
	# load the image
	data = pyplot.imread(filename)
	# plot the image
	pyplot.imshow(data)
	# get the context for drawing boxes
	ax = pyplot.gca()
	# plot each box
	for i in range(len(v_boxes)):
		box = v_boxes[i]
		# get coordinates
		y1, x1, y2, x2 = box.ymin, box.xmin, box.ymax, box.xmax
		# calculate width and height of the box
		width, height = x2 - x1, y2 - y1
		# create the shape
		rect = Rectangle((x1, y1), width, height, fill=False, color='white')
		# draw the box
		ax.add_patch(rect)
		# draw text and score in top left corner
		label = "%s (%.3f)" % (v_labels[i], v_scores[i])
		pyplot.text(x1, y1, label, color='white')
	# show the plot
	pyplot.show()
複製代碼

而後調用該函數，繪製最終結果。

# draw what we found
draw_boxes(photo_filename, v_boxes, v_labels, v_scores)
複製代碼

使用 YOLOv3 模型作預測所要的全部元素，如今都有了。解釋結果，並繪製出來以供審查。

下面列出了完整代碼清單，包括原始和修改過的 xperiencor 腳本。

# load yolov3 model and perform object detection
# based on https://github.com/experiencor/keras-yolo3
import numpy as np
from numpy import expand_dims
from keras.models import load_model
from keras.preprocessing.image import load_img
from keras.preprocessing.image import img_to_array
from matplotlib import pyplot
from matplotlib.patches import Rectangle

class BoundBox:
	def __init__(self, xmin, ymin, xmax, ymax, objness = None, classes = None):
		self.xmin = xmin
		self.ymin = ymin
		self.xmax = xmax
		self.ymax = ymax
		self.objness = objness
		self.classes = classes
		self.label = -1
		self.score = -1

	def get_label(self):
		if self.label == -1:
			self.label = np.argmax(self.classes)

		return self.label

	def get_score(self):
		if self.score == -1:
			self.score = self.classes[self.get_label()]

		return self.score

def _sigmoid(x):
	return 1. / (1. + np.exp(-x))

def decode_netout(netout, anchors, obj_thresh, net_h, net_w):
	grid_h, grid_w = netout.shape[:2]
	nb_box = 3
	netout = netout.reshape((grid_h, grid_w, nb_box, -1))
	nb_class = netout.shape[-1] - 5
	boxes = []
	netout[..., :2]  = _sigmoid(netout[..., :2])
	netout[..., 4:]  = _sigmoid(netout[..., 4:])
	netout[..., 5:]  = netout[..., 4][..., np.newaxis] * netout[..., 5:]
	netout[..., 5:] *= netout[..., 5:] > obj_thresh

	for i in range(grid_h*grid_w):
		row = i / grid_w
		col = i % grid_w
		for b in range(nb_box):
			# 4th element is objectness score
			objectness = netout[int(row)][int(col)][b][4]
			if(objectness.all() <= obj_thresh): continue
			# first 4 elements are x, y, w, and h
			x, y, w, h = netout[int(row)][int(col)][b][:4]
			x = (col + x) / grid_w # center position, unit: image width
			y = (row + y) / grid_h # center position, unit: image height
			w = anchors[2 * b + 0] * np.exp(w) / net_w # unit: image width
			h = anchors[2 * b + 1] * np.exp(h) / net_h # unit: image height
			# last elements are class probabilities
			classes = netout[int(row)][col][b][5:]
			box = BoundBox(x-w/2, y-h/2, x+w/2, y+h/2, objectness, classes)
			boxes.append(box)
	return boxes

def correct_yolo_boxes(boxes, image_h, image_w, net_h, net_w):
	new_w, new_h = net_w, net_h
	for i in range(len(boxes)):
		x_offset, x_scale = (net_w - new_w)/2./net_w, float(new_w)/net_w
		y_offset, y_scale = (net_h - new_h)/2./net_h, float(new_h)/net_h
		boxes[i].xmin = int((boxes[i].xmin - x_offset) / x_scale * image_w)
		boxes[i].xmax = int((boxes[i].xmax - x_offset) / x_scale * image_w)
		boxes[i].ymin = int((boxes[i].ymin - y_offset) / y_scale * image_h)
		boxes[i].ymax = int((boxes[i].ymax - y_offset) / y_scale * image_h)

def _interval_overlap(interval_a, interval_b):
	x1, x2 = interval_a
	x3, x4 = interval_b
	if x3 < x1:
		if x4 < x1:
			return 0
		else:
			return min(x2,x4) - x1
	else:
		if x2 < x3:
			 return 0
		else:
			return min(x2,x4) - x3

def bbox_iou(box1, box2):
	intersect_w = _interval_overlap([box1.xmin, box1.xmax], [box2.xmin, box2.xmax])
	intersect_h = _interval_overlap([box1.ymin, box1.ymax], [box2.ymin, box2.ymax])
	intersect = intersect_w * intersect_h
	w1, h1 = box1.xmax-box1.xmin, box1.ymax-box1.ymin
	w2, h2 = box2.xmax-box2.xmin, box2.ymax-box2.ymin
	union = w1*h1 + w2*h2 - intersect
	return float(intersect) / union

def do_nms(boxes, nms_thresh):
	if len(boxes) > 0:
		nb_class = len(boxes[0].classes)
	else:
		return
	for c in range(nb_class):
		sorted_indices = np.argsort([-box.classes[c] for box in boxes])
		for i in range(len(sorted_indices)):
			index_i = sorted_indices[i]
			if boxes[index_i].classes[c] == 0: continue
			for j in range(i+1, len(sorted_indices)):
				index_j = sorted_indices[j]
				if bbox_iou(boxes[index_i], boxes[index_j]) >= nms_thresh:
					boxes[index_j].classes[c] = 0

# load and prepare an image
def load_image_pixels(filename, shape):
	# load the image to get its shape
	image = load_img(filename)
	width, height = image.size
	# load the image with the required size
	image = load_img(filename, target_size=shape)
	# convert to numpy array
	image = img_to_array(image)
	# scale pixel values to [0, 1]
	image = image.astype('float32')
	image /= 255.0
	# add a dimension so that we have one sample
	image = expand_dims(image, 0)
	return image, width, height

# get all of the results above a threshold
def get_boxes(boxes, labels, thresh):
	v_boxes, v_labels, v_scores = list(), list(), list()
	# enumerate all boxes
	for box in boxes:
		# enumerate all possible labels
		for i in range(len(labels)):
			# check if the threshold for this label is high enough
			if box.classes[i] > thresh:
				v_boxes.append(box)
				v_labels.append(labels[i])
				v_scores.append(box.classes[i]*100)
				# don't break, many labels may trigger for one box
	return v_boxes, v_labels, v_scores

# draw all results
def draw_boxes(filename, v_boxes, v_labels, v_scores):
	# load the image
	data = pyplot.imread(filename)
	# plot the image
	pyplot.imshow(data)
	# get the context for drawing boxes
	ax = pyplot.gca()
	# plot each box
	for i in range(len(v_boxes)):
		box = v_boxes[i]
		# get coordinates
		y1, x1, y2, x2 = box.ymin, box.xmin, box.ymax, box.xmax
		# calculate width and height of the box
		width, height = x2 - x1, y2 - y1
		# create the shape
		rect = Rectangle((x1, y1), width, height, fill=False, color='white')
		# draw the box
		ax.add_patch(rect)
		# draw text and score in top left corner
		label = "%s (%.3f)" % (v_labels[i], v_scores[i])
		pyplot.text(x1, y1, label, color='white')
	# show the plot
	pyplot.show()

# load yolov3 model
model = load_model('model.h5')
# define the expected input shape for the model
input_w, input_h = 416, 416
# define our new photo
photo_filename = 'zebra.jpg'
# load and prepare image
image, image_w, image_h = load_image_pixels(photo_filename, (input_w, input_h))
# make prediction
yhat = model.predict(image)
# summarize the shape of the list of arrays
print([a.shape for a in yhat])
# define the anchors
anchors = [[116,90, 156,198, 373,326], [30,61, 62,45, 59,119], [10,13, 16,30, 33,23]]
# define the probability threshold for detected objects
class_threshold = 0.6
boxes = list()
for i in range(len(yhat)):
	# decode the output of the network
	boxes += decode_netout(yhat[i][0], anchors[i], class_threshold, input_h, input_w)
# correct the sizes of the bounding boxes for the shape of the image
correct_yolo_boxes(boxes, image_h, image_w, input_h, input_w)
# suppress non-maximal boxes
do_nms(boxes, 0.5)
# define the labels
labels = ["person", "bicycle", "car", "motorbike", "aeroplane", "bus", "train", "truck",
	"boat", "traffic light", "fire hydrant", "stop sign", "parking meter", "bench",
	"bird", "cat", "dog", "horse", "sheep", "cow", "elephant", "bear", "zebra", "giraffe",
	"backpack", "umbrella", "handbag", "tie", "suitcase", "frisbee", "skis", "snowboard",
	"sports ball", "kite", "baseball bat", "baseball glove", "skateboard", "surfboard",
	"tennis racket", "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana",
	"apple", "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake",
	"chair", "sofa", "pottedplant", "bed", "diningtable", "toilet", "tvmonitor", "laptop", "mouse",
	"remote", "keyboard", "cell phone", "microwave", "oven", "toaster", "sink", "refrigerator",
	"book", "clock", "vase", "scissors", "teddy bear", "hair drier", "toothbrush"]
# get the details of the detected objects
v_boxes, v_labels, v_scores = get_boxes(boxes, labels, class_threshold)
# summarize what we found
for i in range(len(v_boxes)):
	print(v_labels[i], v_scores[i])
# draw what we found
draw_boxes(photo_filename, v_boxes, v_labels, v_scores)
複製代碼

再次運行示例，打印出模型的原始輸出。

接下來就是模型檢測到的對象摘要和對應置信度。能夠看出，模型檢測到三匹斑馬，並且類似度高於 90%。

[(1, 13, 13, 255), (1, 26, 26, 255), (1, 52, 52, 255)]
zebra 94.91060376167297
zebra 99.86329674720764
zebra 96.8708872795105
複製代碼

繪製出的圖片有三個邊框，能夠看出模型確實成功檢測出了圖片中的三匹斑馬。

Photograph of Three Zebra Each Detected with the YOLOv3 Model and Localized with Bounding Boxes

用 YOLOv3 模型檢測、邊框定位的斑馬圖片

拓展閱讀

若是想深刻了解該主題，本節提供更多有關資源。

論文

You Only Look Once: Unified, Real-Time Object Detection, 2015.
YOLO9000: Better, Faster, Stronger, 2016.
YOLOv3: An Incremental Improvement, 2018.

API

matplotlib.patches.Rectangle API

資源

Keras 項目的其餘 YOLO 實現

總結

本教程教你如何開發 YOLOv3 模型，用於對新的圖像進行對象檢測。

具體來講，你學到了：

基於 YOLO 的卷積神經網絡系列模型，用於對象檢測。最新變體是 YOLOv3。
針對 Keras 深度學習庫的最佳開源庫 YOLOv3 實現。
如何使用預先訓練的 YOLOv3 對新照片進行定位和檢測。

有問題嗎？在評論區提問，我會盡量回答的。

若是發現譯文存在錯誤或其餘須要改進的地方，歡迎到掘金翻譯計劃對譯文進行修改並 PR，也可得到相應獎勵積分。文章開頭的 本文永久連接 即爲本文在 GitHub 上的 MarkDown 連接。

掘金翻譯計劃是一個翻譯優質互聯網技術文章的社區，文章來源爲掘金上的英文分享文章。內容覆蓋 Android、iOS、前端、後端、區塊鏈、產品、設計、人工智能等領域，想要查看更多優質譯文請持續關注掘金翻譯計劃、官方微博、知乎專欄。

相關標籤/搜索

keras+tensorflow+yolov3

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。