1、簡介

官方網站：http://cocodataset.org/
全稱：Microsoft Common Objects in Context （MS COCO）
支持任務：Detection、Keypoints、Stuff、Panoptic、Captions
說明：COCO數據集目前有三個版本，即201四、2015和2017，其中2015版只有測試集，其餘兩個有訓練集、驗證集和測試集。
（本貼內容來源於官網+我的理解與描述）git

2、數據集下載

方法一：直接官網下載（須要FQ）。
~~方法二：本人已把官網數據集放在百度雲網盤，可自行下載（無需FQ）。~~github

3、數據集說明

COCO數據集包括兩大部分：Images和Annotations
Images：「任務+版本」命名的文件夾（例如：train2014），裏面爲xxx.jpg的圖像文件；
Annotations：文件夾，裏面爲xxx.json格式的文本文件（例如：instances_train2014.json）；
使用COCO數據集的核心就在於xxx.json文件的讀取操做，下面詳細介紹annotation文件的組織結構和使用方法。算法

3.1 通用字段

　　COCO有五種註釋類型對應五種任務:目標檢測、關鍵點檢測、實物分割、全景分割和圖像描述。註釋使用JSON文件存儲。每一個xxx.json的內容總體爲一個字典，key爲「info」、「images「、」annotations「和」licenses「，以下所示：json

1 { 2 "info" :info, 3 "images" :[image], 4 "annotations" :[annotation], 5 "licenses" :[license], 6 }

　　value爲對應的數據類型，其中，info是一個字典，images是一個list，annotations是一個list，licenses是一個list。除annotation外，每部分的內容定義以下：api

1 info{ 2 "year" : int, # 數據集年份號 3 "version" : str, # 數據集版本 4 "description" : str, # 數據集描述 5 "contributor" : str, # 貢獻者 6 "url" : str, # 數據集官方網址 7 "date_created" : datetime, # 數據集建立詳細時間 8 } 9 10 image{ 11 "id" : int, # 圖像id 12 "width" : int, # 圖像寬度 13 "height" : int, # 圖像高度 14 "file_name" : str, # 圖像文件名 15 "license" : int, # 許可證 16 "flickr_url" : str, # flickr連接 17 "coco_url" : str, # coco連接 18 "date_captured" : datetime, # 拍攝時間 19 } 20 21 license{ 22 "id" : int, # license的編號，1-8 23 "name" : str, # 許可證名稱 24 "url" : str, # 許可證網址 25 }

　　key爲」annotation「的value對應不一樣的xxx.json略有不一樣，但表示內容含義是同樣的，即對圖片和實例的描述。同時除了annotation外，還有一個key爲」categories「表示類別。如下分別對不一樣任務的annotation和categories進行說明。數組

3.2 非通用字段

3.2.1 Object Detection（目標檢測）

　　以檢測任務爲例，對於每一張圖片，至少包含一個對象，COCO數據集對每個對象進行描述，而不是對一張圖片。每一個對象都包含一系列字段，包括對象的類別id和mask碼，mask碼的分割格式取決於圖像裏的對象數目，當一張圖像裏就一個對象時（iscrowd=0），mask碼用RLE格式，當大於一個對象時（iscrowd=1），採用polyhon格式。服務器

1 annotation{ 2 "id" : int, # annotation的id，每一個對象對應一個annotation 3 "image_id" : int, # 該annotation的對象所在圖片的id 4 "category_id" : int, # 類別id，每一個對象對應一個類別 5 "segmentation" : RLE or [polygon], 6 "area" : float, # 面積 7 "bbox" : [x,y,width,height], # x,y爲左上角座標 8 "iscrowd" : 0 or 1, # 0時segmentation爲REL，1爲polygon 9 } 10 11 categories[{ 12 "id"　　　　　　　　 : int, # 類別id 13 "name" : str, # 類別名稱 14 "supercategory"　　: str, # 類別的父類，例如：bicycle的父類是vehicle 15 }]

3.2.2 Keypoint Detection（關鍵點檢測）

　　與檢測任務同樣，一個圖像包乾若干對象，一個對象對應一個keypoint註釋，一個keypoint註釋包含對象註釋的全部數據（包括id、bbox等）和兩個附加字段。
首先，key爲」keypoints「的value是一個長度爲3k的數組，其中k是類別定義的關鍵點總數（例如人體姿態關鍵點的k爲17）.每一個關鍵點都有一個0索引的位置x、y和可見性標誌v（v=0表示未標記，此時x=y=0；v=1時表示標記，但不可見，不可見的緣由在於被遮擋了；v=2時表示標記且可見），若是一個關鍵點落在對象段內，則認爲是可見的。數據結構

1 annotation{ 2 "keypoints" : [x1,y1,v1,...], 3 "num_keypoints" : int, # v=1，2的關鍵點的個數，即有標記的關鍵點個數 4 "[cloned]" : ..., 5 } 6 7 categories[{ 8 "keypoints" : [str], # 長度爲k的關鍵點名字符串 9 "skeleton" : [edge], # 關鍵點的連通性，主要是經過一組關鍵點邊緣隊列表的形式表示，用於可視化. 10 "[cloned]" : ..., 11 }]

　　其中，[cloned]表示從上面定義的Object Detection註釋中複製的字段。由於keypoint的json文件包含detection任務所需的字段。app

3.2.3 Stuff Segmentation（實例分割）

　　分割任務的對象註釋格式與上面的Object Detection相同且徹底兼容（除了iscrowd是沒必要要的，默認值爲0），分割任務主要字段是「segmentation」。dom

3.2.4 Panoptic Segmentation（全景分割）

對於全景分割任務，每一個註釋結構是每一個圖像的註釋，而不是每一個對象的註釋，與上面三個有區別。每一個圖像的註釋有兩個部分：1）存儲與類無關的圖像分割的PNG；2）存儲每一個圖像段的語義信息的JSON結構。

要將註釋與圖像匹配，使用image_id字段（即：annotation.image_id==image.id）；
對於每一個註釋，每一個像素段的id都存儲爲一個單獨的PNG，PNG位於與JSON同名的文件夾中。每一個分割都有惟一的id，未標記的像素爲0；
對於每一個註釋，每一個語義信息都存儲在annotation.segments_info. segment_info.id，該存儲段存儲惟一的id，並用於從PNG檢索相應的掩碼（ids==segment_info.id）。iscrowd表示段內包含一組對象。bbox和area字段表示附加信息。

1 annotation{ 2 "image_id"　　　　: int, 3 "file_name"　　　 : str, 4 "segments_info" : [segment_info], 5 } 6 7 segment_info{ 8 "id"　　　　　　: int,. 9 "category_id" : int, 10 "area"　　　　 : int, 11 "bbox"　　　　 : [x,y,width,height], 12 "iscrowd"　　 : 0 or 1, 13 } 14 15 categories[{ 16 "id"　　　　　　　　: int, 17 "name"　　　　　　 : str, 18 "supercategory" : str, 19 "isthing" : 0 or 1, 20 "color"　　　　　　: [R,G,B], 21 }]

3.2.5 Image Captioning（圖像字幕）

　　圖像字幕任務的註釋用於存儲圖像標題，每一個標題描述指定的圖像，每一個圖像至少有5個標題。

1 annotation{ 2 "id"　　　　　　: int, 3 "image_id"　　 : int, 4 "caption"　　　: str, 5 }

4、數據集的使用（Python）

4.1 COCOAPI

　　經過上面的介紹可知COCO數據集的標籤有必定複雜度，須要經過各類文件讀取來獲取註釋，爲了讓用戶更好地使用 COCO 數據集, COCO 提供了各類 API，即下面要介紹的cocoapi。

4.2 API安裝

　　首先安裝依賴包：

1 ~$ pip install numpy Cython matplotlab

　　git下載地址：https://github.com/cocodataset/cocoapi.git
　　下載後進入到PythonAPI目錄下：

1 ~$ cd coco/PythonAPI 2 ~/cocoapi$ make

4.3 COCO API使用（官方例程）

　　安裝完在site-packages文件夾能夠看到pycocotools包，該包是COCO數據集的Python API，幫助加載、解析和可視化COCO中的註釋。使用API的方法是直接使用API提供的函數加載註釋文件和讀取Python字典。API函數定義以下：

COCO：加載COCO註釋文件並準備數據結構的COCO api類。
decodeMask：經過運行長度編碼解碼二進制掩碼M。
encodeMask：使用運行長度編碼對二進制掩碼M進行編碼。
getAnnIds：獲得知足給定過濾條件的annotation的id。
getCatIds：得到知足給定過濾條件的category的id。
getImgIds：獲得知足給定過濾條件的imgage的id。
loadAnns：使用指定的id加載annotation。
loadCats：使用指定的id加載category。
loadImgs：使用指定的id加載imgage。
annToMask：將註釋中的segmentation轉換爲二進制mask。
showAnns：顯示指定的annotation。
loadRes：加載算法結果並建立訪問它們的API。
download：從mscoco.org服務器下載COCO圖像。

　　下面展現了數據加載、解析和可視化註釋等內容，步驟以下：

一、首先導入必要的包

1 %matplotlib inline 2 from pycocotools.coco import COCO 3 import numpy as np 4 import skimage.io as io 5 import matplotlib.pyplot as plt 6 import pylab 7 pylab.rcParams['figure.figsize'] = (8.0, 10.0)

二、定義annotation文件路徑（以「instances_val2014.json」爲例）

1 dataDir='..'
2 dataType='val2014'
3 annFile='{}/annotations/instances_{}.json'.format(dataDir,dataType)

三、讀取instances_val2014.json文件到COCO類

1 # initialize COCO api for instance annotations
2 coco = COCO(annFile)

輸出以下：
loading annotations into memory…
Done (t=4.19s)
creating index…
index created!

四、COCO圖像類別的讀取

1 # display COCO categories and supercategories
2 cats = coco.loadCats(coco.getCatIds()) 3 nms=[cat['name'] for cat in cats] 4 print('COCO categories: \n{}\n'.format(' '.join(nms))) 5 6 nms = set([cat['supercategory'] for cat in cats]) 7 print('COCO supercategories: \n{}'.format(' '.join(nms)))

輸出以下：
COCO categories:
person bicycle car motorcycle airplane bus train truck boat traffic light fire hydrant stop sign parking meter bench bird cat dog horse sheep cow elephant bear zebra giraffe backpack umbrella handbag tie suitcase frisbee skis snowboard sports ball kite baseball bat baseball glove skateboard surfboard tennis racket bottle wine glass cup fork knife spoon bowl banana apple sandwich orange broccoli carrot hot dog pizza donut cake chair couch potted plant bed dining table toilet tv laptop mouse remote keyboard cell phone microwave oven toaster sink refrigerator book clock vase scissors teddy bear hair drier toothbrush

COCO supercategories:
sports furniture electronic food appliance vehicle animal kitchen outdoor indoor person accessory

五、COCO原始圖像讀取

1 # 找到符合'person','dog','skateboard'過濾條件的category_id
 2 catIds = coco.getCatIds(catNms=['person','dog','skateboard']); 3 # 找出符合category_id過濾條件的image_id 4 imgIds = coco.getImgIds(catIds=catIds ); 5 # 找出imgIds中images_id爲324158的image_id 6 imgIds = coco.getImgIds(imgIds = [324158]) 7 # 加載圖片，獲取圖片的數字矩陣 8 img = coco.loadImgs(imgIds[np.random.randint(0,len(imgIds))])[0] 9 # 顯示圖片 10 I = io.imread(img['coco_url']) 11 plt.axis('off') 12 plt.imshow(I) 13 plt.show()

輸出以下：

六、加載並顯示annotations

1 # load and display instance annotations
2 plt.imshow(I); plt.axis('off') 3 annIds = coco.getAnnIds(imgIds=img['id'], catIds=catIds, iscrowd=None) 4 anns = coco.loadAnns(annIds) 5 coco.showAnns(anns)

輸出以下：

七、加載並顯示person_keypoints_2014.json的annotations

1 # initialize COCO api for person keypoints annotations
 2 annFile = '{}/annotations/person_keypoints_{}.json'.format(dataDir,dataType) 3 coco_kps=COCO(annFile) 4 5 # load and display keypoints annotations 6 plt.imshow(I); plt.axis('off') 7 ax = plt.gca() 8 annIds = coco_kps.getAnnIds(imgIds=img['id'], catIds=catIds, iscrowd=None) 9 anns = coco_kps.loadAnns(annIds) 10 coco_kps.showAnns(anns)

輸出以下：
loading annotations into memory…
Done (t=2.08s)
creating index…
index created!

八、加載並顯示captions_2014.json.json的annotations

1 # initialize COCO api for caption annotations
2 annFile = '{}/annotations/captions_{}.json'.format(dataDir,dataType) 3 coco_caps=COCO(annFile) 4 5 # load and display caption annotations 6 annIds = coco_caps.getAnnIds(imgIds=img['id']); 7 anns = coco_caps.loadAnns(annIds) 8 coco_caps.showAnns(anns) 9 plt.imshow(I); plt.axis('off'); plt.show()

輸出以下：
loading annotations into memory…
Done (t=0.41s)
creating index…
index created!
A man is skate boarding down a path and a dog is running by his side.
A man on a skateboard with a dog outside.
A person riding a skate board with a dog following beside.
This man is riding a skateboard behind a dog.
A man walking his dog on a quiet country road.

5、COCO數據集的評估

5.1 IOU值計算

　　上圖所示的IOU計算以下：

5.2 COCO評估指標

除非另有說明，不然AP和AR在多個交匯點（IoU）值上取平均值，使用0.50到0.95共10個IOU閾值下的mAP求平均，結果就是COCO數據集定義的AP，與只用一個IOU=0.50下計算的AP相比，是一個突破；
AP是全部類別的平均值。傳統上，這被稱爲「平均準確度」（mAP，mean average precision）。官方沒有區分AP和mAP（一樣是AR和mAR），並假定從上下文中能夠清楚地看出差別。
AP（全部10個IoU閾值和全部80個類別的平均值）將決定贏家。在考慮COCO性能時，這應該被認爲是最重要的一個指標。
在COCO中，比大物體相比有更多的小物體。具體地說，大約41％的物體很小（area<322），34％是中等（322 < area < 962)），24％大（area > 962）。測量的面積（area）是分割掩碼（segmentation mask）中的像素數量。
AR是在每一個圖像中檢測到固定數量的最大召回（recall），在類別和IoU上平均。AR與proposal evaluation中使用的同名度量相關，可是按類別計算。
全部度量標準容許每一個圖像（在全部類別中）最多100個最高得分檢測進行計算。
除了IoU計算（分別在框（box）或掩碼（mask）上執行）以外，用邊界框和分割掩碼檢測的評估度量在全部方面是相同的。

5.3 COCO結果文件統一格式

Object Detection

1 [{ 2 "image_id" : int, 3 "category_id" : int, 4 "bbox" : [x,y,width,height], 5 "score" : float, 6 }]

　　框座標是從圖像左上角測量的浮點數(而且是0索引的)。官方建議將座標舍入到最接近十分之一像素的位置，以減小JSON文件的大小。
對於對象segments的檢測(實例分割)，請使用如下格式:

1 [{ 2 "image_id" : int, 3 "category_id" : int, 4 "segmentation" : RLE, 5 "score" : float, 6 }]

Keypoint Detection

1 [{ 2 "image_id" : int, 3 "category_id" : int, 4 "keypoints" : [x1,y1,v1,...,xk,yk,vk], 5 "score" : float, 6 }]

　　關鍵點座標是從左上角圖像角測量的浮點數(而且是0索引的)。官方建議四捨五入座標到最近的像素，以減小文件大小。還請注意，目前尚未使用vi的可視性標誌(除了控制可視化以外)，官方建議簡單地設置vi=1。

Stuff Segmentation

1 [{ 2 "image_id" : int, 3 "category_id" : int, 4 "segmentation" : RLE, 5 }]

　　除了不須要score字段外，Stuff 分割格式與Object分割格式相同。注意:官方建議用單個二進制掩碼對圖像中出現的每一個標籤進行編碼。二進制掩碼應該使用MaskApi函數encode()經過RLE進行編碼。例如，參見cocostuffhelper.py中的segmentationToCocoResult()。爲了方便，官方還提供了JSON和png格式之間的轉換腳本。

Panoptic Segmentation

1 annotation{ 2 "image_id"　　　　: int, 3 "file_name"　　　: str, 4 "segments_info" : [segment_info], 5 } 6 7 segment_info{ 8 "id"　　　　　　: int, 9 "category_id" : int, 10 }

Image Captioning

1 [{ 2 "image_id": int, 3 "caption": str, 4 }]

5.4 COCOEVAL API使用（官方例程）

COCO還提供了一個計算評估指標的API，即當本身的模型按照官方定義的格式輸出後，可使用API進行快速評估模型的一系列指標。

一、導入必要的包

1 %matplotlib inline 2 import matplotlib.pyplot as plt 3 from pycocotools.coco import COCO 4 from pycocotools.cocoeval import COCOeval 5 import numpy as np 6 import skimage.io as io 7 import pylab 8 pylab.rcParams['figure.figsize'] = (10.0, 8.0)

二、選擇任務

1 annType = ['segm','bbox','keypoints'] 2 annType = annType[1] #specify type here 3 prefix = 'person_keypoints' if annType=='keypoints' else 'instances' 4 print('Running demo for *%s* results.'%(annType))

輸出以下：
Running demo for bbox results.

三、加載json註釋文件（即：Ground Truth）

1 #initialize COCO ground truth api
2 dataDir='../'
3 dataType='val2014'
4 annFile = '%s/annotations/%s_%s.json'%(dataDir,prefix,dataType) 5 cocoGt=COCO(annFile)

輸出以下：
loading annotations into memory…
Done (t=3.16s)
creating index…
index created!

四、加載result文件（即：Predict）

　　COCO.loadRes(resFile)返回的也是一個COCO類，與COCO(annFile)不一樣的是，前者加載官方規定格式的result文件，後者加載官方提供的json文件。

1 #initialize COCO detections api
2 resFile='%s/results/%s_%s_fake%s100_results.json'
3 resFile = resFile%(dataDir, prefix, dataType, annType) 4 cocoDt=cocoGt.loadRes(resFile)

輸出以下：
Loading and preparing results…
DONE (t=0.03s)
creating index…
index created!

五、使用測試集當中的100張圖片進行評估

1 imgIds=sorted(cocoGt.getImgIds())    # 把測試集的圖像id按從小到達排列
2 imgIds=imgIds[0:100]    # 取出前面100個圖像
3 imgId = imgIds[np.random.randint(100)]    # 順序打亂

六、執行評估

1 # running evaluation
2 cocoEval = COCOeval(cocoGt,cocoDt,annType) 3 cocoEval.params.imgIds = imgIds 4 cocoEval.evaluate() 5 cocoEval.accumulate() 6 cocoEval.summarize()

6、總結

　　以上爲COCO數據集官方例程+我的理解，做爲本人的學習筆記，同時供新手瞭解。如有錯漏，請在評論區指出。（轉載請註明來源）

COCO數據集使用