最近在作與目標檢測模型相關的工做,不少都要求VOC格式的數據集.python
PASCAL VOC挑戰賽 (The PASCAL Visual Object Classes )是一個世界級的計算機視覺挑戰賽, PASCAL全稱:Pattern Analysis, Statical Modeling and Computational Learning,是一個由歐盟資助的網絡組織。不少模型都基於此數據集推出.好比目標檢測領域的yolo,ssd等等.git
看下目錄結構github
:~/git_projects/models/research/VOCdevkit/VOC2012$ tree -d . ├── Annotations ├── ImageSets │ ├── Action │ ├── Layout │ ├── Main │ └── Segmentation ├── JPEGImages ├── SegmentationClass └── SegmentationObject
JPEGImages
這個目錄下存放的是圖片數據.網絡
~/git_projects/models/research/VOCdevkit/VOC2012/Annotations$ cat 2012_004331.xml <annotation> <filename>2012_004331.jpg</filename> <folder>VOC2012</folder> <object> <name>person</name> <actions> <jumping>1</jumping> <other>0</other> <phoning>0</phoning> <playinginstrument>0</playinginstrument> <reading>0</reading> <ridingbike>0</ridingbike> <ridinghorse>0</ridinghorse> <running>0</running> <takingphoto>0</takingphoto> <usingcomputer>0</usingcomputer> <walking>0</walking> </actions> <bndbox> <xmax>208</xmax> <xmin>102</xmin> <ymax>230</ymax> <ymin>25</ymin> </bndbox> <difficult>0</difficult> <pose>Unspecified</pose> <point> <x>155</x> <y>119</y> </point> </object> <segmented>0</segmented> <size> <depth>3</depth> <height>375</height> <width>500</width> </size> <source> <annotation>PASCAL VOC2012</annotation> <database>The VOC2012 Database</database> <image>flickr</image> </source> </annotation>
對應的圖片爲
咱們注意須要關注的就是節點下的數據,尤爲是bndbox下的數據.xmin,ymin構成了boundingbox的左上角,xmax,ymax構成了boundingbox的右下角.
啥叫boundingbox? 模型檢測出目標了,會畫一個框框,標定這個框框內的東西,認爲是一個object.
app
一共63個文件,train.txt/val.txt/trainval.txt裏面記錄的是對應的數據集圖片名字. 剩下60個文件=20*3. 一共20個類別,每一個類別有xxx_train.txt,xxx_val.txt,xxx_trainval.txt.dom
1表明正樣本,-1表明負樣本code
看一下aeroplane_train.txt中的部份內容 2011_003177 1 //意思是2011_003177.jpg中有aeroplane 2011_003183 -1 //意思是2011_003183.jpg中沒有aeroplane 2011_003184 -1 2011_003187 -1 2011_003188 -1 2011_003192 -1 2011_003194 -1 2011_003216 -1 2011_003223 -1 2011_003230 -1 2011_003236 -1 2011_003238 -1 2011_003246 -1 2011_003247 -1 2011_003253 -1 2011_003255 -1 2011_003259 -1 2011_003274 -1
看一下train.txt中的內容 只含圖片名稱 2011_003187 2011_003188 2011_003192 2011_003194 2011_003216 2011_003223 2011_003230 2011_003236 2011_003238
數據準備這一步,你的數據可能來自公開數據集,或者合做方的私有數據.
數據集的標註這一步可使用labelIImg 標註本身的圖片https://github.com/tzutalin/labelImgxml
在作數據集格式轉換的過程裏,不可避免的要寫不少腳本,每一個人的需求不一樣,轉換前拿到的文件內的數據格式不一樣,須要的腳本也都有所差別.這裏提供幾個我本身用的腳本.blog
#數據集劃分 import os import random root_dir='./park_voc/VOC2007/' ## 0.7train 0.1val 0.2test trainval_percent = 0.8 train_percent = 0.7 xmlfilepath = root_dir+'Annotations' txtsavepath = root_dir+'ImageSets/Main' total_xml = os.listdir(xmlfilepath) num = len(total_xml) # 100 list = range(num) tv = int(num*trainval_percent) # 80 tr = int(tv*train_percent) # 80*0.7=56 trainval = random.sample(list, tv) train = random.sample(trainval, tr) ftrainval = open(root_dir+'ImageSets/Main/trainval.txt', 'w') ftest = open(root_dir+'ImageSets/Main/test.txt', 'w') ftrain = open(root_dir+'ImageSets/Main/train.txt', 'w') fval = open(root_dir+'ImageSets/Main/val.txt', 'w') for i in list: name = total_xml[i][:-4]+'\n' if i in trainval: ftrainval.write(name) if i in train: ftrain.write(name) else: fval.write(name) else: ftest.write(name) ftrainval.close() ftrain.close() fval.close() ftest .close()
#.txt-->.xml #! /usr/bin/python # -*- coding:UTF-8 -*- import os, sys import glob from PIL import Image # VEDAI 圖像存儲位置 src_img_dir = "/home/train/dataset-expand/park_voc/VOC2007/JPEGImages" # VEDAI 圖像的 ground truth 的 txt 文件存放位置 src_txt_dir = "/home/train/dataset-expand/label_expand" src_xml_dir = "/home/train/dataset-expand/park_voc/VOC2007/Annotations" img_Lists = glob.glob(src_img_dir + '/*.jpg') img_basenames = [] # e.g. 100.jpg for item in img_Lists: img_basenames.append(os.path.basename(item)) img_names = [] # e.g. 100 for item in img_basenames: temp1, temp2 = os.path.splitext(item) img_names.append(temp1) for img in img_names: im = Image.open((src_img_dir + '/' + img + '.jpg')) width, height = im.size # open the crospronding txt file gt = open(src_txt_dir + '/' + img.replace('img','label',1) + '.txt').read().splitlines() #gt = open(src_txt_dir + '/gt_' + img + '.txt').read().splitlines() # write in xml file #os.mknod(src_xml_dir + '/' + img + '.xml') xml_file = open((src_xml_dir + '/' + img + '.xml'), 'w') xml_file.write('<annotation>\n') xml_file.write(' <folder>VOC2007</folder>\n') xml_file.write(' <filename>' + str(img) + '.jpg' + '</filename>\n') xml_file.write(' <size>\n') xml_file.write(' <width>' + str(width) + '</width>\n') xml_file.write(' <height>' + str(height) + '</height>\n') xml_file.write(' <depth>3</depth>\n') xml_file.write(' </size>\n') # write the region of image on xml file for img_each_label in gt: spt = img_each_label.split(',') #這裏若是txt裏面是以逗號‘,’隔開的,那麼就改成spt = img_each_label.split(',')。 xml_file.write(' <object>\n') xml_file.write(' <name>' + str(spt[4]) + '</name>\n') xml_file.write(' <pose>Unspecified</pose>\n') xml_file.write(' <truncated>0</truncated>\n') xml_file.write(' <difficult>0</difficult>\n') xml_file.write(' <bndbox>\n') xml_file.write(' <xmin>' + str(spt[0]) + '</xmin>\n') xml_file.write(' <ymin>' + str(spt[1]) + '</ymin>\n') xml_file.write(' <xmax>' + str(spt[2]) + '</xmax>\n') xml_file.write(' <ymax>' + str(spt[3]) + '</ymax>\n') xml_file.write(' </bndbox>\n') xml_file.write(' </object>\n') xml_file.write('</annotation>')
今天先不寫了,待補充.圖片