語義分割丨PSPNet源碼解析「測試階段」

時間 2019-12-11

標籤語義分割 pspnet 源碼解析測試階段简体版

原文原文鏈接

引言

本文接着上一篇語義分割丨PSPNet源碼解析「網絡訓練」，繼續介紹語義分割的測試階段。html

模型訓練完成後，以什麼樣的策略來進行測試也很是重要。python

通常來講模型測試分爲單尺度single scale和多尺度multi scale，多尺度的結果通常比單尺度高。除此以外，其餘細節好比是將整幅圖送進網絡，仍是採用滑動窗口sliding window 每次取圖的一部分送進網絡這些也會影響測試結果。下面將基於代碼進行闡述。git

完整代碼見：https://github.com/speedinghzl/pytorch-segmentation-toolbox/blob/master/evaluate.pygithub

evaluate.py

main

下面是測試主函數的前半部分，args.whole表示是否使用多尺度。json

若是args.whole爲false，則採起單尺度，調用predict_sliding，滑動窗口。數組

若是args.whole爲true，則採起多尺度，調用predict_multiscale並傳入[0.75, 1.0, 1.25, 1.5, 1.75, 2.0]做爲縮放係數，整圖預測。網絡

def main():
    """Create the model and start the evaluation process."""
    args = get_arguments()  #傳入參數

    # gpu0 = args.gpu
    os.environ["CUDA_VISIBLE_DEVICES"]=args.gpu
    h, w = map(int, args.input_size.split(',')) #h = 769, w = 769
    if args.whole:
        input_size = (1024, 2048)
    else:
        input_size = (h, w) #(769,769)

    model = Res_Deeplab(num_classes=args.num_classes)   #構造模型
    
    saved_state_dict = torch.load(args.restore_from)    #導入權重
    model.load_state_dict(saved_state_dict) #模型加載權重

    model.eval()    #測試模式
    model.cuda()

    testloader = data.DataLoader(CSDataSet(args.data_dir, args.data_list, crop_size=(1024, 2048), mean=IMG_MEAN, scale=False, mirror=False), 
                                    batch_size=1, shuffle=False, pin_memory=True)

    data_list = []
    confusion_matrix = np.zeros((args.num_classes,args.num_classes))    #構造混淆矩陣 shape(19,19)
    palette = get_palette(256)  #上色板
    interp = nn.Upsample(size=(1024, 2048), mode='bilinear', align_corners=True)    #上採樣

    if not os.path.exists('outputs'):
        os.makedirs('outputs')

    for index, batch in enumerate(testloader):
        if index % 100 == 0:
            print('%d processd'%(index))
        image, label, size, name = batch
        #image.shape(1,3,1024,2048)、label.shape(1,1024,2048)、size=[[1024,2048,3]]
        size = size[0].numpy()  #size=[1024,2048,3]
        with torch.no_grad():   #無需梯度回傳
            if args.whole:  #若採用整圖訓練，則調用multiscale方法 output.shape(1024,2048,19)
                output = predict_multiscale(model, image, input_size, [0.75, 1.0, 1.25, 1.5, 1.75, 2.0], args.num_classes, True, args.recurrence)
            else:   #不然採用滑動窗口法
                output = predict_sliding(model, image.numpy(), input_size, args.num_classes, True, args.recurrence)

下面分別看一下單尺度下predict_sliding和多尺度下predict_whole和predict_multiscale的實現。app

predict_sliding

該方法是用一個固定大小的窗口，每次從圖片上扣下一部分，送進網絡獲得輸出。而後窗口滑動，滑動先後有1/3的重疊區域，重疊部分機率疊加。最終用總機率除以重疊次數就獲得了每一個像素的平均機率。less

#image.shape(1,3,1024,2048)、tile_size=(769,769)、classes=1九、flip=True、recur=1
def predict_sliding(net, image, tile_size, classes, flip_evaluation, recurrence):
    interp = nn.Upsample(size=tile_size, mode='bilinear', align_corners=True)   
    image_size = image.shape    #(1,3,1024,2048)
    overlap = 1/3   #每次滑動的重合率爲1/3

    stride = ceil(tile_size[0] * (1 - overlap)) #滑動步長:769*(1-1/3) = 513
    tile_rows = int(ceil((image_size[2] - tile_size[0]) / stride) + 1)  #行滑動步數:(1024-769)/513 + 1 = 2
    tile_cols = int(ceil((image_size[3] - tile_size[1]) / stride) + 1)  #列滑動步數:(2048-769)/513 + 1 = 4
    print("Need %i x %i prediction tiles @ stride %i px" % (tile_cols, tile_rows, stride))
    full_probs = np.zeros((image_size[2], image_size[3], classes))  #初始化全機率矩陣 shape(1024,2048,19)
    count_predictions = np.zeros((image_size[2], image_size[3], classes))   #初始化計數矩陣 shape(1024,2048,19)
    tile_counter = 0    #滑動計數0

    for row in range(tile_rows):    # row = 0,1
        for col in range(tile_cols):    # col = 0,1,2,3
            x1 = int(col * stride)  #起始位置x1 = 0 * 513 = 0
            y1 = int(row * stride)  #        y1 = 0 * 513 = 0
            x2 = min(x1 + tile_size[1], image_size[3])  #末位置x2 = min(0+769, 2048) 
            y2 = min(y1 + tile_size[0], image_size[2])  #      y2 = min(0+769, 1024)
            x1 = max(int(x2 - tile_size[1]), 0)  #從新校準起始位置x1 = max(769-769, 0)
            y1 = max(int(y2 - tile_size[0]), 0)  #                y1 = max(769-769, 0)

            img = image[:, :, y1:y2, x1:x2] #滑動窗口對應的圖像 imge[:, :, 0:769, 0:769]
            padded_img = pad_image(img, tile_size)  #padding 確保扣下來的圖像爲769*769
            # plt.imshow(padded_img)
            # plt.show()
            tile_counter += 1   #計數加1
            print("Predicting tile %i" % tile_counter)
            #將扣下來的部分傳入網絡，網絡輸出機率圖。
            padded_prediction = net(Variable(torch.from_numpy(padded_img), volatile=True).cuda())   #[x, x_dsn]
            if isinstance(padded_prediction, list):
                padded_prediction = padded_prediction[0]    #x.shape(1,19,97,97)
            padded_prediction = interp(padded_prediction).cpu().data[0].numpy().transpose(1,2,0)    #上採樣shape(769,769,19)
            prediction = padded_prediction[0:img.shape[2], 0:img.shape[3], :]   #扣下相應面積 shape(769,769,19)
            count_predictions[y1:y2, x1:x2] += 1    #窗口區域內的計數矩陣加1
            full_probs[y1:y2, x1:x2] += prediction  #窗口區域內的全機率矩陣疊加預測結果

    # average the predictions in the overlapping regions
    full_probs /= count_predictions #全機率矩陣 除以 計數矩陣 即得 平均機率
    # visualize normalization Weights
    # plt.imshow(np.mean(count_predictions, axis=2))
    # plt.show()
    return full_probs   #返回整張圖的平均機率 shape(1024,2048,19)

predict_multiscale

該函數以不一樣的scales調用predict_whole，若採用翻轉，則將圖片翻轉後送入網絡，獲得網絡輸出，再將網絡輸出翻轉，疊加以前的輸出併除以2。ide

#image.shape(1,3,1024,2048)、tile_size=(769,769)、scales=[0.75, 1.0, 1.25, 1.5, 1.75, 2.0]、
#classes=1九、flip=True、recur=1
def predict_multiscale(net, image, tile_size, scales, classes, flip_evaluation, recurrence):
    """
    Predict an image by looking at it with different scales.
        We choose the "predict_whole_img" for the image with less than the original input size,
        for the input of larger size, we would choose the cropping method to ensure that GPU memory is enough.
    """
    image = image.data
    N_, C_, H_, W_ = image.shape    #1, 3, 1024, 2048
    full_probs = np.zeros((H_, W_, classes))    #shape(1024, 2048, 19)  
    for scale in scales:    #[0.75, 1.0, 1.25, 1.5, 1.75, 2.0]
        scale = float(scale)    #0.75
        print("Predicting image scaled by %f" % scale)
        #用不一樣比例對圖片進行縮放
        scale_image = ndimage.zoom(image, (1.0, 1.0, scale, scale), order=1, prefilter=False)   #shape(1,3,768,1536)
        scaled_probs = predict_whole(net, scale_image, tile_size, recurrence)   #預測縮放後的整張圖像
        if flip_evaluation == True: #若採起翻轉
            flip_scaled_probs = predict_whole(net, scale_image[:,:,:,::-1].copy(), tile_size, recurrence)   #翻轉後再次預測整張
            scaled_probs = 0.5 * (scaled_probs + flip_scaled_probs[:,::-1,:])   #翻轉先後各佔50%
        full_probs += scaled_probs  #全機率累加 shape(1024, 2048, 19)
    full_probs /= len(scales)   #求平均機率
    return full_probs   #shape(1024, 2048, 19)

predict_whole

若是採起整圖預測，那麼圖片尺寸跟網絡輸入(cropsize)可能會有衝突。所以網絡輸出長寬可能不等，故須要將輸出上採樣（拉伸）成指定輸入。

#image.shape(1,3,1024,2048)、tile_size=(769,769)
def predict_whole(net, image, tile_size, recurrence):
    image = torch.from_numpy(image)
    interp = nn.Upsample(size=tile_size, mode='bilinear', align_corners=True)   #上採樣
    prediction = net(image.cuda())  #[x, x_dsn]
    if isinstance(prediction, list):
        prediction = prediction[0]  #x.shape(1,19,97,193)注意這裏跟滑動窗口法不一樣，輸出的h、w並不相等
    prediction = interp(prediction).cpu().data[0].numpy().transpose(1,2,0)  #插值 shape(1024,2048,19)
    return prediction

main

完成上述操做後獲得output，將其歸一化並取channel維度上的最大值，得預測結果seg_pred，咱們可使用putpalette函數上色獲得彩色的分割效果。

更重要的，咱們須要計算分割指標mIoU，這裏使用了混淆矩陣confusion_matrix方法，咱們將seg_gt和seg_pred中有效區域取出並將其拉成一維向量，輸入get_confusion_matrix函數。

seg_pred = np.asarray(np.argmax(output, axis=2), dtype=np.uint8)    #對結果進行softmax歸一化 shape(1024,2048)
        output_im = PILImage.fromarray(seg_pred)    #將數組轉換爲圖像
        output_im.putpalette(palette)               #給圖像上色
        output_im.save('outputs/'+name[0]+'.png')   #保存下來

        seg_gt = np.asarray(label[0].numpy()[:size[0],:size[1]], dtype=np.int)  #取出label shape(1024,2048)
    
        ignore_index = seg_gt != 255    #找到label中的有效區域即不爲255的位置，用ignore_index來指示位置
        seg_gt = seg_gt[ignore_index]   #將有效區域取出並轉換爲1維向量
        seg_pred = seg_pred[ignore_index]   #同上轉換爲1維向量，位置一一對應
        # show_all(gt, output)
        confusion_matrix += get_confusion_matrix(seg_gt, seg_pred, args.num_classes)    #混淆矩陣加上本張圖的預測結果

對預測結果進行上色，爲1024x2048x1上的每一個像素點分配RGB通道上的三個值，獲得1024x2048x3。

def get_palette(num_cls):
    """ Returns the color map for visualizing the segmentation mask.
    Args:
        num_cls: Number of classes
    Returns:
        The color map
    """

    n = num_cls
    palette = [0] * (n * 3)
    for j in range(0, n):
        lab = j
        palette[j * 3 + 0] = 0
        palette[j * 3 + 1] = 0
        palette[j * 3 + 2] = 0
        i = 0
        while lab:
            palette[j * 3 + 0] |= (((lab >> 0) & 1) << (7 - i))
            palette[j * 3 + 1] |= (((lab >> 1) & 1) << (7 - i))
            palette[j * 3 + 2] |= (((lab >> 2) & 1) << (7 - i))
            i += 1
            lab >>= 3
    return palette

get_confusion_matrix

初始化混淆矩陣confusion_matrix，其維度爲19x19，混淆矩陣中第i行第j列表示本屬於第i類卻被誤判爲第j列的像素點個數。

因而咱們須要經過gt_label和pred_label，以肯定每一個pixel在混淆矩陣上的位置。

咱們新建一個向量index = (gt_label * class_num + pred_label)，以行優先的方式用一維向量來存儲二維信息。

例如gt_label[0]=1,pred_label[0]=3有index[0]=1*19+3=22，index[0]=22表示第0個像素點本屬於第1類的卻被誤判爲3類，因而confusion_matrix[1][3]計數加一。

#gt_label、pred_label都爲1維向量
def get_confusion_matrix(gt_label, pred_label, class_num):
        """
        Calcute the confusion matrix by given label and pred
        :param gt_label: the ground truth label
        :param pred_label: the pred label
        :param class_num: the nunber of class
        :return: the confusion matrix
        """
        index = (gt_label * class_num + pred_label).astype('int32') #以行優先的方式用一維向量存儲二維位置信息
        label_count = np.bincount(index)    #對各類狀況進行計數，如第1類被誤判爲第2類的一共有x個像素點
        confusion_matrix = np.zeros((class_num, class_num)) #初始化混淆矩陣 shape(19,19)

        for i_label in range(class_num):    #0,1,2,...,18
            for i_pred_label in range(class_num):   #0,1,2,...,18
                cur_index = i_label * class_num + i_pred_label  #0*18+0, 0*18+1, ..., 18*18+18 每一次對應一種判斷狀況
                if cur_index < len(label_count):
                    confusion_matrix[i_label, i_pred_label] = label_count[cur_index]    #矩陣放入對應判斷狀況的次數

        return confusion_matrix

main

語義分割的評價指標mIoU計算以下。

\[MIoU=\frac {1}{k+1}\sum^k_{i=0}\frac{p_{ii}}{\sum^k_{j=0}p_{ij}+\sum^k_{j=0}p_{ji}-p_{ii}}\]

計算每一類的IoU而後求平均。一類的IoU計算方式以下，例如i=1，\(p_{11}\)表示true positives，即本屬於1類且預測也爲1類， \(\sum^k_{j=0}p_{1j}\)表示本屬於1類卻預測爲其餘類的像素點數（注意，這裏包含了\(p_{11}\)），\(\sum^k_{j=0}p_{j1}\)表示本屬於其餘類卻預測爲1類的像素點數（注意，這裏也包含了 \(p_{11}\)），在分母處\(p_{11}\)計算了兩次因此要減去一個\(p_{11}\)

從混淆矩陣定義知，對角線上的元素即爲\(p_{ii}\)，對第i行求和即爲\(\sum^k_{j=0} p_{ij}\)，對第i列求和即爲\(\sum^k_{j=0} p_{ji}\)，因而經過混淆矩陣計算mIoU就很是簡單了，見代碼。

pos = confusion_matrix.sum(1)   #混淆矩陣對行求和
    res = confusion_matrix.sum(0)   #混淆矩陣對列求和
    tp = np.diag(confusion_matrix)  #取出對角元素，即正確判斷的次數

    IU_array = (tp / np.maximum(1.0, pos + res - tp))   #每一類的IoU = ∩/∪ shape(,19)
    mean_IU = IU_array.mean()   #對類取平均
    
    # getConfusionMatrixPlot(confusion_matrix)
    print({'meanIU':mean_IU, 'IU_array':IU_array})
    with open('result.txt', 'w') as f:
        f.write(json.dumps({'meanIU':mean_IU, 'IU_array':IU_array.tolist()}))