Learning to Compare: Relation Network 源碼調試

時間 2020-05-22

標籤 learning compare relation network 源碼調試欄目系統網絡简体版

原文原文鏈接

CVPR 2018 的一篇少樣本學習論文python

Learning to Compare: Relation Network for Few-Shot Learninglinux

源碼地址：https://github.com/floodsung/LearningToCompare_FSLgit

在本身的破筆記本上跑了下這個源碼，windows 系統，pycharm + Anaconda3 + pytorch-cpu 1.0.1github

報了一堆bug, 總結以下：windows

procs_images.py裏 ‘cp’報錯

用procs_images.py處理 miniImangenet 數據集的時候：學習

報錯信息： /LearningToCompare_FSL-master/datas/miniImagenet/proc_images.py 'cp' �����ڲ����ⲿ���Ҳ���ǿ����еĳ������������ļ���

具體位置是spa

/datas/miniImagenet/procs_images.py Line 48:
os.system('cp images/' + image_name + ' ' + cur_dir)

這個‘cp’是linux環境運行的。scala

用windows系統的話要改爲：code

os.rename('images/' + image_name, cur_dir + image_name)

除此以外，全部的 os.system('mkdir ' + filename)orm

也要改爲 os.mkdir(filename)，雖然不必定會報錯。

cpu RuntimeError: Attempting to deserialize object on a CUDA device but torch.cuda.is_available() is False.

個人torch版本是是cpu, 因此把全部 .cuda(GPU)刪了，另外

使用torch.load時添加，map_location ='cpu'

以miniImagenet_train_few_shots.py 爲例 Line 150: feature_encoder.load_state_dict(torch.load(str("./models/omniglot_feature_encoder_" + str(CLASS_NUM) +"way_" + str(SAMPLE_NUM_PER_CLASS) +"shot.pkl"))) 改爲 feature_encoder.load_state_dict(torch.load(str("./models/omniglot_feature_encoder_" + str(CLASS_NUM) +"way_" + str(SAMPLE_NUM_PER_CLASS) +"shot.pkl"),map_location = 'cpu' )) Line:153: relation_network.load_state_dict(torch.load(str("./models/miniimagenet_relation_network_"+ str(CLASS_NUM) +"way_" + str(SAMPLE_NUM_PER_CLASS) +"shot.pkl"))) 改爲 relation_network.load_state_dict(torch.load(str("./models/miniimagenet_relation_network_"+ str(CLASS_NUM) +"way_" + str(SAMPLE_NUM_PER_CLASS) +"shot.pkl"),map_location = 'cpu'))

KeyError: '..\\datas\\omniglot_resized'

報錯信息： File "LearningToCompare_FSL-master/omniglot/omniglot_train_few_shot.py", line 163, in main task = tg.OmniglotTask(metatrain_character_folders,CLASS_NUM,SAMPLE_NUM_PER_CLASS,BATCH_NUM_PER_CLASS) File "LearningToCompare_FSL-master\omniglot\task_generator.py", line 72, in <listcomp> self.train_labels = [labels[self.get_class(x)] for x in self.train_roots] KeyError: '..\\datas\\omniglot_resized'

關鍵的地方實際上是在：

 task_generator.py, line 74:

　　def get_class(self, sample): return os.path.join(*sample.split('/')[:-1])

print (os.path.join(*sample.split('/')[:-1])) 結果是

..\datas\omniglot_resized

而labels是

{'../datas/omniglot_resized/Malay_(Jawi_-_Arabic)\\character25': 0, '../datas/omniglot_resized/Japanese_(hiragana)\\character15': 1, '…}

而 print(os.path.join(*sample.split('\\')[:-1])) 結果正是

../datas/omniglot_resized/Malay_(Jawi_-_Arabic)\character25

解決方法：把'/'改爲'\\'便可 
def get_class(self, sample): return os.path.join(*sample.split('\\')[:-1])

RuntimeError: Expected object of scalar type Long but got scalar type Int for argument #3 'index'

報錯信息：
File "/LearningToCompare_FSL-master/miniimagenet/miniimagenet_train_few_shot.py", line 193, in main torch.zeros(BATCH_NUM_PER_CLASS * CLASS_NUM, CLASS_NUM).scatter_(1, batch_labels.view(-1, 1), 1)) RuntimeError: Expected object of scalar type Long but got scalar type Int for argument #3 'index'

解決方法：在前面加一句

 batch_labels = batch_labels.long()

RuntimeError: Expected object of scalar type Long but got scalar type Int for argument #2 'other'

報錯信息： 
File "LearningToCompare_FSL-master/miniimagenet/miniimagenet_test_few_shot.py", line 247, in <listcomp> rewards = [1 if predict_labels[j]==test_labels[j] else 0 for j in range(batch_size)] RuntimeError: Expected object of scalar type Long but got scalar type Int for argument #2 'other'

解決方法：在前面加上

predict_labels = predict_labels.long() test_labels = test_labels.long()

這兩個好像是使用torch的數據格式問題

IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number

報錯信息： File "LearningToCompare_FSL-master/miniimagenet/miniimagenet_train_few_shot.py", line 212, in main print("episode:",episode+1,"loss",loss.data[0]) IndexError: invalid index of a 0-dim tensor. Use tensor.item() to convert a 0-dim tensor to a Python number 按要求改爲 print("episode:", episode + 1, "loss", loss.item()) 就能夠了

RuntimeError: output with shape [1, 28, 28] doesn't match the broadcast shape [3, 28, 28]

報錯信息： File "LearningToCompare_FSL-master\omniglot\task_generator.py", line 107, in __getitem__ image = self.transform(image) File "...\Anaconda3\envs\python36\lib\site-packages\torchvision\transforms\transforms.py", line 60, in __call__ img = t(img) File "...\Anaconda3\envs\python36\lib\site-packages\torchvision\transforms\transforms.py", line 163, in __call__ return F.normalize(tensor, self.mean, self.std, self.inplace) File "...\Anaconda3\envs\python36\lib\site-packages\torchvision\transforms\functional.py", line 208, in normalize tensor.sub_(mean[:, None, None]).div_(std[:, None, None]) RuntimeError: output with shape [1, 28, 28] doesn't match the broadcast shape [3, 28, 28]

這個是使用Omniglot數據集時的報錯，主要緣由在於

"\omniglot\task_generator.py", line 139: def get_data_loader(task, num_per_class=1, split='train',shuffle=True,rotation=0): normalize = transforms.Normalize(mean=[0.92206, 0.92206, 0.92206], std=[0.08426, 0.08426, 0.08426]) dataset = Omniglot(task,split=split,transform=transforms.Compose([Rotate(rotation),transforms.ToTensor(),normalize]))

使用 torch.transforms 中 normalize 用了 3 通道，而實際使用的數據集Omniglot 圖片大小是 [1, 28, 28]

解決方法：

把 normalize = transforms.Normalize(mean=[0.92206, 0.92206, 0.92206], std=[0.08426, 0.08426, 0.08426]) 改爲 normalize = transforms.Normalize(mean=[0.92206], std=[0.08426])

UserWarning: nn.functional.sigmoid is deprecated.

相似的warning 還有

UserWarning : torch.nn.utils.clip_grad_norm is now deprecated in favor of torch.nn.utils.clip_grad_norm_.

按要求改就行

torch.nn.utils.clip_grad_norm(feature_encoder.parameters(), 0.5) 改爲 torch.nn.utils.clip_grad_norm_(feature_encoder.parameters(), 0.5) def forward裏的 out = F.sigmoid(self.fc2(out)) 改爲 out = F.torch.sigmoid(self.fc2(out))