[Pytorch]docker共享內存問題

ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm)

問題

ERROR: Unexpected bus error encountered in worker. This might be caused by insufficient shared memory (shm)

出現這個錯誤的狀況是,在服務器上的docker中運行訓練代碼時,batch size設置得過大,shared memory不夠(由於docker限制了shm).git

根據PyTorch README:github

Please note that PyTorch uses shared memory to share data between processes, so if torch multiprocessing is used (e.g. for multithreaded data loaders) the default shared memory segment size that container runs with is not enough, and you should increase shared memory size either with --ipc=host or --shm-size command line options to nvidia-docker run.

解決方案

1.這裏說明PyTorch的IPC會利用共享內存,因此共享內存必須足夠大,能夠經過docker run --shm-size進行修改
2.經過設置 --ipc=host
3.將Dataloader的num_workers設置爲0.但訓練會變慢docker

yolov3 issue#283服務器

PyTorch On K8S 共享內存問題定位code

Pytorch的12個坑ip

相關文章
相關標籤/搜索