自定義基於Ubuntu16.04的cuda10 + pytorch1.5基礎鏡像包

重點:使用官方腳本而且用國外服務器直接構建鏡像,迅速&順利。

說明:

1.因爲國內網絡環境不穩定,使用官方dockerfile製做鏡像很容易一些包下載不成功,致使生成的包不徹底
2.直接開了一臺國外服務器使用nvidia官方的cuda dockerfile文件進行鏡像構建,很是迅速且順利!html

參考官方dockerfile

https://gitlab.com/nvidia/container-images/cuda/-/tree/master/dist/10.2/ubuntu16.04-x86_64

如下是踩坑過程,能夠參考排錯

1.製做基礎包base

FROM ubuntu:16.04
LABEL maintainer "NVIDIA CORPORATION <cudatools@nvidia.com>"

# 添加了apt-get update的參數,要求apt服務器傳輸無緩存文件,防止GPG報錯,沒法下載deb包
# 此處比官方多添加一個取消認證選項
# 修改nvidia.com 爲nvidia.cn
# 修改apt鏡像源爲中科大
RUN  echo "deb http://mirrors.ustc.edu.cn/ubuntu/ xenial main restricted universe multiverse \
       deb http://mirrors.ustc.edu.cn/ubuntu/ xenial-security main restricted universe multiverse \
       deb http://mirrors.ustc.edu.cn/ubuntu/ xenial-updates main restricted universe multiverse \
       deb http://mirrors.ustc.edu.cn/ubuntu/ xenial-proposed main restricted universe multiverse \
       deb http://mirrors.ustc.edu.cn/ubuntu/ xenial-backports main restricted universe multiverse \
       deb-src http://mirrors.ustc.edu.cn/ubuntu/ xenial main restricted universe multiverse \
       deb-src http://mirrors.ustc.edu.cn/ubuntu/ xenial-security main restricted universe multiverse\
       deb-src http://mirrors.ustc.edu.cn/ubuntu/ xenial-updates main restricted universe multiverse \
       deb-src http://mirrors.ustc.edu.cn/ubuntu/ xenial-proposed main restricted universe multiverse \
       deb-src http://mirrors.ustc.edu.cn/ubuntu/ xenial-backports main restricted universe multiverse" | tee /etc/apt/sources.list 
    && apt-get -o Acquire::https::No-Cache=True -o Acquire::http::No-Cache=True update \
    && apt-get install -y --no-install-recommends  --allow-unauthenticated \ 
    ca-certificates apt-transport-https gnupg-curl && \
    NVIDIA_GPGKEY_SUM=d1be581509378368edeec8c1eb2958702feedf3bc3d17011adbf24efacce4ab5 && \
    NVIDIA_GPGKEY_FPR=ae09fe4bbd223a84b2ccfce3f60f4b3d7fa2af80 && \
    apt-key adv --fetch-keys https://developer.download.nvidia.cn/compute/cuda/repos/ubuntu1604/x86_64/7fa2af80.pub && \
    apt-key adv --export --no-emit-version -a $NVIDIA_GPGKEY_FPR | tail -n +5 > cudasign.pub && \
    echo "$NVIDIA_GPGKEY_SUM  cudasign.pub" | sha256sum -c --strict - && rm cudasign.pub && \
    echo "deb https://developer.download.nvidia.cn/compute/cuda/repos/ubuntu1604/x86_64 /" > /etc/apt/sources.list.d/cuda.list && \
    echo "deb https://developer.download.nvidia.cn/compute/machine-learning/repos/ubuntu1604/x86_64 /" > /etc/apt/sources.list.d/nvidia-ml.list && \
    apt-get purge --auto-remove -y gnupg-curl \
    && rm -rf /var/lib/apt/lists/*

ENV CUDA_VERSION 10.2.89
ENV CUDA_PKG_VERSION 10-2=$CUDA_VERSION-1

# For libraries in the cuda-compat-* package: https://docs.nvidia.com/cuda/eula/index.html#attachment-a
# 添加了apt-get update的參數,要求apt服務器傳輸無緩存文件,防止GPG報錯,沒法下載deb包
# 此處比官方多添加一個取消認證選項
RUN apt-get -o Acquire::https::No-Cache=True -o Acquire::http::No-Cache=True update \
    && apt-get install -y --no-install-recommends  --allow-unauthenticated \ 
    cuda-cudart-$CUDA_PKG_VERSION \
    cuda-compat-10-2 \
    && ln -s cuda-10.2 /usr/local/cuda && \
    rm -rf /var/lib/apt/lists/*

# Required for nvidia-docker v1
RUN echo "/usr/local/nvidia/lib" >> /etc/ld.so.conf.d/nvidia.conf && \
    echo "/usr/local/nvidia/lib64" >> /etc/ld.so.conf.d/nvidia.conf

ENV PATH /usr/local/nvidia/bin:/usr/local/cuda/bin:${PATH}
ENV LD_LIBRARY_PATH /usr/local/nvidia/lib:/usr/local/nvidia/lib64

# nvidia-container-runtime
ENV NVIDIA_VISIBLE_DEVICES all
ENV NVIDIA_DRIVER_CAPABILITIES compute,utility
ENV NVIDIA_REQUIRE_CUDA "cuda>=10.2 brand=tesla,driver>=396,driver<397 brand=tesla,driver>=410,driver<411 brand=tesla,driver>=418,driver<419 brand=tesla,driver>=440,driver<441"
docker build . -t  cuda10.2:10.2-base-ubuntu16.04    # 此處版本+標籤的定義規則與官方dockerfile製做統一,且易於識別tag

2.1基於base包製做runtime包

ARG IMAGE_NAME
FROM ${IMAGE_NAME}:10.2-base-ubuntu16.04

LABEL maintainer "NVIDIA CORPORATION <cudatools@nvidia.com>"

ENV NCCL_VERSION 2.7.8

# 添加了apt-get update的參數,要求apt服務器傳輸無緩存文件,防止GPG報錯,沒法下載deb包
# 此處比官方多添加一個取消認證選項
RUN apt-get -o Acquire::https::No-Cache=True -o Acquire::http::No-Cache=True update \
    && apt-get install -y --no-install-recommends  --allow-unauthenticated \  
    cuda-libraries-$CUDA_PKG_VERSION \
    cuda-npp-$CUDA_PKG_VERSION \
    cuda-nvtx-$CUDA_PKG_VERSION \
    libcublas10=10.2.2.89-1 \
    libnccl2=$NCCL_VERSION-1+cuda10.2 \
    && apt-mark hold libnccl2 \
    && rm -rf /var/lib/apt/lists/*
# 此處須要添加傳遞給ARG的參數
docker  build . -t cuda10.2:10.2-runtime-ubuntu16.04 --build-arg IMAGE_NAME=cuda10.2;CUDA_PKG_VERSION=10.2.89

2.2基於runtime包製做含cudnn7的包

ARG IMAGE_NAME
FROM ${IMAGE_NAME}:10.2-runtime-ubuntu16.04
LABEL maintainer "NVIDIA CORPORATION <cudatools@nvidia.com>"

ENV CUDNN_VERSION 7.6.5.32

LABEL com.nvidia.cudnn.version="${CUDNN_VERSION}"

# 添加了apt-get update的參數,要求apt服務器傳輸無緩存文件,防止GPG報錯,沒法下載deb包
# 此處比官方多添加一個取消認證選項
RUN apt-get -o Acquire::https::No-Cache=True -o Acquire::http::No-Cache=True update   \
    && apt-get install -y --no-install-recommends --allow-unauthenticated \
    libcudnn7=$CUDNN_VERSION-1+cuda10.2 \
    && apt-mark hold libcudnn7 && \
    rm -rf /var/lib/apt/lists/*
# 此處須要添加傳遞給ARG的參數
docker  build . -t cuda10.2:10.2-runtime-cudnn7-ubuntu16.04 --build-arg IMAGE_NAME=cuda10.2
# 未添加apt-get -o update的參數,會出現的GPG報錯
Reading package lists...
W: GPG error: https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64  Release: The following signatures were invalid: BADSIG F60F4B3D7FA2AF80 cudatools <cudatools@nvidia.com>
W: The repository 'https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64  Release' is not signed.
E: Failed to fetch https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1604/x86_64/Packages  Writing more data than expected (1580267 > 1579913)

3.1基於runtime包製做devel包

ARG IMAGE_NAME
FROM ${IMAGE_NAME}:10.2-runtime-ubuntu16.04

LABEL maintainer "NVIDIA CORPORATION <cudatools@nvidia.com>"

ENV NCCL_VERSION 2.7.8
# 添加了四個依賴cuda-cupti-dev,cuda-cupti,cuda-nvcc,cuda-compiler
RUN apt-get update && apt-get install -y --no-install-recommends \
  libbz2-1.0 cpp-5 gcc-5-base libstdc++-5-dev libcc1-0  libgcc-5-dev \
  libc6 cpp libdpkg-perl bzip2 g++-5 gcc-5 \
  libc6-dev gcc g++ dpkg-dev build-essential \
        cuda-nvcc-$CUDA_PKG_VERSION \
        cuda-cupti--$CUDA_PKG_VERSION \
        cuda-cupti-dev-$CUDA_PKG_VERSION \
        cuda-compiler-$CUDA_PKG_VERSION \
    cuda-nvml-dev-$CUDA_PKG_VERSION \
    cuda-command-line-tools-$CUDA_PKG_VERSION \
    cuda-nvprof-$CUDA_PKG_VERSION \
    cuda-npp-dev-$CUDA_PKG_VERSION \
    cuda-libraries-dev-$CUDA_PKG_VERSION \
    cuda-minimal-build-$CUDA_PKG_VERSION \
    libcublas-dev=10.2.2.89-1 \
    libnccl-dev=2.7.8-1+cuda10.2 \
    && apt-mark hold libnccl-dev \
    && rm -rf /var/lib/apt/lists/*

ENV LIBRARY_PATH /usr/local/cuda/lib64/stubs
# 構建過程出現的報錯
The following packages have unmet dependencies:
 cuda-command-line-tools-10-2 : Depends: cuda-cupti-dev-10-2 (>= 10.2.89) but it is not going to be installed
 cuda-minimal-build-10-2 : Depends: cuda-compiler-10-2 (>= 10.2.89) but it is not going to be installed
 cuda-compiler-10-2 : Depends: cuda-nvcc-10-2 (>= 10.2.89) but it is not going to be installed
 cuda-cupti-dev-10-2 : Depends: cuda-cupti-10-2 (>= 10.2.89) but it is not going to be installed
 cuda-nvcc-10-2 : Depends: build-essential but it is not going to be installed
 build-essential : Depends: libc6-dev but it is not going to be installed or
                            libc-dev
                   Depends: gcc (>= 4:5.2) but it is not going to be installed
                   Depends: g++ (>= 4:5.2) but it is not going to be installed
                   Depends: dpkg-dev (>= 1.17.11) but it is not going to be installed
dpkg-dev : Depends: libdpkg-perl (= 1.18.4ubuntu1) but it is not going to be installed
            Depends: bzip2 but it is not going to be installed
 g++ : Depends: cpp (>= 4:5.3.1-1ubuntu1) but it is not going to be installed
       Depends: g++-5 (>= 5.3.1-3~) but it is not going to be installed
       Depends: gcc-5 (>= 5.3.1-3~) but it is not going to be installed
 gcc : Depends: cpp (>= 4:5.3.1-1ubuntu1) but it is not going to be installed
       Depends: gcc-5 (>= 5.3.1-3~) but it is not going to be installed
 libc6-dev : Depends: libc6 (= 2.23-0ubuntu3) but 2.23-0ubuntu11.2 is to be installed
 bzip2 : Depends: libbz2-1.0 (= 1.0.6-8) but 1.0.6-8ubuntu0.2 is to be installed
 cpp : Depends: cpp-5 (>= 5.3.1-3~) but it is not going to be installed
 g++-5 : Depends: gcc-5-base (= 5.3.1-14ubuntu2) but 5.4.0-6ubuntu1~16.04.12 is to be installed
         Depends: libstdc++-5-dev (= 5.3.1-14ubuntu2) but it is not going to be installed
 gcc-5 : Depends: cpp-5 (= 5.3.1-14ubuntu2) but it is not going to be installed
         Depends: gcc-5-base (= 5.3.1-14ubuntu2) but 5.4.0-6ubuntu1~16.04.12 is to be installed
         Depends: libcc1-0 (>= 5.3.1-14ubuntu2) but it is not going to be installed
         Depends: libgcc-5-dev (= 5.3.1-14ubuntu2) but it is not going to be installed
 libc6-dev : Depends: libc6 (= 2.23-0ubuntu3) but 2.23-0ubuntu11.2 is to be installed
 libdpkg-perl : Depends: perl but it is not going to be installed
E: Unable to correct problems, you have held broken packages.
相關文章
相關標籤/搜索