windows 10 上源碼編譯OpenCV並支持CUDA | compile opencv with CUDA support on windows 10

時間 2019-11-15

標籤 windows 源碼編譯 opencv 支持 cuda compile support 欄目 Windows 简体版

原文原文鏈接

本文首發於我的博客kezunlin.me/post/658069…，歡迎閱讀!html

compile opencv with CUDA support on windows 10 java

Series

Guide

requirements:python

windows: 10
opencv: 3.1.0
nvidia driver: gtx 1060 382.05 (gtx 970m)
GPU arch(s): sm61 (sm52)
cuda: 8.0
cudnn: 5.0.5
cmake: 3.10.0
vs: vs2015 64

nvidia cuda CC

see cuda compute capacitylinux

筆記本版本的顯卡和臺式機的計算能力是有差距的。git

cpu vs gpu

for opencv functionsgithub

get source

Get opencv 3.1.0 for git and fix some bugsweb

git clone https://github.com/opencv/opencv.git
    cd opencv
    git checkout -b v3.1.0 3.1.0複製代碼

# fix bugs for 3.1.0
    git cherry-pick 10896
    git cherry-pick cdb9c
    git cherry-pick 24dbb複製代碼

git branch 複製代碼

master
    * v3.1.0複製代碼

compile

mkdir build && cd build && cmake-gui ..複製代碼

config

configure with VS 2015 win64 with options算法

BUILD_SHARED_LIBS  ON
    CMAKE_CONFIGURATION_TYPES Release # Release
    CMAKE_CXX_FLAGS_RELEASE /MD /O2 /Ob2 /DNDEBUG /MP # for multiple processor複製代碼

WITH_VTK OFF
    BUILD_PERF_TESTS OFF # if ON, build errors occur複製代碼

WITH_CUDA ON
    CUDA_TOOLKIT_ROOT_DIR  C:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v8.0
    #CUDA_ARCH_BIN  3.0 3.5 5.0 5.2 6.0 6.1 # very time-consuming 
    CUDA_ARCH_PTX 3.0複製代碼

for opencvdocker

CUDA_ARCH_BIN 3.0 3.5 5.0 5.2 6.0 6.1 relate withubuntu

-gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;複製代碼

CUDA_ARCH_PTX 3.0 relate with

-gencode;arch=compute_30,code=compute_30;
複製代碼

for caffe

the CUDA_ARCH_BIN parameter specifies multiple architectures so as to support a variety of GPU boards. otherwise, cuda programs will not run with other type of GPU boards.

爲了支持在多個不一樣計算能力的GPU上運行可執行程序，opencv/caffe編譯過程當中須要支持多個不一樣架構，eg. CUDA_ARCH_BIN 3.0 3.5 5.0 5.2 6.0 6.1, 所以編譯過程很是耗時。在編譯的而過程當中儘量選擇須要發佈release版本的GPU架構進行配置編譯。

configure and output:

Selecting Windows SDK version 10.0.14393.0 to target Windows 10.0.17134.
    found IPP (ICV version): 9.0.1 [9.0.1]
    at: C:/compile/opencv/3rdparty/ippicv/unpack/ippicv_win
    CUDA detected: 8.0
    CUDA NVCC target flags: -gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_30,code=compute_30
    Could NOT find Doxygen (missing: DOXYGEN_EXECUTABLE) 
    To enable PlantUML support, set PLANTUML_JAR environment variable or pass -DPLANTUML_JAR=<filepath> option to cmake
    Could NOT find PythonInterp: Found unsuitable version "1.4", but required is at least "3.4" (found C:/Users/zunli/.babun/cygwin/bin/python)
    Could NOT find PythonInterp: Found unsuitable version "1.4", but required is at least "3.2" (found C:/Users/zunli/.babun/cygwin/bin/python)
    Could NOT find Matlab (missing: MATLAB_MEX_SCRIPT MATLAB_INCLUDE_DIRS MATLAB_ROOT_DIR MATLAB_LIBRARIES MATLAB_LIBRARY_DIRS MATLAB_MEXEXT MATLAB_ARCH MATLAB_BIN) 複製代碼

General configuration for OpenCV 3.1.0 =====================================
      Version control:               3.1.0-3-g5e9beb8複製代碼

Platform:
        Host:                        Windows 10.0.17134 AMD64
        CMake:                       3.10.0
        CMake generator:             Visual Studio 14 2015 Win64
        CMake build tool:            C:/Program Files (x86)/MSBuild/14.0/bin/MSBuild.exe
        MSVC:                        1900複製代碼

C/C++:
        Built as dynamic libs?:      YES
        C++ Compiler:                C:/Program Files (x86)/Microsoft Visual Studio 14.0/VC/bin/x86_amd64/cl.exe  (ver 19.0.24215.1)
        C++ flags (Release):         /DWIN32 /D_WINDOWS /W4 /GR /EHa  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /wd4251 /wd4324 /wd4275 /wd4589 /MP8  /MD /O2 /Ob2 /DNDEBUG /MP  /Zi
        C++ flags (Debug):           /DWIN32 /D_WINDOWS /W4 /GR /EHa  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi  /wd4251 /wd4324 /wd4275 /wd4589 /MP8  /MDd /Zi /Ob0 /Od /RTC1 
        C Compiler:                  C:/Program Files (x86)/Microsoft Visual Studio 14.0/VC/bin/x86_amd64/cl.exe
        C flags (Release):           /DWIN32 /D_WINDOWS /W3  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi    /MP8  /MD /O2 /Ob2 /DNDEBUG  /Zi
        C flags (Debug):             /DWIN32 /D_WINDOWS /W3  /D _CRT_SECURE_NO_DEPRECATE /D _CRT_NONSTDC_NO_DEPRECATE /D _SCL_SECURE_NO_WARNINGS /Gy /bigobj /Oi    /MP8  /MDd /Zi /Ob0 /Od /RTC1 
        Linker flags (Release):      /machine:x64  /INCREMENTAL:NO  /debug
        Linker flags (Debug):        /machine:x64  /debug /INCREMENTAL 
        Precompiled headers:         YES
        Extra dependencies:          comctl32 gdi32 ole32 setupapi ws2_32 vfw32 cudart nppc nppi npps cufft -LC:/Program Files/NVIDIA GPU Computing Toolkit/CUDA/v8.0/lib/x64
        3rdparty dependencies:       zlib libjpeg libwebp libpng libtiff libjasper IlmImf複製代碼

OpenCV modules:
        To be built:                 cudev core cudaarithm flann imgproc ml video cudabgsegm cudafilters cudaimgproc cudawarping imgcodecs photo shape videoio cudacodec highgui objdetect ts features2d calib3d cudafeatures2d cudalegacy cudaobjdetect cudaoptflow cudastereo stitching superres videostab python2
        Disabled:                    world
        Disabled by dependency:      -
        Unavailable:                 java python3 viz複製代碼

Windows RT support:            NO複製代碼

GUI: 
        QT:                          NO
        Win32 UI:                    YES
        OpenGL support:              NO
        VTK support:                 NO複製代碼

Media I/O: 
        ZLib:                        build (ver 1.2.8)
        JPEG:                        build (ver 90)
        WEBP:                        build (ver 0.3.1)
        PNG:                         build (ver 1.6.19)
        TIFF:                        build (ver 42 - 4.0.2)
        JPEG 2000:                   build (ver 1.900.1)
        OpenEXR:                     build (ver 1.7.1)
        GDAL:                        NO複製代碼

Video I/O:
        Video for Windows:           YES
        DC1394 1.x:                  NO
        DC1394 2.x:                  NO
        FFMPEG:                      YES (prebuilt binaries)
          codec:                     YES (ver 56.41.100)
          format:                    YES (ver 56.36.101)
          util:                      YES (ver 54.27.100)
          swscale:                   YES (ver 3.1.101)
          resample:                  NO
          gentoo-style:              YES
        GStreamer:                   NO
        OpenNI:                      NO
        OpenNI PrimeSensor Modules:  NO
        OpenNI2:                     NO
        PvAPI:                       NO
        GigEVisionSDK:               NO
        DirectShow:                  YES
        Media Foundation:            NO
        XIMEA:                       NO
        Intel PerC:                  NO複製代碼

Parallel framework:            Concurrency複製代碼

Other third-party libraries:
        Use IPP:                     9.0.1 [9.0.1]
             at:                     C:/compile/opencv/3rdparty/ippicv/unpack/ippicv_win
        Use IPP Async:               NO
        Use Eigen:                   NO
        Use Cuda:                    YES (ver 8.0)
        Use OpenCL:                  YES
        Use custom HAL:              NO複製代碼

NVIDIA CUDA
        Use CUFFT:                   YES
        Use CUBLAS:                  NO
        USE NVCUVID:                 NO
        NVIDIA GPU arch:             30 35 50 52 60 61
        NVIDIA PTX archs:            30
        Use fast math:               NO複製代碼

OpenCL:
        Version:                     dynamic
        Include path:                C:/compile/opencv/3rdparty/include/opencl/1.2
        Use AMDFFT:                  NO
        Use AMDBLAS:                 NO複製代碼

Python 2:
        Interpreter:                 C:/Python27/python.exe (ver 2.7.13)
        Libraries:                   C:/Python27/libs/python27.lib (ver 2.7.13)
        numpy:                       C:/Python27/lib/site-packages/numpy/core/include (ver 1.11.3)
        packages path:               C:/Python27/Lib/site-packages複製代碼

Python 3:
        Interpreter:                 NO複製代碼

Python (for build):            C:/Python27/python.exe複製代碼

Java:
        ant:                         NO
        JNI:                         C:/Program Files/Java/jdk1.8.0_161/include C:/Program Files/Java/jdk1.8.0_161/include/win32 C:/Program Files/Java/jdk1.8.0_161/include
        Java wrappers:               NO
        Java tests:                  NO複製代碼

Matlab:                        Matlab not found or implicitly disabled複製代碼

Documentation:
        Doxygen:                     NO
        PlantUML:                    NO複製代碼

Tests and samples:
        Tests:                       YES
        Performance tests:           NO
        C/C++ Examples:              NO複製代碼

Install path:                  C:/compile/opencv/build/install複製代碼

cvconfig.h is in:              C:/compile/opencv/build
    -----------------------------------------------------------------複製代碼

Configuring done
    Generating done複製代碼

Notice for gencode

CUDA NVCC target flags: -gencode;arch=compute_30,code=sm_30;-gencode;arch=compute_35,code=sm_35;-gencode;arch=compute_50,code=sm_50;-gencode;arch=compute_52,code=sm_52;-gencode;arch=compute_60,code=sm_60;-gencode;arch=compute_61,code=sm_61;-gencode;arch=compute_30,code=compute_30
複製代碼

build

Open OpenCV.sln with VS 2015 and build release version.

this may take hours to finish.

errors

possible solutions

With BUILD_PERF_TESTS and BUILD_TESTS disabled, I managed to build OpenCV 3.1 with CUDA 8.0 on Windows 10 with VS2015 x64 arch target. Without building test/performance modules, the build process costs less time as well : )

I actually got it to work both on my laptop and my desktop (GTX960M and GTX970 respectively) running with OpenCV 3.2 and the latest version of CUDA 8.0 for Win10 in Visual Studio 15 Community! What I did was to enable WITH_CUBLAS aswell as WITH_CUDA. I also turned off BUILD_PERF_TESTS and BUILD_TESTS. The configuration was built using the Visual Studio 14 2015 C++ compiler.

my solution:

disable `BUILD_PERF_TESTS`複製代碼

configure and build again. this time cost only about 1 minutes.

after error fixed,build results

demo

cuda-module

OpenCV GPU module is written using CUDA, therefore it benefits from the CUDA ecosystem.

GPU modules includes class cv::cuda::GpuMat which is a primary container for data kept in GPU memory. It’s interface is very similar with cv::Mat, its CPU counterpart. All GPU functions receive GpuMat as input and output arguments. This allows to invoke several GPU algorithms without downloading data. GPU module API interface is also kept similar with CPU interface where possible. So developers who are familiar with Opencv on CPU could start using GPU straightaway.

The GPU module is designed as a host-level API. This means that if you have pre-compiled OpenCV GPU binaries, you are not required to have the CUDA Toolkit installed or write any extra code to make use of the GPU.

CMakeLists.txt

find_package(OpenCV REQUIRED COMPONENTS core highgui imgproc features2d calib3d 
    cudaarithm cudabgsegm cudafilters cudaimgproc cudawarping cudafeatures2d # for cuda-enabled
) #
MESSAGE( [Main] " OpenCV_INCLUDE_DIRS = ${OpenCV_INCLUDE_DIRS}")
MESSAGE( [Main] " OpenCV_LIBS = ${OpenCV_LIBS}")複製代碼

demo.cpp

In the sample below an image is loaded from local file, next it is uploaded to GPU, thresholded, downloaded and displayed.

#include <opencv2/cudaarithm.hpp>
#include <opencv2/cudabgsegm.hpp>
#include <opencv2/cudafilters.hpp>
#include <opencv2/cudaimgproc.hpp>
#include <opencv2/cudawarping.hpp>
#include <opencv2/cudafeatures2d.hpp>

int test_opencv_gpu()
{
    try
    {
        cv::Mat src_host = cv::imread("file.png", CV_LOAD_IMAGE_GRAYSCALE);
        cv::cuda::GpuMat dst, src;
        src.upload(src_host);

        cv::cuda::threshold(src, dst, 128.0, 255.0, CV_THRESH_BINARY);

        cv::Mat result_host;
        dst.download(result_host);

        cv::imshow("Result", result_host);
        cv::waitKey();
    }
    catch (const cv::Exception& ex)
    {
        std::cout << "Error: " << ex.what() << std::endl;
    }
    return 0;
}複製代碼

cpu vs gpu time cost

(1)對於分辨率不特別大的圖片間的ORB特徵匹配，CPU運算得比GPU版的快（因爲圖像上傳到GPU消耗了時間）
(2)但對於分辨率較大的圖片，或者GPU比CPU好的機器（好比Nvidia Jetson系列），GPU版的ORB算法比CPU版的程序更高效。

problems

(1) 使用cuda版本的opencv caffe網絡的第一次建立很是耗時，後面的網絡建立則很是快。

~~(2) opencv的gpu代碼比cpu代碼慢，初次啓動多耗費20s左右~~。(事實是因爲編譯的caffe和GPU計算力不匹配致使的)

reasons

Your problem is that CUDA needs to initialize! And it will generally takes between serveral seconds

Why first function call is slow?

That is because of initialization overheads. On first GPU function call Cuda Runtime API is initialized implicitly.

The first gpu function call is always takes more time, because CUDA initialize context for device.

The following calls will be faster.

Not Reasons:

(1) CPU clockspeed is 10x faster than GPU clockspeed.

(2) memory transfer times between host (CPU) and device (GPU) (upload,downloa data)

deploy

runtime errors

gtx 1060 編譯的opencv caffe在gtx 970m上運行出現錯誤

im2col.cu Check failed: error == cudaSuccess (8 vs. 0) invalid device function

gtx 1060   sm_61
        gtx 970m   sm_52複製代碼

im2col 是caffe的源文件，代表gtx 970m的計算能力不支持可執行文件的運行。

reasons

see what-is-the-purpose-of-using-multiple-arch-flags-in-nvidias-nvcc-compiler

Roughly speaking, the code compilation flow goes like this:

CUDA C/C++ device code source --> PTX --> SASS

The virtual architecture (e.g. compute_20, whatever is specified by -arch compute...) determines what type of PTX code will be generated. The additional switches (e.g. -code sm_21) determine what type of SASS code will be generated. SASS is actually executable object code for a GPU (machine language). An executable can contain multiple versions of SASS and/or PTX, and there is a runtime loader mechanism that will pick appropriate versions based on the GPU actually being used.

win7/win10 deploy

compile opencv caffe on windows 10 for GTX 1060
deoply on windows 7 for GTX 1080 Ti successfully

for win7, if we install 398.82-desktop-win8-win7-64bit-international-whql.exe,errors may occur:

> nvidia-smi.exe 
    Failed to initialize NVML: Unknown error複製代碼

Solutions: use older drivers 385.69

linux/window performance

(1) api在linux平均耗時3ms;一樣的代碼在windows平均耗時14ms

(2) vs編譯開啓代碼優化先後性能相差接近5倍，125ms vs 25ms

(3) cmake編譯RELEASE選項默認已經開啓了代碼優化 -O3

Reference

History

20180713: created.

Copyright

Post author: kezunlin
Post link: kezunlin.me/post/658069…
Copyright Notice: All articles in this blog are licensed under CC BY-NC-SA 3.0 unless stating additionally.

相關標籤/搜索

cuda

cuda+cudnn+opencv+darknet

windows+vs2010+opencv

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。