做爲國內最先和OpenCV創建合做的公司,OPEN AI LAB配合本次OpenCV V4.5.0的迭代,將集成到OpenCV的Tengine也同步進行了升級,作出了深度的優化,極大的提高了穩定性與效率。做爲OpenCV項目的負責人,OPEN AI LAB 的高級軟件工程師——李琦老師爲你們詳細的分享一下爲何OpenCV選擇將Tengine做爲DNN ARM後端?以及如何爲DNN添加Tengine後端。【注:本文圖片中部分代碼已更新,有任何疑問請加入Tengine開發者社羣(文末掃碼加入開發者QQ羣)】java
李琦:「很榮幸能加入OPEN AI LAB , 遇到一些很棒的人和事,這樣層層的榮幸疊加,讓我有幸能遇到OpenCV中國團隊,而且能借此將Tengine和OpenCV結合起來。我這篇將文章圍繞OpenCV裏面集成Tengine的這項功能的開發流程來說。」linux
Tengine是OPEN AI LAB(開放智能)的開源邊緣AI推理框架,自己是聚焦在端側的推理,針對ARM不一樣的核都有不一樣的彙編優化實現,在如今國內推理框架層出不窮的時代,Tengine還能穩穩的守住性能王者的位置,也是得益於這一塊的優化能力。你們確定也知道,OpenCV是宇宙最強的計算機視覺庫,在神經網絡大火的年代也是很早就作了很全的推理的實現,並且接口簡單,對老用戶來講極其方便,可是在ARM上的性能確實也是還有很大的優化空間。在這樣的一個前提下,強強聯合,便產生了這樣的一個需求。android
實現的整體方案是先解決性能的大頭,神經網絡推理性能耗時八成是在卷積的計算,Tengine在卷積的實現上有采用了高效的手工彙編優化,因此就按照將卷積移植到OpenCV的邏輯來作,以下圖示:git
這裏主要有如下兩個問題:github
- 如何在OpenCV的卷積運算的時候調用Tengine?這裏麪包括了OpenCV的圖調用邏輯、卷積的調用邏輯、卷積的參數傳遞、數據排布等等兼容性問題;
-
如何將Tengine順利嫁接到OpenCV上?僅僅移植卷積實現,仍是移植整個Tengine的架構?編譯如何無縫連接?後端
在方案的早期基本就肯定了將Tengine做爲總體嵌入,編譯直接對接,卷積計算以整個圖的方式被調用,並以單層構圖的方式運行,邏輯以下圖網絡
這種方式將Tengine做爲一個外掛的庫動態編譯到OpenCV中,而且被調用執行,須要完成如下工做來實現:架構
- OpenCV的集成編譯。此步驟須要在OpenCV編譯的時候將Tengine編譯進去,涉及到了解OpenCV的編譯以及Tengine的編譯和調用。
-
卷積計算圖的調用。要了解OpenCV的單層計算的參數傳遞和流程,保證能順利調用Tengine進行計算。框架
-
完整的測試。包括OpenCV的CI測試和性能測試。 ide
集成編譯
下圖是集成編譯的調用關係。
實際代碼的修改和解釋包括:
a. 主CMakeList.txt
b. opencv/cmake/OpenCVFindTengine.cmake
set(OPENCV_LIBTENGINE_ROOT_DIR "" CACHE PATH "Where to look for additional OpenCV modules (can be ;-separated list of paths)") ## 設置用戶可配置Tengine的目錄。 IF(OPENCV_LIBTENGINE_ROOT_DIR) ## 若是配置了Tengine的目錄,使能對應的開關 MESSAGE(STATUS "TENGINE:-- Set tengine lib dir by user ") SET(Tengine_FOUND ON) set(BUILD_TENGINE OFF) SET(Tengine_INCLUDE_DIR ${OPENCV_LIBTENGINE_ROOT_DIR}/include) SET(Tengine_LIB ${OPENCV_LIBTENGINE_ROOT_DIR}/lib/libtengine.a) ELSE() ## 若是沒有配置目錄,就會調用到tengine.cmake的腳本去下載tengine源碼,並編譯 MESSAGE(STATUS "TENGINE:-- Auto download Tengine source code. ") include("${OpenCV_SOURCE_DIR}/3rdparty/libtengine/tengine.cmake") ENDIF() IF(NOT Tengine_LIB) ## 對庫文件的檢測,若是沒有,會報異常,並關掉Tengine SET(Tengine_FOUND OFF) MESSAGE(STATUS "#### Could not find Tengine lib. Turning Tengine_FOUND off") ENDIF() IF (Tengine_FOUND) ## 不論是配置了庫,仍是自動下載源碼了,此處都會配置相關的頭文件和庫文件路徑 MESSAGE(STATUS "Found Tengine include: ${Tengine_INCLUDE_DIR}") MESSAGE(STATUS "Found Tengine libraries: ${Tengine_LIB}") set(HAVE_TENGINE 1) set(TENGINE_LIBRARIES ${Tengine_LIB}) set(TENGINE_INCLUDE_DIRS ${Tengine_INCLUDE_DIR}) ENDIF (Tengine_FOUND) MESSAGE(STATUS "Tengine include is:" ${Tengine_INCLUDE_DIR}) MESSAGE(STATUS "Tengine library is:" ${Tengine_LIB}) MARK_AS_ADVANCED( Tengine_INCLUDE_DIR Tengine_LIB Tengine )
c. opencv/3rdparty/libtengine/tengine.cmake
SET(TENGINE_VERSION "tengine-opencv") SET(OCV_TENGINE_DSTDIRECTORY ${OpenCV_BINARY_DIR}/3rdparty/libtengine) SET(DEFAULT_OPENCV_TENGINE_SOURCE_PATH ${OCV_TENGINE_DSTDIRECTORY}/Tengine-${TENGINE_VERSION}) IF(EXISTS ${DEFAULT_OPENCV_TENGINE_SOURCE_PATH}) ## 若是存在Tengine已經下載好的源碼,那麼不會重複下載,自動編譯便可 MESSAGE(STATUS "Tengine is exist already .") SET(Tengine_FOUND ON) set(BUILD_TENGINE ON) ELSE() SET(OCV_TENGINE_FILENAME "${TENGINE_VERSION}.zip") #name2 SET(OCV_TENGINE_URL "https://github.com/OAID/Tengine/archive/") #url2 SET(tengine_md5sum 9c80d91dc8413911522ec80cde013ae2) #md5sum2 MESSAGE(STATUS "**** TENGINE DOWNLOAD BEGIN ****") ocv_download(FILENAME ${OCV_TENGINE_FILENAME} ## 下載Tengine源碼 HASH ${tengine_md5sum} URL "${OPENCV_TENGINE_URL}" "$ENV{OPENCV_TENGINE_URL}" "${OCV_TENGINE_URL}" DESTINATION_DIR ${OCV_TENGINE_DSTDIRECTORY} ID TENGINE STATUS res UNPACK RELATIVE_URL) if (NOT res) ## 下載不成功,關掉TENGINE MESSAGE(STATUS "TENGINE DOWNLOAD FAILED .Turning Tengine_FOUND off.") SET(Tengine_FOUND OFF) else () MESSAGE(STATUS "TENGINE DOWNLOAD success . ") SET(Tengine_FOUND ON) set(BUILD_TENGINE ON) endif() ENDIF() if (BUILD_TENGINE) set(HAVE_TENGINE 1) # android system if(ANDROID) ## 配置android系統下須要傳遞給tengine的參數,是arm32仍是arm64 if(${ANDROID_ABI} STREQUAL "armeabi-v7a") set(CONFIG_ARCH_ARM32 ON) elseif(${ANDROID_ABI} STREQUAL "arm64-v8a") set(CONFIG_ARCH_ARM64 ON) endif() endif() # linux system ## 配置linux系統下須要傳遞給tengine的參數,是arm32仍是arm64 if(CMAKE_SYSTEM_PROCESSOR STREQUAL arm) set(CONFIG_ARCH_ARM32 ON) elseif(CMAKE_SYSTEM_PROCESSOR STREQUAL aarch64) ## AARCH64 set(CONFIG_ARCH_ARM64 ON) endif() SET(DEFAULT_OPENCV_TENGINE_SOURCE_PATH ${OCV_TENGINE_DSTDIRECTORY}/Tengine-${TENGINE_VERSION}) set(BUILT_IN_OPENCV ON) ## set for tengine compile discern. set(Tengine_INCLUDE_DIR ${DEFAULT_OPENCV_TENGINE_SOURCE_PATH}/core/include) set(Tengine_LIB ${CMAKE_BINARY_DIR}/lib/${ANDROID_ABI}/libtengine.a) if ( IS_DIRECTORY ${DEFAULT_OPENCV_TENGINE_SOURCE_PATH}) ## 添加編譯Tengine add_subdirectory("${DEFAULT_OPENCV_TENGINE_SOURCE_PATH}" ${OCV_TENGINE_DSTDIRECTORY}/build) endif() endif()
d. modules/dnn/CMakeLists.txt
完成如上修改基本上就達到了能夠直接從OpenCV中調用Tengine,自動下載Tengine而且編譯好給後面卷積計算的調用和連接。
卷積推理的調用
關於卷積的計算調用流程以下:
看上圖就會明白,若是須要修改卷積最底層的實現,最終須要修改和了解的是接口:cv::dnn::ConvolutionLayerImpl::forward。該接口的實現是在文件convolution_layer.cpp 中。
實際上,在該接口中調用Tengine還須要瞭解卷積計算須要的一些參數,如下是實際調用的參數傳遞過程:
bool tengine_ret = tengine_forward(input_, inch, ngroups, in_h, in_w, ## 輸入的數據和尺寸 output_, out_b, outch, out_h, out_w, ## 輸出的數據和尺寸 kernel_, kernel_size.size(), kernel.height, kernel.width, ##輸入的參數和尺寸 teg_bias, stride.height, stride.width, pad.height, pad.width, dilation.height, dilation.width, weightsMat.step1(), padMode);
詳細實現以下:
// 添加頭文件 #ifdef HAVE_TENGINE #include "../tengine4dnn/include/tengine_graph_convolution.hpp" #endif #ifdef HAVE_TENGINE int inch = inputs[0].size[1]; // inch int in_h = inputs[0].size[2]; // in_h int in_w = inputs[0].size[3]; // in_w int out_b = outputs[0].size[0]; // out batch size int outch = outputs[0].size[1]; // outch int out_h = outputs[0].size[2]; // out_h int out_w = outputs[0].size[3]; // out_w float *input_ = inputs[0].ptr<float>(); float *output_ = outputs[0].ptr<float>(); float *kernel_ = weightsMat.ptr<float>(); float *teg_bias = &biasvec[0]; ## 調用tengine的forward,全部的參數都在該函數傳遞進去 bool tengine_ret = tengine_forward(input_, inch, ngroups, in_h, in_w, output_, out_b, outch, out_h, out_w, kernel_, kernel_size.size(), kernel.height, kernel.width, teg_bias, stride.height, stride.width, pad.height, pad.width, dilation.height, dilation.width, weightsMat.step1(), padMode); /* activation */ if((true == tengine_ret) && activ ) ## 若是Tengine推理成功且帶有activation的實現,則會調用OpenCV去進行activation的計算 { int out_cstep = out_h * out_w; // out_cstep ParallelConv::run(inputs[0], outputs[0], weightsMat, biasvec, reluslope, kernel_size, strides, pads_begin, pads_end, dilations, activ.get(), ngroups, nstripes); ActivationLayer* activ_ = activ.get(); activ_->forwardSlice(output_, output_, out_cstep, out_cstep, 0, outch); } if(false == tengine_ret) ## 若是使用tengine推理失敗,會自動調用OpenCV原始的實現 #endif { int nstripes = std::max(getNumThreads(), 1); ParallelConv::run(inputs[0], outputs[0], weightsMat, biasvec, reluslope, kernel_size, strides, pads_begin, pads_end, dilations, activ.get(), ngroups, nstripes); } }
上面就是將Tengine集成進OpenCV的最主要兩大塊工做的介紹,實際上還有更多的技術細節此處沒有涉及到。好比Tengine裏面怎麼實現單層的卷積計算,怎麼能徹底複用OpenCV傳遞過來的數據地址,而不作重複的數據拷貝,性能的提高主要緣由,在編譯成功Tengine的庫以後怎麼能在DNN模塊裏面調用到Tengine的接口,OpenCV裏面自動下載第三方的庫是怎麼實現的,有沒有其餘路徑,每一個convolution都建立一遍圖對性能不會有很大的損耗嗎?CI測試等等。因爲篇幅有限,此處不作介紹,這些將會在後續的技術文章中一一介紹。