一、git clone tensorflow serving 及tensorflow代碼git
二、api
ERROR: /root/.cache/bazel/_bazel_root/f71d782da17fd83c84ed6253a342a306/external/local_config_cuda/crosstool/BUILD:4:1: Traceback (most recent call last): File "/root/.cache/bazel/_bazel_root/f71d782da17fd83c84ed6253a342a306/external/local_config_cuda/crosstool/BUILD", line 4 error_gpu_disabled() File "/root/.cache/bazel/_bazel_root/f71d782da17fd83c84ed6253a342a306/external/local_config_cuda/crosstool/error_gpu_disabled.bzl", line 3, in error_gpu_disabled fail("ERROR: Building with --config=c...") ERROR: Building with --config=cuda but TensorFlow is not configured to build with GPU support. Please re-run ./configure and enter 'Y' at the prompt to build with GPU support. ERROR: no such target '@local_config_cuda//crosstool:toolchain': target 'toolchain' not declared in package 'crosstool ' defined by /root/.cache/bazel/_bazel_root/f71d782da17fd83c84ed6253a342a306/external/local_config_cuda/crosstool/BUILD INFO: Elapsed time: 0.093s INFO: 0 processes. FAILED: Build did NOT complete successfully (0 packages loaded)
解決辦法:ide
export TF_NEED_CUDA="1"
三、看起來./configure配置所有不能生效,所以須要本身配置變量指引cuda、cudnn、nccl等全部的非默認路徑oop
解決辦法:經過變量設置ui
export PATH=$PATH:/env/bazel-0.15.0/bin export TF_NEED_CUDA="1" export CUDNN_INSTALL_PATH="/usr/local/cudnn7.3_cuda9.0" export CUDA_INSTALL_PATH="/usr/local/cuda-9.0" export TF_CUDA_VERSION="9.0" export TF_CUDNN_VERSION="7" export TF_NCCL_VERSION="2.2" export NCCL_INSTALL_PATH="/env/nccl_2.2.13-1+cuda9.0_x86_64" export TEST_TMPDIR=/home
四、遇到nvcc檢測cuda版本與設置不一致的問題。spa
ERROR: no such package '@local_config_cuda//crosstool': Traceback (most recent call last): File "/search/odin/zhangliang/_bazel_root/f71d782da17fd83c84ed6253a342a306/external/org_tensorflow/third_party /gpus/cuda_configure.bzl", line 1447 _create_local_cuda_repository(repository_ctx) File "/search/odin/zhangliang/_bazel_root/f71d782da17fd83c84ed6253a342a306/external/org_tensorflow/third_party/gpus/cuda_configure.bzl", line 1187, in _create_local_cuda_repository _get_cuda_config(repository_ctx) File "/search/odin/zhangliang/_bazel_root/f71d782da17fd83c84ed6253a342a306/external/org_tensorflow/third_party/gpus/cuda_configure.bzl", line 909, in _get_cuda_config _cuda_version(repository_ctx, cuda_toolkit_path, c...) File "/search/odin/zhangliang/_bazel_root/f71d782da17fd83c84ed6253a342a306/external/org_tensorflow/third_party/gpus/cuda_configure.bzl", line 492, in _cuda_version auto_configure_fail(("CUDA version detected from nvc...))) File "/search/odin/zhangliang/_bazel_root/f71d782da17fd83c84ed6253a342a306/external/org_tensorflow/third_party/gpus/cuda_configure.bzl", line 317, in auto_configure_fail fail(("\n%sCuda Configuration Error:%...))) Cuda Configuration Error: CUDA version detected from nvcc (8.0.61) does not match TF_CUDA_VERSION (9.0) INFO: Elapsed time: 0.785s INFO: 0 processes. FAILED: Build did NOT complete successfully (1 packages loaded)
解決辦法,修改cuda toolkit的地址:unix
export CUDA_TOOLKIT_PATH="/usr/local/cuda-9.0"
五、code
ERROR: no such package '@local_config_cc//': Traceback (most recent call last): File "/search/odin/zhangliang/_bazel_root/f71d782da17fd83c84ed6253a342a306/external/bazel_tools/tools/cpp/cc_configure.bzl", line 56 configure_unix_toolchain(repository_ctx, cpu_value, overriden...) File "/search/odin/zhangliang/_bazel_root/f71d782da17fd83c84ed6253a342a306/external/bazel_tools/tools/cpp/unix_cc_configure.bzl", line 477, in configure_unix_toolchain _find_generic(repository_ctx, "gcc", "CC", overriden...) File "/search/odin/zhangliang/_bazel_root/f71d782da17fd83c84ed6253a342a306/external/bazel_tools/tools/cpp/unix_cc_configure.bzl", line 459, in _find_generic auto_configure_fail(msg) File "/search/odin/zhangliang/_bazel_root/f71d782da17fd83c84ed6253a342a306/external/bazel_tools/tools/cpp/lib_cc_configure.bzl", line 109, in auto_configure_fail fail(("\n%sAuto-Configuration Error:%...))) Auto-Configuration Error: Cannot find gcc or CC (gcc -std=gnu99); either correct your path or set the CC environment variable ERROR: Analysis of target '//tensorflow_serving/model_servers:tensorflow_model_server' failed; build aborted: no such package '@local_config_cc//': Traceback (most recent call last): File "/search/odin/zhangliang/_bazel_root/f71d782da17fd83c84ed6253a342a306/external/bazel_tools/tools/cpp/cc_configure.bzl", line 56 configure_unix_toolchain(repository_ctx, cpu_value, overriden...) File "/search/odin/zhangliang/_bazel_root/f71d782da17fd83c84ed6253a342a306/external/bazel_tools/tools/cpp/unix_cc_configure.bzl", line 477, in configure_unix_toolchain _find_generic(repository_ctx, "gcc", "CC", overriden...) File "/search/odin/zhangliang/_bazel_root/f71d782da17fd83c84ed6253a342a306/external/bazel_tools/tools/cpp/unix_cc_configure.bzl", line 459, in _find_generic auto_configure_fail(msg) File "/search/odin/zhangliang/_bazel_root/f71d782da17fd83c84ed6253a342a306/external/bazel_tools/tools/cpp/lib_cc_configure.bzl", line 109, in auto_configure_fail fail(("\n%sAuto-Configuration Error:%...))) Auto-Configuration Error: Cannot find gcc or CC (gcc -std=gnu99); either correct your path or set the CC environment variable INFO: Elapsed time: 2.579s INFO: 0 processes. FAILED: Build did NOT complete successfully (4 packages loaded)
解決辦法:orm
CC=/usr/bin/gcc
六、server
ERROR: /search/odin/zhangliang/code/serving-1.14/serving/tensorflow_serving/model_servers/BUILD:356:1: Linking of rule '//tensorflow_serving/model_servers:tensorflow_model_server' failed (Exit 1) /usr/bin/ld: bazel-out/k8-opt/bin/tensorflow_serving/model_servers/_objs/tensorflow_model_server/tensorflow_serving/mo del_servers/version.o: relocation R_X86_64_32 against `.rodata' can not be used when making a shared object; recompile with -fPIC bazel-out/k8-opt/bin/tensorflow_serving/model_servers/_objs/tensorflow_model_server/tensorflow_serving/model_servers/version.o: could not read symbols: Bad value collect2: error: ld returned 1 exit status Target //tensorflow_serving/model_servers:tensorflow_model_server failed to build Use --verbose_failures to see the command lines of failed build steps. INFO: Elapsed time: 697.810s, Critical Path: 331.33s INFO: 3321 processes: 3321 local. FAILED: Build did NOT complete successfully
解決辦法:
編譯tensorflow_model_server_main_lib時出錯,查看了一下,發現是linkstamp 「version.cc」時出錯,提示應該加-fPIC。
簡單的繞過辦法:
BUILD
cc_library( name = "tensorflow_model_server_main_lib", srcs = [ "main.cc", ], #hdrs = [ # "version.h", #], #linkstamp = "version.cc", visibility = [ ":tensorflow_model_server_custom_op_clients", "//tensorflow_serving:internal", ], deps = [ ":server_lib", "@org_tensorflow//tensorflow/c:c_api", "@org_tensorflow//tensorflow/core:lib", "@org_tensorflow//tensorflow/core/platform/cloud:gcs_file_system", "@org_tensorflow//tensorflow/core/platform/hadoop:hadoop_file_system", "@org_tensorflow//tensorflow/core/platform/s3:s3_file_system", ], )
main.cc
//#include "tensorflow_serving/model_servers/version.h" ... if (display_version) { std::cout << "TensorFlow ModelServer: " << "r1.12" << "\n" << "TensorFlow Library: " << TF_Version() << "\n"; return 0; }