藍鯨單機離線部署:app_mgr組件安裝失敗解決

以前在騰訊藍鯨智雲-單機離線部署測試中,遇到了幾個安裝問題,本文記錄下3.2 app_mgr組件安裝失敗 的解決過程,由於這個問題卡了好久(可能也是由於筆者對python相關知識和藍鯨產品不夠熟悉),雖然最終解決了,但過程自己更值得記錄。html

1.問題描述

離線安裝app_mgr組件時失敗:
安裝命令:./bk_install app_mgr
報錯信息以下:node

create virtualenv for paas_agent                  
Requirement already satisfied: pbr in /usr/local/lib/python2.7/site-packages
Requirement already satisfied: virtualenvwrapper in /usr/local/lib/python2.7/site-packages
Requirement already satisfied: virtualenv-clone in /usr/local/lib/python2.7/site-packages (from virtualenvwrapper)
Requirement already satisfied: stevedore in /usr/local/lib/python2.7/site-packages (from virtualenvwrapper)
Requirement already satisfied: virtualenv in /usr/local/lib/python2.7/site-packages (from virtualenvwrapper)
Requirement already satisfied: pbr>=1.6 in /usr/local/lib/python2.7/site-packages (from stevedore->virtualenvwrapper)
Requirement already satisfied: six>=1.9.0 in /usr/local/lib/python2.7/site-packages (from stevedore->virtualenvwrapper)
[192.168.1.6]20200303-174651 224   mkvirtualenv -a /data/bkce/paas_agent/paas_agent --extra-search-dir=/data/install/pip --no-download -p /usr/local/bin/python paas_agent
Already using interpreter /usr/local/bin/python
New python executable in /data/bkce/.envs/paas_agent/bin/python
Installing setuptools, pip, wheel...done.
Setting project for paas_agent to /data/bkce/paas_agent/paas_agent
Ignoring indexes: http://mirrors.cloud.tencent.com/pypi/simple
Requirement already satisfied (use --upgrade to upgrade): pbr in /data/bkce/.envs/paas_agent/lib/python2.7/site-packages
Ignoring indexes: http://mirrors.cloud.tencent.com/pypi/simple
Requirement already satisfied (use --upgrade to upgrade): virtualenvwrapper in /data/bkce/.envs/paas_agent/lib/python2.7/site-packages
Requirement already satisfied (use --upgrade to upgrade): virtualenv-clone in /data/bkce/.envs/paas_agent/lib/python2.7/site-packages (from virtualenvwrapper)
Requirement already satisfied (use --upgrade to upgrade): stevedore in /data/bkce/.envs/paas_agent/lib/python2.7/site-packages (from virtualenvwrapper)
Requirement already satisfied (use --upgrade to upgrade): virtualenv in /data/bkce/.envs/paas_agent/lib/python2.7/site-packages (from virtualenvwrapper)
Requirement already satisfied (use --upgrade to upgrade): pbr>=1.6 in /data/bkce/.envs/paas_agent/lib/python2.7/site-packages (from stevedore->virtualenvwrapper)
Requirement already satisfied (use --upgrade to upgrade): six>=1.9.0 in /data/bkce/.envs/paas_agent/lib/python2.7/site-packages (from stevedore->virtualenvwrapper)
Ignoring indexes: http://mirrors.cloud.tencent.com/pypi/simple
Requirement already satisfied (use --upgrade to upgrade): supervisor in /data/bkce/.envs/paas_agent/lib/python2.7/site-packages
Requirement already satisfied (use --upgrade to upgrade): six in /data/bkce/.envs/paas_agent/lib/python2.7/site-packages
Requirement already satisfied (use --upgrade to upgrade): meld3>=0.6.5 in /data/bkce/.envs/paas_agent/lib/python2.7/site-packages (from supervisor)
[192.168.1.6]20200303-174801 233   generate env variable settings.
[192.168.1.6]20200303-174801 151   exec: pip install --no-cache-dir  -r requirements.txt (/data/bkce/paas_agent/paas_agent)
Collecting Django==1.8.11 (from -r requirements.txt (line 1))
  Retrying (Retry(total=4, connect=None, read=None, redirect=None)) after connection broken by 'NewConnectionError('<pip._vendor.requests.packages.urllib3.connection.HTTPConnection object at 0x7f7b58e91150>: Failed to establish a new connection: [Errno 101] Network is unreachable',)': /pypi/simple/django/
  Retrying (Retry(total=3, connect=None, read=None, redirect=None)) after connection broken by 'NewConnectionError('<pip._vendor.requests.packages.urllib3.connection.HTTPConnection object at 0x7f7b58e91d50>: Failed to establish a new connection: [Errno 101] Network is unreachable',)': /pypi/simple/django/
  Retrying (Retry(total=2, connect=None, read=None, redirect=None)) after connection broken by 'NewConnectionError('<pip._vendor.requests.packages.urllib3.connection.HTTPConnection object at 0x7f7b58e91f10>: Failed to establish a new connection: [Errno 101] Network is unreachable',)': /pypi/simple/django/
  Retrying (Retry(total=1, connect=None, read=None, redirect=None)) after connection broken by 'NewConnectionError('<pip._vendor.requests.packages.urllib3.connection.HTTPConnection object at 0x7f7b58e5c110>: Failed to establish a new connection: [Errno 101] Network is unreachable',)': /pypi/simple/django/
  Retrying (Retry(total=0, connect=None, read=None, redirect=None)) after connection broken by 'NewConnectionError('<pip._vendor.requests.packages.urllib3.connection.HTTPConnection object at 0x7f7b58e5c2d0>: Failed to establish a new connection: [Errno 101] Network is unreachable',)': /pypi/simple/django/
  Could not find a version that satisfies the requirement Django==1.8.11 (from -r requirements.txt (line 1)) (from versions: )
No matching distribution found for Django==1.8.11 (from -r requirements.txt (line 1))
[192.168.1.6]20200303-174900 177   pip install (--no-cache-dir ) for paas_agent.  FAILED
[192.168.1.6]20200303-174900 47   Abort

注意:離線安裝就是指安裝環境沒法鏈接互聯網,若是你的部署環境容許能夠鏈接外網,測試過該組件安裝會很是順利。python

2.初步分析

首先,比較奇怪的是隻有離線安裝app_mgr這個組件時,報錯沒法鏈接網絡,回顧上面的報錯日誌,發現安裝這個組件時:nginx

[192.168.1.6]20200303-174801 233   generate env variable settings.
[192.168.1.6]20200303-174801 151   exec: pip install --no-cache-dir  -r requirements.txt (/data/bkce/paas_agent/paas_agent)

看起來這個pip 命令沒有使用--find-links參數指定本地的路徑,因此嘗試鏈接外網的pip源。
而在其餘組件安裝時,都是有指定這個參數到各自本地路徑的:redis

--好比安裝fta:
[192.168.1.6]20200302-001610 233   generate env variable settings.
[192.168.1.6]20200302-001610 151   exec: pip install --no-cache-dir --no-index --find-links=/data/src/fta/support-files/pkgs -r requirements.txt (/data/bkce/fta/fta)

--好比安裝bkdata
[192.168.1.6]20200302-003237 233   generate env variable settings.
[192.168.1.6]20200302-003237 151   exec: pip install --no-cache-dir --no-index --find-links=/data/src/bkdata/support-files/pkgs -r requirements.txt (/data/bkce/bkdata/dataapi)

能夠看到這類組件安裝在一樣相似的步驟時,都有使用--find-links參數各自指定本地包存放的路徑。shell

初步進行了一些嘗試:django

2.1 直接使用pip離線安裝後再次嘗試單獨安裝app_mgrjson

pip install --no-cache-dir --no-index --find-links=/data/src/paas_agent/support-files/pkgs -r /data/bkce/paas_agent/paas_agent/requirements.txt

pip離線安裝成功,可是再調用安裝./bk_install app_mgr 組件依然報錯,說明手工提早安裝無效。
這大概是由於程序是進入到對應的virtualenv執行的,而虛擬環境相對是獨立的。c#

2.2 找到一些pip.conf的配置文件,備份原文件,修改配置指定本地路徑
嘗試修過的配置文件:/data/src/.pip/pip.conf、/data/install/pip/pip.conf,內容改成:api

[global]
find-links = /data/src/paas_agent/support-files/pkgs
[install]
find-links = /data/src/paas_agent/support-files/pkgs

可是調用安裝./bk_install app_mgr 組件依然報一樣錯誤,說明無效。
後面其餘嘗試會發現有更多的pip.conf,所有修改也是不行。

2.3 設置環境變量
官方文檔搜到一個環境變量PIP_FIND_LINKS:

export PIP_FIND_LINKS=/data/src/paas_agent/support-files/pkgs

再次嘗試調用./bk_install app_mgr 安裝組件,報錯不變。
這大概是由於寫死在程序裏的,相似crontab定時任務同樣,在外部設置變量干預也沒用,必須找到裏面的設置。

2.4 其餘嘗試
好比在bk_install中app_mgr模塊下手工加入上面的環境變量設置,也不行,報錯不變。

3.集思廣益

問題有些陷入僵局,並且顯然是有問題,與客戶反饋上述分析,一致認爲極可能是bug,找藍鯨客服進行反饋。
客服人員的答覆是離線安裝建議配置完整的本地pip源,考慮到全量pip源要接近2T的空間申請,轉換爲進行指定包的pip源搭建。
並且這個解決方案更像是workaround,跳過了問題本質,由於實際其餘組件都不須要,會使用find-links參數指定本地的包目錄。

由於以前沒接觸過,配置本地pip源也耗費了很多時間搜索驗證:

[root@rbtnode1 bin]# find /data -name pip.conf
/data/install/pip/pip.conf
/data/install/pip.conf
/data/src/service/.pip/pip.conf
/data/src/.pip/pip.conf
/data/src/pip.conf


cat /data/install/pip/pip.conf
cat /data/install/pip.conf
cat /data/src/service/.pip/pip.conf
cat /data/src/.pip/pip.conf
cat /data/src/pip.conf
cat ~/.pip/pip.conf

不清楚究竟會用到哪一個pip.conf,因此全部配置文件備份,而後內容統一都改成本地pip源:

[global]
trusted-host = 192.168.1.6
index-url = http://192.168.1.6:8080/simple

關於本地pip源的具體配置,可參考網上這兩篇文章:

可是嘗試安裝仍是報錯。修改globals.env配置文件:

# 設置訪問網絡資源如yum源所使用的HTTP代理地址, 如: BK_PROXY=http://192.168.0.1:8833
export BK_PROXY=http://192.168.1.6:8080/simple

和同事也聊到這個事情,從邏輯上來看仍是應該解決如何跟其餘組件同樣能夠指定find-links參數才能夠。
思路只能是本身從腳本源頭去找,看有沒有對應的設置。從bk_install這個主腳本開始爲入口。

4.最終解決

開始看腳本沒多久就看下去了,由於本身不多運用腳本能力,自己也是弱項。從bk_install到bkcec就看到裏面調用了好多文件,一時找不到頭緒。此時又回頭看最初的報錯日誌,看報錯以前有這樣一行,像是腳本的輸出內容:

[192.168.1.6]20200303-174801 233   generate env variable settings.
[192.168.1.6]20200303-174801 151   exec: pip install --no-cache-dir  -r requirements.txt (/data/bkce/paas_agent/paas_agent)

依據"generate env variable settings"搜索/data/install下全部的文件,發現只有utils.fc文件包含:

[root@rbtnode1 install]# grep "generate env variable settings" *
grep: agent_setup: Is a directory
grep: appmgr: Is a directory
grep: bcs: Is a directory
grep: bin: Is a directory
grep: build: Is a directory
grep: deck: Is a directory
grep: extra: Is a directory
grep: health_check: Is a directory
grep: migrate: Is a directory
grep: pip: Is a directory
grep: scripts: Is a directory
grep: setuptools-36.0.1: Is a directory
grep: support-files: Is a directory
grep: templates: Is a directory
grep: uninstall: Is a directory
utils.fc:    log "generate env variable settings."
grep: verify: Is a directory
[root@rbtnode1 install]# ls -l utils.fc
-rw-r--r-- 1 root root 38897 Jan  9 16:11 utils.fc
[root@rbtnode1 install]# scp utils.fc 192.168.1.61:/tmp/

拷貝下來去看發現有這樣一段代碼比較像:

_install_pypkgs () {
    local module=$1
    local project=$2
    local local_pip_src=$PKG_SRC_PATH/$module/support-files/pkgs
    local pip_options="--no-cache-dir "

    local _ordered_requirement_files=( $( shopt -s nullglob; echo 0[0-9]_requirements*.txt) )

    if [ "${#_ordered_requirement_files[@]}" -eq 0 ]; then
        _ordered_requirement_files=( requirements.txt )
    fi

    for reqr_file in ${_ordered_requirement_files[@]}; do
        if [ "${reqr_file//_local/}" != "$reqr_file" -o -f SELF_CONTAINED_PIP_PKG ]; then
            pip_options="--no-cache-dir --no-index --find-links=$local_pip_src"
        fi

        log "exec: pip install $pip_options -r $reqr_file ($PWD)"
        http_proxy=$BK_PROXY https_proxy=$BK_PROXY \
            pip install $pip_options -r $reqr_file      <-- 這裏pip install 帶的參數$pip_options極可能沒有find-links參數
            
        nassert "pip install ($pip_options) for $venv_name"
    done
    #shopt -s nullglob
}

上面標註的那一行,指出這裏pip install 帶的參數$pip_options極可能沒有find-links參數,由於上面賦予pip_options變量的是在if條件裏面,暫時來不及總體梳理分析,嘗試直接修改 utils.fc 文件加入pip_options的定義:

_install_pypkgs () {
    local module=$1
    local project=$2
    local local_pip_src=$PKG_SRC_PATH/$module/support-files/pkgs
    local pip_options="--no-cache-dir "

    local _ordered_requirement_files=( $( shopt -s nullglob; echo 0[0-9]_requirements*.txt) )

    if [ "${#_ordered_requirement_files[@]}" -eq 0 ]; then
        _ordered_requirement_files=( requirements.txt )
    fi

    for reqr_file in ${_ordered_requirement_files[@]}; do
        if [ "${reqr_file//_local/}" != "$reqr_file" -o -f SELF_CONTAINED_PIP_PKG ]; then
            pip_options="--no-cache-dir --no-index --find-links=$local_pip_src"
        fi

        log "exec: pip install $pip_options -r $reqr_file ($PWD)"
        http_proxy=$BK_PROXY https_proxy=$BK_PROXY \
            #pip install $pip_options -r $reqr_file     <-- 以前的這一行註釋,下面兩行是新增,指定pip_options參數值後再調用pip install
            pip_options="--no-cache-dir --no-index --find-links=$local_pip_src"
            pip install $pip_options -r $reqr_file

        nassert "pip install ($pip_options) for $venv_name"
    done
    #shopt -s nullglob
}

修改 utils.fc 後再次測試,發現以前報錯的位置再也不報錯(雖然顯示尚未find-links參數,但實際已經有了):

[192.168.1.6]20200303-214725 235   generate env variable settings.
[192.168.1.6]20200303-214726 151   exec: pip install --no-cache-dir  -r requirements.txt (/data/bkce/paas_agent/paas_agent)
Ignoring indexes: http://192.168.1.6:8080/simple
Collecting Django==1.8.11 (from -r requirements.txt (line 1))
Collecting PyMySQL==0.6.7 (from -r requirements.txt (line 2))

省略部分輸出..

Collecting idna<2.9,>=2.5 (from requests==2.21.0->-r requirements.txt (line 3))
  Could not find a version that satisfies the requirement idna<2.9,>=2.5 (from requests==2.21.0->-r requirements.txt (line 3)) (from versions: )
No matching distribution found for idna<2.9,>=2.5 (from requests==2.21.0->-r requirements.txt (line 3))
[192.168.1.6]20200303-214856 177   pip install (--no-cache-dir --no-index --find-links=/data/src/paas_agent/support-files/pkgs) for paas_agent.  FAILED
[192.168.1.6]20200303-214856 47   Abort
[root@rbtnode1 install]#

但最後又由於缺包停止了安裝。
這個 idna<2.9,>=2.5 在paas_agent的requirements.txt中實際沒有列出來,但實際須要。能夠將其餘位置的包都統一打包到一個目錄(/data/localpip),而後拷貝其餘的包到這個目錄下:

[root@rbtnode1 pkgs]# pwd
/data/src/paas_agent/support-files/pkgs
[root@rbtnode1 pkgs]# ls -l |wc -l
62

[root@rbtnode1 pkgs]# cp -n /data/localpip/* ./
[root@rbtnode1 pkgs]# pwd
/data/src/paas_agent/support-files/pkgs
[root@rbtnode1 pkgs]# ls -l |wc -l
281

而後再嘗試安裝app_mgr:

[root@rbtnode1 pkgs]# cd /data/install/
[root@rbtnode1 install]# ./bk_install app_mgr

此次終於成功了,日誌以下,能夠看到appt安裝成功後接下來仍是安裝appo,均可以成功:

Collecting chardet<3.1.0,>=3.0.2 (from requests==2.21.0->-r requirements.txt (line 3))
Collecting idna<2.9,>=2.5 (from requests==2.21.0->-r requirements.txt (line 3))
Collecting certifi>=2017.4.17 (from requests==2.21.0->-r requirements.txt (line 3))
Installing collected packages: Django, PyMySQL, urllib3, chardet, idna, certifi, requests, pytz, amqp, anyjson, kombu, billiard, celery, django-celery, redis, httplib2, xlrd, xlwt, MarkupSafe, Mako, Jinja2, pycrypto, gunicorn, six, SQLAlchemy, suds, supervisor, uWSGI, pytest-runner, setuptools-scm
  Running setup.py install for anyjson: started
    Running setup.py install for anyjson: finished with status 'done'
  Running setup.py install for billiard: started
    Running setup.py install for billiard: finished with status 'done'

省略部分輸出..

Successfully installed Django-1.8.11 Jinja2-2.8 Mako-1.0.4 MarkupSafe-0.23 PyMySQL-0.6.7 SQLAlchemy-1.0.12 amqp-1.4.9 anyjson-0.3.3 billiard-3.3.0.23 celery-3.1.18 certifi-2019.3.9 chardet-3.0.4 django-celery-3.2.1 gunicorn-19.6.0 httplib2-0.9.1 idna-2.8 kombu-3.0.35 pycrypto-2.6.1 pytest-runner-2.8 pytz-2016.6.1 redis-2.10.5 requests-2.21.0 setuptools-scm-1.11.1 six-1.10.0 suds-0.4 supervisor-3.3.1 uWSGI-2.0.13.1 urllib3-1.24.1 xlrd-1.0.0 xlwt-1.1.2
[192.168.1.6]20200303-222848 175   pip install (--no-cache-dir --no-index --find-links=/data/src/paas_agent/support-files/pkgs) for paas_agent.  OK
[192.168.1.6]20200303-222858 453   apps isolate mode: virutalenv
Ignoring indexes: http://192.168.1.6:8080/simple
Requirement already satisfied (use --upgrade to upgrade): Django==1.8.11 in /data/bkce/.envs/paas_agent/lib/python2.7/site-packages (from -r requirements.txt (line 1))
Requirement already satisfied (use --upgrade to upgrade): PyMySQL==0.6.7 in /data/bkce/.envs/paas_agent/lib/python2.7/site-packages (from -r requirements.txt (line 2))

省略部分輸出..

[192.168.1.6]20200303-222926 151   install python package for virtualenv paas_agent done.
[192.168.1.6]20200303-222927 468   local nginx is required for paas_agent. going to install it.
Loaded plugins: fastestmirror
Loading mirror speeds from cached hostfile
Package 1:nginx-1.12.2-2.el7.x86_64 already installed and latest version
Nothing to do
[192.168.1.6]20200303-222934 175   render: #etc#nginx.conf -> /data/bkce//etc/nginx.conf.  OK
[192.168.1.6]20200303-222935 175   render: #etc#nginx#paasagent.conf -> /data/bkce//etc/nginx/paasagent.conf.  OK
[192.168.1.6]20200303-222936 322   PLACE HOLDER __SID__ is replaced into empty
[192.168.1.6]20200303-222937 322   PLACE HOLDER __TOKEN__ is replaced into empty
[192.168.1.6]20200303-222937 175   render: #etc#paas_agent_config.yaml.tpl -> /data/bkce//etc/paas_agent_config.yaml.  OK
[192.168.1.6]20200303-222938 175   render: #etc#supervisor-paas_agent.conf -> /data/bkce//etc/supervisor-paas_agent.conf.  OK
[192.168.1.6]20200303-222939 56   install appt(allproject) done

                         initdata for appt()                         
[192.168.1.6]20200303-222946 182   exec initdata_appt on 192.168.1.6
[192.168.1.6]20200303-222958 262   update config file: paas_agent_config.yaml
[192.168.1.6]20200303-222958 268   register appt succeded.
[192.168.1.6]20200303-222958 502   create database bksuite_common
[192.168.1.6]20200303-222958 504   add version info to db
[192.168.1.6]20200303-223001 98   starting appt(ALL) on host: 192.168.1.6
[192.168.1.6]20200303-223052 77   activate appt(192.168.1.6) succeded

#這裏appt已經安裝成功,接下來安裝appo

省略部分輸出..

                          install appo(all)                          
[192.168.1.6]20200303-223102 112   check dependences for paas_agent

省略部分輸出..

                         initdata for appo()                         
[192.168.1.6]20200303-223509 182   exec initdata_appo on 192.168.1.6
[192.168.1.6]20200303-223533 262   update config file: paas_agent_config.yaml
[192.168.1.6]20200303-223534 268   register appo succeded.
[192.168.1.6]20200303-223535 502   create database bksuite_common
[192.168.1.6]20200303-223535 504   add version info to db
[192.168.1.6]20200303-223541 98   starting appo(ALL) on host: 192.168.1.6
[192.168.1.6]20200303-223613 77   activate appo(192.168.1.6) succeded
[192.168.1.6] paas_agent()    paas_agent                       RUNNING   pid 23792, uptime 0:06:10
[192.168.1.6] nginx: RUNNING
[192.168.1.6] paas_agent()    paas_agent                       RUNNING   pid 23792, uptime 0:06:42
[192.168.1.6] nginx: RUNNING
[192.168.1.6] rabbitmq: RUNNING

若是以上步驟沒有報錯, 你如今能夠完成正式環境及測試環境的部署,能夠:
 1. 經過./bk_install saas-o bk_nodeman 部署節點管理app, 或
 2. 經過開發者中心部署app.
若要安裝藍鯨監控, 日誌檢索, 須要先經過 ./bk_install bkdata 安裝 bkdata
[root@rbtnode1 install]#

終於跌跌撞撞的解決了這個困惑許久的問題。後續本身還須要增強python和shell的腳本能力。

相關文章
相關標籤/搜索