KubeSphere排錯實戰

概述:近期在使用QingCloud的Kubesphere,極好的用戶體驗,私有化部署,無基礎設施依賴,無 Kubernetes 依賴,支持跨物理機、虛擬機、雲平臺部署,能夠納管不一樣版本、不一樣廠商的 Kubernetes 集羣。在k8s上層進行了封裝實現了基於角色的權限控制,DevOPS流水線快速實現CI/CD,內置harbor/gitlab/jenkins/sonarqube等經常使用工具,基於基於 OpenPitrix 提供應用的全生命週期管理,包含開發、測試、發佈、升級,下架等應用相關操做本身體驗仍是很是的棒。
一樣做爲開源項目,不免存在一些bug,在本身的使用中遇到下排錯思路,很是感謝qingcloud社區提供的技術協助,對k8s有興趣的能夠去體驗下國產的平臺,如絲般順滑的體驗,rancher的用戶也能夠來對不體驗下。html

一 清理退出狀態的容器

在集羣運行一段時間後,有些container因爲異常狀態退出Exited,須要去及時清理釋放磁盤,能夠將其設置成定時任務執行node

docker rm `docker ps -a |grep Exited |awk '{print $1}'`

二 清理異常或被驅逐的pod

  • 清理kubesphere-devops-system的ns下清理
kubectl delete pods -n kubesphere-devops-system $(kubectl get pods -n kubesphere-devops-system |grep Evicted|awk '{print $1}')
kubectl delete pods -n kubesphere-devops-system $(kubectl get pods -n kubesphere-devops-system |grep CrashLoopBackOff|awk '{print $1}')
  • 爲方便清理指定ns清理evicted/crashloopbackoff的pod/清理exited的容器
#!/bin/bash
# auth:kaliarch

clear_evicted_pod() {
  ns=$1
  kubectl delete pods -n ${ns} $(kubectl get pods -n ${ns} |grep Evicted|awk '{print $1}')
}
clear_crash_pod() {
  ns=$1
  kubectl delete pods -n ${ns} $(kubectl get pods -n ${ns} |grep CrashLoopBackOff|awk '{print $1}')
}
clear_exited_container() {
  docker rm `docker ps -a |grep Exited |awk '{print $1}'`
}

echo "1.clear exicted pod"
echo "2.clear crash pod"
echo "3.clear exited container"
read -p "Please input num:" num

case ${num} in 
"1")
  read -p "Please input oper namespace:" ns
  clear_evicted_pod ${ns}
  ;;

"2")
  read -p "Please input oper namespace:" ns
  clear_crash_pod ${ns}
  ;;
"3")
  clear_exited_container
  ;;
"*")
  echo "input error"
  ;;
esac
  • 清理所有ns中evicted/crashloopbackoff的pod
# 獲取全部ns

kubectl get ns|grep -v "NAME"|awk '{print $1}'

# 清理驅逐狀態的pod

for ns in `kubectl get ns|grep -v "NAME"|awk '{print $1}'`;do kubectl delete pods -n ${ns} $(kubectl get pods -n ${ns} |grep Evicted|awk '{print $1}');done
# 清理異常pod
for ns in `kubectl get ns|grep -v "NAME"|awk '{print $1}'`;do kubectl delete pods -n ${ns} $(kubectl get pods -n ${ns} |grep CrashLoopBackOff|awk '{print $1}');done

三 將docker數據遷移

在安裝過程當中未指定docker數據目錄,系統盤50G,隨着時間推移磁盤不夠用,須要遷移docker數據,使用軟鏈接方式:
首選掛載新磁盤到/data目錄nginx

systemctl stop docker

mkdir -p /data/docker/  

rsync -avz /var/lib/docker/ /data/docker/  

mv /var/lib/docker /data/docker_bak

ln -s /data/docker /var/lib/

systemctl daemon-reload

systemctl start docker

四 kubesphere網絡排錯

  • 問題描述:

在kubesphere的node節點或master節點,手動去啓動容器,在容器裏面沒法連通公網,是個人配置哪裏不對麼,以前默認使用calico,如今改爲fluannel也不行,在kubesphere中部署deployment中的pod的容器上能夠出公網,在node或master單獨手動啓動的訪問不了公網git

查看手動啓動的容器網絡上走的docker0web

root@fd1b8101475d:/# ip a

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1

    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

    inet 127.0.0.1/8 scope host lo

       valid_lft forever preferred_lft forever

2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1

    link/ipip 0.0.0.0 brd 0.0.0.0

105: eth0@if106: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default 

    link/ether 02:42:ac:11:00:02 brd ff:ff:ff:ff:ff:ff link-netnsid 0

    inet 172.17.0.2/16 brd 172.17.255.255 scope global eth0

       valid_lft forever preferred_lft forever

在pods中的容器網絡用的是kube-ipvs0docker

1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue qlen 1

    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

    inet 127.0.0.1/8 scope host lo

       valid_lft forever preferred_lft forever

2: tunl0@NONE: <NOARP> mtu 1480 qdisc noop qlen 1

    link/ipip 0.0.0.0 brd 0.0.0.0

4: eth0@if18: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1500 qdisc noqueue

    link/ether c2:27:44:13:df:5d brd ff:ff:ff:ff:ff:ff

    inet 10.233.97.175/32 scope global eth0

       valid_lft forever preferred_lft forever
  • 解決方案:

查看docker啓動配置shell

KubeSphere排錯實戰

修改文件/etc/systemd/system/docker.service.d/docker-options.conf中去掉參數:--iptables=false 這個參數等於false時會不寫iptablesapi

[Service]
Environment="DOCKER_OPTS=  --registry-mirror=https://registry.docker-cn.com --data-root=/var/lib/docker --log-opt max-size=10m --log-opt max-file=3 --insecure-registry=harbor.devops.kubesphere.local:30280"

五 kubesphere 應用路由異常

在kubesphere中應用路由ingress使用的是nginx,在web界面配置會致使兩個host使用同一個ca證書,能夠經過註釋文件配置bash

⚠️注意:ingress控制deployment在:網絡

KubeSphere排錯實戰

kind: Ingress
apiVersion: extensions/v1beta1
metadata:
  name: prod-app-ingress
  namespace: prod-net-route
  resourceVersion: '8631859'
  labels:
    app: prod-app-ingress
  annotations:
    desc: 生產環境應用路由
    nginx.ingress.kubernetes.io/client-body-buffer-size: 1024m
    nginx.ingress.kubernetes.io/proxy-body-size: 2048m
    nginx.ingress.kubernetes.io/proxy-read-timeout: '3600'
    nginx.ingress.kubernetes.io/proxy-send-timeout: '1800'
    nginx.ingress.kubernetes.io/service-upstream: 'true'
spec:
  tls:
    - hosts:
        - smartms.tools.anchnet.com
      secretName: smartms-ca
    - hosts:
        - smartsds.tools.anchnet.com
      secretName: smartsds-ca
  rules:
    - host: smartms.tools.anchnet.com
      http:
        paths:
          - path: /
            backend:
              serviceName: smartms-frontend-svc
              servicePort: 80
    - host: smartsds.tools.anchnet.com
      http:
        paths:
          - path: /
            backend:
              serviceName: smartsds-frontend-svc

              servicePort: 80

六 kubesphere更新jenkins的agent

用戶在本身的使用場景當中,可能會使用不一樣的語言版本活不一樣的工具版本。這篇文檔主要介紹如何替換內置的 agent。

默認base-build鏡像中沒有sonar-scanner工具,Kubesphere Jenkins 的每個 agent 都是一個Pod,若是要替換內置的agent,就須要替換 agent 的相應鏡像。

構建最新 kubesphere/builder-base:advanced-1.0.0 版本的 agent 鏡像

更新爲指定的自定義鏡像:ccr.ccs.tencentyun.com/testns/base:v1

參考連接:https://kubesphere.io/docs/advanced-v2.0/zh-CN/devops/devops-admin-faq/#%E5%8D%87%E7%BA%A7-jenkins-agent-%E7%9A%84%E5%8C%85%E7%89%88%E6%9C%AC

KubeSphere排錯實戰

KubeSphere排錯實戰

在 KubeSphere 修改 jenkins-casc-config 之後,您須要在 Jenkins Dashboard 系統管理下的 configuration-as-code 頁面從新加載您更新過的系統配置。

參考:

https://kubesphere.io/docs/advanced-v2.0/zh-CN/devops/jenkins-setting/#%E7%99%BB%E9%99%86-jenkins-%E9%87%8D%E6%96%B0%E5%8A%A0%E8%BD%BD

KubeSphere排錯實戰

jenkins中更新base鏡像

KubeSphere排錯實戰

⚠️先修改kubesphere中jenkins的配置,jenkins-casc-config

七 Devops中Mail發送

參考:https://www.cloudbees.com/blog/mail-step-jenkins-workflow

內置變量:

變量名 解釋
BUILD_NUMBER The current build number, such as "153"
BUILD_ID The current build ID, identical to BUILD_NUMBER for builds created in 1.597+, but a YYYY-MM-DD_hh-mm-ss timestamp for older builds
BUILD_DISPLAY_NAME The display name of the current build, which is something like "#153" by default.
JOB_NAME Name of the project of this build, such as "foo" or "foo/bar". (To strip off folder paths from a Bourne shell script, try: ${JOB_NAME##*/})
BUILD_TAG String of "jenkins-${JOB_NAME}-${BUILD_NUMBER}". Convenient to put into a resource file, a jar file, etc for easier identification.
EXECUTOR_NUMBER The unique number that identifies the current executor (among executors of the same machine) that’s carrying out this build. This is the number you see in the "build executor status", except that the number starts from 0, not 1.
NODE_NAME Name of the slave if the build is on a slave, or "master" if run on master
NODE_LABELS Whitespace-separated list of labels that the node is assigned.
WORKSPACE The absolute path of the directory assigned to the build as a workspace.
JENKINS_HOME The absolute path of the directory assigned on the master node for Jenkins to store data.
JENKINS_URL Full URL of Jenkins, like http://server:port/jenkins/ (note: only available if Jenkins URL set in system configuration)
BUILD_URL Full URL of this build, like http://server:port/jenkins/job/foo/15/ (Jenkins URL must be set)
SVN_REVISION Subversion revision number that's currently checked out to the workspace, such as "12345"
SVN_URL Subversion URL that's currently checked out to the workspace.
JOB_URL Full URL of this job, like http://server:port/jenkins/job/foo/ (Jenkins URL must be set)

最終本身寫了適應本身業務的模版,能夠直接使用

mail to: 'xuel@net.com',
          charset:'UTF-8', // or GBK/GB18030
          mimeType:'text/plain', // or text/html
          subject: "Kubesphere ${env.JOB_NAME} [${env.BUILD_NUMBER}] 發佈正常Running Pipeline: ${currentBuild.fullDisplayName}",
          body: """
          ---------Anchnet Devops Kubesphere Pipeline job--------------------

          項目名稱 : ${env.JOB_NAME}
          構建次數 : ${env.BUILD_NUMBER}
          掃描信息 : 地址:${SONAR_HOST}
          鏡像地址 : ${REGISTRY}/${QHUB_NAMESPACE}/${APP_NAME}:${IMAGE_TAG}
          構建詳情:SUCCESSFUL: Job ${env.JOB_NAME} [${env.BUILD_NUMBER}]
          構建狀態 : ${env.JOB_NAME} jenkins 發佈運行正常
          構建URL : ${env.BUILD_URL}"""

KubeSphere排錯實戰

KubeSphere排錯實戰

相關文章
相關標籤/搜索