理解OpenShift（1）：網絡之 Router 和 Route

時間 2019-11-06

標籤理解 openshift 網絡 router route 欄目系統網絡简体版

原文原文鏈接

理解OpenShift（1）：網絡之 Router 和 Routehtml

理解OpenShift（2）：網絡之 DNS（域名服務）前端

理解OpenShift（3）：網絡之 SDNnode

理解OpenShift（4）：用戶及權限管理mysql

理解OpenShift（5）：從 Docker Volume 到 OpenShift Persistent Volumeweb

** 本文基於 OpenShift 3.11，Kubernetes 1.11 進行測試 ***redis

1. OpenShift 爲何須要 Router 和 Route？

顧名思義，Router 是路由器，Route 是路由器中配置的路由。OpenShift 中的這兩個概念是爲了解決從集羣外部（就是從除了集羣節點之外的其它地方）訪問服務的需求。不曉得爲何OpenShift 要將Kubernetes 中的 Ingress 改成 Router，我卻是以爲 Ingress 名字更貼切。sql

從外部經過 router 和從內部經過 servide 訪問 pod 中的應用兩個過程的簡單的示意圖以下：後端

上圖中，某個應用的三個pod 分別位於 node1，node2 和 node3 上。OpenShift 中有三層IP地址概念：服務器

pod 本身的 IP 地址，能夠類比爲 OpenStack 中虛擬機的固定IP。它只有在集羣內纔有意義。
service 的 IP 地址。Service 一般有 ClusterIP，這也是一種集羣內部的IP 地址。
應用的外部 IP 地址，能夠類比爲OpenStack 中的浮動IP，或者IDC IP（和浮動IP 之間是NAT 映射關係）。

所以，要從集羣外部訪問 pod 中的應用，無非兩種方式：微信

一種是利用一個代理（proxy），把外部 IP 地址轉化爲後端的 Pod IP 地址。這就是 OpenShift router/route 的思路。OpenShift 中的 router 服務，是一個運行在特定節點（一般是基礎架構節點）上的集羣基礎服務，由集羣管理員負責建立和管理。它能夠有多個副本（pod）。router 中可有多個 route，每一個 route 能經過外部HTTP 請求的域名找出其後端的 pod 列表，並進行網絡包的轉發。也就是將pod 中的應用暴露到外網域名，使得用戶能夠外面經過域名訪問到應用。這其實是一種七層負載均衡器。OpenShift 默認採用 HAProxy 來實現，固然也支持其它實現，好比 F5.
另外一種是將服務直接暴露到集羣外。這種方式具體會在『服務 Service』那一篇文章中詳細解釋。

2. OpenShift 如何利用 HAProxy 實現 router 和 route？

2.1 Router 部署

使用 ansible 採用默認配置部署 OpenShift 集羣時，在集羣 Infra 節點上，會以 Host networking 方式運行一個 HAProxy 的 pod，它會在全部網卡的 80 和 443 端口上進行監聽。

[root@infra-node3 cloud-user]# netstat -lntp | grep haproxy
tcp        0      0 127.0.0.1:10443         0.0.0.0:*               LISTEN      583/haproxy         
tcp        0      0 127.0.0.1:10444         0.0.0.0:*               LISTEN      583/haproxy         
tcp        0      0 0.0.0.0:80              0.0.0.0:*               LISTEN      583/haproxy         
tcp        0      0 0.0.0.0:443             0.0.0.0:*               LISTEN      583/haproxy

其中，172.0.0.1 上的 10443 和 10444 是HAproxy 本身使用的。下文會有解釋。

所以，在每一個 infra 節點上，只能有一個 HAProxy pod，由於這些端口只能被佔用一次。若是調度器找不到知足要求的節點，則router 服務的調度就會失敗：

0/7 nodes are available: 2 node(s) didn't have free ports for the requested pod ports, 5 node(s) didn't match node selector

OpenShift HAProxy Router 支持兩種部署方式：

一種是常見的單Router 服務部署，它有一個或多個實例（pod），分佈在多個節點上，負責整個集羣上部署的服務的對外訪問。
另外一種是分片（sharding）部署。此時，會有多個 Router 服務，每一個Router 服務負責指定的若干project，二者之間採用標籤（label）進行映射。這是爲了解決單個 Router 的性能不夠問題而提出的解決方案。

OpenShift 提供了 oc adm router 命令來建立 router 服務。

建立router：

[root@master1 cloud-user]# oc adm router router2 --replicas=1 --service-account=router
info: password for stats user admin has been set to J3YyPjlbqf
--> Creating router router2 ...
    warning: serviceaccounts "router" already exists
    clusterrolebinding.authorization.openshift.io "router-router2-role" created
    deploymentconfig.apps.openshift.io "router2" created
    service "router2" created
--> Success

詳細的部署方法請參見官方文檔 https://docs.openshift.com/container-platform/3.11/install_config/router/default_haproxy_router.html。

2.2 Router pod 中的 HAProxy 進程

在 Router 服務的每一個 pod 之中，openshift-router 進程啓動了一個 haproy 進程：

UID        PID  PPID  C STIME TTY          TIME CMD
1000000+     1     0  0 Nov21 ?        00:14:27 /usr/bin/openshift-router
1000000+ 16011     1  0 12:42 ?        00:00:00 /usr/sbin/haproxy -f /var/lib/haproxy/conf/haproxy.config -p /var/lib/haproxy/run/haproxy.pid -x /var/lib/haproxy/run/haproxy.sock -sf 16004

查看 haproxy 使用的配置文件（只是部分）：

global
  maxconn 20000
  daemon
  ca-base /etc/ssl
  crt-base /etc/ssl
 。。。。  

defaults
  maxconn 20000

  # Add x-forwarded-for header.

  # server openshift_backend 127.0.0.1:8080
  errorfile 503 /var/lib/haproxy/conf/error-page-503.http

。。。
  timeout http-request 10s
  timeout http-keep-alive 300s

  # Long timeout for WebSocket connections.
  timeout tunnel 1h

frontend public
    
  bind :80
  mode http
  tcp-request inspect-delay 5s
  tcp-request content accept if HTTP
  monitor-uri /_______internal_router_healthz

  # Strip off Proxy headers to prevent HTTpoxy (https://httpoxy.org/)
  http-request del-header Proxy

  # DNS labels are case insensitive (RFC 4343), we need to convert the hostname into lowercase
  # before matching, or any requests containing uppercase characters will never match.
  http-request set-header Host %[req.hdr(Host),lower]

  # check if we need to redirect/force using https.
  acl secure_redirect base,map_reg(/var/lib/haproxy/conf/os_route_http_redirect.map) -m found
  redirect scheme https if secure_redirect

  use_backend %[base,map_reg(/var/lib/haproxy/conf/os_http_be.map)]

  default_backend openshift_default

# public ssl accepts all connections and isn't checking certificates yet certificates to use will be
# determined by the next backend in the chain which may be an app backend (passthrough termination) or a backend
# that terminates encryption in this router (edge)
frontend public_ssl
    
  bind :443
  tcp-request  inspect-delay 5s
  tcp-request content accept if { req_ssl_hello_type 1 }

  # if the connection is SNI and the route is a passthrough don't use the termination backend, just use the tcp backend
  # for the SNI case, we also need to compare it in case-insensitive mode (by converting it to lowercase) as RFC 4343 says
  acl sni req.ssl_sni -m found
  acl sni_passthrough req.ssl_sni,lower,map_reg(/var/lib/haproxy/conf/os_sni_passthrough.map) -m found
  use_backend %[req.ssl_sni,lower,map_reg(/var/lib/haproxy/conf/os_tcp_be.map)] if sni sni_passthrough

  # if the route is SNI and NOT passthrough enter the termination flow
  use_backend be_sni if sni

  # non SNI requests should enter a default termination backend rather than the custom cert SNI backend since it
  # will not be able to match a cert to an SNI host
  default_backend be_no_sni

。。。

backend be_edge_http:demoprojectone:jenkins
  mode http
  option redispatch
  option forwardfor
  balance leastconn
  timeout server  4m

  timeout check 5000ms
  http-request set-header X-Forwarded-Host %[req.hdr(host)]
  http-request set-header X-Forwarded-Port %[dst_port]
  http-request set-header X-Forwarded-Proto http if !{ ssl_fc }
  http-request set-header X-Forwarded-Proto https if { ssl_fc }
  http-request set-header X-Forwarded-Proto-Version h2 if { ssl_fc_alpn -i h2 }
  http-request add-header Forwarded for=%[src];host=%[req.hdr(host)];proto=%[req.hdr(X-Forwarded-Proto)];proto-version=%[req.hdr(X-Forwarded-Proto-Version)]
  cookie 4376ea64d7d0abf11209cfe5f7cca1e7 insert indirect nocache httponly secure
  server pod:jenkins-1-84nrt:jenkins:10.128.2.13:8080 10.128.2.13:8080 cookie 8669a19afc9f0fed6824feb9fb1cf4ac weight 256

。。。

爲了簡單期間，上面只是配置文件的部份內容，它主要包括三種類型：

全局配置，好比最大鏈接數 maxconn，超時時間 timeout 等；以及front部分，即前端配置，HAProxy 默認會在 443 和 80 兩個端口上分別監聽外部 https 和 http 請求。
backend，即每一個服務的後端配置，裏面有不少關鍵內容，好比後端協議（mode）、負載均衡方法（balance）、後端列表（server，這裏是pod，包括其IP 地址和端口）、證書等。

所以，OpenShift 的路由器功能須要能對這三部分進行管理和控制。

關於負載均衡器和 HAProxy 的詳細介紹，能夠參考 Neutron 理解 (7): Neutron 是如何實現負載均衡器虛擬化的這篇文章。

2.3 全局配置管理

要指定或修改 HAProxy 的全局配置，OpenShift 有提供兩種方式：

（1）第一種是使用 oc adm router 命令在建立 router 時候指定各類參數，好比 --max-connections 用於設置最大鏈接數。好比：

oc adm router --max-connections=200000 --ports='81:80,444:443' router3

建立出來的HAProxy 的 maxconn 將是 20000，router3 這個服務對外暴露出來的端口是 81 和 444，可是 HAProxy pod 的端口依然是 80 和 443.

（2）經過設置 dc/<dc router名> 的環境變量來設置 router 的全局配置。

在官方文檔 https://docs.openshift.com/container-platform/3.4/architecture/core_concepts/routes.html#haproxy-template-router 中有完整的環境變量列表。好比運行如下命令後，

 oc set env dc/router3 ROUTER_SERVICE_HTTPS_PORT=444 ROUTER_SERVICE_HTTP_PORT=81 STATS_PORT=1937

router3 會從新部署，新部署的HAProxy 的 https 監聽端口是 444，http 監聽端口是 80，統計端口是 1937.

2.4 OpenShift passthrough 類型的 route 與 HAProxy backend

（1）經過OpenShift Console 或者 oc 命令建立一條 route，它將 sit 項目的 jenkins 服務暴露到域名 sitjenkins.com.cn：

在界面上建立 route：

結果：

Name:                   sitjenkins.com.cn
Namespace:              sit
Labels:                 app=jenkins-ephemeral
                        template=jenkins-ephemeral-template
Annotations:            <none>
Requested Host:         sitjenkins.com.cn
Path:                   <none>
TLS Termination:        passthrough
Endpoint Port:          web

Service:        jenkins
Weight:         100 (100%)
Endpoints:      10.128.2.15:8080, 10.131.0.10:8080

這裏，service name 起了一箇中介做用，把 route 和服務的端點（也就是pod）鏈接了起來。

（2）router 服務的兩個 pod 中的 HAProxy 進程的配置文件中多了一個backend：

# Secure backend, pass through
backend be_tcp:sit:sitjenkins.com.cn
  balance source

  hash-type consistent
  timeout check 5000ms}
  server pod:jenkins-1-bqhfj:jenkins:10.128.2.15:8080 10.128.2.15:8080 weight 256 check inter 5000ms
  server pod:jenkins-1-h2fff:jenkins:10.131.0.10:8080 10.131.0.10:8080 weight 256 check inter 5000ms

其中，這些後端 server 其實就是 pod，它們是 openshift 經過步驟（1）中的 service name 找到的。balance 是負載均衡策略，後文會解釋。

（3）文件 /var/lib/haproxy/conf/os_sni_passthrough.map 中多了一條記錄

sh-4.2$ cat /var/lib/haproxy/conf/os_sni_passthrough.map
^sitjenkins\.com\.cn(:[0-9]+)?(/.*)?$ 1

（4）文件 /var/lib/haproxy/conf/os_tcp_be.map 中多了一條記錄

sh-4.2$ cat /var/lib/haproxy/conf/os_tcp_be.map
^sitjenkins\.com\.cn(:[0-9]+)?(/.*)?$ be_tcp:sit:sitjenkins.com.cn

（5）HAProxy 根據上面的 map 文件爲該條 route 選擇第（2）步中增長的 backend的邏輯以下

frontend public_ssl  #解釋：前端協議 https，

  bind :443  ##前端端口 443
  tcp-request  inspect-delay 5s
  tcp-request content accept if { req_ssl_hello_type 1 }

  # if the connection is SNI and the route is a passthrough don't use the termination backend, just use the tcp backend
  # for the SNI case, we also need to compare it in case-insensitive mode (by converting it to lowercase) as RFC 4343 says
  acl sni req.ssl_sni -m found ##檢查 https request 支持 sni
  acl sni_passthrough req.ssl_sni,lower,map_reg(/var/lib/haproxy/conf/os_sni_passthrough.map) -m found ##檢查經過 sni 傳來的 hostname 在 os_sni_patthrough.map 文件中
  use_backend %[req.ssl_sni,lower,map_reg(/var/lib/haproxy/conf/os_tcp_be.map)] if sni sni_passthrough ##從 oc_tcp_be.map 中根據 sni hostname 獲取 backend name

  # if the route is SNI and NOT passthrough enter the termination flow
  use_backend be_sni if sni

  # non SNI requests should enter a default termination backend rather than the custom cert SNI backend since it
  # will not be able to match a cert to an SNI host
  default_backend be_no_sni

（6）HAPorxy 進程會重啓，從而應用修改了的配置文件。

理解（5）中的腳本須要的一些背景知識：

SNI：TLS Server Name Indication (SNI) ，這是 TLS 網絡協議的一種擴展，會在 TLS 握手前由客戶端（client）告知服務器端（server）它將會鏈接的域名（hostname），使得服務器端能夠根據該hostname 向客戶端段返回指定的證書，從而使得服務器端可以支持多個hostname 須要的多個證書。詳情請參閱 https://en.wikipedia.org/wiki/Server_Name_Indication。
OpenShift passthrough route：這種 route 的 SSL 鏈接不會在 router 上被 TLS 終止（termination），而是router 會將 TLS 連接透傳到後端。下文有解釋。
HAProxy 對 SNI 的支持：HAProxy 會根據 SNI 的信息中的 hostname 去選擇特定的 backend。詳情請參閱 https://www.haproxy.com/blog/enhanced-ssl-load-balancing-with-server-name-indication-sni-tls-extension/。
HAProxy ACL：詳情請參閱 https://www.haproxy.com/documentation/aloha/10-0/traffic-management/lb-layer7/acls/

從上面的藍色註釋中，咱們能看到 HAProxy 進程經過 https 請求中經過 SNI 傳入的域名 sitjenkins.com.cn ，在 os_tcp_be.map 文件中獲取到了 backend 名稱 be_tcp:sit:sitjenkins.com.cn，這樣就和（2）步驟中的 backend 對應上了。

OpenShift 的 router 使用的 HAProxy 採用基於域名的負載均衡路由方式，示例以下，具體說明請參加官方文檔。

2.5 OpenShift edge 和 re-encrypt 類型的 route 與 HAProxy

HAProxy 前端：前端依然是在 443 端口監聽外部 HTTPS 請求

frontend public_ssl
  bind :443
.....
  # if the route is SNI and NOT passthrough enter the termination flow
  use_backend be_sni if sni

可是，當 TLS 終止類型不是 passthrough （edge 或者 re-encrypt）時，會使用backend be_sni。

backend be_sni
  server fe_sni 127.0.0.1:10444 weight 1 send-prox

而這個後端是由本機的 127.0.0.1:10444 提供服務，所以又轉到了前端 fe_sni：

frontend fe_sni
  # terminate ssl on edge
  bind 127.0.0.1:10444 ssl no-sslv3 crt /var/lib/haproxy/router/certs/default.pem crt-list /var/lib/haproxy/conf/cert_config.map accept-proxy
  mode http
。。。。。。

  # map to backend
  # Search from most specific to general path (host case).
  # Note: If no match, haproxy uses the default_backend, no other
  #       use_backend directives below this will be processed.
  use_backend %[base,map_reg(/var/lib/haproxy/conf/os_edge_reencrypt_be.map)]

  default_backend openshift_default

map 映射文件：

sh-4.2$ cat /var/lib/haproxy/conf/os_edge_reencrypt_be.map
^edgejenkins\.com\.cn(:[0-9]+)?(/.*)?$ be_edge_http:sit:jenkins-edge

Edge 類型 route 的 HAProxy 後端：

backend be_edge_http:sit:jenkins-edge
  mode http
  option redispatch
  option forwardfor
  balance leastconn

  timeout check 5000ms
  .....
  server pod:jenkins-1-bqhfj:jenkins:10.128.2.15:8080 10.128.2.15:8080 cookie 71c6bd03732fa7da2f1b497b1e4c7993 weight 256 check inter 5000ms
  server pod:jenkins-1-h2fff:jenkins:10.131.0.10:8080 10.131.0.10:8080 cookie fa8d7fb72a46958a7add1406e6d26cc8 weight 256 check inter 5000ms

Re-encrypt 類型 route 的 HAProxy 後端：

# Plain http backend or backend with TLS terminated at the edge or a
# secure backend with re-encryption.
backend be_secure:sit:reencryptjenkins.com.cn
  mode http
。。。。

http-request set-header X-Forwarded-Host %[req.hdr(host)]
http-request set-header X-Forwarded-Port %[dst_port]
http-request set-header X-Forwarded-Proto http if !{ ssl_fc }
http-request set-header X-Forwarded-Proto https if { ssl_fc }
http-request set-header X-Forwarded-Proto-Version h2 if { ssl_fc_alpn -i h2 }

  server pod:jenkins-1-bqhfj:jenkins:10.128.2.15:8080 10.128.2.15:8080 cookie ... weight 256 ssl verifyhost jenkins.sit.svc verify required ca-file /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt check inter 5000ms #與後端的鏈路採用 ssl 加密，而且要檢查hostname
  server pod:jenkins-1-h2fff:jenkins:10.131.0.10:8080 10.131.0.10:8080 cookie ... weight 256 ssl verifyhost jenkins.sit.svc verify required ca-file /var/run/secrets/kubernetes.io/serviceaccount/service-ca.crt check inter 5000ms

這裏能夠看出來從新使用密鑰對鏈接進行加密，可是不知道爲什麼 mode 依然是 http，而不是 https。

2.6 設置和修改 route 配置

route 配置主要有如下幾個比較重要的：

（1）SSL 終結方式。共三種：

edge：TLS 在 router 上被終結，而後非SSL網絡包被轉發給後端 pod。所以須要在 router 上安裝 TLS 證書。不安裝的話，會使用 router 的默認證書。
passthrough：加密網絡包直接被髮給 pod，router 上不作TLS 終結，由於不須要在 router 上配置證書或密鑰。
Re-encryption：是 edge 的一種變種。首先 router 上會使用一個證書作 TSL 終結，而後使用另外的證書再進行加密，而後發給後端 pod。所以，整個網絡路徑都是加密的。

設置：

能夠在建立 route 時設置，也能夠經過修改 route 的 termination 配置項來修改其 SSL 終結方式。
具體請參考官方文檔 https://docs.okd.io/latest/architecture/networking/routes.html#edge-termination

（2）負載均衡策略。也有三種：

roundrobin：根據權重輪流使用全部後端。
leastconn：選擇最少鏈接的後端接收請求。
source：將源IP進行哈希，確保來自同一個源IP的請求發給同一個後端。

設置：

要修改整個 router 的負載均衡策略，可以使用 ROUTER_TCP_BALANCE_SCHEME 環境變量，爲該 router 的全部 passthrough 類型的 route設置負載均衡策略，使用 ROUTER_LOAD_BALANCE_ALGORITHM 爲其它類型的 route 設置策略。
可使用 haproxy.router.openshift.io/balance 爲某個 route 設置負載均衡策略。

舉例：

設置整個 router 的環境變量：oc set env dc/router ROUTER_TCP_BALANCE_SCHEME=roundrobin

改完之後，該 router 實例會從新部署，全部 passthrough 的 route 都是 roundrobin 類型的了。默認爲 source 類型。

修改某個 route 的負載均衡的策略：oc edit route aaaa.svc.cluster.local

修改完成後，HAProxy 中對應該 route 的 backend 中的 balance 值會被修改成 leastconn。

2.7 一個 route 將流量分給多個後端服務

該功能經常使用於一些開發測試流程，好比作A/B 測試。

在下面的配置中，有一個應用三個版本的部署，前端一個 route，各服務使用不一樣的權重。

下面是 HAProxy 配置文件中的 backend 配置，採用 roundrobin 負載均衡模式：

3. OpenShift router 服務如何實現高可用？

OpenShift router 服務支持兩種高可用模式。

3.1 單 router 服務多副本，並利用和DNS/LB 實現高可用

這種模式只部署一個 router 服務，它支持集羣的全部對外暴露的服務。要實現HA，須要設置副本數（replicas）大於1，使得會在超過一臺服務器上建立pod，而後再經過DNS輪詢或者四層負載均衡。

由於 router/pod 中的 HAProxy 要實現本地配置文件，所以實際上它們是有狀態容器。OpenShift 採用 etcd 做爲配置的統一存儲，openshift-router 進程應該是採起某種機制（被通知或定時拉取）從 etcd 中獲取 router 和 route 的配置，而後再修改本地的配置文件，再重啓 HAPorxy 進程來應用新修改了的配置文件。要深刻了解這裏面的工做原理，能夠去看源代碼。

由於master 上的服務也須要有LB（8443端口），router 服務也須要LB（80和443端口）。所以，要麼採用兩個LB：

（圖片來源）

要麼採用一個LB 來支持 master 上的服務和 router 服務：

（圖片來源）

3.2 多 router 服務經過分片（sharding）實現高可用

這種模式下，管理員須要建立和部署多個 router 服務，每一個router 服務支持一個或幾個 project/namespace。router 和 project/namespace 之間的映射使用標籤（label）來實現。具體的配置請參考官網 https://docs.openshift.com/container-platform/3.11/install_config/router/default_haproxy_router.html。實際上，和一些產品（好比mysql，memedcache）的分片功能相似，該功能更多地是爲了解決性能問題，而沒法徹底解決高可用問題。