Nginx之反向代理及其高可用keepalived+LVS

時間 2019-11-12

標籤 nginx 反向代理及其可用 keepalived+lvs keepalived lvs 欄目 Nginx 简体版

原文原文鏈接

集羣的特色php

1、高性能 high performance
只有當併發或總請求數量超過單臺服務器的承受能力時，服務器集羣纔會體現出優點
2、價格有效性cost-effectiveness
在達到一樣性能需求的條件下，採用計算機集羣架構比採用同等運算能力大型計算機具備更高的性價比
3、可伸縮性（scalability）
4、可管理性  manageability
5、可編程性  programmability

Linux集羣主要分如下幾類css

負載均衡集羣  (Load Balance Cluster)        LBC或者LB
高可用集羣    ( High Availability Cluster)      HAC
科學計算集羣  (High Performance Computing Cluster)  HPC
網格計算      (Grid computer）

負載均衡集羣 & 高可用集羣html

負載均衡集羣(Load Balance Cluster) 的做用
　　1、分擔用戶訪問請求或數據流量
　　2、保持業務連續性，即7*24高可用服務(宕機不能太多)
應用: web服務，以及數據庫從庫，及其餘應用業務
典型開源軟件 lvs  nginx  haproxy  （lighted）

高可用集羣( High Availability Cluster) 做用（建議能作HB就不作HAC）
　　1、當一臺服務器宕機時，另外一臺接管（ip資源和服務器資源）
　　2、負載均衡器之間，主數據庫及主存儲之間
典型的開源軟件 keepalive  heartbeat

互聯網企業經常使用開源集羣軟件有 nginx、lvs、haproxy、keeplived，heartbeat
互聯網企業經常使用商業集羣硬件有F5\Netscaler(citrix)\Radware\A10等,至關於haproxy工做模式

實戰配置一個nginx反向代理java

新安裝兩臺nginx server（略）
# cat nginx.conf.default | egrep -v "#|^$" >nginx.conf

nginx配置以下
worker_processes  1;
events {
    worker_connections  1024;
}
http {
    include       mime.types;
    default_type  application/octet-stream;
    sendfile        on;
    keepalive_timeout  65;
        upstream www.server_pools {                  => www.server_pools可自定義
                server 192.168.0.82 weight=1;
                server 192.168.0.83 weight=1;
#    　　　　    server 192.168.0.* weight=1 backup;  高可用設置
}
     server {                                =>server標籤，代理主機，proxy_pass拋給upstream
        listen 80;
        server_name www.gtms.org;
        location / {
        proxy_pass http://www.server_pools;
}
}
}

upstream (ngx_http_upstream_module) http://nginx.org/en/docs/stream/ngx_stream_upstream_module.html 
　　nginx負載均衡模塊，定義有哪些節點，調度算法是什麼
　　Example Configuration 
upstream backend { server backend1.example.com weight=5;
     server backend2.example.com:8080;
     server unix:/tmp/backend3;

    server backup1.example.com:8080   backup;
    server backup2.example.com:8080   backup; } 

server {
     location / {
        proxy_pass http://backend;
 } } 
http_proxy（ngx_http_proxy_module）http://nginx.org/en/docs/http/ngx_http_proxy_module.html
負責請求的轉發，例如proxy_pass 發給定義好的指定upstream 
Example Configuration 
location / {
     proxy_pass       http://localhost:8000;
     proxy_set_header Host $host;　　將header中的url傳給後端server
     proxy_set_header X-Forwarded-For $remote_addr;　　將客戶端真實IP傳給後端server
 }

nginx讓後端apache記錄真實IP的方法 
　　LogFormat "%h %l %u %t \"%r\" %>s %b \"%{X-Forwarded-For}i\"" common


找一臺機器配置好hosts進行測試
[root@node87 ~]# for i in `seq 1000`;do curl www.gtms.org;sleep 1;done
82www.gtms.org
83www.gtms.org
82www.gtms.org
83www.gtms.org
82www.gtms.org
83www.gtms.org
82www.gtms.org
83www.gtms.org
82www.gtms.org
83www.gtms.org
82www.gtms.org

upstream (ngx_http_upstream_module)詳解node

Nginx的負載均衡功能依賴於此模塊，支持的代理方式有proxy_pass(做爲java代理)、fastcgi_pass、memcached_pass，新版的有所增長，本次針對proxy_pass代理方式講解
ngx_http_upstream_module容許nginx定義一組或多組節點服務器組，使用時能夠經過proxy_pass代理方式把網站的請求發送到事先定義好的對應的upstream組的名字上。
具體寫法爲「proxy_pass http://www_server_pools」,其中www_server_pools就是一個upstream節點服務器組的名字

upstream模塊內部標籤參數說明
server 192.168.0.90       weight=5;
server backend2.example.com:8080;
　　　　　　　　#負載均衡RS節點，能夠是IP 或是域名。高併發場景可以使用域名，使用DNS作負載均衡，weight表明權重，默認爲1 
server 127.0.0.1:8080       max_fails=3 fail_timeout=30s;
　　　　　　　　#max_fails=3， 嘗試鏈接rs失敗的次數，配合proxy_next_upstream, fastcgi_next_upstream, uwsgi_next_upstream, scgi_next_upstream, and memcached_next_upstream等使用。
　　　　　　　　當nginx接收rs返回這三個參數定義的狀態碼時，會將這個請求轉發給正常的rs，例如404，502，503，默認值爲1，企業場景，建議2-3次，京東一次，藍訊10次
server backup1.example.com  backup;
　　　　　　　　#熱備配置（rs高可用），當前面激活的rs都失敗後會自動啓動。注意，當ip_hash算法時，rs狀態不能是weight或backup。（haproxy能夠設置幾臺down了，back啓動）
fail_timeout=30s;
　　　　　　　　#在max_fails定義的失敗次數後，距下次檢查的間隔時間，默認10s。常規2-3s。好比京東3s，藍訊3s
server backup1.example.com  down;
　　　　　　　　#down標誌着rs不可用，能夠配和ip_hash使用

max_conns=number       
　　　　　　　　#單個rs最大併發鏈接數限制，防止過載
route=string   　　　　 
　　　　　　　　#設置server路由的名字
slow_start=time  　　　　
　　　　　　　　#宕機的rs從恢復開始，多長時間內被認爲是健康的


proxy_next_upstream健康檢查


server {
    listen    80;
    server_name www.gtms.org;
    location /{
    proxy_pass http://static_pools;
 proxy_next_upstream     http_500 http_502 http_503 http_504 error timeout invalid_header;
#設定nginx代理請求後端real server時，請求出錯時，再請求下一個服務器

upstream模塊調度算法mysql

靜態  rr  wrr  iphash 負載均衡器根據自身設定的規則進行分配，不考慮後端節點服務器的狀況 
iphash（能夠實現會話保持，hash用戶ip，同一ip來源分 配同一節點，能夠解決session共享問題，可是nat上網模式致使負載不均）。 
此外：url_hash 訪問的URL HASH緩存,主要用於web緩存 (當有cache問題或新增cache時，整個緩存羣從新計算，根據cache數量取模的，存儲壓力瞬間增長)。
一致性hash算法（consistent_hash）解決此問題,較小動盪，以及能平衡cache羣。nginx自己不支持一致性hash算法，其分支taobao的tengine支持


動態 least_conn fair 
fair 根據後端節點服務器的響應時間分配請求，相應時間短的優先分配。
默認不支持，必須下載nginx的相關模塊upstream_fair 
實例 upstream server_pool
{ 
server 192.168.0.1 
server 192.168.0.2 fair; 
} 
least_conn根據後端節點的鏈接數來決定分配狀況，哪一個鏈接數少就分配

http_proxy（ngx_http_proxy_module）詳解linux

     proxy_bind
     proxy_buffer_size
     proxy_buffering
     proxy_buffers
     proxy_busy_buffers_size
     proxy_cache
     proxy_cache_background_update
     proxy_cache_bypass
     proxy_cache_convert_head
     proxy_cache_key
     proxy_cache_lock
     proxy_cache_lock_age
     proxy_cache_lock_timeout
     proxy_cache_max_range_offset
     proxy_cache_methods
     proxy_cache_min_uses
     proxy_cache_path
     proxy_cache_purge
     proxy_cache_revalidate
     proxy_cache_use_stale
     proxy_cache_valid
     proxy_connect_timeout
     proxy_cookie_domain
     proxy_cookie_path
     proxy_force_ranges
     proxy_headers_hash_bucket_size
     proxy_headers_hash_max_size
     proxy_hide_header
     proxy_http_version
     proxy_ignore_client_abort
     proxy_ignore_headers
     proxy_intercept_errors
     proxy_limit_rate
     proxy_max_temp_file_size
     proxy_method
     proxy_next_upstream
     proxy_next_upstream_timeout
     proxy_next_upstream_tries
     proxy_no_cache
     proxy_pass
     proxy_pass_header
     proxy_pass_request_body
     proxy_pass_request_headers
     proxy_read_timeout
     proxy_redirect
     proxy_request_buffering
     proxy_send_lowat
     proxy_send_timeout
     proxy_set_body
     proxy_set_header
     proxy_ssl_certificate
     proxy_ssl_certificate_key
     proxy_ssl_ciphers
     proxy_ssl_crl
     proxy_ssl_name
     proxy_ssl_password_file
     proxy_ssl_server_name
     proxy_ssl_session_reuse
     proxy_ssl_protocols
     proxy_ssl_trusted_certificate
     proxy_ssl_verify
     proxy_ssl_verify_depth
     proxy_store
     proxy_store_access
     proxy_temp_file_write_size
     proxy_temp_path
     Embedded Variables

http://nginx.org/en/docs/http/ngx_http_proxy_module.html

經常使用模塊android

proxy_pass
把用戶的請求轉向到反向代理定義的upstream服務器池proxy_pass http://server_pools;
proxy_set_header
proxy_set_header Host $host;
設置http請求header項傳給後端服務器，使攜帶request head信息。多虛擬主機時，識別出虛擬主機若是不帶可能返回和請求內容不一致的網頁。
proxy_set_header X-Forwarded-For $remote_addr;
在代理向後端服務器發送的http請求頭中加入X-Forward-For字段信息，用於後端服務器程序、日誌等接收記錄真實用戶的IP，而不是代理服務器的IP。
proxy_connect_timeout
Defines a timeout for establishing a connection with a proxied server. It should be noted that this timeout cannot usually exceed 75 seconds.
表示與後端服務器鏈接的超時時間，即發起握手等候相應的超時時間
proxy_send_timeout
Sets a timeout for transmitting a request to the proxied server. The timeout is set only between two successive write operations, not for the transmission of the whole request.
If the proxied server does not receive anything within this time, the connection is closed.
表示RS的數據回傳時間，即在規定的時間內，RS必須傳完全部數據，不然，將斷開鏈接
proxy_read_timeout
Defines a timeout for reading a response from the proxied server. The timeout is set only between two successive read operations, not for the transmission of the whole response.
If the proxied server does not transmit anything within this time, the connection is closed.
設置從RS獲取信息的時間，表示鏈接創建成功後，等待RS的響應時間，其實就是已經進入後端隊列中等候處理的時間
proxy_buffer_size
Sets the size of the buffer used for reading the first part of the response received from the proxied server.
This part usually contains a small response header. By default, the buffer size is equal to one memory page.
This is either 4K or 8K, depending on a platform. It can be made smaller, however.
設置緩衝區大小，一般包含後端服務器的頭部信息
proxy_buffers
Sets the number and size of the buffers used for reading a response from the proxied server, for a single connection.
By default, the buffer size is equal to one memory page. This is either 4K or 8K, depending on a platform.
設置緩衝區大小和數量
proxy_busy_buffers_size
When buffering of responses from the proxied server is enabled, limits the total size of buffers that can be busy sending a response to the client while the response is not yet fully read.
In the meantime, the rest of the buffers can be used for reading the response and, if needed, buffering part of the response to a temporary file.
By default, size is limited by the size of two buffers set by the proxy_buffer_size and proxy_buffers directives.
設置用於系統繁忙時可使用的buffer大小，推薦爲proxy_buffers*2
proxy_temp_file_write_size
Limits the size of data written to a temporary file at a time, when buffering of responses from the proxied server to temporary files is enabled.
By default, size is limited by two buffers set by the proxy_buffer_size and proxy_buffers directives.
The maximum size of a temporary file is set by the proxy_max_temp_file_size directive.
指定proxy緩存臨時文件的大小

能夠在conf文件中以include方式加載參數
upstream www_server_pools {
server 192.168.0.82 weight=3;
server 192.168.0.83;
}
server {
listen 80;
server_name www.gtms.org;
location /{
proxy_pass http://www_server_pools;
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $remote_addr;
include proxy.conf;
}
}
#cat proxy.conf
proxy_set_header Host $host;
proxy_set_header X-Forwarded-For $remote_addr;
proxy_connect_timeout 60;
proxy_send_timeout 60;
proxy_read_timeout 60;
proxy_buffer_size 4k;
proxy_buffers 4 32k;
proxy_busy_buffers_size 64k
proxy_temp_file_write_size 64k

proxy_pass指令介紹nginx

proxy_pass指令介紹：http://nginx.org/en/docs/http/ngx_http_proxy_module.html#proxy_pass
部分使用示例
一、將匹配URI爲name的請求拋給http://127.0.0.1/remote/
location /name/ {
    proxy_pass http://127.0.0.1/remote/; }　　#斜線很重要
二、將匹配URI爲some/path的請求拋給http://127.0.0.1
location /some/path/ {
 　　proxy_pass http://127.0.0.1/; }
三、將匹配URI爲name的請求應用指定的rewrite規則，而後拋給http://127.0.0.1;
location /name/ {
    rewrite    /name/([^/]+) /users?name=$1 break;
    proxy_pass http://127.0.0.1;

案例配置實戰
（程序沒法分離狀況(域名拆不開)，使用upstream實現動靜態分離）
當用戶請求url/upload/xx地址時實現由upload上傳服務器池處理請求
當用戶請求url/static/xx地址時實現由靜態服務器池處理請求
除此外，其餘請求默認動態服務器池處理請求

方案一 location語句實現的方案，匹配目錄
　　upstream static_pools {server ip:80 weight=1;}　　#static_pools爲靜態服務器池，有一個服務器，地址爲ip，80端口
　　upstream upload_pools {server ip:80 weight=1;}　　#upload_pools爲上傳服務器池，有一個服務器，地址爲ip，80端口
　　upstream default_pools {serverip:80 weight=1;}　　#default_pools爲默認動態服務器池，有一個服務器，地址爲ip，80端口

server {
    listen    80;
    server_name    www.gtms.org;
    
　　　　location / {
　　　　　　proxy_pass http:// default_pools;             #默認走動態，功能最全
　　　　　　include proxy.conf
　　　　　　}
　　　　location /static/ {
　　　　　　proxy_pass http:// static_pools;　　　　　　　　#匹配到static，訪問static服務器
　　　　　　include proxy.conf
　　　　　　}
　　　　location /upload/ {
　　　　　　proxy_pass http://upload_pools;　　　　　　　　#匹配到upload，訪問upload服務器
　　　　　　include proxy.conf
　　　　　　 }

方案二 if語句實現的方案，匹配目錄
if ($request_uri  ~*  "^/static/(.*)$")  {proxy_pass http://static_pools/$1;}
if ($request_uri  ~*  "^/upload/(.*)$")  {proxy_pass http://upload_pools/$1;}
location / {
proxy_pass http://default_pools/$1;
include proxy.conf
}
應用場景
在企業中，有時只但願一個域名對外提供服務，不但願使用多個域名對應同一個產品業務，此時須要在代理服務器上經過配置規則，使得匹配不一樣規則的請求會交給不一樣的RS池處理，這類業務有：
一、業務域名沒有拆分或者不但願拆分，可是但願實現動靜分離，多業務分離
二、不一樣的服務端設備（例如：手機和PC端）使用同一個域名訪問同一個業務網站，就須要設置將不一樣設備的用戶請求交給後端不一樣的服務器處理，以便獲得最佳的用戶體驗


根據瀏覽器選擇不一樣的服務器$http_user_agent
location / {
    if ($http_user_agent ~* "MSIE")  {proxy_pass http://static_pools;}
    if (http_user_agent ~* "Chrome") {proxy_pass http://upload_pools;}
proxy_pass http://default_pools;
}
根據客戶端選擇不一樣的服務器http_user_agent
location / {
    if ($http_user_agent ~* 「android」) {proxy_pass http://android_pools;}
    if ($http_user_agent ~* 「iphone」)   {proxy_pass http://ipone_pools;}
proxy_pass http://pc_pools;　　默認找此
include extra/proxy.conf;
}

根據擴展名實現代理轉發
location方法
location ~ .*.(gif|jpg|jpeg|png|bmp|swf|css|js)$  {proxy pass http://static_pools;
include proxy.conf;}
location ~ .*.(php|php3|php5)$ { proxy pass http://dynamic_pools;
include proxy.conf;}
if語句實現的方案
if ($request_uri  ~*  ".*\.(php|php5)$")
{proxy_pass http://php_server_pools;}
if ($request_uri  ~*  ".*\.(jsp|jsp*|do|do*)$")
{proxy_pass http:java_server_pools;}

nginx服務器內核參數生產配置web

可參考linux服務器內核參數優化
如下參數優化適合apache、nginx、squid多種web應用，特殊業務可能須要微調
所謂內核優化，主要是在linux系統中針對業務服務器應用而進行的系統內核參數優化，優化並沒有特定的標準，下面是常見的生產環境linux的內核參數優化爲例講解，供參考
/etc/sysctl.conf    sysctl -p生效
net.ipv4.tcp_fin_timeout = 2
net.ipv4.tcp_tw_reuse = 1
net.ipv4.tcp_tw_recycle = 1
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_keepalive_time = 600
net.ipv4.ip_local_port_range = 4000    65000
net.ipv4.tcp_max_syn_backlog = 16384
net.ipv4.tcp_max_tw_buckets = 36000
net.ipv4.route.gc_timeout = 100
net.ipv4.tcp_syn_retries = 1
net.ipv4.tcp_synack_retries = 1
net.core.somaxconn = 16384
net.core.netdev_max_backlog = 16384
net.ipv4.tcp_max_orphans = 16384
#如下參數是對iptables防火牆的優化，防火牆不開會提示，能夠忽略不理。
net.ipv4.ip_conntrack_max = 25000000
net.ipv4.netfilter.ip_conntrack_max=25000000
net.ipv4.netfilter.ip_conntrack_tcp_timeout_established=180
net.ipv4.netfilter.ip_conntrack_tcp_timeout_time_wait=120
net.ipv4.netfilter.ip_conntrack_tcp_timeout_close_wait=60
net.ipv4.netfilter.ip_conntrack_tcp_timeout_fin_wait=120

Keepalived高可用集羣

Keepalived服務介紹（ www.keepalived.org）

Keepalived起初是專爲LVS設計的，專門用來監控LVS集羣系統中各個服務節點的狀態，後來又加入了VRRP的功能，所以除了配合LVS服務外，也能夠爲其餘服務（nginx haproxy）的高可用軟件
VRRP是Virtual Router Redunancy Protocol（虛擬路由器冗餘協議）的縮寫，VRRP出現的目的就是爲了解決靜態路由出現的單點故障問題，他可以保證網絡的不間斷、穩定的運行。
因此，Keepalived一方面具備LVS cluster nodes healthecks功能，另外一方面也具備LVS directors failover功能。

VRRP協議

　　VRRP協議，全稱Virtual Router Reduncancy Protocol，中文名，虛擬路由器冗餘協議，VRRP的出現就是爲了解決靜態路由的單點故障，VRRP是經過一種競選協議機制來將路由任務交給某臺VRRP路由器。

VRRP協議，全稱Virtual Router Redundancy Protocol，虛擬路由器冗餘協議，VRRP的出現就是爲了解決靜態路由的單店故障
VRRP是經過一種競選協議機制來將路由任務交給某臺VRRP路由器。
VRRP是經過IP多播方式實現通訊
主發包，備接包，當備接不到主發的包的時候，就啓動接管程序接管主的資源，備能夠有多個，經過優先級競選
VRRP使用了加密協議

Keepalived服務有兩大用途：healthcheck、failover

1、LVS directors failover 失敗接管功能
　　ha failover功能：實現LB Master主機和Backup主機故障轉移和自動切換。
　　這是針對有兩個負載均衡器Director同時工做而採起的故障轉移措施。
　　當主負載均衡器（master）失效或出現故障時，備份負載均衡器（BACKUP）將會自動接管主負載均衡器的全部工做（VIP資源及相應服務）；
　　一旦主負載均衡器（MASTER）故障修復，master又會接管回它原來處理的工做，而備份負載均衡器（backup）會釋放master失效時它接管的工做，此時二者將恢復到最初各自的角色狀態。
2、LVS cluster nodes healthchecks 健康檢查功能
    a、keepalived.conf裏配置就能夠實現LVS功能
    b、keepalived能夠對LVS下面的集羣階段作健康檢查
　　RS healthcheck功能：負載均衡按期檢查RS的可用性決定是否給其分發請求
　　當虛擬服務其中的某個甚至是幾個真實服務器同時發生故障沒法提供服務時，負載均衡器會自動將失效的RS服務器從轉發隊列中清除出去，從而保證用戶的訪問不受影響；
　　當故障的RS服務器被修復之後，系統又會自動地把他們加入轉發隊列，分發請求提供正常服務。

keepalived工做原理

keepalived高可用對之間是通過VRRP協議通訊的，VRRP協議是經過競爭機制來肯定主備的。
主的優先級高於備，所以，工做時會得到全部資源，備節點處於等待狀態，當主掛了的時候，備節點接管主節點的資源，而後頂替主節點對外提供服務器。
VRRP協議是經過IP多播包的方式（224.0.0.18）發送的。
在keepalived之間，只有做爲主的服務器會一直髮送VRRP廣播包，告訴備他還活着，此時備不會搶佔主。
當主不可用時，即備監聽不到主發送的廣播包時，就會啓動相關服務接管資源，保證業務的連續性，接管速度能夠小於1秒。VRRP使用加密協議加密發送廣播包。

keepalived安裝與配置

在兩臺nginx proxy上安裝，實現高可用，當一臺出現宕機時，備機接管

# ln -s /usr/src/kernels/2.6.32-573.el6.x86_64/ /usr/src/linux
若是沒有，經過yum install kernel-devel -y安裝  暫時用不到，配lvs時用到，由於經過內核管理lvs

#yum install openssl-devel –y
#tar -xzvf keepalived-1.2.20.tar.gz 
#cd keepalived-1.2.20
#./configure
#make && make install
Keepalived configuration
------------------------
Keepalived version         : 1.2.22
Compiler                 : gcc
Compiler flags             : -g -O2
Extra Lib                  : -lssl -lcrypto -lcrypt 
Use IPVS Framework        : Yes     ==>LVS功能    
IPVS sync daemon support   : Yes       ==>LVS功能
IPVS use libnl              : No
fwmark socket support      : Yes
Use VRRP Framework       : Yes
Use VRRP VMAC            : Yes     ===VRRP功能
Use VRRP authentication    : Yes
SNMP keepalived support    : No
SNMP checker support     : No
SNMP RFCv2 support       : No
SNMP RFCv3 support       : No
SHA1 support             : No
Use Debug flags          : No
libnl version            : None
Use IPv4 devconf         : No
Use libiptc              : No
Use libipset             : No

配置規範啓動
# /bin/cp /usr/local/etc/rc.d/init.d/keepalived /etc/init.d/     ==>生成啓動腳本命令
# /bin/cp /usr/local/etc/sysconfig/keepalived /etc/sysconfig/    ==>配置啓動腳本參數(Options for keepalived)
# mkdir /etc/keepalived                                 ==>建立默認的配置文件路徑
# /bin/cp /usr/local/etc/keepalived/keepalived.conf  /etc/keepalived    ==>conf模板文件拷至上述路徑
# /bin/cp /usr/local/sbin/keepalived /usr/sbin

#/etc/init.d/keepalived            
Usage: /etc/init.d/keepalived {start|stop|reload|restart|condrestart|status}

# ps -ef | grep keepalived
root       8219      1  0 21:43 ?        00:00:00 keepalived -D
root       8221   8219  0 21:43 ?        00:00:00 keepalived -D
root       8222   8219  0 21:43 ?        00:00:00 keepalived -D

#vi /etc/keepalived/keepalived.conf　　　　# man keepalived.conf配置文件幫助  1 ! Configuration File for keepalived
  2 
  3 global_defs {
  4    notification_email {
  5      acassen@firewall.loc
  6      failover@firewall.loc
  7      sysadmin@firewall.loc
  8    }
  9    notification_email_from Alexandre.Cassen@firewall.loc
 10    smtp_server 192.168.200.1
 11    smtp_connect_timeout 30     #1到11行，企業裏基本不用動
 12    router_id LVS_01    　　　　 #另外一臺router_id LVS_02,至關於mysql的serverid
 13    vrrp_skip_check_adv_addr
 14    vrrp_strict
 15    vrrp_garp_interval 0
 16    vrrp_gna_interval 0
 17 }
 18 （下面部分除了 20行和23行，2個keepalived同樣）
 19  vrrp_instance VI_1 {          #vrrp實例的ID    實現雙主時改（複製以下部分改）,大概最多20個實例
 20 state MASTER   　　　　　　　#另外一臺state BACKUP
 21     interface eth0
 22     virtual_router_id 51    　　#實例的ID 不用改，否則裂腦，實現雙主 時改
 23 priority 100   　　　　　　　#另外一臺priority 50 (官方建議大50，優先級大接管資源）
 24     advert_int 1    　　　　　　 #心跳間隔1秒，對方接受不到，馬上接管
 25     authentication {
 26         auth_type PASS
 27         auth_pass 1111
 28     }
 29     virtual_ipaddress {     　　#vip配置
 30         192.168.0.222/24
 33     }
 34 }
以後部分是LVS配置，能夠刪除。


#/etc/init.d/keepalived start
Starting keepalived:                                       [  OK  ]
#ip addr | grep 192           
    inet 192.168.0.84/24 brd 192.168.0.255 scope global eth0
    inet 192.168.0.222/24 scope global secondary eth0
==>當主keepalived服務停掉後，192.168.0.222這個ip自動漂移到BACKUP上，若是兩端都有，說明腦裂了
keepalive添加IP方式是經過ip addr add 192.168.0.222 dev eth0方式添加的ip，因此ifconfig查不到的,用ip add 查看

測試
==nginx proxy經過vip 192.168.0.222服務，當中止主proxy的keepalived時，vip迅速漂移至備機，實現nginx proxy服務高可用
[root@node87 ~]# for i in `seq 100`;do curl 192.168.0.222;date +%s;sleep 1;done
83www.gtms.org
1486551965
82www.gtms.org
1486551966　　　　#切換間隔
83www.gtms.org
1486551970
82www.gtms.org
1486551971
83www.gtms.org
1486551972
82www.gtms.org


keepalived實現服務器級別的接管，nginx服務宕，不會接管。 cat check_web.sh    #後臺運行並監控此進程，注意腳本名，不要nginx字樣。若是檢查沒有nginx進程，中止keepalived
#!/bin/sh
while true
do
if [ `ps –ef | grep nginx | grep –v grep | wc -l` -lt 2]
    then
        /etc/init.d/keepalived stop
fi  
sleep 5
done 備機腳本檢查裂腦    #能夠ping通主，備節點有VIP就認爲裂腦
cat check_split_brain.sh
#!/bin/sh
while true
do
ping -c 2 -W 3 10.0.0.7 &>/dev/null                #主機real ip
 if [ $? -eq 0 -a `ip add|grep 10.0.0.17|wc -l` -eq 1 ]   #主機vip
   then
    echo "ha is split brain.warning."
else
    echo "ha is ok"
fi
sleep 5
done keepalived日誌  
默認/var/log/message
# vi /etc/sysconfig/keepalived
KEEPALIVED_OPTIONS="-D"       #修改成==> KEEPALIVED_OPTIONS="-D -S 0 -d"    #-S 0(0設備)
# vi /etc/rsyslog.conf        #增長一行
local0.*        /var/log/keepalived.log
#/etc/init.d/rsyslog  restart

LVS負載均衡

LVS（Linux Virtual Server）介紹

該項目在1998年5月由章文嵩博士組織成立，
是中國國內最先出現的自由軟件項目之一。

LVS項目介紹    http://www.linuxvirtualserver.org/zh/lvs1.html
LVS集羣的體系結構    http://www.linuxvirtualserver.org/zh/lvs2.html
LVS集羣中的IP負載均衡技術    http://www.linuxvirtualserver.org/zh/lvs3.html
LVS集羣的負載調度    http://www.linuxvirtualserver.org/zh/lvs4.html
 IPVS（lvs）發展史
　　早在2.2內核時，IPVS就已經之內核補丁的形式出現
　　從2.4.23版本開始，IPVS軟件就是合併到Linux內核的經常使用版本的內核補丁的集合。
　　從2.4.24之後IPVS已經成爲Linux官方標準內核的一部份。

管理IPVS的方式：
　　IPVS     是實現調度的模塊，工做在內核層面，使用該軟件配置LVS時候，不能直接配置內核中的ipvs
　　ipvsadm  管理IPVS的工具，經過keepalived配置文件也能夠實現管理IPVS


LVS技術點小結：
　　一、真正實現負載調度的工具是IPVS，工做在linux內核層面。
　　二、LVS自帶的IPVS管理工具是ipvsadm。
　　三、keepalived實現管理IPVS及對負載均衡器的高可用。
　　四、Red hat工具Piranha WEB管理實現調度的工具IPVS

LVS集羣負載均衡器接受服務的全部入站客戶端計算機請求，並根據調度算法決定哪一個集羣節點應該處理回覆請求。
負載均衡器(簡稱LB)有時也被稱爲LVS Director(簡稱Director)。

名詞解釋

VIP    虛擬IP地址     VIP爲Direct用於向客戶端計算機提供服務的IP地址
RIP    真實IP地址     在集羣下面節點上使用的IP地址，是物理IP地址
DIP    Director的IP  Direct用於鏈接內外網的IP地址，物理網卡上的IP地址，是負載均衡上的IP
CIP    客戶端IP地址    客戶端用戶請求集羣服務器的IP地址，該地址用於發送給集羣的請求的源IP

LVS 4種模式

NAT（Network Address Translation）
DR（Direct Routing）*****互聯網公司常常採用的
TUN（IP Tunneling）
FULLNAT（Full Network Address Translation）

DR模式-直接路由模式

Direct Routing（VS/DR）
VS/DR模式是經過改寫請求報文的目標MAC地址，將請求發給真實服務器的，而真實服務器將響應後的處理結果直接返回給客戶端用戶。
同VS/TUN技術同樣，VS/DR技術可極大地提升集羣系統的伸縮性。並且，這種DR模式沒有IP隧道的開銷，對集羣中的真實服務器也沒有必須支持IP隧道協議的要求
可是要求調度器LB與真實服務器RS都有一塊網卡連在同一物理網段上，即必須在同一個局域網環境。

一、經過在調度器LB上修改數據包的目的MAC地址實現轉發。注意，源IP地址仍然是CIP，目的IP地址仍然是VIP。
二、請求的報文通過調度器，而RS響應處理後的報文無需通過調度器LB，所以，併發訪問量大時使用效率很高（和NAT模式比）。
三、因DR模式是經過MAC地址的改寫機制實現的轉發，所以，全部RS節點和調度器LB只能在一個局域網LAN中（小缺點）。
四、須要注意RS節點的VIP的綁定（lo:vip/32,lo1:vip/32）和ARP抑制問題。
五、RS節點的默認網關不須要是調度器LB的DIP，而直接是IDC機房分配的上級路由器的IP（這是RS帶有外網IP地址的狀況），理論講：只要RS能夠出網便可，不是必需要配置外網IP。(建議外網IP)
六、因爲DR模式的調度器僅進行了目的MAC地址的改寫，所以，調度器LB沒法改變請求的報文的目的端口（和NAT要區別）。
七、當前，調度器LB支持幾乎全部的UNIX，LINUX系統，但目前不支持WINDOWS系統。真實服務器RS節點能夠是WINDOWS系統。
八、總的來講DR模式效率很高，可是配置也較麻煩，所以，訪問量不是特別大的公司能夠用haproxy/nginx取代之。這符合運維的原則：簡單、易用、高效。
參考：日PV 1000-2000W或併發請求1萬如下均可以考慮用haproxy/nginx（LVS NAT模式）
九、直接對外的訪問業務，例如：web服務作RS節點，RS最好用公網IP地址。若是不直接對外的業務，例如：MySQL,存儲系統RS節點，最好只用內部IP地址。

NAT模式

經過網絡地址轉換，調度器重寫請求報文的目標地址，根據預設的調度算法，將請求分派給後端的真實服務器；
真實服務器的響應報文經過調度器時，報文的源地址被重寫，再返回給客戶，完成整個負載調度過程。
1、NAT技術將請求的報文（經過DNAT方式改寫）和響應的報文（經過SNAT方式改寫），經過調度器地址重寫而後在轉發給內部的服務器，報文返回時在改寫成原來的用戶請求的地址。
2、只須要在調度器LB上配置WAN公網IP便可，調度器也要有私有LAN IP和內部RS節點通訊。
3、每臺內部RS節點的網關地址，必需要配成調度器LB的私有LAN內物理網卡地址（LDIP），這樣才能確保數據報文返回時仍然通過調度器LB。
4、因爲請求與響應的數據報文都通過調度器LB，所以，網站訪問量大時調度器LB有較大瓶頸，通常要求最多10-20臺節點。
5、NAT模式支持對IP及端口的轉換，即用戶請求10.0.1.1:80，能夠經過調度器轉換到RS節點的10.0.1.2:8080（DR和TUN模式不具有的）。
6、全部NAT內部RS節點只需配置私有LAN IP便可。
7、因爲數據包來回都須要通過調度器，所以，要開啓內核轉發net.ipv4.ip_forward = 1，固然也包括iptables防火牆的forward功能（DR和TUN模式不須要）。

TUN模式

採用NAT技術時，因爲請求和響應報文都必須通過調度器地址重寫，當客戶請求愈來愈多時，調度器的處理能力將成爲瓶頸。
爲了解決這個問題，調度器把請求報 文經過IP隧道轉發至真實服務器，而真實服務器將響應直接返回給客戶，因此調度器只處理請求報文。因爲通常網絡服務應答比請求報文大許多，採用 VS/TUN技術後，集羣系統的最大吞吐量能夠提升10倍。
1、負載均衡器經過把請求的報文經過IP隧道（ipip隧道，高級班講這個）的方式（請求的報文不通過原目的地址的改寫(包括MAC)，而是直接封裝成另外的IP報文）轉發至真實服務器，而真實服務器將響應處理後直接返回給客戶端用戶。
2、因爲真實服務器將響應處理後的報文直接返回給客戶端用戶，所以，最好RS有一個外網IP地址，這樣效率纔會更高。理論上：只要能出網便可，無需外網IP地址。
3、因爲調度器LB只處理入站請求的報文。所以，此集羣系統的吞吐量能夠提升10倍以上，但隧道模式也會帶來必定的系統開銷。TUN模式適合LAN/WAN。
4、TUN模式的LAN環境轉發不如DR模式效率高，並且還要考慮系統對IP隧道的支持問題。
5、全部的RS服務器都要綁定VIP，抑制ARP，配置複雜。
6、LAN環境通常多采用DR模式，WAN環境能夠用TUN模式，可是當前在WAN環境下，請求轉發更多的被haproxy/nginx/DNS調度等代理取代。所以，TUN模式在國內公司實際應用的已經不多。跨機房應用要麼拉光纖成局域網，要麼DNS調度，底層數據還得同步。
7、直接對外的訪問業務，例如：web服務作RS節點，最好用公網IP地址。不直接對外的業務，例如：MySQL,存儲系統RS節點，最好用內部IP地址。

以上3中模式的特色

1、NAT模式：
入站DNAT,出站SNAT，入站出站都通過LVS，能夠修改端口，私有網絡。

2、DR模式*****
修改數據包的目的MAC地址，入站通過LVS,出站不通過LVS，直接返回客戶，不能改端口，LAN內使用。

3、TUN模式
不改變數據包內容，數據包外部封裝一個IP頭，入站通過LVS,出站不通過LVS，直接返回客戶，不能改端口，LAN/WAN使用。
LVS和節點之間經過隧道通訊。

3種模式的特色

	VS/NAT	VS/TUN	VS/DR
Real Server	config dr gw	Tunneling	Non-arp device/tie vip
Server Network	private	LAN/WAN	LAN
Server Number	low（1--20）	High（100）	High（100）
Real Server Gateway	LB	Own router	Own router
優勢	地址與端口轉換	WAN環境	性能最高
缺點	瓶頸大效率低	須要支持隧道協議	不能跨LAN

LVS調度算法:

固定調度算法：rr,wrr,dh（dest hash）,sh
動態調度算法：wlc,lc,lblc,lblcr,SED,NQ(後兩種官方站點沒提到，編譯LVS，make過程能夠看到rr|wrr|lc|wlc|lblc|lblcr|dh|sh|sed|nq)。

通常的網絡服務，如http、mail、Mysql等，經常使用的LVS調度算法爲
a、  rr基本輪巡調度算法
b、  wlc加權最小鏈接調度
c、  wrr加權輪巡調度算法

LVS安裝配置

Real server：192.168.0.82（nginx） 192.168.0.83（nginx）
LB server ：192.168.0.84 （VIP使用以前keepalived配置的192.168.0.222）

#/etc/init.d/keepalived start            #keepalived啓動時會加載ip_vs模塊,執行ipvsadm時也會自動加載
#lsmod | grep ip_vs 
ip_vs_rr                1420  3 
ip_vs                 126534  5 ip_vs_rr
libcrc32c               1246  1 ip_vs
ipv6                  335589  282 ip_vs,ip6t_REJECT,nf_conntrack_ipv6,nf_defrag_ipv6

ip_vs管理軟件安裝
#yum install ipvsadm -y
#rpm -qa ipvsadm
ipvsadm-1.26-4.el6.x86_64

# ipvsadm --help
ipvsadm v1.26 2008/5/15 (compiled with popt and IPVS v1.2.1)
Usage:
  ipvsadm -A|E -t|u|f service-address [-s scheduler] [-p [timeout]] [-M netmask] [--pe persistence_engine]
  ipvsadm -D -t|u|f service-address
  ipvsadm -C
  ipvsadm -R
  ipvsadm -S [-n]
  ipvsadm -a|e -t|u|f service-address -r server-address [options]
  ipvsadm -d -t|u|f service-address -r server-address
  ipvsadm -L|l [options]
  ipvsadm -Z [-t|u|f service-address]
  ipvsadm --set tcp tcpfin udp
  ipvsadm --start-daemon state [--mcast-interface interface] [--syncid sid]
  ipvsadm --stop-daemon state
  ipvsadm -h

Commands:
Either long or short options are allowed.
  --add-service     -A        add virtual service with options
  --edit-service    -E        edit virtual service with options
  --delete-service  -D        delete virtual service
  --clear           -C        clear the whole table
  --restore         -R        restore rules from stdin
  --save            -S        save rules to stdout
  --add-server      -a        add real server with options
  --edit-server     -e        edit real server with options
  --delete-server   -d        delete real server
  --list            -L|-l     list the table
  --zero            -Z        zero counters in a service or all services
  --set tcp tcpfin udp        set connection timeout values
  --start-daemon              start connection sync daemon
  --stop-daemon               stop connection sync daemon
  --help            -h        display this help message

Options:
  --tcp-service  -t service-address   service-address is host[:port]
  --udp-service  -u service-address   service-address is host[:port]
  --fwmark-service  -f fwmark         fwmark is an integer greater than zero
  --ipv6         -6                   fwmark entry uses IPv6
  --scheduler    -s scheduler         one of rr|wrr|lc|wlc|lblc|lblcr|dh|sh|sed|nq,
                                      the default scheduler is wlc.
  --pe            engine              alternate persistence engine may be sip,
                                      not set by default.
  --persistent   -p [timeout]         persistent service 會話保持
  --netmask      -M netmask           persistent granularity mask
  --real-server  -r server-address    server-address is host (and port)
  --gatewaying   -g                   gatewaying (direct routing) (default)
  --ipip         -i                   ipip encapsulation (tunneling)
  --masquerading -m                   masquerading (NAT)
  --weight       -w weight            capacity of real server
  --u-threshold  -x uthreshold        upper threshold of connections
  --l-threshold  -y lthreshold        lower threshold of connections
  --mcast-interface interface         multicast interface for connection sync
  --syncid sid                        syncid for connection sync (default=255)
  --connection   -c                   output of current IPVS connections
  --timeout                           output of timeout (tcp tcpfin udp)
  --daemon                            output of daemon information
  --stats                             output of statistics information
  --rate                              output of rate information
  --exact                             expand numbers (display exact values)
  --thresholds                        output of thresholds information
  --persistent-conn                   output of persistent connection info
  --nosort                            disable sorting output of service/server entries
  --sort                              does nothing, for backwards compatibility
  --ops          -o                   one-packet scheduling
  --numeric      -n                   numeric output of addresses and ports



LVS配置過程：（ipvsadm --help）
#ipvsadm -C          　　　　 # 清空以前的配置
#ipvsadm --set 30 5 60  　　 # ipvsadm --set tcp tcpfin udp  #設置超時參數,小優化ipvsadm -L --timeout
#ipvsadm -A -t 192.168.0.222:80 -s rr     #添加一個v server 可加-p 300 會話保持 -t  tcp server
#ipvsadm -L
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  192.168.0.187:http rr
#ipvsadm -a -t 192.168.0.222:80 -r 192.168.0.82:80 -g    #添加-r realserve節點 -g直接路由模式
#ipvsadm -a -t 192.168.0.222:80 -r 192.168.0.83:80 -g  

#ipvsadm -Ln 查看  #能夠加--stats --timeout
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
  -> RemoteAddress:Port           Forward Weight ActiveConn InActConn
TCP  192.168.0.222:80 rr
  -> 192.168.0.83:80              Route   1      0          0         
  -> 192.168.0.83:80              Route   1      0          0   


      

手工配置LVS小結：
ipvsadm -C
ipvsadm --set 30 5 60
ipvsadm -A -t 192.168.0.222:80  -s  rr -p 300
ipvsadm -a -t 192.168.0.222:80  -r 192.168.0.84:80 -g
ipvsadm -a -t 192.168.0.222:80  -r 192.168.0.85:80 -g  
ipvsadm -Ln
LVS節點刪除：
ipvsadm -D -t 192.168.0.222:80
ipvsadm -d -t 192.168.0.222:80 -r 192.168.0.82:80

 全部RS節點的配置過程：
RS綁定VIP：
ip addr add 192.168.0.222/32 dev lo label lo:1      #建議綁定32位的
route add -host 192.168.0.222 dev lo
抑制ARP：
echo "1" >/proc/sys/net/ipv4/conf/lo/arp_ignore
echo "2" >/proc/sys/net/ipv4/conf/lo/arp_announce
echo "1" >/proc/sys/net/ipv4/conf/all/arp_ignore
echo "2" >/proc/sys/net/ipv4/conf/all/arp_announce


arp_ignore:定義對目標地址爲本地IP的ARP詢問不一樣的應答模式0 
0 - (默認值): 迴應任何網絡接口上對任何本地IP地址的arp查詢請求 
1 - 只回答目標IP地址是來訪網絡接口本地地址的ARP查詢 請求 2 - 只回答目標IP地址是來訪網絡接口本地地址的ARP查詢請求,且來訪IP必須在該網絡接口的子網段內 
3 - 不迴應該網絡界面的arp請求，而只對設置的惟一和鏈接地址作出迴應 
4-7 - 保留未使用 
8 -不迴應全部（本地地址）的arp查詢
arp_announce:對網絡接口上，本地IP地址的發出的，ARP迴應，做出相應級別的限制: 肯定不一樣程度的限制,宣佈對來自本地源IP地址發出Arp請求的接口 
0 - (默認) 在任意網絡接口（eth0,eth1，lo）上的任何本地地址 
1 -儘可能避免不在該網絡接口子網段的本地地址作出arp迴應. 當發起ARP請求的源IP地址是被設置應該經由路由達到此網絡接口的時候頗有用.此時會檢查來訪IP是否爲全部接口上的子網段內ip之一.若是改來訪IP不屬於各個網絡接口上的子網段內,那麼將採用級別2的方式來進行處理. 2 - 對查詢目標使用最適當的本地地址.在此模式下將忽略這個IP數據包的源地址並嘗試選擇與能與該地址通訊的本地地址.首要是選擇全部的網絡接口的子網中外出訪問子網中包含該目標IP地址的本地地址. 若是沒有合適的地址被發現,將選擇當前的發送網絡接口或其餘的有可能接受到該ARP迴應的網絡接口來進行發送.


找一臺機器進行測試（hosts配置192.168.0.222 www.gtms.org）
[root@node86 ~]# for i in `seq 100`;do curl 192.168.0.222;date +%s;sleep 1;done
82www.gtms.org
1486557567
83www.gtms.org
1486557568
82www.gtms.org
1486557569
83www.gtms.org
1486557570
82www.gtms.org
1486557571
83www.gtms.org
1486557572

從一臺nginx server捕獲到的信息

[root@node83 ~]# tcpdump -nnn -i eth0 -s 10000 -A host 192.168.0.222  and port 80
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 10000 bytes
06:48:20.837760 IP 192.168.0.86.48280 > 192.168.0.222.80: Flags [S], seq 1903210290, win 14600, options [mss 1460,sackOK,TS val 9037827 ecr 0,nop,wscale 6], length 0
E..<G1@.@.q....V.......Pqp.2......9..[.........
............
06:48:20.837886 IP 192.168.0.86.48280 > 192.168.0.222.80: Flags [S], seq 1903210290, win 14600, options [mss 1460,sackOK,TS val 9037827 ecr 0,nop,wscale 6], length 0
E..<G1@.@.q....V.......Pqp.2......9..[.........
............
06:48:20.837915 IP 192.168.0.222.80 > 192.168.0.86.48280: Flags [S.], seq 3324728961, ack 1903210291, win 14480, options [mss 1460,sackOK,TS val 10961765 ecr 9037827,nop,wscale 6], length 0
E..<..@.@..7.......V.P...+V.qp.3..8.f   .........
..Ce........
06:48:20.838246 IP 192.168.0.86.48280 > 192.168.0.222.80: Flags [.], ack 1, win 229, options [nop,nop,TS val 9037829 ecr 10961765], length 0
...V.......Pqp.3.+V......}.....
......Ce
06:48:20.838478 IP 192.168.0.86.48280 > 192.168.0.222.80: Flags [.], ack 1, win 229, options [nop,nop,TS val 9037829 ecr 10961765], length 0
...V.......Pqp.3.+V......}.....
......Ce
06:48:20.838514 IP 192.168.0.86.48280 > 192.168.0.222.80: Flags [P.], seq 1:177, ack 1, win 229, options [nop,nop,TS val 9037829 ecr 10961765], length 176
E...G3@.@.p\...V.......Pqp.3.+V............
......CeGET / HTTP/1.1
User-Agent: curl/7.19.7 (x86_64-redhat-linux-gnu) libcurl/7.19.7 NSS/3.19.1 Basic ECC zlib/1.2.3 libidn/1.18 libssh2/1.4.2
Host: 192.168.0.222
Accept: */*

LVS集羣分發不均

致使負載不均衡的緣由可能有
1、若是經過keepalived管理ipvs，persistent參數的配置，能夠註釋解決
2、lvs自身會話保持參數-p 大公司儘可能用cookie替代session
3、lvs調度算法致使
4、後端RS server的會話保持參數keepalive
5、訪問量較少，不均衡更明顯
6、用戶發送的請求時間長短和請求資源多少大小因素

LVS故障排查思路

1、調度器配置規則
2、RS節點VIP綁定和arp抑制問題
3、RS節點服務是否正常
4、藉助tcpdump，ping工具

經過keepalived管理ipvs配置（先清除上述配置）

lvs在keepalive配置文件中的設置
    virtual_server 192.168.0.222 80 {
        delay_loop 6　　健康檢查時間
        lb_algo wrr
        lb_kind DR
        nat_mask 255.255.255.0
        persistence_timeout 50　　會話保持長致使負載不均
        protocol TCP
     real_server 192.168.0.82 80 {
        weight 1
        TCP_CHECK {
        connect_timeout 8　　超時時間
        nb_get_retry 3　　　　重試次數
        delay_before_retry 3　　重試間隔
        connect_port 80
        }
        }
      real_server 192.168.0.83 80 {
        weight 1
        TCP_CHECK {
        connect_timeout 8
        nb_get_retry 3
        delay_before_retry 3
        connect_port 80
        }
        }
        }