Configure PostgreSQL Replication With Repmgr

時間 2021-01-23

標籤 node sql 數據庫 app ide post 測試 this 操作系統欄目 Postgre SQL 简体版

原文原文鏈接

本文介紹使用開源的repmgr組件配置PostgreSQL 12的replication以及failover。node

一、環境信息

二、安裝PG軟件包

全部節點安裝PostgreSQL 12以及repmgr軟件包。sql

[root@hwd04 ~]# dnf -y install https://download.postgresql.org/pub/repos/yum/reporpms/EL-8-x86_64/pgdg-redhat-repo-latest.noarch.rpm
[root@hwd04 ~]# dnf -qy module disable postgresql
[root@hwd04 ~]# dnf install postgresql12-server postgresql12-contrib repmgr12

三、配置primary節點

3.1 初始化PostgreSQL數據庫

[root@hwd04 ~]# /usr/pgsql-12/bin/postgresql-12-setup initdb
Initializing database ... OK

3.2 配置PostgreSQL參數

[root@hwd04 ~]# vi /var/lib/pgsql/12/data/postgresql.conf 
listen_addresses = '*' 
max_wal_senders = 10
max_replication_slots = 10
wal_level = 'replica'
wal_log_hints = on
hot_standby = on
archive_mode = on
archive_command = '/bin/true'

重啓PostgreSQL服務：數據庫

[root@hwd04 ~]# systemctl enable postgresql-12.service
[root@hwd04 ~]# systemctl restart postgresql-12.service

3.3 建立repmgr數據庫以及用戶

[root@hwd04 ~]# su - postgres
[postgres@hwd04 ~]$ createuser --superuser repmgr
[postgres@hwd04 ~]$ createdb --owner=repmgr repmgr
[postgres@hwd04 ~]$ psql -c "ALTER USER repmgr SET search_path TO repmgr, public;"

編輯postgresql.conf文件，加入如下內容，表示當pg啓動的時候載入repmgr組件：app

[root@hwd04 ~]# vi /var/lib/pgsql/12/data/postgresql.conf 
shared_preload_libraries = 'repmgr'

3.4 配置repmgr服務

repmgr默認的配置文件路徑爲/etc/repmgr/12/repmgr.conf，主備節點分別加入如下內容。ide

--hwd04(primary)
[root@hwd04 ~]# vi /etc/repmgr/12/repmgr.conf
node_id=1
node_name='hwd04'
conninfo='host=192.168.120.25 user=repmgr dbname=repmgr connect_timeout=2' 
data_directory='/var/lib/pgsql/12/data'
--hwd05(standby)
[root@hwd05 ~]# vi /etc/repmgr/12/repmgr.conf
node_id=2
node_name='hwd05'
conninfo='host=192.168.120.26 user=repmgr dbname=repmgr connect_timeout=2' 
data_directory='/var/lib/pgsql/12/data'
--hwd06(standby)
[root@hwd06 ~]# vi /etc/repmgr/12/repmgr.conf
node_id=3
node_name='hwd06'
conninfo='host=192.168.120.27 user=repmgr dbname=repmgr connect_timeout=2' 
data_directory='/var/lib/pgsql/12/data'

3.5 配置primary節點的pg_hba.conf

#For Replication
local   replication     repmgr                              trust
host    replication     repmgr      127.0.0.1/32            trust
host    replication     repmgr      192.168.120.0/24        trust

local   repmgr          repmgr                              trust
host    repmgr          repmgr      127.0.0.1/32            trust
host    repmgr          repmgr      192.168.120.0/24        trust

重啓pg服務：post

[root@hwd04 ~]# systemctl restart postgresql-12.service

standby節點驗證是否能夠訪問primary節點：測試

[postgres@hwd05 ~]$ psql 'host=192.168.120.25 user=repmgr dbname=repmgr connect_timeout=2'
psql (12.3)
Type "help" for help.

repmgr=# \q
[postgres@hwd06 ~]$ psql 'host=192.168.120.25 user=repmgr dbname=repmgr connect_timeout=2'
psql (12.3)
Type "help" for help.

repmgr=# \q

3.6 向repmgr中註冊primary節點

[postgres@hwd04 ~]$ /usr/pgsql-12/bin/repmgr -f /etc/repmgr/12/repmgr.conf primary register
INFO: connecting to primary database...
NOTICE: attempting to install extension "repmgr"
NOTICE: "repmgr" extension successfully installed
NOTICE: primary node record (ID: 1) registered

註冊完成後，使用下面的命令驗證集羣狀態：
ui

四、克隆standby節點

在正式克隆以前，能夠先進行預演，若是沒有報錯正式進行克隆，不然根據預演的報錯信息，排查完成後，進行正式克隆。this

4.1 克隆standby預演

[postgres@hwd05 ~]$ /usr/pgsql-12/bin/repmgr -h 192.168.120.25 -U repmgr -d repmgr -f /etc/repmgr/12/repmgr.conf standby clone --dry-run
NOTICE: destination directory "/var/lib/pgsql/12/data" provided
INFO: connecting to source node
DETAIL: connection string is: host=192.168.120.25 user=repmgr dbname=repmgr
DETAIL: current installation size is 31 MB
INFO: "repmgr" extension is installed in database "repmgr"
INFO: parameter "max_wal_senders" set to 10
NOTICE: checking for available walsenders on the source node (2 required)
INFO: sufficient walsenders available on the source node
DETAIL: 2 required, 10 available
NOTICE: checking replication connections can be made to the source server (2 required)
INFO: required number of replication connections could be made to the source server
DETAIL: 2 replication connections required
WARNING: data checksums are not enabled and "wal_log_hints" is "off"
DETAIL: pg_rewind requires "wal_log_hints" to be enabled
NOTICE: standby will attach to upstream node 1
HINT: consider using the -c/--fast-checkpoint option
INFO: all prerequisites for "standby clone" are met

[postgres@hwd06 ~]$ /usr/pgsql-12/bin/repmgr -h 192.168.120.25 -U repmgr -d repmgr -f /etc/repmgr/12/repmgr.conf standby clone --dry-run
NOTICE: destination directory "/var/lib/pgsql/12/data" provided
INFO: connecting to source node
DETAIL: connection string is: host=192.168.120.25 user=repmgr dbname=repmgr
DETAIL: current installation size is 31 MB
INFO: "repmgr" extension is installed in database "repmgr"
INFO: parameter "max_wal_senders" set to 10
NOTICE: checking for available walsenders on the source node (2 required)
INFO: sufficient walsenders available on the source node
DETAIL: 2 required, 10 available
NOTICE: checking replication connections can be made to the source server (2 required)
INFO: required number of replication connections could be made to the source server
DETAIL: 2 replication connections required
WARNING: data checksums are not enabled and "wal_log_hints" is "off"
DETAIL: pg_rewind requires "wal_log_hints" to be enabled
NOTICE: standby will attach to upstream node 1
HINT: consider using the -c/--fast-checkpoint option
INFO: all prerequisites for "standby clone" are met

4.2 正式克隆standby

有N個Standby節點，就執行N次standby克隆操做。操作系統

[postgres@hwd05 ~]$ /usr/pgsql-12/bin/repmgr -h 192.168.120.25 -U repmgr -d repmgr -f /etc/repmgr/12/repmgr.conf standby clone
NOTICE: destination directory "/var/lib/pgsql/12/data" provided
INFO: connecting to source node
DETAIL: connection string is: host=192.168.120.25 user=repmgr dbname=repmgr
DETAIL: current installation size is 31 MB
NOTICE: checking for available walsenders on the source node (2 required)
NOTICE: checking replication connections can be made to the source server (2 required)
INFO: checking and correcting permissions on existing directory "/var/lib/pgsql/12/data"
NOTICE: starting backup (using pg_basebackup)...
HINT: this may take some time; consider using the -c/--fast-checkpoint option
INFO: executing:
  /usr/pgsql-12/bin/pg_basebackup -l "repmgr base backup"  -D /var/lib/pgsql/12/data -h 192.168.120.25 -p 5432 -U repmgr -X stream 
NOTICE: standby clone (using pg_basebackup) complete
NOTICE: you can now start your PostgreSQL server
HINT: for example: pg_ctl -D /var/lib/pgsql/12/data start
HINT: after starting the server, you need to register this standby with "repmgr standby register"
[postgres@hwd06 ~]$ /usr/pgsql-12/bin/repmgr -h 192.168.120.25 -U repmgr -d repmgr -f /etc/repmgr/12/repmgr.conf standby clone
NOTICE: destination directory "/var/lib/pgsql/12/data" provided
INFO: connecting to source node
DETAIL: connection string is: host=192.168.120.25 user=repmgr dbname=repmgr
DETAIL: current installation size is 31 MB
NOTICE: checking for available walsenders on the source node (2 required)
NOTICE: checking replication connections can be made to the source server (2 required)
INFO: checking and correcting permissions on existing directory "/var/lib/pgsql/12/data"
NOTICE: starting backup (using pg_basebackup)...
HINT: this may take some time; consider using the -c/--fast-checkpoint option
INFO: executing:
  /usr/pgsql-12/bin/pg_basebackup -l "repmgr base backup"  -D /var/lib/pgsql/12/data -h 192.168.120.25 -p 5432 -U repmgr -X stream 
NOTICE: standby clone (using pg_basebackup) complete
NOTICE: you can now start your PostgreSQL server
HINT: for example: pg_ctl -D /var/lib/pgsql/12/data start
HINT: after starting the server, you need to register this standby with "repmgr standby register"

克隆完成後，啓動各個standby節點的PostgreSQL服務：

[root@hwd05 ~]# systemctl enable postgresql-12.service
[root@hwd05 ~]# systemctl restart postgresql-12.service

4.3 向repmgr註冊standby節點

[postgres@hwd05 ~]$ /usr/pgsql-12/bin/repmgr -f /etc/repmgr/12/repmgr.conf standby register
INFO: connecting to local node "hwd05" (ID: 2)
INFO: connecting to primary database
WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID 1)
INFO: standby registration complete
NOTICE: standby node "hwd05" (ID: 2) successfully registered
[postgres@hwd06 ~]$ /usr/pgsql-12/bin/repmgr -f /etc/repmgr/12/repmgr.conf standby register
INFO: connecting to local node "hwd06" (ID: 3)
INFO: connecting to primary database
WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID 1)
INFO: standby registration complete
NOTICE: standby node "hwd06" (ID: 3) successfully registered

註冊完成後，檢查集羣狀態：

[postgres@hwd04 ~]$ /usr/pgsql-12/bin/repmgr -f /etc/repmgr/12/repmgr.conf cluster show --compact

到此，整個流複製服務配置完成。

五、配置automatic failover服務

5.1 配置PostgreSQL服務

[root@hwd12 ~]# /usr/pgsql-12/bin/postgresql-12-setup initdb
Initializing database ... OK
[root@hwd12 ~]# vi /var/lib/pgsql/12/data/postgresql.conf
listen_addresses = '*'
shared_preload_libraries = 'repmgr'
[root@hwd12 ~]# vi /var/lib/pgsql/12/data/pg_hba.conf 
local   replication     repmgr                              trust
host    replication     repmgr      127.0.0.1/32            trust
host    replication     repmgr      192.168.120.0/24        trust
local   repmgr          repmgr                              trust
host    repmgr          repmgr      127.0.0.1/32            trust
host    repmgr          repmgr      192.168.120.0/24        trust
[root@hwd12 ~]# systemctl enable postgresql-12.service
[root@hwd12 ~]# systemctl restart postgresql-12.service

5.2 建立repmgr數據庫以及用戶

[root@hwd12 ~]# su - postgres
[postgres@hwd12 ~]$ createuser --superuser repmgr
[postgres@hwd12 ~]$ createdb --owner=repmgr repmgr
[postgres@hwd12 ~]$ psql -c "ALTER USER repmgr SET search_path TO repmgr, public;"

主節點鏈接witness節點測試：

[postgres@hwd04 ~]$ psql 'host=192.168.120.50 user=repmgr dbname=repmgr connect_timeout=2'        
psql (12.3)
Type "help" for help.

repmgr=# \q

5.3 編輯repmgr配置文件

[root@hwd12 ~]# vi /etc/repmgr/12/repmgr.conf 
node_id=4
node_name='hwd12'
conninfo='host=192.168.120.50 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/var/lib/pgsql/12/data'

5.4 向repmgr註冊witness節點

[postgres@hwd12 ~]$ /usr/pgsql-12/bin/repmgr -f /etc/repmgr/12/repmgr.conf witness register -h 192.168.120.25
INFO: connecting to witness node "hwd12" (ID: 4)
INFO: connecting to primary node
NOTICE: attempting to install extension "repmgr"
NOTICE: "repmgr" extension successfully installed
INFO: witness registration complete
NOTICE: witness node "hwd12" (ID: 4) successfully registered

註冊完成後，查詢集羣狀態以下圖所示：

5.5 全部節點編輯 sudoers文件

加入如下內容：

[root@hwd12 ~]# vi /etc/sudoer
Defaults:postgres !requiretty
postgres ALL = NOPASSWD: /usr/bin/systemctl stop postgresql-12.service, /usr/bin/systemctl start postgresql-12.service, /usr/bin/systemctl restart postgresql-12.service, /usr/bin/systemctl reload postgresql-12.service, /usr/bin/systemctl start repmgr12.service, /usr/bin/systemctl stop repmgr12.service

5.6 配置repmgr參數

編輯全部節點的repmgr.conf文件，加入如下內容：

failover='automatic'                    
priority=60                             
connection_check_type=ping              
reconnect_attempts=6                    
reconnect_interval=10                   
promote_command='/usr/pgsql-12/bin/repmgr standby promote -f /etc/repmgr/12/repmgr.conf --log-to-file'
follow_command='/usr/pgsql-12/bin/repmgr standby follow -f /etc/repmgr/12/repmgr.conf --log-to-file --upstream-node-id=%n'
monitoring_history=yes
monitor_interval_secs=2
standby_disconnect_on_failover=true
primary_visibility_consensus=true
log_status_interval=60
service_start_command = 'sudo /usr/bin/systemctl start postgresql-12.service'
service_stop_command = 'sudo /usr/bin/systemctl stop postgresql-12.service'
service_restart_command = 'sudo /usr/bin/systemctl restart postgresql-12.service'
service_reload_command = 'sudo /usr/bin/systemctl reload postgresql-12.service'
repmgrd_service_start_command = 'sudo /usr/bin/systemctl start repmgr12.service'
repmgrd_service_stop_command = 'sudo /usr/bin/systemctl stop repmgr12.service'

注意：standby的priority值須要更改，由於默認是100，而primary使用的是默認值。這裏設置hwd05的priority爲60，hwd06的priority爲40。而witness節點hwd12不須要設置priority參數。另外，priority的值越大，成爲primary的優先級就越高。
編輯完成後，啓動各個節點的repmgr服務：

[postgres@hwd04 ~]$ /usr/pgsql-12/bin/repmgr -f /etc/repmgr/12/repmgr.conf daemon start --dry-run
INFO: prerequisites for starting repmgrd met
DETAIL: following command would be executed:
  sudo /usr/bin/systemctl start repmgr12.service
[postgres@hwd04 ~]$ /usr/pgsql-12/bin/repmgr -f /etc/repmgr/12/repmgr.conf daemon start
NOTICE: executing: "sudo /usr/bin/systemctl start repmgr12.service"
NOTICE: repmgrd was successfully started

啓動完成後，能夠在primary或者standby節點查詢集羣的events，以下：

[postgres@hwd04 ~]$ /usr/pgsql-12/bin/repmgr -f /etc/repmgr/12/repmgr.conf cluster event --event=repmgrd_start

也能夠經過操做系統日誌文件，查詢repmgr相關信息。

5.7 Primary故障模擬測試

這裏將hwd04的PostgreSQL服務中止掉，而後經過日誌信息，是否能夠實現自動將standby角色轉爲primary角色，其餘正常節點從新鏈接到新的primary節點。

中止primary節點服務

[postgres@hwd04 ~]$ sudo systemctl stop postgresql-12.service

中止後，查看集羣信息，發現primary節點狀態變爲unreachable。

1分鐘後，再查看witness節點的日誌，就會發現hwd05已成爲新的primary，其餘節點已從新鏈接至hwd05，witness日誌以下：

當舊primary故障恢復後，並不會自動轉換爲standby，而是以primary角色獨自運行，這時就須要將其從新加入到集羣中。以下：

[postgres@hwd04 ~]$ repmgr node service --action=stop --checkpoint
NOTICE: issuing CHECKPOINT on node "hwd04" (ID: 1) 
DETAIL: executing server command "sudo /usr/bin/systemctl stop postgresql-12.service"
[postgres@hwd04 ~]$ repmgr -f /etc/repmgr/12/repmgr.conf -d 'host=192.168.120.26 user=repmgr dbname=repmgr' node rejoin --force-rewind
NOTICE: pg_rewind execution required for this node to attach to rejoin target node 2
DETAIL: rejoin target server's timeline 3 forked off current database system timeline 2 before current recovery point F/EB000028
NOTICE: executing pg_rewind
DETAIL: pg_rewind command is "/usr/pgsql-12/bin/pg_rewind -D '/var/lib/pgsql/12/data' --source-server='host=192.168.120.26 user=repmgr dbname=repmgr connect_timeout=2'"
pg_rewind: servers diverged at WAL location F/EA0000A0 on timeline 2
pg_rewind: rewinding from last common checkpoint at F/EA000028 on timeline 2
pg_rewind: Done!
NOTICE: 0 files copied to /var/lib/pgsql/12/data
NOTICE: setting node 1's upstream to node 2
WARNING: unable to ping "host=192.168.120.25 user=repmgr dbname=repmgr connect_timeout=2"
DETAIL: PQping() returned "PQPING_NO_RESPONSE"
NOTICE: starting server using "sudo /usr/bin/systemctl start postgresql-12.service"
NOTICE: NODE REJOIN successful
DETAIL: node 1 is now attached to node 2

若是不能從新加入，能夠將舊primary強制(-F)轉換爲standby，以下：

[postgres@hwd04 ~]$ /usr/pgsql-12/bin/repmgr -h 192.168.120.26 -U repmgr -d repmgr -f /etc/repmgr/12/repmgr.conf standby clone -F
NOTICE: destination directory "/var/lib/pgsql/12/data" provided
INFO: connecting to source node
DETAIL: connection string is: host=192.168.120.26 user=repmgr dbname=repmgr
DETAIL: current installation size is 15 GB
NOTICE: checking for available walsenders on the source node (2 required)
NOTICE: checking replication connections can be made to the source server (2 required)
WARNING: directory "/var/lib/pgsql/12/data" exists but is not empty
NOTICE: -F/--force provided - deleting existing data directory "/var/lib/pgsql/12/data"
NOTICE: starting backup (using pg_basebackup)...
HINT: this may take some time; consider using the -c/--fast-checkpoint option
INFO: executing:
/usr/pgsql-12/bin/pg_basebackup -l "repmgr base backup"  -D /var/lib/pgsql/12/data -h 192.168.120.26 -p 5432 -U repmgr -X stream 
NOTICE: standby clone (using pg_basebackup) complete
NOTICE: you can now start your PostgreSQL server
HINT: for example: sudo /usr/bin/systemctl start postgresql-12.service
HINT: after starting the server, you need to re-register this standby with "repmgr standby register --force" to update the existing node record
[postgres@hwd04 ~]$ sudo systemctl start postgresql-12.service
[postgres@hwd04 ~]$ repmgr -f /etc/repmgr/12/repmgr.conf standby register -F                                                  
INFO: connecting to local node "hwd04" (ID: 1)
INFO: connecting to primary database
INFO: standby registration complete
NOTICE: standby node "hwd04" (ID: 1) successfully registered

也能夠經過查詢pg_stat_replication視圖獲取相關信息，以下：

postgres=# select pid,usesysid,usename,application_name,client_addr,client_port,state,sent_lsn,write_lsn,flush_lsn,sync_state from pg_stat_replication;

5.8 手工執行switchover操做

這裏將hwd06提高爲primary節點，當前集羣信息以下圖：

首先進行預演操做：

[postgres@hwd06 ~]$ repmgr standby switchover --siblings-follow --dry-run
NOTICE: checking switchover on node "hwd06" (ID: 3) in --dry-run mode
INFO: SSH connection to host "192.168.120.25" succeeded
INFO: able to execute "repmgr" on remote host "192.168.120.25"
INFO: all sibling nodes are reachable via SSH
INFO: 3 walsenders required, 10 available
INFO: demotion candidate is able to make replication connection to promotion candidate
INFO: 0 pending archive files
INFO: replication lag on this standby is 0 seconds
INFO: would pause repmgrd on node "hwd04" (ID 1)
INFO: would pause repmgrd on node "hwd05" (ID 2)
INFO: would pause repmgrd on node "hwd06" (ID 3)
INFO: would pause repmgrd on node "hwd12" (ID 4)
NOTICE: local node "hwd06" (ID: 3) would be promoted to primary; current primary "hwd04" (ID: 1) would be demoted to standby
INFO: following shutdown command would be run on node "hwd04":
"sudo /usr/bin/systemctl stop postgresql-12.service"
INFO: parameter "shutdown_check_timeout" is set to 60 seconds
INFO: prerequisites for executing STANDBY SWITCHOVER are met

預演無報錯，下面正式執行switchover操做：

[postgres@hwd06 ~]$ repmgr standby switchover --siblings-follow 
NOTICE: executing switchover on node "hwd06" (ID: 3)
NOTICE: local node "hwd06" (ID: 3) will be promoted to primary; current primary "hwd04" (ID: 1) will be demoted to standby
NOTICE: stopping current primary node "hwd04" (ID: 1)
NOTICE: issuing CHECKPOINT on node "hwd04" (ID: 1) 
DETAIL: executing server command "sudo /usr/bin/systemctl stop postgresql-12.service"
INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")
NOTICE: current primary has been cleanly shut down at location F/EB000028
NOTICE: promoting standby to primary
DETAIL: promoting server "hwd06" (ID: 3) using pg_promote()
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
NOTICE: STANDBY PROMOTE successful
DETAIL: server "hwd06" (ID: 3) was successfully promoted to primary
INFO: local node 1 can attach to rejoin target node 3
DETAIL: local node's recovery point: F/EB000028; rejoin target node's fork point: F/EB0000A0
NOTICE: setting node 1's upstream to node 3
WARNING: unable to ping "host=192.168.120.25 user=repmgr dbname=repmgr connect_timeout=2"
DETAIL: PQping() returned "PQPING_NO_RESPONSE"
NOTICE: starting server using "sudo /usr/bin/systemctl start postgresql-12.service"
NOTICE: NODE REJOIN successful
DETAIL: node 1 is now attached to node 3
NOTICE: node  "hwd06" (ID: 3) promoted to primary, node "hwd04" (ID: 1) demoted to standby
NOTICE: executing STANDBY FOLLOW on 2 of 2 siblings
INFO:  node 4 received notification to follow node 3
INFO: STANDBY FOLLOW successfully executed on all reachable sibling nodes
NOTICE: switchover was successful
DETAIL: node "hwd06" is now primary and node "hwd04" is attached as standby
NOTICE: STANDBY SWITCHOVER has completed successfully

操做完成後，集羣信息以下圖：

相關標籤/搜索

每日一句

每一个你不满意的现在，都有一个你没有努力的曾经。