repmgr配置備庫報錯File exists處理一例

時間 2021-03-02

標籤 node sql 數據庫 app socket ide post 測試 this 欄目 SQL 简体版

原文原文鏈接

背景

生產上選用repmgr給PostgreSQL數據庫作高可用集羣，在給生產上一套庫作高可用改造時發現standby clone時報錯，沒法複製備庫，報錯內容以下：
node

緣由

先說緣由，是由於對PG和pg_basebackup比較瞭解的同窗可能本身就能夠想出解決方案，不須要再繼續往下看了。緣由是因爲建立的獨立表空間指定的目錄放在$PGDATA目錄下，repmgr的standby clone調用的是pg_basebackup，並且沒有指定輸出格式，默認爲plain，會複製主庫目錄時把PGDATA目錄下全部文件、目錄和獨立表空間目錄，因此會報錯File exists。sql

解決方案

遷移主庫獨立表空間到PGDATA之外的目錄(會阻塞寫)
指定新目錄作standby clone，clone完後把把文件移到實際PGDATA目錄

因爲方案一涉及到對主庫作操做，不建議在生產上操做，除非不介意對應用的影響。數據庫

測試方案二

節點1操做

添加表空間、建立database、寫表 app

postgres=# create user pguser login  password 'pguser';
CREATE ROLE
postgres=# create tablespace tbs_mydb owner pguser location '/home/postgres/data/pg_tbs/tbs_mydb';
WARNING:  tablespace location should not be inside the data directory
CREATE TABLESPACE
postgres=# create  database mydb with owner=pguser template=template0 encoding='UTF8' tablespace =tbs_mydb;
CREATE DATABASE
postgres=# grant all on database mydb to pguser with grant option;
GRANT
postgres=# grant all on tablespace tbs_mydb to pguser;
GRANT
postgres=# \c mydb pguser
You are now connected to database "mydb" as user "pguser".
mydb=> create table t1 (id int);
CREATE TABLE
mydb=> insert into t1 values(1);
INSERT 0 1
mydb=> select * from t1;
 id 
----
  1
(1 row)

節點2操做

第一次嘗試 standby clone，出現與生產上一致的報錯，報錯信息與生產一致socket

INFO: checking and correcting permissions on existing directory "/home/postgres/data"
NOTICE: starting backup (using pg_basebackup)...
HINT: this may take some time; consider using the -c/--fast-checkpoint option
INFO: executing:
  /usr/local/pgsql/bin/pg_basebackup -l "repmgr base backup"  -D /home/postgres/data -h 192.168.56.111 -p 6000 -U repmgr -X stream 
pg_basebackup: could not create directory "/home/postgres/data/pg_tbs": File exists
pg_basebackup: removing contents of data directory "/home/postgres/data"
pg_basebackup: changes to tablespace directories will not be undone
ERROR: unable to take a base backup of the source server
HINT: data directory ("/home/postgres/data") may need to be cleaned up manually

修改repmgr.conf中的data_directory='/home/postgres/repmgr'ide

再次嘗試 standby clone，成功post

[postgres@repmgr2 ~]$ repmgr -h 192.168.56.111 -U repmgr -d repmgr -f ~/repmgr.conf standby clone -p6000 
NOTICE: destination directory "/home/postgres/repmgr" provided
INFO: connecting to source node
DETAIL: connection string is: host=192.168.56.111 user=repmgr port=6000 dbname=repmgr
DETAIL: current installation size is 45 MB
DEBUG: 1 node records returned by source node
DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=192.168.56.111 port=6000 fallback_application_name=repmgr options=-csearch_path="
DEBUG: upstream_node_id determined as 111
INFO: replication slot usage not requested;  no replication slot will be set up for this standby
NOTICE: checking for available walsenders on the source node (2 required)
NOTICE: checking replication connections can be made to the source server (2 required)
INFO: checking and correcting permissions on existing directory "/home/postgres/repmgr"
NOTICE: starting backup (using pg_basebackup)...
HINT: this may take some time; consider using the -c/--fast-checkpoint option
INFO: executing:
  /usr/local/pgsql/bin/pg_basebackup -l "repmgr base backup"  -D /home/postgres/repmgr -h 192.168.56.111 -p 6000 -U repmgr -X stream 
DEBUG: create_recovery_file(): creating "/home/postgres/repmgr/recovery.conf"...
DEBUG: recovery.conf line: standby_mode = 'on'

DEBUG: recovery.conf line: primary_conninfo = 'host=192.168.56.111 user=repmgr port=6000 application_name=repmgr2 connect_timeout=2'

DEBUG: recovery.conf line: recovery_target_timeline = 'latest'

NOTICE: standby clone (using pg_basebackup) complete
NOTICE: you can now start your PostgreSQL server
HINT: for example: pg_ctl -D /home/postgres/repmgr start
HINT: after starting the server, you need to register this standby with "repmgr standby register"

修改repmgr.conf爲原來的配置，並把repmgr目錄下的全部文件mv到data目錄下測試

data_directory='/home/postgres/data'

[postgres@repmgr2 repmgr]$ mv * ~/data/
mv: cannot move ‘pg_tbs’ to ‘/home/postgres/data/pg_tbs’: File exists

修改配置文件中的cluster_name參數並啓動數據庫ui

[postgres@repmgr2 data]$ pg_ctl -D /home/postgres/data/ start
waiting for server to start....2021-02-28 10:09:15.905 CST [3498] LOG:  listening on IPv4 address "0.0.0.0", port 6000
2021-02-28 10:09:15.912 CST [3498] LOG:  listening on Unix socket "/tmp/.s.PGSQL.6000"
2021-02-28 10:09:15.949 CST [3498] LOG:  redirecting log output to logging collector process
2021-02-28 10:09:15.949 CST [3498] HINT:  Future log output will appear in directory "log".
. done
server started

註冊備庫成功this

[postgres@repmgr2 data]$ repmgr -f ../repmgr.conf standby register
INFO: connecting to local node "repmgr2" (ID: 113)
DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=192.168.56.113 port=6000 fallback_application_name=repmgr options=-csearch_path="
INFO: connecting to primary database
DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=192.168.56.111 port=6000 fallback_application_name=repmgr options=-csearch_path="
WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID 111)
INFO: standby registration complete
NOTICE: standby node "repmgr2" (ID: 113) successfully registered

檢查集羣狀態

[postgres@repmgr2 data]$ repmgr -f ../repmgr.conf cluster show
DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=192.168.56.113 port=6000 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=192.168.56.111 port=6000 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=192.168.56.113 port=6000 fallback_application_name=repmgr options=-csearch_path="
DEBUG: connecting to: "user=repmgr connect_timeout=2 dbname=repmgr host=192.168.56.111 port=6000 fallback_application_name=repmgr options=-csearch_path="
 ID  | Name    | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                                        
-----+---------+---------+-----------+----------+----------+----------+----------+---------------------------------------------------------------------------
 111 | repmgr1 | primary | * running |          | default  | 100      | 5        | host=192.168.56.111 port=6000 user=repmgr dbname=repmgr connect_timeout=2
 113 | repmgr2 | standby |   running | repmgr1  | default  | 100      | 5        | host=192.168.56.113 port=6000 user=repmgr dbname=repmgr connect_timeout=2

測試數據同步

主庫測試添加數據

mydb=> insert into t1 values(2);
INSERT 0 1
mydb=> select * from t1;
 id 
----
  1
  2
(2 rows)

從庫查詢

[postgres@repmgr2 data]$ psql
psql (10.11)
Type "help" for help.

postgres=# \c mydb pguser
You are now connected to database "mydb" as user "pguser".
mydb=> select * from t1;
 id 
----
  1
  2
(2 rows)

寫在最後

其實在建立獨立表空間時PG已經作了提示表空間不該用在DATA目錄，因此出現上面的報錯就是掉進了前人的坑。

WARNING:  tablespace location should not be inside the data directory

若是想嘗試方案一的能夠提供一下思路

#新建立一個表空間
postgres=# create tablespace zhijian  owner pguser location '/data/pgdata/11/pg_tbs/tbs_zhijian';
CREATE TABLESPACE
#更改數據庫的表空間
mydb=> \c postgres postgres
postgres=# alter database mydb set  tablespace zhijian;
ALTER DATABASE