在數據庫沒法啓動時,通常能夠根據報錯信息,採起對應措施便可,下面列出一些在數據庫啓動時報出錯誤比較嚴重而解決方式又不那麼明顯的處理方法。node
模擬錯誤,查到pg_class系統表中一個索引在磁盤中的位置,經過vim任意修改其中內容。sql
postgres=# select pg_relation_filepath('pg_class_oid_index'); pg_relation_filepath ---------------------- base/13219/36870 (1 row)
$ cd $PGDATA $ vim base/13219/36870
重啓數據庫。數據庫
$pg_ctl restart -m fast postgres@db-192-168-173-230-> pg_ctl restart -m fast waiting for server to shut down.... done server stopped waiting for server to start....2019-03-12 11:59:17.312 CST [5688] LOG: 00000: listening on IPv4 address "0.0.0.0", port 1921 2019-03-12 11:59:17.312 CST [5688] LOCATION: StreamServerPort, pqcomm.c:593 2019-03-12 11:59:17.312 CST [5688] LOG: 00000: listening on IPv6 address "::", port 1921 2019-03-12 11:59:17.312 CST [5688] LOCATION: StreamServerPort, pqcomm.c:593 2019-03-12 11:59:17.314 CST [5688] LOG: 00000: listening on Unix socket "./.s.PGSQL.1921" 2019-03-12 11:59:17.314 CST [5688] LOCATION: StreamServerPort, pqcomm.c:587 2019-03-12 11:59:17.400 CST [5688] LOG: 00000: redirecting log output to logging collector process 2019-03-12 11:59:17.400 CST [5688] HINT: Future log output will appear in directory "log". 2019-03-12 11:59:17.400 CST [5688] LOCATION: SysLogger_Start, syslogger.c:667 done server started
數據庫能夠正常啓動,日誌也沒有報錯。vim
但鏈接數據庫時,會報出錯誤:app
$ psql psql: FATAL: could not read block 1 in file "base/13219/36870": read only 32756 of 32768 bytes
因爲上面是模擬的錯誤,咱們天然是知道出錯的是哪一個表或索引,但忽然遇到該問題又進不去數據庫時,可使用oid2name來肯定對應的數據庫和對象。socket
$ oid2name All databases: Oid Database Name Tablespace ---------------------------------- 13219 postgres pg_default 16393 swrd pg_default 13218 template0 pg_default 1 template1 pg_default $ oid2name -f 36870 From database "postgres": Filenode Table Name ------------------------------ 36870 pg_class_oid_index
我上面的狀況是數據庫能夠啓動,可是沒法進入,當遇到沒法啓動但遇到相似錯誤的方法也適用。工具
下面經過單用戶模式進入數據庫:post
$ postgres --single -P -d 1
-P 參數是關閉系統索引。測試
-d 1是設置debug日誌級別爲1。級別是從1-5,數字越高日誌越詳盡。ui
$ postgres --single -P -d 1 2019-03-12 11:17:16.677 CST [1092] DEBUG: 00000: mmap(12998148096) with MAP_HUGETLB failed, huge pages disabled: Cannot allocate memory 2019-03-12 11:17:16.677 CST [1092] LOCATION: CreateAnonymousSegment, pg_shmem.c:485 2019-03-12 11:17:16.759 CST [1092] NOTICE: 00000: database system was shut down at 2019-03-12 11:16:54 CST 2019-03-12 11:17:16.759 CST [1092] LOCATION: StartupXLOG, xlog.c:6363 2019-03-12 11:17:16.759 CST [1092] DEBUG: 00000: checkpoint record is at 2/67000028 2019-03-12 11:17:16.759 CST [1092] LOCATION: StartupXLOG, xlog.c:6646 2019-03-12 11:17:16.760 CST [1092] DEBUG: 00000: redo record is at 2/67000028; shutdown true 2019-03-12 11:17:16.760 CST [1092] LOCATION: StartupXLOG, xlog.c:6724 2019-03-12 11:17:16.760 CST [1092] DEBUG: 00000: next transaction ID: 0:46060157; next OID: 36864 2019-03-12 11:17:16.760 CST [1092] LOCATION: StartupXLOG, xlog.c:6728 2019-03-12 11:17:16.760 CST [1092] DEBUG: 00000: next MultiXactId: 1; next MultiXactOffset: 0 2019-03-12 11:17:16.760 CST [1092] LOCATION: StartupXLOG, xlog.c:6731 2019-03-12 11:17:16.760 CST [1092] DEBUG: 00000: oldest unfrozen transaction ID: 561, in database 1 2019-03-12 11:17:16.760 CST [1092] LOCATION: StartupXLOG, xlog.c:6734 2019-03-12 11:17:16.760 CST [1092] DEBUG: 00000: oldest MultiXactId: 1, in database 1 2019-03-12 11:17:16.760 CST [1092] LOCATION: StartupXLOG, xlog.c:6737 2019-03-12 11:17:16.760 CST [1092] DEBUG: 00000: commit timestamp Xid oldest/newest: 0/0 2019-03-12 11:17:16.760 CST [1092] LOCATION: StartupXLOG, xlog.c:6741 2019-03-12 11:17:16.760 CST [1092] DEBUG: 00000: transaction ID wrap limit is 2147484208, limited by database with OID 1 2019-03-12 11:17:16.760 CST [1092] LOCATION: SetTransactionIdLimit, varsup.c:368 2019-03-12 11:17:16.760 CST [1092] DEBUG: 00000: MultiXactId wrap limit is 2147483648, limited by database with OID 1 2019-03-12 11:17:16.760 CST [1092] LOCATION: SetMultiXactIdLimit, multixact.c:2269 2019-03-12 11:17:16.760 CST [1092] DEBUG: 00000: starting up replication slots 2019-03-12 11:17:16.760 CST [1092] LOCATION: StartupReplicationSlots, slot.c:1110 2019-03-12 11:17:16.760 CST [1092] DEBUG: 00000: MultiXactId wrap limit is 2147483648, limited by database with OID 1 2019-03-12 11:17:16.760 CST [1092] LOCATION: SetMultiXactIdLimit, multixact.c:2269 2019-03-12 11:17:16.760 CST [1092] DEBUG: 00000: MultiXact member stop limit is now 4294757632 based on MultiXact 1 2019-03-12 11:17:16.760 CST [1092] LOCATION: SetOffsetVacuumLimit, multixact.c:2632 PostgreSQL stand-alone backend 11.0 backend> reindex table pg_class; 2019-03-12 11:18:34.181 CST [1092] DEBUG: 00000: building index "pg_class_oid_index" on table "pg_class" serially 2019-03-12 11:18:34.181 CST [1092] LOCATION: index_build, index.c:2297 2019-03-12 11:18:34.188 CST [1092] DEBUG: 00000: building index "pg_class_relname_nsp_index" on table "pg_class" serially 2019-03-12 11:18:34.188 CST [1092] LOCATION: index_build, index.c:2297 2019-03-12 11:18:34.191 CST [1092] DEBUG: 00000: building index "pg_class_tblspc_relfilenode_index" on table "pg_class" serially 2019-03-12 11:18:34.191 CST [1092] LOCATION: index_build, index.c:2297 backend> 2019-03-12 11:18:47.832 CST [1092] NOTICE: 00000: shutting down 2019-03-12 11:18:47.832 CST [1092] LOCATION: ShutdownXLOG, xlog.c:8459 2019-03-12 11:18:47.986 CST [1092] LOG: 00000: checkpoint starting: shutdown immediate 2019-03-12 11:18:47.986 CST [1092] LOCATION: LogCheckpointStart, xlog.c:8508 2019-03-12 11:18:47.988 CST [1092] DEBUG: 00000: performing replication slot checkpoint 2019-03-12 11:18:47.988 CST [1092] LOCATION: CheckPointReplicationSlots, slot.c:1074 2019-03-12 11:18:48.000 CST [1092] DEBUG: 00000: checkpoint sync: number=1 file=base/13219/1259 time=1.022 msec 2019-03-12 11:18:48.000 CST [1092] LOCATION: mdsync, md.c:1251 2019-03-12 11:18:48.004 CST [1092] LOG: 00000: checkpoint complete: wrote 2 buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.010 s, sync=0.001 s, total=0.019 s; sync files=1, longest=0.001 s, average=0.001 s; distance=16384 kB, estimate=16384 kB 2019-03-12 11:18:48.004 CST [1092] LOCATION: LogCheckpointEnd, xlog.c:8590 2019-03-12 11:18:48.029 CST [1092] NOTICE: 00000: database system is shut down 2019-03-12 11:18:48.029 CST [1092] LOCATION: UnlinkLockFiles, miscinit.c:860 $ pg_ctl start waiting for server to start....2019-03-12 11:18:53.104 CST [1185] LOG: 00000: listening on IPv4 address "0.0.0.0", port 1921 2019-03-12 11:18:53.104 CST [1185] LOCATION: StreamServerPort, pqcomm.c:593 2019-03-12 11:18:53.104 CST [1185] LOG: 00000: listening on IPv6 address "::", port 1921 2019-03-12 11:18:53.104 CST [1185] LOCATION: StreamServerPort, pqcomm.c:593 2019-03-12 11:18:53.107 CST [1185] LOG: 00000: listening on Unix socket "./.s.PGSQL.1921" 2019-03-12 11:18:53.107 CST [1185] LOCATION: StreamServerPort, pqcomm.c:587 2019-03-12 11:18:53.191 CST [1185] LOG: 00000: redirecting log output to logging collector process 2019-03-12 11:18:53.191 CST [1185] HINT: Future log output will appear in directory "log". 2019-03-12 11:18:53.191 CST [1185] LOCATION: SysLogger_Start, syslogger.c:667 done server started
能夠看到pg_class索引已修復。而後啓動數據庫便可,數據庫已恢復正常。這裏測試的是系統表的索引,至於咱們自定義的非系統對象,即便刪掉在數據庫啓動或進入時,都不會報錯,只有在用到時纔會報錯。若是不是磁盤壞道,在報錯後,一般reindex一下便可。
將以前的數據目錄mv一下,建立新的數據庫目錄,而後使用備份恢復啓動。
我嘗試搜索源碼,修改了幾處,但因爲報同類型錯誤的地方太多,沒有進行全部的修改。下面是修改幾處後報出的錯誤。
$ /opt/pgsql11_modify/bin/psql WARNING: could not read block 1 in file "base/13219/36864": read only 32756 of 32768 bytes psql: FATAL: could not open file "base/13219/36864.1" (target block 131072): previous segment is only 1 blocks
這是最後的辦法。在數據庫啓動後,應及時將數據導出,而後在其餘集羣中恢復。