??從PostgreSQL 9.0開始,基於預寫日志(Write Ahead Log, WAL)的可讀復制(PITR)更是成為了官方提供的異步主從復制(Master-Slave Replication)解決方案,該方案擁有如下優點:
a.使用預寫日志記錄數據庫的改動,不額外增加服務端的其他負荷。
b.當主服務器失敗(如斷電、系統崩潰、災難)時,不會丟失任何數據。
c.支持基於流和基於檔案的兩種日志傳輸方案。
d.備用服務器可作為負載均衡節點提供讀請求。
e.支持多個或多級備用服務器。
??主服務器在接受到每個事務請求時,將數據改動用預寫日志(WAL)記錄。具體而言,事務采用兩段提交(Two Phase Commit),即先將改動寫入預寫日志,然後再實際改動數據庫。這樣可以保證預寫日志的時間戳永遠不落後於數據庫,即便是正在寫入時服務器突然崩潰,重啟以後也可以依據預寫日志將數據恢復,因為預寫日志保留了比數據庫記錄中更新的版本。PostgreSQL的異步復制解決方案正是利用了預寫日志,將預寫日志從主服務器(Master Sever)傳輸到備用服務器(Standby Server),然後在備用服務器上回放(Replay)出預寫日志中記錄改動,從而實現主從復制。PostgreSQL使用了兩種方式傳輸預寫日志:存檔式(archive)和流式(streaming)。
??存檔式復制的原理是主服務器將預寫日志主動拷貝到一個安全的位置(可以直接到備用服務器,也可以是第三台服務器),同時備用服務器定期掃描這個位置,並將預寫日志拷貝到備用服務器端然後再回放。這樣即使主服務器崩潰了,備用服務器也可以從這個安全的位置獲取到一份完整的記錄,以確保任何數據不會丟失。而流式復制則簡化了這一個步驟,由主服務器直接通過TCP協議向備用服務器傳輸日志,避免了兩次復制的開銷,有利於減小備用服務器和主服務器直接的數據延時。但當主服務器崩潰時,未被傳輸到備用服務器的日志則會丟失,造成數據損失。PostgreSQL支持存檔式和流式兩種模式的混合,當兩種模式都開啟時,備用服務器會定期檢查是否有存檔已經到達指定的位置,並回放日志。一旦檢測到指定的位置沒有新的日志,則會切換到流式模式試圖直接從網絡傳輸日志,接著再檢查存檔,不斷重復這一循環。
需求:master和slave作為主從流復制,當master宕機後,slave切換為新主繼續服務;然而當master恢復故障後,作為從重新加入主從流復制。
主的基本配置請參考http://blog.csdn.net/yanggd1987/article/details/51150190,以下是需更改添加的地方。
1.創建復制用戶
postgres=# create user rep replication password 'reptest';
CREATE ROLE
postgres=# \du
List of roles
Role name | Attributes | Member of
-----------+------------------------------------------------+-----------
postgres | Superuser, Create role, Create DB, Replication | {}
rep | Replication | {}
2.設置pg_hba.conf
#允許slave訪問
vim pg_hba.conf
host replication rep 10.10.10.61/8 md5
3.設置主庫postgresql.conf
wal_level = hot_standby # write ahead log,流復制時為hot_standby
hot_standby = on
max_wal_senders = 2 # 流復制的最大連接數
wal_keep_segments = 16 # 流復制保留的最大xlog數
其中為了在備機服務器上開啟“只讀”查詢,wal_level必須設置成“hot_standby”. 但是,如果你一直不會在stand-by模式下連接到備機,你可以選擇“archive”,archive模式只會將主上的wal日志記錄備份到備用服務器上。
4.重啟數據庫
pg_ctl restart -D /data/pgsql/data
或者
pg_ctl stop -D /data/pgsql/data -m fast
pg_ctl start -D /data/pgsql/data -m fast
slave配置
1.創建數據目錄
mkdir -p /data/pgsql/data
chown -R postgres.postgres data
chmod 700 data
2.使用pg_backendup生成備庫
-bash-4.2$ pg_basebackup -D /data/pgsql/data -Fp -Xs -v -P -h 10.10.10.60 -p 5432 -U rep
Password:
transaction log start point: 0/2000020
pg_basebackup: starting background WAL receiver
20154/20154 kB (100%), 1/1 tablespace
transaction log end point: 0/20000E0
pg_basebackup: waiting for background process to finish streaming...
pg_basebackup: base backup completed
-bash-4.2$ ls /data/pgsql/data/
backup_label global pg_hba.conf pg_multixact pg_serial pg_stat_tmp pg_tblspc PG_VERSION postgresql.conf
base pg_clog pg_ident.conf pg_notify pg_snapshots pg_subtrans pg_twophase pg_xlog
此時表空間目錄和$PGDATA目錄已經復制過來了。
注意:從服務器無需初始化,因為所有的配置文件及目錄都是從主上復制過來的。
3.配置recovery.conf
-bash-4.2$ rpm -ql postgresql-server |grep recovery
/usr/share/pgsql/recovery.conf.sample
-bash-4.2$ cp /usr/share/pgsql/recovery.conf.sample /data/pgsql/data/recovery.conf
#添加如下幾行
vim recovery.conf
standby_mode = on
trigger_file = '/data/pgsql/pg.trigger'
primary_conninfo = 'host=10.10.10.60 port=5432 user=rep password=reptest'
其中:
standby_mode = on指明是否開啟服務器作為一個備機,在流復制裡,這個參數必須要開啟。
primary_conninfo 指明用於備服務器連接到主服務器的連接字符串
trigger_file 指定一個觸發文件讓備服務器感覺到它的時候就會停止流復制(即:故障轉移),不要創建這個文件。當你想主從切換的時候才需要創建它。
注意:當從切換為主完成後,recovery.conf會自動變為recovery.done
4.啟動pgsql
pg_ctl start -D /data/pgsql/data
測試流復制
1.查看主庫
-bash-4.2$ ps -ef |grep postgres
postgres 3585 23019 0 10:37 pts/0 00:00:00 psql
postgres 3730 1 0 11:23 pts/0 00:00:00 /usr/bin/postgres -D /data/pgsql/data
postgres 3731 3730 0 11:23 ? 00:00:00 postgres: logger process
postgres 3733 3730 0 11:23 ? 00:00:00 postgres: checkpointer process
postgres 3734 3730 0 11:23 ? 00:00:00 postgres: writer process
postgres 3735 3730 0 11:23 ? 00:00:00 postgres: wal writer process
postgres 3736 3730 0 11:23 ? 00:00:00 postgres: autovacuum launcher process
postgres 3737 3730 0 11:23 ? 00:00:00 postgres: stats collector process
postgres 4066 3730 0 13:40 ? 00:00:00 postgres: wal sender process rep 192.168.3.141(47146) streaming 0/3000A38
postgres 4069 23019 0 13:42 pts/0 00:00:00 ps -ef
postgres 4070 23019 0 13:42 pts/0 00:00:00 grep --color=auto postgres
root 23018 21245 0 Apr01 pts/0 00:00:00 su - postgres
postgres 23019 23018 0 Apr01 pts/0 00:00:00 -bash
確認主庫中進程有“postgres: wal sender process”
2.查看從庫
-bash-4.2$ ps -ef |grep postgres
root 694 19367 0 11:27 pts/0 00:00:00 psql -h 192.168.3.139 -U postgres -W
root 724 19367 0 11:57 pts/0 00:00:00 su - postgres
postgres 725 724 0 11:57 pts/0 00:00:00 -bash
postgres 1069 1 0 14:13 pts/0 00:00:00 /usr/bin/postgres -D /data/pgsql/data
postgres 1070 1069 0 14:13 ? 00:00:00 postgres: logger process
postgres 1071 1069 0 14:13 ? 00:00:00 postgres: startup process recovering 000000010000000000000003
postgres 1072 1069 0 14:13 ? 00:00:00 postgres: checkpointer process
postgres 1073 1069 0 14:13 ? 00:00:00 postgres: writer process
postgres 1074 1069 0 14:13 ? 00:00:00 postgres: stats collector process
postgres 1075 1069 0 14:13 ? 00:00:00 postgres: wal receiver process streaming 0/3000AD0
postgres 1078 725 0 14:15 pts/0 00:00:00 ps -ef
postgres 1079 725 0 14:15 pts/0 00:00:00 grep --color=auto postgre
確定備庫進程中有”postgres: wal receiver process”
3.執行如下命令查看快照,它返回主庫記錄點、備庫記錄點;主庫每增加一條寫入,記錄點的值就會加1。
postgres=# select txid_current_snapshot();
txid_current_snapshot
-----------------------
1894:1894:
(1 row)
postgres=# create database testdb;
CREATE DATABASE
postgres=# select txid_current_snapshot();
txid_current_snapshot
-----------------------
1895:1895:
(1 row)
4.查看主備同步狀態
postgres=# select * from pg_stat_replication;
pid | usesysid | usename | application_name | client_addr | client_hostname | client_port | backend_start | state | sent_location | write_location | flush_location | replay_locat
ion | sync_priority | sync_state
------+----------+---------+------------------+---------------+-----------------+-------------+-------------------------------+-----------+---------------+----------------+----------------+-------------
----+---------------+------------
4066 | 16385 | rep | walreceiver | 10.10.10.61 | | 47146 | 2016-04-08 13:40:53.375562+08 | streaming | 0/3000B68 | 0/3000B68 | 0/3000B68 | 0/3000B68
| 0 | async
(1 row)
字段state顯示的同步狀態有:startup(連接中)、catchup(同步中)、streaming(同步);字段sync_state顯示的模式有:async(異步)、sync(同步)、potential(雖然現在是異步模式,但是有可能升級到同步模式)
5.主庫添加數據
postgres=# create table t1(id int4,create_time timestamp(0) without time zone);
CREATE TABLE
postgres=# insert into t1 values(1,now());
INSERT 0 1
postgres=# select * from t1;
id | create_time
----+---------------------
1 | 2016-04-08 13:53:50
(1 row)
6.在從庫查看數據
postgres=# c testdb
postgres=# select * from t1;
id | create_time
----+---------------------
1 | 2016-04-08 13:53:50
(1 row)
主從庫數據同步,則流復制成功。
7.從庫插入數據
postgres=# insert into t1 values(2,now());
ERROR: cannot execute INSERT in a read-only transaction
備庫上執行insert語句會報錯,因為備庫是只讀的。
主從切換
1.模擬主機故障
-bash-4.2$ pg_ctl stop -D /data/pgsql/data -m fast
waiting for server to shut down.... done
server stopped
2.提升備庫狀態
-bash-4.2$ pg_ctl promote -D /data/pgsql/data
server promoting
3.查看狀態
postgres=# select pg_is_in_recovery();
pg_is_in_recovery
-------------------
f
(1 row)
主庫返回f,備庫返回t
原master恢復為從
我們以新master和新slave稱呼
1.模擬新的主上新增數據
#新master
postgres=# insert into t1 values(2,now());
postgres=# insert into t1 values(3,now());
2.查看兩台數據庫服務器的pg_xlog;
#新master
-bash-4.2$ ll
total 81928
-rw-------. 1 postgres postgres 16777216 Apr 8 12:45 000000010000000000000002
-rw-------. 1 postgres postgres 16777216 Apr 8 16:09 000000010000000000000003
-rw-------. 1 postgres postgres 16777216 Apr 11 09:22 000000020000000000000003
-rw-------. 1 postgres postgres 16777216 Apr 11 10:24 000000020000000000000004
-rw-------. 1 postgres postgres 16777216 Apr 12 09:16 000000020000000000000005
-rw-------. 1 postgres postgres 56 Apr 8 16:10 00000002.history
drwx------. 2 postgres postgres 4096 Apr 8 12:45 archive_status
#新slave
-bash-4.2$ ll
total 49156
-rw-------. 1 postgres postgres 16777216 Apr 8 12:12 000000010000000000000001
-rw-------. 1 postgres postgres 16777216 Apr 8 12:12 000000010000000000000002
-rw-------. 1 postgres postgres 16777216 Apr 8 15:37 000000010000000000000003
drwx------. 2 postgres postgres 4096 Apr 1 09:22 archive_status
區別是新主上多出來的日志就是我們新插入數據後生成的日志
3.配置新master的pg_hba.conf
host replication rep 10.10.10.60/8 md5
4.查看主從pg_controldata輸出
#新master
-bash-4.2$ pg_controldata /data/pgsql/data
pg_control version number: 922
Catalog version number: 201204301
Database system identifier: 6268391932590681930
Database cluster state: in production
pg_control last modified: Mon 11 Apr 2016 04:36:25 PM CST
Latest checkpoint location: 0/5002E68
Prior checkpoint location: 0/5002DD0
Latest checkpoint's REDO location: 0/5002E30
Latest checkpoint's TimeLineID: 2
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 0/1899
Latest checkpoint's NextOID: 16390
Latest checkpoint's NextMultiXactId: 1
Latest checkpoint's NextMultiOffset: 0
Latest checkpoint's oldestXID: 1879
Latest checkpoint's oldestXID's DB: 1
Latest checkpoint's oldestActiveXID: 1899
Time of latest checkpoint: Mon 11 Apr 2016 04:36:25 PM CST
Minimum recovery ending location: 0/0
Backup start location: 0/0
Backup end location: 0/0
End-of-backup record required: no
Current wal_level setting: hot_standby
Current max_connections setting: 100
Current max_prepared_xacts setting: 0
Current max_locks_per_xact setting: 64
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Date/time type storage: 64-bit integers
Float4 argument passing: by value
Float8 argument passing: by value
#新slave
-bash-4.2$ pg_controldata /data/pgsql/data
pg_control version number: 922
Catalog version number: 201204301
Database system identifier: 6268391932590681930
Database cluster state: shut down in recovery
pg_control last modified: Mon 11 Apr 2016 04:06:39 PM CST
Latest checkpoint location: 0/3014EB8
Prior checkpoint location: 0/3014EB8
Latest checkpoint's REDO location: 0/3014EB8
Latest checkpoint's TimeLineID: 1
Latest checkpoint's full_page_writes: on
Latest checkpoint's NextXID: 0/1897
Latest checkpoint's NextOID: 16390
Latest checkpoint's NextMultiXactId: 1
Latest checkpoint's NextMultiOffset: 0
Latest checkpoint's oldestXID: 1879
Latest checkpoint's oldestXID's DB: 1
Latest checkpoint's oldestActiveXID: 0
Time of latest checkpoint: Fri 08 Apr 2016 03:37:07 PM CST
Minimum recovery ending location: 0/3014F18
Backup start location: 0/0
Backup end location: 0/0
End-of-backup record required: no
Current wal_level setting: hot_standby
Current max_connections setting: 100
Current max_prepared_xacts setting: 0
Current max_locks_per_xact setting: 64
Maximum data alignment: 8
Database block size: 8192
Blocks per segment of large relation: 131072
WAL block size: 8192
Bytes per WAL segment: 16777216
Maximum length of identifiers: 64
Maximum columns in an index: 32
Maximum size of a TOAST chunk: 1996
Date/time type storage: 64-bit integers
Float4 argument passing: by value
Float8 argument passing: by value
從輸出結果我們得出Latest checkpoint’s TimeLineID 新主為2,因為我們由從切換為新master並且新插入了數據,而新slave仍為1
**注意:**pg_controldata 輸出數據庫服務的當前狀態,可以根據 “Database cluster state: ” 的信息來判斷,如果值為 “in production” 說明是主庫,如果值為 “in archive recovery” 說明是備庫。
5.配置新slave的recovery.conf
-bash-4.2$ rpm -ql postgresql-server |grep recovery
/usr/share/pgsql/recovery.conf.sample
-bash-4.2$ cp /usr/share/pgsql/recovery.conf.sample /data/pgsql/data/recovery.conf
添加如下幾行
standby_mode = on
trigger_file = '/data/pgsql/pg.trigger'
primary_conninfo = 'host=10.10.10.61 port=5432 user=rep password=rep'
#新增如下一行
recovery_target_timeline = 'latest'
6.啟動新從
pg_ctl start -D /data/pgsql/data
7.查看日志pg_log
FATAL: timeline 2 of the primary does not match recovery target timeline 1
原因:由於新主從的timelineID不一致造成
8.我們將主從pg_xlog下的多出的文件copy到從上,注意後綴為history的文件也要復制,archive_status歸檔日志可以不用復制
scp 00000002* [email protected]:/data/pgsql/data/pg_xlog
9.重啟新slave數據庫
pg_ctl restart -D /data/pgsql/data
10.查看新slave數據是否同步
-bash-4.2$ psql
psql (9.2.15)
Type "help" for help.
postgres=# select * from t1;
id | create_time
----+---------------------
1 | 2016-04-08 13:53:50
2 | 2016-04-11 10:27:23
3 | 2016-04-11 10:33:17
(3 rows)
postgres=# select pg_is_in_recovery();
pg_is_in_recovery
-------------------
t
(1 row)
由上看出新slave和新master同步,並且狀態正常,切換成功。
另外,你可以通過比較主服務器上的當前的WAL寫位置與備服務器上的最新“接收/重做”的WAL位置來計算復制的延遲。它們各自可以通過在主服務器端使用pg_current_xlog_location函數來獲取,在備服務器上通過pg_last_xlog_receive_location或者pg_last_xlog_replay_location來獲取。
#新master
postgres=# select pg_current_xlog_location();
pg_current_xlog_location
--------------------------
0/50227C8
(1 row)
#新slave
postgres=# select pg_last_xlog_receive_location();
pg_last_xlog_receive_location
-------------------------------
0/50227C8
(1 row)
postgres=# select pg_last_xlog_replay_location();
pg_last_xlog_replay_location
------------------------------
0/50227C8
(1 row)