MHA(Master HA)是一款開源的MySQL的高可用程序,它為MySQL主從復制架構提供了
automating master failover 功能。MHA在監控到master節點故障時,會提升其中
擁有最新數據的slave節點來成為新的master節點。在此期間,MHA會通過於其它從節點
獲取額外信息來避免一致性方面的問題。MHA還提供了master節點的在線切換功能,即按
需切換master/slave節點。
MHA服務有兩咱角色。MHA Mangager(管理節點)和MHA Node(數據節點):
MHA Manager:通常單獨部署在一台獨立機器上管理多個master/slave集群,每個
master/slave集群稱作一個application:
MHA node:運行在每台MySQL服務器之上(master/slave/manager),它通過監控
具務解析和清理logs功能來腳本來加快故障轉移。
MHA會提供諸多工具程序,其常見的如下所示。
Manager節點:
masterha_check_ssh:MHA依賴的SSH環境檢測工具;
masterha_check_repl:MySQL復制環境檢測工具;
masterha_mamager:MHA服務主程序;
masterha_check_status:MHA運行狀態探測工具;
masterha_master_monitor:MySQL master節點可用性監測工具;
masterha_master_swith:master節點切換工具;
masterha_conf_host:添加或刪除配置的節點;
masterha_stop:斗閉MHA服務的工具 ;
Node 節點:
save_binary_logs:保存和復制Master的二進制日志 ;
apply_diff_replay_logs:識別差異的中繼日志事件並應用於其它slave;
filter_mysqlbinlog:去除不必要的ROLLBACK事件(MHA已不再使用這個工具);
purge_replay_logs:清除中繼日志(不會阻塞SQL線程);
自定義擴展:
secondary_check_script:通過多條網絡路由檢測master的可用性;
master_ip_failover_scipt:更新application使用的masterip;
shutdown_script:強制關閉master節點;
report_scipt:發送報告;
init_conf_load_script:加載初始配置參數;
master_ip_online_change_scipt:更新master節點ip地址;
MHA對MySQL復制環境有特殊要求,例如各節點都要開啟二進制日志及中繼日志,各從節點
必須啟用其read-only skip_name_resolve innodb_file_per_table=ON 屬性,並關閉relay_log_purge
功能等,還有體集事物所必需的同步;同步主機:
#for i in {0..3} ; do ssh node$i ntpdate 172.16.0.1 ; done
本實驗境共有四個節點,其中角色分配如下;
node0 :MHA Manager(172.16.23.10); node1 :MariaDB master node2 :MariaDB slave node3 :MariaDB slave
各節點的/etc/hosts文件配置內容添加如下內容;
172.16.23.10 node0 node0.rj.com 172.16.23.11 node1 node1.rj.com 172.16.23.12 node2 node2.rj.com 172.16.23.13 node3 node3.rj.com
初始主節點的master配置; node1 # vim /etc/my.cnf innodb_file_per_table=ON 此項內容與下面一項內容一般在服務器啟動時便加上 skip_name_resolve=ON server_id=1 relay_log=relay-bin log-bin=log-bin symbolic-links=0 所有slave節點依賴的配置; node(2,3)# vim /etc/my.cnf innodb_file_per_table=ON skip_name_resolve=ON server_id=2 #注 此處的結點到了別的節點中一定要改,各節點的id必需唯一; relay-log=relay-bin log-bin=master-bin relay-log-purge=OFF read-only=ON
按上述要求分別配置好主從節點之後,按MySQL復制配置架構的配置方式將其配置完成並吂動
master節點和各slave節點,以及各slave節點啟動其IO和SQL線程,確保主從復制運行無誤
node1 mysql> GRANT REPLICATION SLAVE,REPLICATION CLIENT ON *.* TO 'rj'@'172.16.%.%' IDENTIFIED BY 'centos.123' ; node(2,3) mysql> CHANGE MASTER TO MASTER_HOST='172.16.23.11',MASTER_USER='rj',MASTER_PASSWORD='centos.123',MASTER_LOG_FILE='log-bin.000001',MASTER_LOG_POS=492; mysql> START SLAVE; mysql> SHOW SLAVE STATUS; Slave_IO_Running: Yes Slave_SQL_Running: Yes 以下兩項是否為yes
同時,也可以在主結點上創建個庫或表,查看其是否可台實現主從同步;
而後,在所有MySQL節點授權擁有管理權限的用戶可在本地網絡中有其它節點上遠程方問。
當然,此時僅需要且只能在master節點運行類似如下SQL語句即可。
mysql> GRANT ALL ON *.* to 'rjyy'@'172.16.%.%' IDENTIFIED BY 'centos.123';
三、安裝配置MHA
准備基於ssh互信通信環境
MHA集群中的各節點彼此之間均需基於ssh互信通信,以實現遠程控制及數據管理功能。簡單起見,可在Manager
節點生成密鑰對兒,並設置其可遠程連接本地主機後,將私鑰文件及authorized_keys文件復制給余下的所有結點即可。
下面的的操作在manager(node0)節點操作完成。
# for i in {0..3} ; do scp -p .ssh/id_rsa .ssh/authorized_keys node$i:/root/.ssh/; done
安裝MHA
node0 # yum install mha4mysql-manager-0.56-0.el6.noarch.rpm mha4mysql-node-0.56-0.el6.noarch.rpm # for i in {1..3}; do scp mha4mysql-node-0.56-0.el6.noarch.rpm node$i:/root/ ; done node(1,2,3) # yum -y install mha4mysql-node-0.56-0.el6.noarch.rpm [[email protected] ~]# rpm -ql mha4mysql-manager /usr/bin/masterha_check_repl /usr/bin/masterha_check_ssh /usr/bin/masterha_check_status /usr/bin/masterha_conf_host /usr/bin/masterha_manager /usr/bin/masterha_master_monitor /usr/bin/masterha_master_switch /usr/bin/masterha_secondary_check /usr/bin/masterha_stop 以上為可執行文件,就是上面所列出的命令 /usr/share/man/man1/masterha_check_repl.1.gz /usr/share/man/man1/masterha_check_ssh.1.gz /usr/share/man/man1/masterha_check_status.1.gz /usr/share/man/man1/masterha_conf_host.1.gz /usr/share/man/man1/masterha_manager.1.gz /usr/share/man/man1/masterha_master_monitor.1.gz /usr/share/man/man1/masterha_master_switch.1.gz /usr/share/man/man1/masterha_secondary_check.1.gz /usr/share/man/man1/masterha_stop.1.gz /usr/share/perl5/vendor_perl/MHA/Config.pm /usr/share/perl5/vendor_perl/MHA/DBHelper.pm /usr/share/perl5/vendor_perl/MHA/FileStatus.pm /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm /usr/share/perl5/vendor_perl/MHA/ManagerAdmin.pm /usr/share/perl5/vendor_perl/MHA/ManagerAdminWrapper.pm /usr/share/perl5/vendor_perl/MHA/ManagerConst.pm /usr/share/perl5/vendor_perl/MHA/ManagerUtil.pm /usr/share/perl5/vendor_perl/MHA/MasterFailover.pm /usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm /usr/share/perl5/vendor_perl/MHA/MasterRotate.pm /usr/share/perl5/vendor_perl/MHA/SSHCheck.pm /usr/share/perl5/vendor_perl/MHA/Server.pm /usr/share/perl5/vendor_perl/MHA/ServerManager.pm
Manager節點需要為每個監控的master/slave集群提供一個專用的配置文件,而所有的master/salve集群也可共享全局
配置。全局配置文件默認為/etc/masterha_default.cnf,其為可先配置。如果僅監控一組 master/slave集群,也可
直接通過application的配置提供各服務器的默認配置信息。而每個application的配置文件路徑為自定義,本示例將使用
/etc/masterha/appl.cnf
# vim /etc/masterha/appl.cnf [server default] user=rjrj password=centos.123 manager_workdir=/data/masterha/app1 manager_log=/data/masterha/app1/manager.log remote_workdir=/data/masterha/app1 ssh_user=root repl_user=rjrj repl_password=centos.123 ping_interval=1 [server1] hostname=172.16.23.11 candidate_master=1 [server2] hostname=172.16.23.12 candidate_master=1 [server3] hostname=172.16.23.13
[[email protected] ~]# masterha_check_ssh --conf=/etc/masterha/app1.cnf Tue Feb 21 23:13:48 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Tue Feb 21 23:13:48 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf.. Tue Feb 21 23:13:48 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf.. Tue Feb 21 23:13:48 2017 - [info] Starting SSH connection tests.. Tue Feb 21 23:13:49 2017 - [debug] Tue Feb 21 23:13:48 2017 - [debug] Connecting via SSH from [email protected](172.16.23.11:22) to [email protected](172.16.23.12:22).. Tue Feb 21 23:13:48 2017 - [debug] ok. Tue Feb 21 23:13:48 2017 - [debug] Connecting via SSH from [email protected](172.16.23.11:22) to [email protected](172.16.23.13:22).. Tue Feb 21 23:13:48 2017 - [debug] ok. Tue Feb 21 23:13:49 2017 - [debug] Tue Feb 21 23:13:48 2017 - [debug] Connecting via SSH from [email protected](172.16.23.12:22) to [email protected](172.16.23.11:22).. Tue Feb 21 23:13:49 2017 - [debug] ok. Tue Feb 21 23:13:49 2017 - [debug] Connecting via SSH from [email protected](172.16.23.12:22) to [email protected](172.16.23.13:22).. Tue Feb 21 23:13:49 2017 - [debug] ok. Tue Feb 21 23:13:50 2017 - [debug] Tue Feb 21 23:13:49 2017 - [debug] Connecting via SSH from [email protected](172.16.23.13:22) to [email protected](172.16.23.11:22).. Tue Feb 21 23:13:49 2017 - [debug] ok. Tue Feb 21 23:13:49 2017 - [debug] Connecting via SSH from [email protected](172.16.23.13:22) to [email protected](172.16.23.12:22).. Tue Feb 21 23:13:50 2017 - [debug] ok. Tue Feb 21 23:13:50 2017 - [info] All SSH connection tests passed successfully.
最後顯示為successfully表示已經成功了
[[email protected] ~]# masterha_check_repl --conf=/etc/masterha/app1.cnf Wed Feb 22 10:18:40 2017 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping. Wed Feb 22 10:18:40 2017 - [info] Reading application default configuration from /etc/masterha/app1.cnf.. Wed Feb 22 10:18:40 2017 - [info] Reading server configuration from /etc/masterha/app1.cnf.. Wed Feb 22 10:18:40 2017 - [info] MHA::MasterMonitor version 0.56. Wed Feb 22 10:18:41 2017 - [info] GTID failover mode = 0 Wed Feb 22 10:18:41 2017 - [info] Dead Servers: Wed Feb 22 10:18:41 2017 - [info] Alive Servers: Wed Feb 22 10:18:41 2017 - [info] 172.16.23.11(172.16.23.11:3306) Wed Feb 22 10:18:41 2017 - [info] 172.16.23.12(172.16.23.12:3306) Wed Feb 22 10:18:41 2017 - [info] 172.16.23.13(172.16.23.13:3306) Wed Feb 22 10:18:41 2017 - [info] Alive Slaves: Wed Feb 22 10:18:41 2017 - [info] 172.16.23.12(172.16.23.12:3306) Version=5.5.44-MariaDB-log (oldest major version between slaves) log-bin:enabled Wed Feb 22 10:18:41 2017 - [info] Replicating from 172.16.23.11(172.16.23.11:3306) Wed Feb 22 10:18:41 2017 - [info] Primary candidate for the new Master (candidate_master is set) Wed Feb 22 10:18:41 2017 - [info] 172.16.23.13(172.16.23.13:3306) Version=5.5.44-MariaDB-log (oldest major version between slaves) log-bin:enabled Wed Feb 22 10:18:41 2017 - [info] Replicating from 172.16.23.11(172.16.23.11:3306) Wed Feb 22 10:18:41 2017 - [info] Current Alive Master: 172.16.23.11(172.16.23.11:3306) Wed Feb 22 10:18:41 2017 - [info] Checking slave configurations.. Wed Feb 22 10:18:41 2017 - [info] Checking replication filtering settings.. Wed Feb 22 10:18:41 2017 - [info] binlog_do_db= , binlog_ignore_db= Wed Feb 22 10:18:41 2017 - [info] Replication filtering check ok. Wed Feb 22 10:18:41 2017 - [info] GTID (with auto-pos) is not supported Wed Feb 22 10:18:41 2017 - [info] Starting SSH connection tests.. Wed Feb 22 10:18:43 2017 - [info] All SSH connection tests passed successfully. Wed Feb 22 10:18:43 2017 - [info] Checking MHA Node version.. Wed Feb 22 10:18:48 2017 - [info] Version check ok. Wed Feb 22 10:18:48 2017 - [info] Checking SSH publickey authentication settings on the current master.. Wed Feb 22 10:18:48 2017 - [info] HealthCheck: SSH to 172.16.23.11 is reachable. Wed Feb 22 10:18:49 2017 - [info] Master MHA Node version is 0.56. Wed Feb 22 10:18:49 2017 - [info] Checking recovery script configurations on 172.16.23.11(172.16.23.11:3306).. Wed Feb 22 10:18:49 2017 - [info] Executing command: save_binary_logs --command=test --start_pos=4 --binlog_dir=/var/lib/mysql,/var/log/mysql --output_file=/data/masterha/app1/save_binary_logs_test --manager_version=0.56 --start_file=log-bin.000003 Wed Feb 22 10:18:49 2017 - [info] Connecting to [email protected](172.16.23.11:22).. Creating /data/masterha/app1 if not exists.. Creating directory /data/masterha/app1.. done. ok. Checking output directory is accessible or not.. ok. Binlog found at /var/lib/mysql, up to log-bin.000003 Wed Feb 22 10:18:50 2017 - [info] Binlog setting check done. Wed Feb 22 10:18:50 2017 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers.. Wed Feb 22 10:18:50 2017 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='rjyy' --slave_host=172.16.23.12 --slave_ip=172.16.23.12 --slave_port=3306 --workdir=/data/masterha/app1 --target_version=5.5.44-MariaDB-log --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx Wed Feb 22 10:18:50 2017 - [info] Connecting to [email protected](172.16.23.12:22).. Creating directory /data/masterha/app1.. done. Checking slave recovery environment settings.. Opening /var/lib/mysql/relay-log.info ... ok. Relay log found at /var/lib/mysql, up to relay-bin.000009 Temporary relay log file is /var/lib/mysql/relay-bin.000009 Testing mysql connection and privileges.. done. Testing mysqlbinlog output.. done. Cleaning up test file(s).. done. Wed Feb 22 10:18:51 2017 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='rjyy' --slave_host=172.16.23.13 --slave_ip=172.16.23.13 --slave_port=3306 --workdir=/data/masterha/app1 --target_version=5.5.44-MariaDB-log --manager_version=0.56 --relay_log_info=/var/lib/mysql/relay-log.info --relay_dir=/var/lib/mysql/ --slave_pass=xxx Wed Feb 22 10:18:51 2017 - [info] Connecting to [email protected](172.16.23.13:22).. Creating directory /data/masterha/app1.. done. Checking slave recovery environment settings.. Opening /var/lib/mysql/relay-log.info ... ok. Relay log found at /var/lib/mysql, up to relay-bin.000007 Temporary relay log file is /var/lib/mysql/relay-bin.000007 Testing mysql connection and privileges.. done. Testing mysqlbinlog output.. done. Cleaning up test file(s).. done. Wed Feb 22 10:18:52 2017 - [info] Slaves settings check done. Wed Feb 22 10:18:52 2017 - [info] 172.16.23.11(172.16.23.11:3306) (current master) +--172.16.23.12(172.16.23.12:3306) +--172.16.23.13(172.16.23.13:3306) Wed Feb 22 10:18:52 2017 - [info] Checking replication health on 172.16.23.12.. Wed Feb 22 10:18:52 2017 - [info] ok. Wed Feb 22 10:18:52 2017 - [info] Checking replication health on 172.16.23.13.. Wed Feb 22 10:18:52 2017 - [info] ok. Wed Feb 22 10:18:52 2017 - [warning] master_ip_failover_script is not defined. Wed Feb 22 10:18:52 2017 - [warning] shutdown_script is not defined. Wed Feb 22 10:18:52 2017 - [info] Got exit code 0 (Not master dead). MySQL Replication Health is OK.
檢查管理的MySQL復制集群的連接配置參數是正常的;
# nohup masterha_manager --conf=/etc/masterha/app1.cnf > /data/master/app1/manager.log 2>&1 &
[[email protected] ~]# systemctl stop mariadb 讓主服務器停掉; [[email protected] ~]# mysql Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB connection id is 50 Server version: 5.5.44-MariaDB-log MariaDB Server Copyright (c) 2000, 2015, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MariaDB [(none)]> SHOW MASTER STATUS; +-------------------+----------+--------------+------------------+ | File | Position | Binlog_Do_DB | Binlog_Ignore_DB | +-------------------+----------+--------------+------------------+ | master-bin.000005 | 245 | | | +-------------------+----------+--------------+------------------+ 1 row in set (0.00 sec) MariaDB [(none)]> SHOW SLAVE STATUS\G; Empty set (0.00 sec) ERROR: No query specified
[11:12:[email protected]~]#mysql Welcome to the MariaDB monitor. Commands end with ; or \g. Your MariaDB connection id is 33 Server version: 5.5.44-MariaDB-log MariaDB Server Copyright (c) 2000, 2015, Oracle, MariaDB Corporation Ab and others. Type 'help;' or '\h' for help. Type '\c' to clear the current input statement. MariaDB [(none)]> SHOW SLAVE STATUS\G; *************************** 1. row *************************** Slave_IO_State: Waiting for master to send event Master_Host: 172.16.23.12 Master_User: rj Master_Port: 3306 Connect_Retry: 60 Master_Log_File: master-bin.000008 Read_Master_Log_Pos: 245 Relay_Log_File: relay-bin.000002 Relay_Log_Pos: 530 Relay_Master_Log_File: master-bin.000008 Slave_IO_Running: Yes Slave_SQL_Running: Yes Replicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 0 Last_Error: Skip_Counter: 0 Exec_Master_Log_Pos: 245 Relay_Log_Space: 818 Until_Condition: None Until_Log_File: Until_Log_Pos: 0 Master_SSL_Allowed: No Master_SSL_CA_File: Master_SSL_CA_Path: Master_SSL_Cert: Master_SSL_Cipher: Master_SSL_Key: Seconds_Behind_Master: 0 Master_SSL_Verify_Server_Cert: No Last_IO_Errno: 0 Last_IO_Error: Last_SQL_Errno: 0 Last_SQL_Error: Replicate_Ignore_Server_Ids: Master_Server_Id: 2 1 row in set (0.00 sec)