一、啟動Heartbeat
1、啟動主節點的Heartbeat
Heartbeat安裝完成後,自動在/etc/init.d目錄下生成了啟動腳步文件heartbeat,直接輸入/etc/init.d/heartbeat可以看到heartbeat腳本的用法,如下所示:
[root@node1 ~]# /etc/init.d/heartbeat
Usage: /etc/init.d/heartbeat {start|stop|status|restart|reload|force-reload}
因而啟動heartbeat可以通過如下命令進行:
[root@node1 ~]#service heartbeat start
或者通過
[root@node1 ~]#/etc/init.d/heartbeat start
這樣就啟動了主節點的heartbeat服務
日志信息如下:
Feb 5 19:09:48 node1 heartbeat: [22768]: info: glib: ucast: bound send socket to device: eth0
Feb 5 19:09:48 node1 heartbeat: [22768]: info: glib: ucast: bound receive socket to device: eth0
Feb 5 19:09:48 node1 heartbeat: [22768]: info: glib: ucast: started on port 694 interface eth0 to 192.168.12.1
Feb 5 19:09:48 node1 heartbeat: [22768]: info: glib: ping heartbeat started.
Feb 5 19:09:48 node1 heartbeat: [22768]: info: glib: ping group heartbeat started.
Feb 5 19:09:48 node1 heartbeat: [22768]: info: Local status now set to: 'up'
Feb 5 19:09:49 node1 heartbeat: [22768]: info: Link 192.168.12.1:192.168.12.1 up.
Feb 5 19:09:49 node1 heartbeat: [22768]: info: Status update for node 192.168.12.1: status ping
Feb 5 19:09:49 node1 heartbeat: [22768]: info: Link group1:group1 up.
Feb 5 19:09:49 node1 heartbeat: [22768]: info: Status update for node group1: status ping
此段日志是Heartbeat在進行初始化配置,例如,heartbeat的心跳時間間隔、UDP廣播端口、ping節點的運行狀態等,日志信息到這裡會暫停,等待120秒之後,heartbeat會繼續輸出日志,而這個120秒剛好是ha.cf中“initdead”選項的設定時間。此時heartbeat的輸出信息如下:
Feb 5 19:11:48 node1 heartbeat: [22768]: WARN: node node2: is dead
Feb 5 19:11:48 node1 heartbeat: [22768]: info: Comm_now_up(): updating status to active
Feb 5 19:11:48 node1 heartbeat: [22768]: info: Local status now set to: 'active'
Feb 5 19:11:48 node1 heartbeat: [22768]: info: Starting child client "/usr/local/ha/lib/heartbeat/pingd -m 100 -d 5s" (102,105)
Feb 5 19:11:49 node1 heartbeat: [22768]: WARN: No STONITH device configured.
Feb 5 19:11:49 node1 heartbeat: [22768]: WARN: Shared disks are not protected.
Feb 5 19:11:49 node1 heartbeat: [22768]: info: Resources being acquired from node2.
Feb 5 19:11:49 node1 heartbeat: [22794]: info: Starting "/usr/local/ha/lib/heartbeat/pingd -m 100 -d 5s" as uid 102 gid 105 (pid 22794)
在上面這段日志中,由於node2還沒有啟動,所以會給出“node2: is dead”的警告信息,接下來啟動了heartbeat插件pingd,由於我們在ha.cf文件中沒有配置STONITH,所以日志裡也給出了“No STONITH device configured”的警告提示。
繼續看下面的日志:
Feb 5 19:11:50 node1 IPaddr[22966]: INFO: Resource is stopped
Feb 5 19:11:50 node1 ResourceManager[22938]: info: Running /usr/local/ha/etc/ha.d/resource.d/IPaddr 192.168.12.135 start
Feb 5 19:11:50 node1 IPaddr[23029]: INFO: Using calculated nic for 192.168.12.135: eth0
Feb 5 19:11:50 node1 IPaddr[23029]: INFO: Using calculated netmask for 192.168.12.135: 255.255.255.0
Feb 5 19:11:51 node1 pingd: [22794]: info: attrd_lazy_update: Connecting to cluster... 5 retries remaining
Feb 5 19:11:51 node1 IPaddr[23029]: INFO: eval ifconfig eth0:0 192.168.12.135 netmask 255.255.255.0 broadcast 192.168.12.255
Feb 5 19:11:51 node1 avahi-daemon[2455]: Registering new address record for 192.168.12.135 on eth0.
Feb 5 19:11:51 node1 IPaddr[23015]: INFO: Success
Feb 5 19:11:51 node1 Filesystem[23134]: INFO: Resource is stopped
Feb 5 19:11:51 node1 ResourceManager[22938]: info: Running /usr/local/ha/etc/ha.d/resource.d/Filesystem /dev/sdf1 /data1 ext3 start
Feb 5 19:11:52 node1 Filesystem[23213]: INFO: Running start for /dev/sdf1 on /data1
Feb 5 19:11:52 node1 kernel: kjournald starting. Commit interval 5 seconds
Feb 5 19:11:52 node1 kernel: EXT3 FS on sdf1, internal journal
Feb 5 19:11:52 node1 kernel: EXT3-fs: mounted filesystem with ordered data mode.
Feb 5 19:11:52 node1 Filesystem[23205]: INFO: Success
上面這段日志是進行資源的監控和接管,主要完成haresources文件中的設置,在這裡是啟用集群虛擬IP和掛載磁盤分區
此時,通過ifconfig命令查看主節點的網絡配置,可以看到,主節點已經自動綁定了集群的IP地址,在HA集群之外的主機上通過ping命令檢測集群IP地址192.168.12.135,已經處於可通狀態,也就是該地址變的可用。
同時查看磁盤分區的掛載情況,共享磁盤分區/dev/sdf1已經被自動掛載。
2、啟動備用節點的Heartbeat
啟動備份節點的Heartbeat,與主節點方法一樣,使用如下命令:
[root@node2 ~]#/etc/init.d/heartbeat start
或者執行
[root@node2 ~]#service heartbeat start
這樣就啟動了備用節點的heartbeat服務,備用節點的heartbeat日志輸出信息與主節點相對應,通過“tail -f /var/log/messages”可以看到如下輸出:
Feb 19 02:52:15 node2 heartbeat: [26880]: info: Pacemaker support: false
Feb 19 02:52:15 node2 heartbeat: [26880]: info: **************************
Feb 19 02:52:15 node2 heartbeat: [26880]: info: Configuration validated. Starting heartbeat 3.0.4
Feb 19 02:52:15 node2 heartbeat: [26881]: info: heartbeat: version 3.0.4
Feb 19 02:52:15 node2 heartbeat: [26881]: info: Heartbeat generation: 1297766398
Feb 19 02:52:15 node2 heartbeat: [26881]: info: glib: UDP multicast heartbeat started for group 225.0.0.1 port 694 interface eth0 (ttl=1 loop=0)
Feb 19 02:52:15 node2 heartbeat: [26881]: info: glib: ucast: write socket priority set to IPTOS_LOWDELAY on eth0
Feb 19 02:52:15 node2 heartbeat: [26881]: info: glib: ucast: bound send socket to device: eth0
Feb 19 02:52:15 node2 heartbeat: [26881]: info: glib: ping heartbeat started.
Feb 19 02:52:15 node2 heartbeat: [26881]: info: glib: ping group heartbeat started.
Feb 19 02:52:15 node2 heartbeat: [26881]: info: Local status now set to: 'up'
Feb 19 02:52:16 node2 heartbeat: [26881]: info: Link node1:eth0 up.
Feb 19 02:52:16 node2 heartbeat: [26881]: info: Status update for node node1: status active
Feb 19 02:52:16 node2 heartbeat: [26881]: info: Link 192.168.12.1:192.168.12.1 up.
Feb 19 02:52:16 node2 heartbeat: [26881]: info: Status update for node 192.168.12.1: status ping
Feb 19 02:52:16 node2 heartbeat: [26881]: info: Link group1:group1 up.
Feb 19 02:52:16 node2 harc[26894]: info: Running /usr/local/ha/etc/ha.d//rc.d/status status
Feb 19 02:52:17 node2 heartbeat: [26881]: info: Comm_now_up(): updating status to active
Feb 19 02:52:17 node2 heartbeat: [26881]: info: Local status now set to: 'active'
二、測試heartbeat的高可用功能
如何才能得知HA集群是否正常工作,模擬環境測試是個不錯的方法,在把Heartbeat高可用性集群放到生產環境中之前,需要做如下幾個步驟的測試,從而確定HA是否正常工作:
(1)正常關閉和重啟主節點的heartbeat
首先在主節點node1上執行“service heartbeat stop”正常關閉主節點的Heartbeat進程,此時通過ifconfig命令查看主節點網卡信息,正常情況下,應該可以看到主節點已經釋放了集群的服務IP地址,同時也釋放了掛載的共享磁盤分區,然後查看備份節點,現在備份節點已經接管了集群的服務IP,同時也自動掛載上了共享的磁盤分區。
在這個過程中,使用ping命令對集群服務IP進行測試,可以看到,集群IP一致處於可通狀態,並沒有任何延時和阻塞現象,也就是說在正常關閉主節點的情況下,主備節點的切換是無縫的,HA對外提供的服務也可以不間斷運行。
接著,將主節點heartbeat正常啟動,heartbeat啟動後,備份節點將自動釋放集群服務IP,同時卸載共享磁盤分區,而主節點將再次接管集群服務IP和掛載共享磁盤分區,其實備份節點釋放資源與主節點綁定資源是同步進行的。因而,這個過程也是一個無縫切換。
(2)在主節點上拔去網線
拔去主節點連接公共網絡的網線後,heartbeat插件ipfail通過ping測試可以立刻檢測到網絡連接失敗,接著自動釋放資源,而就在此時,備用節點的ipfail插件也會檢測到主節點出現網絡故障,在等待主節點釋放資源完畢後,備用節點馬上接管了集群資源,從而保證了網絡服務不間斷持續運行。
同理,當主節點網絡恢復正常時,由於設置了“auto_failback on”選項,集群資源將自動從備用節點切會主節點。
(3)關閉主節點的系統
在主節點拔去電源後,備用節點的heartbeat進程會立刻收到主節點已經shutdown的消息,備用節點就開始進行資源的接管,這種情況其實和主節點網絡故障的現象類似。
(4)讓主節點系統內核崩潰
當主節點系統崩潰後,網絡也就失去了響應,那麼備用節點的heartbeat進程就會立刻檢測到主節點網絡故障,然後進行資源切換,但是由於主節點系統內核崩潰,導致自身不能卸載所占有的資源,例如共享磁盤分區、集群服務IP等,那麼此時如果沒有類似Stonith設備的話,就會出現資源爭用的情況,但是如果有Stonith設備,Stonith設備會首先將故障的主節點電源關閉或者重啟此節點等操作,這樣就讓主節點釋放了集群資源,當Stonith設備完成所有操作時,備份節點才拿到接管主節點資源的所有權,從而接管主節點的資源。
(完)