這裡首先感謝酒哥的構建高可用的Linux服務器的這本書,看了這本書上並參考裡面的配置讓自己對DRBD+HeartBeat+NFS思路清晰了許多。
drbd簡單來說就是一個網絡raid-1,一般有2到多個node節點,各個節點創建的磁盤塊會映射到本地drbd塊,而後通過網絡對各個節點drbd磁盤塊進行互相同步更新。
heartbeat的作用就可以增加drbd的可用性,它能在某節點故障後,自動切換drbd塊到備份節點,並自動進行虛IP從新綁定,DRBD塊提權,磁盤掛載以及啟動NFS等腳本操作,這一系列操作因為只在他後端節點間完成,前端用戶訪問的是heartbeat的虛IP,所以對用戶來說無任何感知。
最後吐槽下,yum安裝真心坑爹,以後如果非必須,盡量源碼包安裝。
系統版本: centos6.3 x64(內核2.6.32)
DRBD: DRBD-8.4.3
HeartBeat:epel更新源(真坑)
NFS: 系統自帶
HeartBeat VIP: 192.168.7.90
node1 DRBD+HeartBeat: 192.168.7.88(drbd1.example.com)
node2 DRBD+HeartBeat: 192.168.7.89 (drbd2.example.com)
(node1)為僅主節點端配置
(node2)為僅從節點端配置
(node1,node2)為主從節點都需配置
一.DRBD配置,傳送門:http://showerlee.blog.51cto.com/2047005/1211963
二.Hearbeat配置;
這裡接著DRBD系統環境及安裝配置:
1.安裝heartbeat(CentOS6.3中默認不帶有Heartbeat包,因此需要從第三方下載)(node1,node2)
# wget ftp://mirror.switch.ch/pool/1/mirror/scientificlinux/6rolling/i386/os/Packages/epel-release-6-5.noarch.rpm
# rpm -ivUh epel-release-6-5.noarch.rpm
# yum --enablerepo=epel install heartbeat -y
2.配置heartbeat
(node1)
# vi /etc/ha.d/ha.cf
---------------
# 日志
logfile /var/log/ha-log
logfacility local0
# 心跳監測時間
keepalive 2
# 死亡時間
deadtime 5
# 指定對方IP:
ucast eth0 192.168.7.89
# 服務器正常後由主服務器接管資源,另一台服務器放棄該資源
auto_failback off
#定義節點
node drbd1.example.com drbd2.example.com
---------------
(node2)
# vi /etc/ha.d/ha.cf
---------------
# 日志
logfile /var/log/ha-log
logfacility local0
# 心跳監測時間
keepalive 2
# 死亡時間
deadtime 5
# 指定對方IP:
ucast eth0 192.168.7.88
# 服務器正常後由主服務器接管資源,另一台服務器放棄該資源
auto_failback off
#定義節點
node drbd1.example.com drbd2.example.com
---------------
編輯雙機互聯驗證文件:(node1,node2)
# vi /etc/ha.d/authkeys
--------------
auth 1
1 crc
--------------
# chmod 600 /etc/ha.d/authkeys
編輯集群資源文件:(node1,node2)
# vi /etc/ha.d/haresources
--------------
drbd1.example.com IPaddr::192.168.7.90/24/eth0 drbddisk::r0 Filesystem::/dev/drbd0::/data::ext4 killnfsd
--------------
注:該文件內IPaddr,Filesystem等腳本存放路徑在/etc/ha.d/resource.d/下,也可在該目錄下存放服務啟動腳本(例如:mysql,www),將相同腳本名稱添加到/etc/ha.d/haresources內容中,從而跟隨heartbeat啟動而啟動該腳本。
IPaddr::192.168.7.90/24/eth0:用IPaddr腳本配置浮動VIP
drbddisk::r0:用drbddisk腳本實現DRBD主從節點資源組的掛載和卸載
Filesystem::/dev/drbd0::/data::ext4:用Filesystem腳本實現磁盤掛載和卸載
編輯腳本文件killnfsd,用來重啟NFS服務:
注:因為NFS服務切換後,必須重新mount NFS共享出來的目錄,否則會報錯(待驗證)
# vi /etc/ha.d/resource.d/killnfsd
-----------------
killall -9 nfsd; /etc/init.d/nfs restart;exit 0
-----------------
賦予執行權限:
# chmod 755 /etc/ha.d/resource.d/killnfsd
創建DRBD腳本文件drbddisk:(node1,node2)
注:
此處又是一個大坑,如果不明白Heartbeat目錄結構的朋友估計要在這裡被卡到死,因為默認yum安裝Heartbeat,不會在/etc/ha.d/resource.d/創建drbddisk腳本,而且也無法在安裝後從本地其他路徑找到該文件。
此處本人也是因為啟動Heartbeat後無法PING通虛IP,最後通過查看/var/log/ha-log日志,找到一行
ERROR: Cannot locate resource script drbddisk
然後進而到/etc/ha.d/resource.d/路徑下發現竟然沒有drbddisk腳本,最後在google上找到該代碼,創建該腳本,終於測試通過:
# vi /etc/ha.d/resource.d/drbddisk
-----------------------
#!/bin/bash
#
# This script is inteded to be used as resource script by heartbeat
#
# Copright 2003-2008 LINBIT Information Technologies
# Philipp Reisner, Lars Ellenberg
#
###
DEFAULTFILE="/etc/default/drbd"
DRBDADM="/sbin/drbdadm"
if [ -f $DEFAULTFILE ]; then
. $DEFAULTFILE
fi
if [ "$#" -eq 2 ]; then
RES="$1"
CMD="$2"
else
RES="all"
CMD="$1"
fi
## EXIT CODES
# since this is a "legacy heartbeat R1 resource agent" script,
# exit codes actually do not matter that much as long as we conform to
# http://wiki.linux-ha.org/HeartbeatResourceAgent
# but it does not hurt to conform to lsb init-script exit codes,
# where we can.
# http://refspecs.linux-foundation.org/LSB_3.1.0/
#LSB-Core-generic/LSB-Core-generic/iniscrptact.html
####
drbd_set_role_from_proc_drbd()
{
local out
if ! test -e /proc/drbd; then
ROLE="Unconfigured"
return
fi
dev=$( $DRBDADM sh-dev $RES )
minor=${dev#/dev/drbd}
if [[ $minor = *[!0-9]* ]] ; then
# sh-minor is only supported since drbd 8.3.1
minor=$( $DRBDADM sh-minor $RES )
fi
if [[ -z $minor ]] || [[ $minor = *[!0-9]* ]] ; then
ROLE=Unknown
return
fi
if out=$(sed -ne "/^ *$minor: cs:/ { s/:/ /g; p; q; }" /proc/drbd); then
set -- $out
ROLE=${5%/**}
: ${ROLE:=Unconfigured} # if it does not show up
else
ROLE=Unknown
fi
}
case "$CMD" in
start)
# try several times, in case heartbeat deadtime
# was smaller than drbd ping time
try=6
while true; do
$DRBDADM primary $RES && break
let "--try" || exit 1 # LSB generic error
sleep 1
done
;;
stop)
# heartbeat (haresources mode) will retry failed stop
# for a number of times in addition to this internal retry.
try=3
while true; do
$DRBDADM secondary $RES && break
# We used to lie here, and pretend success for anything != 11,
# to avoid the reboot on failed stop recovery for "simple
# config errors" and such. But that is incorrect.
# Don't lie to your cluster manager.
# And don't do config errors...
let --try || exit 1 # LSB generic error
sleep 1
done
;;
status)
if [ "$RES" = "all" ]; then
echo "A resource name is required for status inquiries."
exit 10
fi
ST=$( $DRBDADM role $RES )
ROLE=${ST%/**}
case $ROLE in
Primary|Secondary|Unconfigured)
# expected
;;
*)
# unexpected. whatever...
# If we are unsure about the state of a resource, we need to
# report it as possibly running, so heartbeat can, after failed
# stop, do a recovery by reboot.
# drbdsetup may fail for obscure reasons, e.g. if /var/lock/ is
# suddenly readonly. So we retry by parsing /proc/drbd.
drbd_set_role_from_proc_drbd
esac
case $ROLE in
Primary)
echo "running (Primary)"
exit 0 # LSB status "service is OK"
;;
Secondary|Unconfigured)
echo "stopped ($ROLE)"
exit 3 # LSB status "service is not running"
;;
*)
# NOTE the "running" in below message.
# this is a "heartbeat" resource script,
# the exit code is _ignored_.
echo "cannot determine status, may be running ($ROLE)"
exit 4 # LSB status "service status is unknown"
;;
esac
;;
*)
echo "Usage: drbddisk [resource] {start|stop|status}"
exit 1
;;
esac
exit 0
-----------------------
賦予執行權限:
# chmod 755 /etc/ha.d/resource.d/drbddisk
在兩個節點上啟動HeartBeat服務,先啟動node1:(node1,node2)
# service heartbeat start
# chkconfig heartbeat on
這裡能夠PING通虛IP 192.168.7.90,表示配置成功
三.配置NFS:(node1,node2)
# vi /etc/exports
-----------------
/data *(rw,no_root_squash)
-----------------
重啟NFS服務:
# service rpcbind restart
# service nfs restart
# chkconfig rpcbind on
# chkconfig nfs off
這裡設置NFS開機不要自動運行,因為/etc/ha.d/resource.d/killnfsd 該腳本內容控制NFS的啟動。
四.最終測試
在另外一台LINUX的客戶端掛載虛IP:192.168.7.90,掛載成功表明NFS+DRBD+HeartBeat大功告成.
# mount -t nfs 192.168.7.90:/data /tmp
# df -h
---------------
......
192.168.7.90:/data 1020M 34M 934M 4% /tmp
---------------
測試DRBD+HeartBeat+NFS可用性:
1.向掛載的/tmp目錄傳送文件,忽然重新啟動主端DRBD服務,查看變化
經本人測試能夠實現斷點續傳
2.正常狀態重啟Primary主機後,觀察主DRBD狀態是否恢復Primary並能正常被客戶端掛載並且之前寫入的文件存在,可以正常再寫入文件。
經本人測試可以正常恢復,且客戶端無需重新掛載NFS共享目錄,之前數據存在,且可直接寫入文件。
3.當Primary主機因為硬件損壞或其他原因需要關機維修,需要將Secondary提升為Primary主機,如何手動操作?
如果設備能夠正常啟動則按照如下操作,無法啟動則強行提升Secondary為Primary,待宕機設備能夠正常啟動,若“腦裂”,再做後續修復工作。
首先先卸載客戶端掛載的NFS主機目錄
# umount /tmp
(node1)
卸載DRBD設備:
# service nfs stop
# umount /data
降權:
# drbdadm secondary r0
查看狀態,已降權
# service drbd status
-----------------
drbd driver loaded OK; device status:
version: 8.4.3 (api:1/proto:86-101)
GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by [email protected], 2013-05-27 20:45:19
m:res cs ro ds p mounted fstype
0:r0 Connected Secondary/Secondary UpToDate/UpToDate C
-----------------
(node2)
提權:
# drbdadm primary r0
查看狀態,已提權:
# service drbd status
----------------
drbd driver loaded OK; device status:
version: 8.4.3 (api:1/proto:86-101)
GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by [email protected], 2013-05-27 20:49:06
m:res cs ro ds p mounted fstype
0:r0 Connected Primary/Secondary UpToDate/UpToDate C
----------------
這裡還未掛載DRBD目錄,讓Heartbeat幫忙掛載:
注:若重啟過程中發現Heartbeat日志報錯:
ERROR: glib: ucast: error binding socket. Retrying: Permission denied
請檢查selinux是否關閉
# service heartbeat restart
# service drbd status
-----------------------
drbd driver loaded OK; device status:
version: 8.4.3 (api:1/proto:86-101)
GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by [email protected], 2013-05-27 20:49:06
m:res cs ro ds p mounted fstype
0:r0 Connected Primary/Secondary UpToDate/UpToDate C /data ext4
------------------------
成功讓HeartBeat掛載DRBD目錄
重新在客戶端做NFS掛載測試:
# mount -t nfs 192.168.7.90:/data /tmp
# ll /tmp
------------------
1 10 2 2222 3 4 5 6 7 8 9 lost+found orbit-root
------------------
重啟剛剛被提權的主機,待重啟查看狀態:
# service drbd status
------------------------
drbd driver loaded OK; device status:
version: 8.4.3 (api:1/proto:86-101)
GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by [email protected], 2013-05-27 20:49:06
m:res cs ro ds p mounted fstype
0:r0 WFConnection Primary/Unknown UpToDate/DUnknown C /data ext4
------------------------
HeartBeat成功掛載DRBD目錄,drbd無縫連接到備份節點,客戶端使用NFS掛載點對故障無任何感知。
 
4.測試最後剛才那台宕機重新恢復正常後,他是否會從新奪取Primary資源?
重啟後不會重新獲取資源,需手動切換主從權限方可。
注:vi /etc/ha.d/ha.cf配置文件內該參數:
--------------------
auto_failback off
--------------------
表示服務器正常後由新的主服務器接管資源,另一台舊服務器放棄該資源
5.以上都未利用heartbeat實現故障自動轉移,當線上DRBD主節點宕機,備份節點是否立即無縫接管,heartbeat+drbd高可用性是否能夠實現?
首先先在客戶端掛載NFS共享目錄
# mount -t nfs 192.168.7.90:/data /tmp
a.模擬將主節點node1 的heartbeat服務停止,則備節點node2是否接管服務?
(node1)
# service drbd status
----------------------------
drbd driver loaded OK; device status:
version: 8.4.3 (api:1/proto:86-101)
GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by [email protected], 2013-05-27 20:45:19
m:res cs ro ds p mounted fstype
0:r0 Connected Primary/Secondary UpToDate/UpToDate C /data ext4
----------------------------
# service heartbeat stop
(node2)
# service drbd status
----------------------------------------
drbd driver loaded OK; device status:
version: 8.4.3 (api:1/proto:86-101)
GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by [email protected], 2013-05-27 20:49:06
m:res cs ro ds p mounted fstype
0:r0 Connected Primary/Secondary UpToDate/UpToDate C /data ext4
-----------------------------------------
從機無縫接管,測試客戶端是否能夠使用NFS共享目錄
# cd /tmp
# touch test01
# ls test01
------------------
test01
------------------
測試通過。。。
b.模擬將主節點宕機(直接強行關機),則備節點node2是否接管服務?
(node1)
強制關機,直接關閉node1虛擬機電源
(node2)
# service drbd status
-------------------------------
drbd driver loaded OK; device status:
version: 8.4.3 (api:1/proto:86-101)
GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by [email protected], 2013-05-27 20:49:06
m:res cs ro ds p mounted fstype
0:r0 WFConnection Primary/Unknown UpToDate/DUnknown C /data ext4
-------------------------------
從機無縫接管,測試客戶端是否能夠使用NFS共享目錄
# cd /tmp
# touch test02
# ls test02
------------------
test02
------------------
待node1恢復啟動,查看drbd狀態信息:
# service drbd status
------------------------------
drbd driver loaded OK; device status:
version: 8.4.3 (api:1/proto:86-101)
GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by [email protected], 2013-05-27 20:49:06
m:res cs ro ds p mounted fstype
0:r0 Connected Primary/Secondary UpToDate/UpToDate C /data ext4
-------------------------------
node1已連接上線,處於UpToDate狀態,測試通過。。。
注:這裡node1的heartbeat有幾率在關閉服務時,node2無法接管,所以有一定維護成本,因為本人線上未跑該服務,建議實際使用在上線前多做模擬故障演練,再實際上線。
-------大功告成----------
本文出自 “一路向北” 博客,請務必保留此出處http://showerlee.blog.51cto.com/2047005/1212185