歡迎來到Linux教程網
Linux教程網
Linux教程網
Linux教程網
您现在的位置: Linux教程網 >> UnixLinux >  >> Linux基礎 >> 關於Linux

Centos 6.3下DRBD+HeartBeat+NFS配置筆記

這裡首先感謝酒哥的構建高可用的Linux服務器的這本書,看了這本書上並參考裡面的配置讓自己對DRBD+HeartBeat+NFS思路清晰了許多。

drbd簡單來說就是一個網絡raid-1,一般有2到多個node節點,各個節點創建的磁盤塊會映射到本地drbd塊,而後通過網絡對各個節點drbd磁盤塊進行互相同步更新。

heartbeat的作用就可以增加drbd的可用性,它能在某節點故障後,自動切換drbd塊到備份節點,並自動進行虛IP從新綁定,DRBD塊提權,磁盤掛載以及啟動NFS等腳本操作,這一系列操作因為只在他後端節點間完成,前端用戶訪問的是heartbeat的虛IP,所以對用戶來說無任何感知。

最後吐槽下,yum安裝真心坑爹,以後如果非必須,盡量源碼包安裝。

系統版本: centos6.3 x64(內核2.6.32)

DRBD:     DRBD-8.4.3

HeartBeat:epel更新源(真坑)

NFS:       系統自帶

HeartBeat VIP:    192.168.7.90

node1 DRBD+HeartBeat:     192.168.7.88(drbd1.example.com)

node2 DRBD+HeartBeat:   192.168.7.89 (drbd2.example.com)

(node1)為僅主節點端配置

(node2)為僅從節點端配置

(node1,node2)為主從節點都需配置

一.DRBD配置,傳送門:http://showerlee.blog.51cto.com/2047005/1211963

二.Hearbeat配置;

這裡接著DRBD系統環境及安裝配置:

1.安裝heartbeat(CentOS6.3中默認不帶有Heartbeat包,因此需要從第三方下載)(node1,node2)

# wget ftp://mirror.switch.ch/pool/1/mirror/scientificlinux/6rolling/i386/os/Packages/epel-release-6-5.noarch.rpm

# rpm -ivUh epel-release-6-5.noarch.rpm

# yum --enablerepo=epel install heartbeat -y

2.配置heartbeat

(node1)

# vi /etc/ha.d/ha.cf

---------------

# 日志

logfile         /var/log/ha-log

logfacility     local0

# 心跳監測時間

keepalive       2

# 死亡時間

deadtime        5

# 指定對方IP:

ucast           eth0 192.168.7.89

# 服務器正常後由主服務器接管資源,另一台服務器放棄該資源

auto_failback   off

#定義節點

node            drbd1.example.com drbd2.example.com

---------------

(node2)

# vi /etc/ha.d/ha.cf

---------------

# 日志

logfile         /var/log/ha-log

logfacility     local0

# 心跳監測時間

keepalive       2

# 死亡時間

deadtime        5

# 指定對方IP:

ucast           eth0 192.168.7.88

# 服務器正常後由主服務器接管資源,另一台服務器放棄該資源

auto_failback   off

#定義節點

node            drbd1.example.com drbd2.example.com

---------------

編輯雙機互聯驗證文件:(node1,node2)

# vi /etc/ha.d/authkeys

--------------

auth 1

1 crc

--------------

# chmod 600 /etc/ha.d/authkeys

編輯集群資源文件:(node1,node2)

# vi /etc/ha.d/haresources

--------------

drbd1.example.com IPaddr::192.168.7.90/24/eth0 drbddisk::r0 Filesystem::/dev/drbd0::/data::ext4 killnfsd

--------------

注:該文件內IPaddr,Filesystem等腳本存放路徑在/etc/ha.d/resource.d/下,也可在該目錄下存放服務啟動腳本(例如:mysql,www),將相同腳本名稱添加到/etc/ha.d/haresources內容中,從而跟隨heartbeat啟動而啟動該腳本。

IPaddr::192.168.7.90/24/eth0:用IPaddr腳本配置浮動VIP

drbddisk::r0:用drbddisk腳本實現DRBD主從節點資源組的掛載和卸載

Filesystem::/dev/drbd0::/data::ext4:用Filesystem腳本實現磁盤掛載和卸載

編輯腳本文件killnfsd,用來重啟NFS服務:

注:因為NFS服務切換後,必須重新mount NFS共享出來的目錄,否則會報錯(待驗證)

# vi /etc/ha.d/resource.d/killnfsd

-----------------

killall -9 nfsd; /etc/init.d/nfs restart;exit 0

-----------------

賦予執行權限:

# chmod 755 /etc/ha.d/resource.d/killnfsd

創建DRBD腳本文件drbddisk:(node1,node2)

注:

此處又是一個大坑,如果不明白Heartbeat目錄結構的朋友估計要在這裡被卡到死,因為默認yum安裝Heartbeat,不會在/etc/ha.d/resource.d/創建drbddisk腳本,而且也無法在安裝後從本地其他路徑找到該文件。

此處本人也是因為啟動Heartbeat後無法PING通虛IP,最後通過查看/var/log/ha-log日志,找到一行

ERROR: Cannot locate resource script drbddisk

然後進而到/etc/ha.d/resource.d/路徑下發現竟然沒有drbddisk腳本,最後在google上找到該代碼,創建該腳本,終於測試通過:

# vi /etc/ha.d/resource.d/drbddisk

-----------------------

#!/bin/bash

#

# This script is inteded to be used as resource script by heartbeat

#

# Copright 2003-2008 LINBIT Information Technologies

# Philipp Reisner, Lars Ellenberg

#

###

DEFAULTFILE="/etc/default/drbd"

DRBDADM="/sbin/drbdadm"

if [ -f $DEFAULTFILE ]; then

 . $DEFAULTFILE

fi

if [ "$#" -eq 2 ]; then

 RES="$1"

 CMD="$2"

else

 RES="all"

 CMD="$1"

fi

## EXIT CODES

# since this is a "legacy heartbeat R1 resource agent" script,

# exit codes actually do not matter that much as long as we conform to

#  http://wiki.linux-ha.org/HeartbeatResourceAgent

# but it does not hurt to conform to lsb init-script exit codes,

# where we can.

#  http://refspecs.linux-foundation.org/LSB_3.1.0/

#LSB-Core-generic/LSB-Core-generic/iniscrptact.html

####

drbd_set_role_from_proc_drbd()

{

local out

if ! test -e /proc/drbd; then

ROLE="Unconfigured"

return

fi

dev=$( $DRBDADM sh-dev $RES )

minor=${dev#/dev/drbd}

if [[ $minor = *[!0-9]* ]] ; then

# sh-minor is only supported since drbd 8.3.1

minor=$( $DRBDADM sh-minor $RES )

fi

if [[ -z $minor ]] || [[ $minor = *[!0-9]* ]] ; then

ROLE=Unknown

return

fi

if out=$(sed -ne "/^ *$minor: cs:/ { s/:/ /g; p; q; }" /proc/drbd); then

set -- $out

ROLE=${5%/**}

: ${ROLE:=Unconfigured} # if it does not show up

else

ROLE=Unknown

fi

}

case "$CMD" in

   start)

# try several times, in case heartbeat deadtime

# was smaller than drbd ping time

try=6

while true; do

$DRBDADM primary $RES && break

let "--try" || exit 1 # LSB generic error

sleep 1

done

;;

   stop)

# heartbeat (haresources mode) will retry failed stop

# for a number of times in addition to this internal retry.

try=3

while true; do

$DRBDADM secondary $RES && break

# We used to lie here, and pretend success for anything != 11,

# to avoid the reboot on failed stop recovery for "simple

# config errors" and such. But that is incorrect.

# Don't lie to your cluster manager.

# And don't do config errors...

let --try || exit 1 # LSB generic error

sleep 1

done

;;

   status)

if [ "$RES" = "all" ]; then

   echo "A resource name is required for status inquiries."

   exit 10

fi

ST=$( $DRBDADM role $RES )

ROLE=${ST%/**}

case $ROLE in

Primary|Secondary|Unconfigured)

# expected

;;

*)

# unexpected. whatever...

# If we are unsure about the state of a resource, we need to

# report it as possibly running, so heartbeat can, after failed

# stop, do a recovery by reboot.

# drbdsetup may fail for obscure reasons, e.g. if /var/lock/ is

# suddenly readonly.  So we retry by parsing /proc/drbd.

drbd_set_role_from_proc_drbd

esac

case $ROLE in

Primary)

echo "running (Primary)"

exit 0 # LSB status "service is OK"

;;

Secondary|Unconfigured)

echo "stopped ($ROLE)"

exit 3 # LSB status "service is not running"

;;

*)

# NOTE the "running" in below message.

# this is a "heartbeat" resource script,

# the exit code is _ignored_.

echo "cannot determine status, may be running ($ROLE)"

exit 4 #  LSB status "service status is unknown"

;;

esac

;;

   *)

echo "Usage: drbddisk [resource] {start|stop|status}"

exit 1

;;

esac

exit 0

-----------------------

賦予執行權限:

# chmod 755 /etc/ha.d/resource.d/drbddisk

在兩個節點上啟動HeartBeat服務,先啟動node1:(node1,node2)

# service heartbeat start

# chkconfig heartbeat on

這裡能夠PING通虛IP 192.168.7.90,表示配置成功

三.配置NFS:(node1,node2)

# vi /etc/exports

-----------------

/data        *(rw,no_root_squash)

-----------------

重啟NFS服務:

# service rpcbind restart

# service nfs restart

# chkconfig rpcbind on

# chkconfig nfs off

這裡設置NFS開機不要自動運行,因為/etc/ha.d/resource.d/killnfsd 該腳本內容控制NFS的啟動。

四.最終測試

在另外一台LINUX的客戶端掛載虛IP:192.168.7.90,掛載成功表明NFS+DRBD+HeartBeat大功告成.

# mount -t nfs 192.168.7.90:/data /tmp

# df -h

---------------

......

192.168.7.90:/data   1020M   34M  934M   4% /tmp

---------------

測試DRBD+HeartBeat+NFS可用性:

1.向掛載的/tmp目錄傳送文件,忽然重新啟動主端DRBD服務,查看變化

經本人測試能夠實現斷點續傳

2.正常狀態重啟Primary主機後,觀察主DRBD狀態是否恢復Primary並能正常被客戶端掛載並且之前寫入的文件存在,可以正常再寫入文件。

經本人測試可以正常恢復,且客戶端無需重新掛載NFS共享目錄,之前數據存在,且可直接寫入文件。

3.當Primary主機因為硬件損壞或其他原因需要關機維修,需要將Secondary提升為Primary主機,如何手動操作?

如果設備能夠正常啟動則按照如下操作,無法啟動則強行提升Secondary為Primary,待宕機設備能夠正常啟動,若“腦裂”,再做後續修復工作。

首先先卸載客戶端掛載的NFS主機目錄

# umount /tmp

(node1)

卸載DRBD設備:

# service nfs stop

# umount /data

降權:

# drbdadm secondary r0

查看狀態,已降權

# service drbd status

-----------------

drbd driver loaded OK; device status:

version: 8.4.3 (api:1/proto:86-101)

GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by [email protected], 2013-05-27 20:45:19

m:res  cs         ro                   ds                 p  mounted  fstype

0:r0   Connected  Secondary/Secondary  UpToDate/UpToDate  C

-----------------

(node2)

提權:

# drbdadm primary r0

查看狀態,已提權:

# service drbd status

----------------

drbd driver loaded OK; device status:

version: 8.4.3 (api:1/proto:86-101)

GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by [email protected], 2013-05-27 20:49:06

m:res  cs         ro                 ds                 p  mounted  fstype

0:r0   Connected  Primary/Secondary  UpToDate/UpToDate  C

----------------

這裡還未掛載DRBD目錄,讓Heartbeat幫忙掛載:

注:若重啟過程中發現Heartbeat日志報錯:

ERROR: glib: ucast: error binding socket. Retrying: Permission denied

請檢查selinux是否關閉

# service heartbeat restart

# service drbd status

-----------------------

drbd driver loaded OK; device status:

version: 8.4.3 (api:1/proto:86-101)

GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by [email protected], 2013-05-27 20:49:06

m:res  cs         ro                 ds                 p  mounted  fstype

0:r0   Connected  Primary/Secondary  UpToDate/UpToDate  C  /data    ext4

------------------------

成功讓HeartBeat掛載DRBD目錄

重新在客戶端做NFS掛載測試:

# mount -t nfs 192.168.7.90:/data /tmp

# ll /tmp

------------------

1  10  2  2222  3  4  5  6  7  8  9  lost+found  orbit-root

------------------

重啟剛剛被提權的主機,待重啟查看狀態:

# service drbd status

------------------------

drbd driver loaded OK; device status:

version: 8.4.3 (api:1/proto:86-101)

GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by [email protected], 2013-05-27 20:49:06

m:res  cs            ro               ds                 p  mounted  fstype

0:r0   WFConnection  Primary/Unknown  UpToDate/DUnknown  C  /data    ext4

------------------------

HeartBeat成功掛載DRBD目錄,drbd無縫連接到備份節點,客戶端使用NFS掛載點對故障無任何感知。

 

4.測試最後剛才那台宕機重新恢復正常後,他是否會從新奪取Primary資源?

重啟後不會重新獲取資源,需手動切換主從權限方可。

注:vi /etc/ha.d/ha.cf配置文件內該參數:

--------------------

auto_failback   off

--------------------

表示服務器正常後由新的主服務器接管資源,另一台舊服務器放棄該資源

5.以上都未利用heartbeat實現故障自動轉移,當線上DRBD主節點宕機,備份節點是否立即無縫接管,heartbeat+drbd高可用性是否能夠實現?

首先先在客戶端掛載NFS共享目錄

# mount -t nfs 192.168.7.90:/data /tmp

a.模擬將主節點node1 的heartbeat服務停止,則備節點node2是否接管服務?

(node1)

# service drbd status

----------------------------

drbd driver loaded OK; device status:

version: 8.4.3 (api:1/proto:86-101)

GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by [email protected], 2013-05-27 20:45:19

m:res  cs         ro                 ds                 p  mounted  fstype

0:r0   Connected  Primary/Secondary  UpToDate/UpToDate  C  /data    ext4

----------------------------

# service heartbeat stop

(node2)

# service drbd status

----------------------------------------

drbd driver loaded OK; device status:

version: 8.4.3 (api:1/proto:86-101)

GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by [email protected], 2013-05-27 20:49:06

m:res  cs         ro                 ds                 p  mounted  fstype

0:r0   Connected  Primary/Secondary  UpToDate/UpToDate  C  /data    ext4

-----------------------------------------

從機無縫接管,測試客戶端是否能夠使用NFS共享目錄

# cd /tmp

# touch test01

# ls test01

------------------

test01

------------------

測試通過。。。

b.模擬將主節點宕機(直接強行關機),則備節點node2是否接管服務?

(node1)

強制關機,直接關閉node1虛擬機電源

(node2)

# service drbd status

-------------------------------

drbd driver loaded OK; device status:

version: 8.4.3 (api:1/proto:86-101)

GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by [email protected], 2013-05-27 20:49:06

m:res  cs            ro               ds                 p  mounted  fstype

0:r0   WFConnection  Primary/Unknown  UpToDate/DUnknown  C  /data    ext4

-------------------------------

從機無縫接管,測試客戶端是否能夠使用NFS共享目錄

# cd /tmp

# touch test02

# ls test02

------------------

test02

------------------

待node1恢復啟動,查看drbd狀態信息:

# service drbd status

------------------------------

drbd driver loaded OK; device status:

version: 8.4.3 (api:1/proto:86-101)

GIT-hash: 89a294209144b68adb3ee85a73221f964d3ee515 build by [email protected], 2013-05-27 20:49:06

m:res  cs         ro                 ds                 p  mounted  fstype

0:r0   Connected  Primary/Secondary  UpToDate/UpToDate  C  /data    ext4

-------------------------------

node1已連接上線,處於UpToDate狀態,測試通過。。。

注:這裡node1的heartbeat有幾率在關閉服務時,node2無法接管,所以有一定維護成本,因為本人線上未跑該服務,建議實際使用在上線前多做模擬故障演練,再實際上線。

-------大功告成----------

本文出自 “一路向北” 博客,請務必保留此出處http://showerlee.blog.51cto.com/2047005/1212185

Copyright © Linux教程網 All Rights Reserved