Setting DRBD in Primary / Primary — common commands to sync resync and make changes

Posted on March 12, 2014

Setting DRBD in Primary / Primary — common commands to sync resync and make changes

As we have been setting up our farm with an NFS share the DRBD primary / primary connection between servers is important.

We are setting up a group of /customcommands/ that we will be able to run to help us keep track of all of the common status and maintenance commands we use, but when we have to create, make changes to the structure, sync and resync, recover, grow or move the servers, We need to document our ‘Best Practices’ and how we can recover.

From base Server install

apt-get install gcc make flex
wget http://oss.linbit.com/drbd/8.4/drbd-8.4.1.tar.gz
tar xvfz drbd-8.4.1.tar.gz
cd drbd-8.4.1/
./configure --prefix=/usr --localstatedir=/var --sysconfdir=/etc --with-km
make KDIR=/lib/modules/3.2.0-58-virtual/build
make install

Setup in/etc/drbd.d/disk.res

resource r0 { protocol C; syncer { rate 1000M; } startup { wfc-timeout 15; degr-wfc-timeout 60; become-primary-on both; } net { #requires a clustered filesystem ocfs2 for 2 prmaries, mounted simultaneously allow-two-primaries; after-sb-0pri discard-zero-changes; after-sb-1pri discard-secondary; after-sb-2pri disconnect; cram-hmac-alg sha1; shared-secret "sharedsanconfigsecret"; } on server1{ device /dev/drbd0; disk /dev/xvdb; address 192.168.100.10:7788; meta-disk internal; }
on riofarm-base-san2 { device /dev/drbd0; disk /dev/xvdb; address 192.168.100.11:7788; meta-disk internal; } }

Setup your /etc/hosts

192.168.100.10 server1 192.168.100.11 server2

Setup /etc/hostname with

server1

reboot, verify your settings and SAVE A DRBDVMTEMPLATE clone your VM to a new server called server2

Setup /etc/hostname with

server2

start drbd with /etc/init.d/drbd this will likely try and create the connection, but this is where we are going to ‘play’ to learn the commands and how we can sync, etc.

cat /proc/drbd   #shows the status of the connections
server1> drbdadm down r0   #turns of the drbdresource and connection
server2> drbdadm down r0   #turns of the drbd resource and connection
server1> drbdadm -- --force create-md r0  #creates a new set of meta data on the drive,  which 'erases drbds memory of the sync status in the past
server2> drbdadm -- --force create-md r0  #creates a new set of meta data on the drive,  which 'erases drbds memory of the sync status in the past
server1> drbdadm up r0   #turns on the drbdresource and connection and they shoudl connect without a problem,   with no memory of a past sync history
server2> drbdadm up r0    #turns on the drbdresource and connection and they shoudl connect without a problem,   with no memory of a past sync history
server1> drbdadm -- --clear-bitmap new-current-uuid r0  # this create a new 'disk sync image' essentially telling drbd that the servers are blank so no sync needs to be done  both servers are immediately UpToDate/UptoDate in /proc/drbd
server1> drbdadm primary r0
server2> drbdadm primary r0    #make both servers primary and  now when you put an a filesystem on /dev/drbd0 you will be able to read and write on both systems as though they are local

So, lets do some failure scenarios, Say, we loose a server, it doesn’t matter which one since they are both primaries, in this case though we will say server2 failed. Create a new VM from DRBDVMTEMPLATE which already had drbd made on it with the configuration or create another one using the instructions above.

Open /etc/hostname and set it to

server2

reboot. Make sure /etc/init.d/drbd start is running

server1>watch cat /proc/drbd  #watch the status of dtbd,  it is very useful and telling about what is happening,   you will want DRBD to be Connected Primary/Unknown  UpToDate/DUnknown
server2>drbdadm down
server2>dbadm wipe-md r0  #this is an optional step that is used to wipe out the meta data,  I have not seen that it does anything different than creating the metadata using the command below,  but it is useful to know the command in case you want to get rid of md  on your disk
server2>drbdadm -- --force create-md r0  ##this makes sure that their is no partial resync data left over from where you cloned it from
server2>drbdadm up r0 # this brings drbd server2 back into the resource and connects them,  it will immediately sart syncing you should see SyncSource Primary/Secondary UpToDate/Inconsistent on server1,     for me it was soing to to 22 hours for my test of a 1TM  (10 MB / second)

Lets get funky, what happens if you stop everything in the middle of a sync

server1>drbdadm down r0 #we shut down the drdb resource that has the most up to date information,   on server2 /proc/drbd shows Secondary/Unknown  Inconsitent/DUnknown ,  server2 does not know about server1 any more,  but server2 still knows that server2 is inconsitent,   (insertable step here could be on server2: drbdadm down ro; drbdadm up ro,  with no change to the effect)
server1>drbdadm up ro #  this brings server1 back on line and /proc/drbd on server1 shows  SyncSource,  server2 shows SyncTarget,   server1 came backup as the UpToDate server,  server2 was Inconsistent,  it figured it out

Where things started to go wrong and become less ‘syncable’ was when servers were both down and had to be brought back up again separately with a new uuid was created on them separately. so lets simulate that the drbd config fell apart, and we have to put it together again.

server2>drbdadm disconnect ro;  drdbadm -- --force create-md r0 ; drbd connect ro;   #start the sync process over