Using MPT-Status for RAID Monitoring in a Poweredge C6100 with Perc 6

Using MPT-Status for RAID Monitoring in a Poweredge C6100 with Perc 6

This post outlines the steps needed to get a CLI report of the conditions of your RAIDs in a Poweredge C6100 with a PERC 6/i RAID Controller.

Verify your controller type:

cat /proc/scsi/mptsas/0

ioc0: LSISAS1068E B3, FwRev=011b0000h, Ports=1, MaxQ=277

Download the following packages:

daemonize-1.5.6-1.el5.i386.rpm mpt-status-1.2.0-3.el5.centos.i386.rpm lsscsi-0.17-3.el5.i386.rpm

http://dl.nux.ro/utils/mpt-status/mpt-status-1.2.0-3.el5.centos.i386.rpm

http://dl.nux.ro/utils/mpt-status/daemonize-1.5.6-1.el5.i386.rpm

http://mirror.centos.org/centos/5/os/i386/CentOS/lsscsi-0.17-3.el5.i386.rpm

Install mtp-status:

rpm -ivh mpt-status-1.2.0-3.el5.centos.i386.rpm daemonize-1.5.6-1.el5.i386.rpm lsscsi-0.17-3.el5.i386.rpm

modprobe mptctl

echo mptctl >> /etc/modules

Verify your modules:

lsmod |grep mpt

mptctl 90739 0

mptsas 57560 4

mptscsih 39876 1 mptsas

mptbase 91081 3 mptctl,mptsas,mptscsih

scsi_transport_sas 27681 1 mptsas

scsi_mod 145658 7 mptctl,sg,libata,mptsas,mptscsih,scsi_transport_sas,sd_mod

run:

mpt-status or mpt-status -n -s

Also, you can use: lsscsi -l

This little script:

echo `mpt-status -n -s|awk ‘/OPTIMAL/ {print $1, “OK”}; /ONLINE/ {print $1, “OK”}; /DEGRADED/ {print $1, “FAILURE”}; /scsi/ {print $2}; /MISSING/ {print $1, “FAILURE”} ‘`

reports:

vol_id:0 OK phys_id:1 OK phys_id:0 OK 100% 100%

On a rebuild, it reports:

vol_id:0 FAILURE phys_id:2 OK phys_id:3 OK 75% 75%

Copy that script into a file called “check_raid”, and make it executable, E.G. 755

Edit nagios-statd on parcel1. Replace “sudo /customcommands/check_raid.pl -b -w1 -c1” with filename check-raid (without the switches) at line 20, and remove “sudo”

So, from this:

commandlist[‘Linux’] = (“df -P”,”who -q | grep “#””,”ps ax”,”uptime”,”free | awk ‘$1~/^Swap:/{print ($3/$2)*100}'”,”sudo /customcommands/check_raid.pl -b -w1 -c1″)

To this:

commandlist[‘Linux’] = (“df -P”,”who -q | grep “#””,”ps ax”,”uptime”,”free | awk ‘$1~/^Swap:/{print ($3/$2)*100}'”,”/customcommands/check_raid”)

Port 1040 will need to be opened in XenServer. Edit /etc/sysconfig/iptables and insert this line:

-A RH-Firewall-1-INPUT -p tcp -m tcp –dport 1040 -j ACCEPT

Restart the firewall:

service iptables restart

Output:

Flushing firewall rules: [ OK ]

Setting chains to policy ACCEPT: filter [ OK ]

Unloading iptables modules: [ OK ]

Applying iptables firewall rules: [ OK ]

Loading additional iptables modules: ip_conntrack_netbios_n[FAILED]

NOTE: The “FAILED” error above doesn’t seem to be a problemVerify that port 1040 is open:

Check the status of port 1040:

service iptables status

Output:

Table: filter

Chain INPUT (policy ACCEPT)

num target prot opt source destination

1 ACCEPT 47 — 0.0.0.0/0 0.0.0.0/0

2 RH-Firewall-1-INPUT all — 0.0.0.0/0 0.0.0.0/0

Chain FORWARD (policy ACCEPT)

num target prot opt source destination

1 RH-Firewall-1-INPUT all — 0.0.0.0/0 0.0.0.0/0

Chain OUTPUT (policy ACCEPT)

num target prot opt source destination

Chain RH-Firewall-1-INPUT (2 references)

num target prot opt source destination

1 ACCEPT all — 0.0.0.0/0 0.0.0.0/0

2 ACCEPT icmp — 0.0.0.0/0 0.0.0.0/0 icmp type 255

3 ACCEPT esp — 0.0.0.0/0 0.0.0.0/0

4 ACCEPT ah — 0.0.0.0/0 0.0.0.0/0

5 ACCEPT udp — 0.0.0.0/0 224.0.0.251 udp dpt:5353

6 ACCEPT udp — 0.0.0.0/0 0.0.0.0/0 udp dpt:631

7 ACCEPT tcp — 0.0.0.0/0 0.0.0.0/0 tcp dpt:631

8 ACCEPT tcp — 0.0.0.0/0 0.0.0.0/0 tcp dpt:1040

9 ACCEPT all — 0.0.0.0/0 0.0.0.0/0 state RELATED,ESTABLISHED

10 ACCEPT udp — 0.0.0.0/0 0.0.0.0/0 state NEW udp dpt:694

11 ACCEPT tcp — 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:22

12 ACCEPT tcp — 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:80

13 ACCEPT tcp — 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:443

14 REJECT all — 0.0.0.0/0 0.0.0.0/0 reject-with icmp-host-prohibited

running “nagios-statd” opens port 1040 on Parcel1 and listens for commands to be initiated by nagios_stat on the nagios server.

On the nagios server, in a file called “remote.orig.cfg, there are commands defined using “nagios-stat”: NOTE: These are from a working server and haven’t been modified to work with mpt. Some changes may need to be made. This is just an example of the interaction between Nagios server and client

Example:

define command{

command_name check_remote_raid

command_line $USER1$/nagios-stat -w $ARG1$ -c $ARG2$ -p $ARG3$ raid $HOSTADDRESS$

}

This command defined above is used in the “services.cfg” file.

Example:

define service{

use matraex-template

host_name mtx-lilac

service_description Lilac /data Raid

check_command check_remote_raid!1!1!1040

The three files needed on the C6100 node are:

/customcommands/check_raid (contents below) -rwxr-xr-x

/customcommands/nagios-statd (contents below) -rwxr-xr-x

/etc/init.d/nagios-statd (contens below) -rwxr–r–

Creating the soft links:

ln -s /etc/init.d/nagios-statd /etc/rc.d/rc3.d/K01nagios-statd

ln -s /etc/init.d/nagios-statd /etc/rc.d/rc3.d/S99nagios-statd

The -s = soft, and -f if used, forces overwrite.

/rc3.d/ designates runlevel 3

So when you do this:

ls -lt /customcommands/nagios-statd /etc/init.d/nagios-statd /customcommands/check_raid /etc/rc.d/rc3.d/*nagios-statd

This is what you should see:

lrwxrwxrwx 1 root root 22 Mar 6 08:08 /etc/rc.d/rc3.d/K01nagios-statd -> ../init.d/nagios-statd

-rwxr-xr-x 1 root root 365 Mar 6 07:59 /customcommands/check_raid

lrwxrwxrwx 1 root root 22 Mar 6 07:52 /etc/rc.d/rc3.d/S99nagios-statd -> ../init.d/nagios-statd

-rwxr-xr-x 1 root root 649 Mar 6 07:51 /etc/init.d/nagios-statd

-rwxr-xr-x 1 root root 9468 Mar 5 12:05 /customcommands/nagios-statd

Script Files:

NOTE: Here’s a little fix that helped me out. I had originally pasted these scripts into a DOS/Windows editor (wordpad) and it added DOS-type returns to the file, resulting in an error:

-bash: ./nagios-statd: /bin/sh^M: bad interpreter: No such file or directory

If you encounter this, do this:

Open the file in vi

hit “:” to go into command mode

enter “set fileformat=unix”

then :wq to quit.

/customcommands/check_raid:

#!/bin/bash

EXECFILE=/usr/sbin/mpt-status

if [ ! -e $EXECFILE ] ; then

echo

echo “Error $EXECFILE is not installed, please install before running”

echo

echo “Usage $0”;

echo

exit 10

fi

echo `$EXECFILE -n -s|awk ‘/OPTIMAL/ {print $1, “OK”}; /ONLINE/ {print $1, “OK”}; /DEGRADED/ {print $1, “FAILURE”}; /scsi/ {print $2};

/MISSING/ {print $1, “FAILURE”} ‘`

/customcommands/nagios_statd

#!/usr/bin/python

import getopt, os, sys, signal, socket, SocketServer

class Functions:

“Contains a set of methods for gathering data from the server.”

def __init__(self):

self.nagios_statd_version = 3.09

# As of right now, the commands are for df, who, proc, uptime, and swap.

commandlist = {}

commandlist[‘AIX’] = (“df -Ik”,”who | wc -l”,”ps ax”,”uptime”,”lsps -sl | grep -v Paging | awk ‘{print $2}’ | cut -f1 -d%”)

commandlist[‘BSD/OS’] = (“df”,”who | wc -l”,”ps -ax”,”uptime”,None)

commandlist[‘CYGWIN_NT-5.0’] = (“df -P”,None,”ps -s -W | awk ‘{printf(“%6s%6s%3s%6s%sn”,$1,$2,” S”,” 0:00″,substr($0,22))}'”,None,None)

commandlist[‘CYGWIN_NT-5.1’] = commandlist[‘CYGWIN_NT-5.0’]

commandlist[‘FreeBSD’] = (“df -k”,”who | wc -l”,”ps ax”,”uptime”,”swapinfo | awk ‘$1!~/^Device/{print $5}'”)

commandlist[‘HP-UX’] = (“bdf -l”,”who -q | grep “#””,”ps -el”,”uptime”,None)

commandlist[‘IRIX’] = (“df -kP”,”who -q | grep “#””,”ps -e -o “pid tty state time comm””,”/usr/bsd/uptime”,None)

commandlist[‘IRIX64’] = commandlist[‘IRIX’]

commandlist[‘Linux’] = (“df -P”,”who -q | grep “#””,”ps ax”,”uptime”,”free | awk ‘$1~/^Swap:/{print ($3/$2)*100}'”,”/customcommands/check_raid”)

commandlist[‘NetBSD’] = (“df -k”,”who | wc -l”,”ps ax”,”uptime”,”swapctl -l | awk ‘$1!~/^Device/{print $5}'”)

commandlist[‘NEXTSTEP’] = (“df”,”who | /usr/ucb/wc -l”,”ps -ax”,”uptime”,None)

commandlist[‘OpenBSD’] = (“df -k”,”who | wc -l”,”ps -ax”,”uptime”,”swapctl -l | awk ‘$1!~/^Device/{print $5}'”)

commandlist[‘OSF1’] = (“df -P”,”who -q | grep “#””,”ps ax”,”uptime”,None)

commandlist[‘SCO-SV’] = (“df -Bk”,”who -q | grep “#””,”ps -el -o “pid tty s time args””,”uptime”,None)

commandlist[‘SunOS’] = (“df -k”,”who -q | grep “#””,”ps -e -o “pid tty s time comm””,”uptime”,”swap -s | tr -d -s -c [:digit:][:space:] | nawk ‘{print ($3/($3+$4))*100}'”)

commandlist[‘UNIXWARE2’] = (“/usr/ucb/df”,”who -q | grep “#””,”ps -el | awk ‘{printf(“%6d%9s%2s%5s %sn”,$5,substr($0, 61, 8),$2,substr($0,69,5),substr($0,75))}”,”echo `uptime`, load average: 0.00, `sar | awk ‘{oldidle=idle;idle=$5} END {print 100-oldidle}’`,0.00″,None)

# Now to make commandlist with the correct one for your OS.

try:

self.commandlist = commandlist[os.uname()[0]]

except KeyError:

print “Your platform isn’t supported by nagios-statd – exiting.”

sys.exit(3)

# Below are the functions that the client can call.

def disk(self):

return self.__run(0)

def proc(self):

return self.__run(2)

def swap(self):

return self.__run(4)

def uptime(self):

return self.__run(3)

def user(self):

return self.__run(1)

def raid(self):

return self.__run(5)

def version(self):

i = “nagios-statd ” + str(self.nagios_statd_version)

return i

def __run(self,cmdnum):

# Unmask SIGCHLD so popen can detect the return status (temporarily)

signal.signal(signal.SIGCHLD, signal.SIG_DFL)

outputfh = os.popen(self.commandlist[cmdnum])

output = outputfh.read()

returnvalue = outputfh.close()

signal.signal(signal.SIGCHLD, signal.SIG_IGN)

if (returnvalue):

return “ERROR %s ” % output

else:

return output

class NagiosStatd(SocketServer.StreamRequestHandler):

“Handles connection initialization and data transfer (as daemon)”

def handle(self):

# Check to see if user is allowed

if self.__notallowedhost():

self.wfile.write(self.error)

return 1

if not hasattr(self,”generichandler”):

self.generichandler = GenericHandler(self.rfile,self.wfile)

self.generichandler.run()

def __notallowedhost(self):

“Compares list of allowed users to client’s IP address.”

if hasattr(self.server,”allowedhosts”) == 0:

return 0

for i in self.server.allowedhosts:

if i == self.client_address[0]: # Address is in list

return 0

try: # Do an IP lookup of host in blocked list

i_ip = socket.gethostbyname(i)

except:

self.error = “ERROR DNS lookup of blocked host “%s” failed. Denying by default.” % i

return 1

if i_ip != i: # If address in list isn’t an IP

if socket.getfqdn(i) == socket.getfqdn(self.client_address[0]):

return 0

self.error = “ERROR Client is not among hosts allowed to connect.”

return 1

class GenericHandler:

def __init__(self,rfile=sys.stdin,wfile=sys.stdout):

# Create functions object

self.functions = Functions()

self.rfile = rfile

self.wfile = wfile

def run(self):

# Get the request from the client

line = self.rfile.readline()

line = line.strip()

# Check for appropriate requests from client

if len(line) == 0:

self.wfile.write(“ERROR No function requested from client.”)

return 1

# Call the appropriate function

try:

output = getattr(self.functions,line)()

except AttributeError:

error = “ERROR Function “” + line + “” does not exist.”

self.wfile.write(error)

return 1

except TypeError:

error = “ERROR Function “” + line + “” not supported on this platform.”

self.wfile.write(error)

return 1

# Send output

if output.isspace():

error = “ERROR Function “” + line + “” returned no information.”

self.wfile.write(error)

return 1

elif output == “ERROR”:

error = “ERROR Function “” + line + “” exited abnormally.”

self.wfile.write(error)

else:

for line in output:

self.wfile.write(line)

class ReUsingServer (SocketServer.ForkingTCPServer):

allow_reuse_address = True

class Initialization:

“Methods for interacting with user – initial code entry point.”

def __init__(self):

self.port = 1040

self.ip = ”

# Run this through Functions initially, to make sure the platform is supported.

i = Functions()

del(i)

def getoptions(self):

“Parses command line”

try:

opts, args = getopt.getopt(sys.argv[1:], “a:b:ip:P:Vh”, [“allowedhosts=”,”bindto=”,”inetd”,”port=”,”pid=”,”version”,”help”])

except getopt.GetoptError, (msg, opt):

print sys.argv[0] + “: ” + msg

print “Try ‘” + sys.argv[0] + ” –help’ for more information.”

sys.exit(3)

for option,value in opts:

if option in (“-a”,”–allowedhosts”):

value = value.replace(” “,””)

self.allowedhosts = value.split(“,”)

elif option in (“-b”,”–bindto”):

self.ip = value

elif option in (“-i”,”–inetd”):

self.runfrominetd = 1

elif option in (“-p”,”–port”):

self.port = int(value)

elif option in (“-P”,”–pid”):

self.pidfile = value

elif option in (“-V”,”–version”):

self.version()

sys.exit(3)

elif option in (“-h”,”–help”):

self.usage()

def main(self):

# Retrieve command line options

self.getoptions()

# Just splat to stdout if we’re running under inetd

if hasattr(self,”runfrominetd”):

server = GenericHandler()

server.run()

sys.exit(0)

# Check to see if the port is available

try:

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)

s.bind((self.ip, self.port))

s.close()

del(s)

except socket.error, (errno, msg):

print “Unable to bind to port %s: %s – exiting.” % (self.port, msg)

sys.exit(2)

# Detach from terminal

if os.fork() == 0:

# Make this the controlling process

os.setsid()

# Be polite and chdir to /

os.chdir(‘/’)

# Try to close all open filehandles

for i in range(0,256):

try: os.close(i)

except: pass

# Redirect the offending filehandles

sys.stdin = open(‘/dev/null’,’r’)

sys.stdout = open(‘/dev/null’,’w’)

sys.stderr = open(‘/dev/null’,’w’)

# Set the path

os.environ[“PATH”] = “/bin:/usr/bin:/usr/local/bin:/usr/sbin”

# Reap children automatically

signal.signal(signal.SIGCHLD, signal.SIG_IGN)

# Save pid if user requested it

if hasattr(self,”pidfile”):

self.savepid(self.pidfile)

# Create a forking TCP/IP server and start processing

server = ReUsingServer((self.ip,self.port),NagiosStatd)

if hasattr(self,”allowedhosts”):

server.allowedhosts = self.allowedhosts

server.serve_forever()

# Get rid of the parent

else:

sys.exit(0)

def savepid(self,file):

try:

fh = open(file,”w”)

fh.write(str(os.getpid()))

fh.close()

except:

print “Unable to save PID file – exiting.”

sys.exit(2)

def usage(self):

print “Usage: ” + sys.argv[0] + ” [OPTION]”

print “nagios-statd daemon – remote UNIX system monitoring tool for Nagios.n”

print “-a, –allowedhosts=HOSTS Comma delimited list of IPs/hosts allowed to connect.”

print “-b, –bindto=IP IP address for the daemon to bind to.”

print “-i, –inetd Run from inetd.”

print “-p, –port=PORT Port to listen on.”

print “-P, –pid=FILE Save pid to FILE.”

print “-V, –version Output version information and exit.”

print ” -h, –help Print this help and exit.”

sys.exit(3)

def version(self):

i = Functions()

print “nagios-statd %.2f” % i.nagios_statd_version

print “os.uname()[0] = %s ” % os.uname()[0]

print “Written by Nick Reinkingn”

print “Copyright (C) 2002 Nick Reinking”

print “This is free software. There is NO warranty; not even for MERCHANTABILITY or”

print “FITNESS FOR A PARTICULAR PURPOSE.”

print “nNagios is a trademark of Ethan Galstad.”

if __name__ == “__main__”:

# Check to see if running Python 2.x+ / needed because getfqdn() is Python 2.0+ only

if (int(sys.version[0]) < 2):

print “nagios-statd requires Python version 2.0 or greater.”

sys.exit(3)

i = Initialization()

i.main()

/etc/init.d/nagios-statd:

#!/bin/sh

#

# This file should have uid root, gid sys and chmod 744

#

if [ ! -d /usr/bin ]

then # /usr not mounted

exit

fi

killproc() { # kill the named process(es)

pid=`/bin/ps -e |

/bin/grep -w $1 |

/bin/sed -e ‘s/^ *//’ -e ‘s/ .*//’`

[ “$pid” != “” ] && kill $pid

}

# Start/stop processes required for netsaint_statd server

case “$1” in

‘start’)

/customcommands/nagios-statd -a <IP of Allowed Nagios Server>,<IP of Test Workstation> -p 1040

;;

‘stop’)

killproc nagios-statd

;;

*)

echo “Usage: /etc/init.d/nagios-statd { start | stop }”

;;

esac

 

Testing:

As you can see in the script file above, I’ve added the IP Address of a test workstation. This will allow me to simply telnet to a node in the C6100 and execute one of the commands defined in this section of the /customcommands/nagios-statd script:

# Below are the functions that the client can call.

def disk(self):

return self.__run(0)

def proc(self):

return self.__run(2)

def swap(self):

return self.__run(4)

def uptime(self):

return self.__run(3)

def user(self):

return self.__run(1)

def raid(self):

return self.__run(5)

At your workstation, telnet to <Node IP Address> 1040

When connected, the screen will be blank.

Type “raid”. The screen won’t echo this.

When you hat enter, you should see:

vol_id:0 OK phys_id:2 OK phys_id:3 OK 100% 100%

Now you’re ready to move on to the Nagios configuration.

Matt Long

03/06/2015

Adding and Removing Local Storage From XenServer

Adding and Removing Local Storage From XenServer

To add local storage XenServer 6.x

get your device id’s with:

ll /dev/disk/by-id

The host uuid can be copied and pasted from the general tab of your host in XenCenter.

Create your storage:

xe sr-create content-type=user device-config:device=/dev/sdb host-uuid=<Place the host’s UUID here> name-label=”<Name your local storage here>” shared=false type=lvm

NOTE: Make sure that “shared=” is false. If you have shared storage on a hypervisor, you won’t be able to add it to a pool. When a hypervisor is added to a pool, its local storage is automatically shared in that pool.

NOTE: Replace sdb in the above command with the device that you’re adding.

To Remove local storage XenServer 6.x

Go to console in XenCenter or log in to your xenserver host via ssh

List your storage repositories.

xe sr-list

You will see something like this:

uuid ( RO) : <The uuid number you want is here>
name-label ( RW): Local storage
name-description ( RW):
host ( RO): host.example.com
type ( RO): lvm
content-type ( RO): user

uuid string is the Storage Repository uuid (SR-uuid) that you need to be able to do the next step.

Get the Physical Block Device UUID.

xe pbd-list sr-uuid=Your-UUID

uuid ( RO) This is the PBD-uuid

Unplug the local storage.

xe pbd-unplug uuid=Your-PBD-uuid

Delete the PBD:

xe pbd-destroy uuid=your-PBD-uuid

Forget ( remove ) the Local storage from showing up as detached.

xe sr-forget uuid=your-SR-uuid

Now check your XenCenter that it’s removed.

Disk write speed testing different XenServer configurations – single disk vs mdadm vs hardware raid

Disk write speed testing different XenServer configurations – single disk vs mdadm vs hardware raid

In our virtual environment on of the VM Host servers has a hardware raid controller on it .  so natuarally we used the hardware raid.

The server is a on a Dell 6100 which uses a low featured LSI SAS RAID controller.
One of the ‘low’ features was that it only allows two RAID volumes at a time.  Also it does not do RAID 10

So I decided to create a RAID 1 with two SSD drives for the host,  and we would also put the root operating systems for each of the Guest VMs there.   It would be fast and redundant.   Then we have upto 4 1TB disks for the larger data sets.  We have multiple identically configured VM Hosts in our Pool.

For the data drives,  with only 1 more RAID volume I could create without a RAID 10,  I was limited to either a RAID V,   a mirror with 2 spares,   a JBOD.  In order to get the most space out of the 4 1TB drives,   I created the RAIDV.   After configuring two identical VM hosts like this,  putting a DRBD Primary / Primary connection between the two of them and then OCFS2 filesystem on top of it.  I found I got as low as 3MB write speed.   I wasnt originally thinking about what speeds I would get,  I just kind of expected that the speeds would be somewhere around disk write speed and so I suppose I was expecting to get acceptable speeds beetween 30 and 80 MB/s.   When I didn’t,  I realized I was going to have to do some simple benchmarking on my 4 1TB drives to see what configuration will work best for me to get the best speed and size configuration out of them.

A couple of environment items

  • I will mount the final drive on /data
  • I mount temporary drives in /mnt when testing
  • We use XenServer for our virtual environment,  I will refer to the host as the VM Host or dom0 and to a guest VM as VM Guest or domU.
  • The final speed that we are looking to get is on domU,  since that is where our application will be,  however I will be doing tests in both dom0 and domU environments.
  • It is possible that the domU may be the only VM Guest,  so we will also test raw disk access from domU for the data (and skip the abstraction level provided by the dom0)

So,  as I test the different environments I need to be able to createw and destroy the local storage on the dom0 VM Host.  Here are some commands that help me to do it.
I already went through xencenter and removed all connections and virtual disk on the storage I want to remove,  I had to click on the device “Local Storage 2” under the host and click the storage tab and make sure each was deleted. {VM Host SR Delete Process}

xe sr-list host=server1 #find and keep the uuid of the sr in my case "c2457be3-be34-f2c1-deac-7d63dcc8a55a"
xe pbd-list   sr-uuid=c2457be3-be34-f2c1-deac-7d63dcc8a55a # find and keep the uuid of the pbd connectig sr to dom0 "b8af1711-12d6-5c92-5ab2-c201d25612a9"
xe pbd-unplug  uuid=b8af1711-12d6-5c92-5ab2-c201d25612a9 #unplug the device from the sr
xe pbd-destroy uuid=b8af1711-12d6-5c92-5ab2-c201d25612a9 #destroy the devices
xe sr-forget uuid=c2457be3-be34-f2c1-deac-7d63dcc8a55a #destroy the sr

Now that the sr is destroyed,  I can work on the raw disks on the dom0 and do some benchmarking on the speeds of differnt soft configurations from their.
Once I have made  a change,  to the structure of the disks,  I can recreate the sr with a new name on top of whatever solution I come up with by :

xe sr-create content-type=user device-config:device=/dev/XXX host-uuid=`grep -B1 -f /etc/hostname <(xe host-list)|head -n1|awk ‘{print $NF}’` name-label=”Local storage XXX on `cat /etc/hostname`” shared=false type=lvm

Replace the red XXX with what works for you

Most of the tests were me just running dd commands and writing the slowest time,  and then what seemed to be about the average time in MB/s.   It seemed like,  the first time a write was done it was a bit slower but each subsequent time it was faster and I am not sure if that means when a disk is idle,  it takes a bit longer to speed up and write?  if that is the case then there are two scenarios,   if the disk is often idle,  the it will use the slower number,  but if the disk is busy,  it will use the higher average number,  so I tracked them both.  The idle disk issue was not scientific and many of my tests did not wait long enough for the disk to go idle inbetween tests.

The commands I ran for testing were dd commands

dd if=/dev/zero of=data/speetest.`date +%s` bs=1k count=1000 conv=fdatasync  #for 1 mb
dd if=/dev/zero of=data/speetest.`date +%s` bs=1k count=10000 conv=fdatasync  #for 10 mb
dd if=/dev/zero of=data/speetest.`date +%s` bs=1k count=100000 conv=fdatasync  #for 100 mb
dd if=/dev/zero of=data/speetest.`date +%s` bs=1k count=1000000 conv=fdatasync  #for 1000 mb

I wont get into the details of every single command I ran as I was creating the different disk configurations and environments but I will document a couple of them

Soft RAID 10 on dom0

dom0>mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/sda1 /dev/sdb2 --assume-clean
dom0>mdadm --create /dev/md1 --level=1 --raid-devices=2 /dev/sdc1 /dev/sdd2 --assume-clean
dom0>mdadm --create /dev/md10 --level=0 --raid-devices=2 /dev/md0 /dev/md1 --assume-clean
dom0>mkfs.ext3 /dev/md10
dom0>xe sr-create content-type=user device-config:device=/dev/md10 host-uuid=`grep -B1 -f /etc/hostname <(xe host-list)|head -n1|awk ‘{print $NF}’` name-label=”Local storage md10 on `cat /etc/hostname`” shared=false type=lvm

Dual Dom0 Mirror – Striped on DomU for an “Extended RAID 10”

dom0> {VM Host SR Delete Process} #to clean out 'Local storage md10'
dom0>mdadm --manage /dev/md2 --stop
dom0>mkfs.ext3 /dev/md0 && mkfs.ext3 /dev/md1
dom0>xe sr-create content-type=user device-config:device=/dev/md0 host-uuid=`grep -B1 -f /etc/hostname <(xe host-list)|head -n1|awk ‘{print $NF}’` name-label=”Local storage md0 on `cat /etc/hostname`” shared=false type=lvm
dom0>xe sr-create content-type=user device-config:device=/dev/md1 host-uuid=`grep -B1 -f /etc/hostname <(xe host-list)|head -n1|awk ‘{print $NF}’` name-label=”Local storage md1 on `cat /etc/hostname`” shared=false type=lvm
domU>
#at this  point use Xen Center to add and attach disks from each of the local md0 and md1 disks to the domU (they were attached on my systems as xvdb and xvdc
domU> mdadm --create /dev/md10 --level=0 --raid-devices=2 /dev/xvdb /dev/xvdc
domU> mkfs.ext3 /dev/md10  && mount /data /dev/md10

Four disks SR from dom0, soft raid 10 on domU

domU>umount /data
domU> mdadm --manage /dev/md10 --stop
domU> {delete md2 and md1 disks from the storage tab under your VM Host in Xen Center}
dom0> {VM Host SR Delete Process} #to clean out 'Local storage md10'
dom0>mdadm --manage /dev/md2 --stop
dom0>mdadm --manage /dev/md1 --stop
dom0>mdadm --manage /dev/md0 --stop
dom0>fdisk /dev/sda #delete partition and write (d w)
dom0>fdisk /dev/sdb #delete partition and write (d w)
dom0>fdisk /dev/sdc #delete partition and write (d w)
dom0>fdisk /dev/sdd #delete partition and write (d w)
dom0>xe sr-create content-type=user device-config:device=/dev/sda host-uuid=`grep -B1 -f /etc/hostname <(xe host-list)|head -n1|awk '{print $NF}'` name-label="Local storage sda on `cat /etc/hostname`" shared=false type=lvm
dom0>xe sr-create content-type=user device-config:device=/dev/sdb host-uuid=`grep -B1 -f /etc/hostname <(xe host-list)|head -n1|awk '{print $NF}'` name-label="Local storage sdb on `cat /etc/hostname`" shared=false type=lvm
dom0>xe sr-create content-type=user device-config:device=/dev/sdc host-uuid=`grep -B1 -f /etc/hostname <(xe host-list)|head -n1|awk '{print $NF}'` name-label="Local storage sdc on `cat /etc/hostname`" shared=false type=lvm
dom0>xe sr-create content-type=user device-config:device=/dev/sdd host-uuid=`grep -B1 -f /etc/hostname <(xe host-list)|head -n1|awk '{print $NF}'` name-label="Local storage sdd on `cat /etc/hostname`" shared=false type=lvm
domU>mdadm --create /dev/md10 -l10 --raid-devices=4 /dev/xvdb /dev/xvdc /dev/xvde /dev/xvdf
domU>mdadm --detail --scan >> /etc/mdadm/mdadm.conf 
domU>echo 100000 > /proc/sys/dev/raid/speed_limit_min #I made the resync go fast, which reduced it from 26 hours to about 3 hours
domU>mdadm --grow /dev/md0 --size=max

Creating a Bootable USB Install Thumb drive for XenServer

Creating a Bootable USB Install Thumb drive for XenServer

We have a couple sites with XenServer VM machines,  so part of our redundancy / failure plan is to be able to quickly isntall / reinstall a XenServer hypervisor.

THere are plenty of more involved methods with setting up PXE servers,  etc.  But the quickest / low tech method is to have a USB thumbdrive on hand.

So we can use one of the plethora of tools to create a USB thumbdrive,   (unetbootin,  USB to ISO,  etc)  but they all seem to have problems with the ISO,   (OS not found,    error with install,  etc)

So I found one that works well

http://rufus.akeo.ie/

He keeps his software upto date it appears.  Download it and run it,   select your USB drive then check the box to ‘create a bootable disk using ISO Image,  select the image to use from your hard drive.    I downloaded the iso image from

http://xenserver.org/overview-xenserver-open-source-virtualization/download.html

– XenServer Installation ISO

Then just boot from the USB drive and the install should start.

 

Deleting Orphaned Disks in Citrix XenServer

Deleting Orphaned Disks in Citrix XenServer

I found that while building my virtual environment with templates ready to deploy I created quite a few templates and snapshots.

I did a pretty good job of deleting the extras when I didn’t need them any more,  but in some cases when deleting a VM I no longer needed,  I forgot to check the box to delete the snapshots that went WITH that VM.

I could see under the dom0 host -> Storage tab  that space was still allocated to the snapshots,  (Usage was higher than the combined visible suage of servers and templates,  and Virtual allocation was way higher than it should be)

But without a place that listed the snapshots that were taking up space. When looking into the way to delete these orphaned snapshots (and the disk snapshots that went with them)  I found some cumbersome command line methods. 

Like this old method that someone used - http://blog.appsense.com/2009/11/deleting-orphaned-disks-in-citrix-xenserver-5-5/

After a big more digging,  i found that by just clicking on the Local Storage under the domU  then clicking on the ‘Storage’ tab under there,  I would see a list of all of the storage elements that are allocated.  I would see some that were for snapshots without a name.  Turns out those were the ones that were orphaned,  If they were allocated to a live server the delete button would not be highlighted so I just deleted those old ones.

 

 

Resizing a VDI on XenServer using XenCenter and Commandline

Resizing a VDI on XenServer using XenCenter and Commandline

Occassionally I have a need to change the size of a disk,  perhaps to allocate more data to the os.

To do this,  on the host I unmount the disk

umount /data

Click on the domU server in XenCenter and click on the Storage tab,  select the storage item I want to resize and click ‘Detach’
at the command line on one of the dom0 hosts

 xe sr-list host=dom0hostname

write down the uuid of the SR which the Virtual Disk was in. (we will use XXXXX-XXXXX-XXXX)

 xe vdi-list sr-uuid=XXXXX-XXXXX-XXXX

write down the uuid of the disk that you wanted to resize(we will use YYYY-YYYY-YYYYY)
Also,  note that the the virtual-size parameter that shows.  VDIs can not be shrunk so you will need a disk size LARGER than the size displayed here.

 xe vdi-resize sr-uuid=YYYY-YYYY-YYYYY disk-size=9887654


Call Now Button(208) 344-1115

SIGN UP TO
GET OUR 
FREE
 APP BLUEPRINT

Join our email list

and get your free whitepaper