Using MPT-Status for RAID Monitoring in a Poweredge C6100 with Perc 6

Using MPT-Status for RAID Monitoring in a Poweredge C6100 with Perc 6

This post outlines the steps needed to get a CLI report of the conditions of your RAIDs in a Poweredge C6100 with a PERC 6/i RAID Controller.

Verify your controller type:

cat /proc/scsi/mptsas/0

ioc0: LSISAS1068E B3, FwRev=011b0000h, Ports=1, MaxQ=277

Download the following packages:

daemonize-1.5.6-1.el5.i386.rpm mpt-status-1.2.0-3.el5.centos.i386.rpm lsscsi-0.17-3.el5.i386.rpm

http://dl.nux.ro/utils/mpt-status/mpt-status-1.2.0-3.el5.centos.i386.rpm

http://dl.nux.ro/utils/mpt-status/daemonize-1.5.6-1.el5.i386.rpm

http://mirror.centos.org/centos/5/os/i386/CentOS/lsscsi-0.17-3.el5.i386.rpm

Install mtp-status:

rpm -ivh mpt-status-1.2.0-3.el5.centos.i386.rpm daemonize-1.5.6-1.el5.i386.rpm lsscsi-0.17-3.el5.i386.rpm

modprobe mptctl

echo mptctl >> /etc/modules

Verify your modules:

lsmod |grep mpt

mptctl 90739 0

mptsas 57560 4

mptscsih 39876 1 mptsas

mptbase 91081 3 mptctl,mptsas,mptscsih

scsi_transport_sas 27681 1 mptsas

scsi_mod 145658 7 mptctl,sg,libata,mptsas,mptscsih,scsi_transport_sas,sd_mod

run:

mpt-status or mpt-status -n -s

Also, you can use: lsscsi -l

This little script:

echo `mpt-status -n -s|awk ‘/OPTIMAL/ {print $1, “OK”}; /ONLINE/ {print $1, “OK”}; /DEGRADED/ {print $1, “FAILURE”}; /scsi/ {print $2}; /MISSING/ {print $1, “FAILURE”} ‘`

reports:

vol_id:0 OK phys_id:1 OK phys_id:0 OK 100% 100%

On a rebuild, it reports:

vol_id:0 FAILURE phys_id:2 OK phys_id:3 OK 75% 75%

Copy that script into a file called “check_raid”, and make it executable, E.G. 755

Edit nagios-statd on parcel1. Replace “sudo /customcommands/check_raid.pl -b -w1 -c1” with filename check-raid (without the switches) at line 20, and remove “sudo”

So, from this:

commandlist[‘Linux’] = (“df -P”,”who -q | grep “#””,”ps ax”,”uptime”,”free | awk ‘$1~/^Swap:/{print ($3/$2)*100}'”,”sudo /customcommands/check_raid.pl -b -w1 -c1″)

To this:

commandlist[‘Linux’] = (“df -P”,”who -q | grep “#””,”ps ax”,”uptime”,”free | awk ‘$1~/^Swap:/{print ($3/$2)*100}'”,”/customcommands/check_raid”)

Port 1040 will need to be opened in XenServer. Edit /etc/sysconfig/iptables and insert this line:

-A RH-Firewall-1-INPUT -p tcp -m tcp –dport 1040 -j ACCEPT

Restart the firewall:

service iptables restart

Output:

Flushing firewall rules: [ OK ]

Setting chains to policy ACCEPT: filter [ OK ]

Unloading iptables modules: [ OK ]

Applying iptables firewall rules: [ OK ]

Loading additional iptables modules: ip_conntrack_netbios_n[FAILED]

NOTE: The “FAILED” error above doesn’t seem to be a problemVerify that port 1040 is open:

Check the status of port 1040:

service iptables status

Output:

Table: filter

Chain INPUT (policy ACCEPT)

num target prot opt source destination

1 ACCEPT 47 — 0.0.0.0/0 0.0.0.0/0

2 RH-Firewall-1-INPUT all — 0.0.0.0/0 0.0.0.0/0

Chain FORWARD (policy ACCEPT)

num target prot opt source destination

1 RH-Firewall-1-INPUT all — 0.0.0.0/0 0.0.0.0/0

Chain OUTPUT (policy ACCEPT)

num target prot opt source destination

Chain RH-Firewall-1-INPUT (2 references)

num target prot opt source destination

1 ACCEPT all — 0.0.0.0/0 0.0.0.0/0

2 ACCEPT icmp — 0.0.0.0/0 0.0.0.0/0 icmp type 255

3 ACCEPT esp — 0.0.0.0/0 0.0.0.0/0

4 ACCEPT ah — 0.0.0.0/0 0.0.0.0/0

5 ACCEPT udp — 0.0.0.0/0 224.0.0.251 udp dpt:5353

6 ACCEPT udp — 0.0.0.0/0 0.0.0.0/0 udp dpt:631

7 ACCEPT tcp — 0.0.0.0/0 0.0.0.0/0 tcp dpt:631

8 ACCEPT tcp — 0.0.0.0/0 0.0.0.0/0 tcp dpt:1040

9 ACCEPT all — 0.0.0.0/0 0.0.0.0/0 state RELATED,ESTABLISHED

10 ACCEPT udp — 0.0.0.0/0 0.0.0.0/0 state NEW udp dpt:694

11 ACCEPT tcp — 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:22

12 ACCEPT tcp — 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:80

13 ACCEPT tcp — 0.0.0.0/0 0.0.0.0/0 state NEW tcp dpt:443

14 REJECT all — 0.0.0.0/0 0.0.0.0/0 reject-with icmp-host-prohibited

running “nagios-statd” opens port 1040 on Parcel1 and listens for commands to be initiated by nagios_stat on the nagios server.

On the nagios server, in a file called “remote.orig.cfg, there are commands defined using “nagios-stat”: NOTE: These are from a working server and haven’t been modified to work with mpt. Some changes may need to be made. This is just an example of the interaction between Nagios server and client

Example:

define command{

command_name check_remote_raid

command_line $USER1$/nagios-stat -w $ARG1$ -c $ARG2$ -p $ARG3$ raid $HOSTADDRESS$

}

This command defined above is used in the “services.cfg” file.

Example:

define service{

use matraex-template

host_name mtx-lilac

service_description Lilac /data Raid

check_command check_remote_raid!1!1!1040

The three files needed on the C6100 node are:

/customcommands/check_raid (contents below) -rwxr-xr-x

/customcommands/nagios-statd (contents below) -rwxr-xr-x

/etc/init.d/nagios-statd (contens below) -rwxr–r–

Creating the soft links:

ln -s /etc/init.d/nagios-statd /etc/rc.d/rc3.d/K01nagios-statd

ln -s /etc/init.d/nagios-statd /etc/rc.d/rc3.d/S99nagios-statd

The -s = soft, and -f if used, forces overwrite.

/rc3.d/ designates runlevel 3

So when you do this:

ls -lt /customcommands/nagios-statd /etc/init.d/nagios-statd /customcommands/check_raid /etc/rc.d/rc3.d/*nagios-statd

This is what you should see:

lrwxrwxrwx 1 root root 22 Mar 6 08:08 /etc/rc.d/rc3.d/K01nagios-statd -> ../init.d/nagios-statd

-rwxr-xr-x 1 root root 365 Mar 6 07:59 /customcommands/check_raid

lrwxrwxrwx 1 root root 22 Mar 6 07:52 /etc/rc.d/rc3.d/S99nagios-statd -> ../init.d/nagios-statd

-rwxr-xr-x 1 root root 649 Mar 6 07:51 /etc/init.d/nagios-statd

-rwxr-xr-x 1 root root 9468 Mar 5 12:05 /customcommands/nagios-statd

Script Files:

NOTE: Here’s a little fix that helped me out. I had originally pasted these scripts into a DOS/Windows editor (wordpad) and it added DOS-type returns to the file, resulting in an error:

-bash: ./nagios-statd: /bin/sh^M: bad interpreter: No such file or directory

If you encounter this, do this:

Open the file in vi

hit “:” to go into command mode

enter “set fileformat=unix”

then :wq to quit.

/customcommands/check_raid:

#!/bin/bash

EXECFILE=/usr/sbin/mpt-status

if [ ! -e $EXECFILE ] ; then

echo

echo “Error $EXECFILE is not installed, please install before running”

echo

echo “Usage $0”;

echo

exit 10

fi

echo `$EXECFILE -n -s|awk ‘/OPTIMAL/ {print $1, “OK”}; /ONLINE/ {print $1, “OK”}; /DEGRADED/ {print $1, “FAILURE”}; /scsi/ {print $2};

/MISSING/ {print $1, “FAILURE”} ‘`

/customcommands/nagios_statd

#!/usr/bin/python

import getopt, os, sys, signal, socket, SocketServer

class Functions:

“Contains a set of methods for gathering data from the server.”

def __init__(self):

self.nagios_statd_version = 3.09

# As of right now, the commands are for df, who, proc, uptime, and swap.

commandlist = {}

commandlist[‘AIX’] = (“df -Ik”,”who | wc -l”,”ps ax”,”uptime”,”lsps -sl | grep -v Paging | awk ‘{print $2}’ | cut -f1 -d%”)

commandlist[‘BSD/OS’] = (“df”,”who | wc -l”,”ps -ax”,”uptime”,None)

commandlist[‘CYGWIN_NT-5.0’] = (“df -P”,None,”ps -s -W | awk ‘{printf(“%6s%6s%3s%6s%sn”,$1,$2,” S”,” 0:00″,substr($0,22))}'”,None,None)

commandlist[‘CYGWIN_NT-5.1’] = commandlist[‘CYGWIN_NT-5.0’]

commandlist[‘FreeBSD’] = (“df -k”,”who | wc -l”,”ps ax”,”uptime”,”swapinfo | awk ‘$1!~/^Device/{print $5}'”)

commandlist[‘HP-UX’] = (“bdf -l”,”who -q | grep “#””,”ps -el”,”uptime”,None)

commandlist[‘IRIX’] = (“df -kP”,”who -q | grep “#””,”ps -e -o “pid tty state time comm””,”/usr/bsd/uptime”,None)

commandlist[‘IRIX64’] = commandlist[‘IRIX’]

commandlist[‘Linux’] = (“df -P”,”who -q | grep “#””,”ps ax”,”uptime”,”free | awk ‘$1~/^Swap:/{print ($3/$2)*100}'”,”/customcommands/check_raid”)

commandlist[‘NetBSD’] = (“df -k”,”who | wc -l”,”ps ax”,”uptime”,”swapctl -l | awk ‘$1!~/^Device/{print $5}'”)

commandlist[‘NEXTSTEP’] = (“df”,”who | /usr/ucb/wc -l”,”ps -ax”,”uptime”,None)

commandlist[‘OpenBSD’] = (“df -k”,”who | wc -l”,”ps -ax”,”uptime”,”swapctl -l | awk ‘$1!~/^Device/{print $5}'”)

commandlist[‘OSF1’] = (“df -P”,”who -q | grep “#””,”ps ax”,”uptime”,None)

commandlist[‘SCO-SV’] = (“df -Bk”,”who -q | grep “#””,”ps -el -o “pid tty s time args””,”uptime”,None)

commandlist[‘SunOS’] = (“df -k”,”who -q | grep “#””,”ps -e -o “pid tty s time comm””,”uptime”,”swap -s | tr -d -s -c [:digit:][:space:] | nawk ‘{print ($3/($3+$4))*100}'”)

commandlist[‘UNIXWARE2’] = (“/usr/ucb/df”,”who -q | grep “#””,”ps -el | awk ‘{printf(“%6d%9s%2s%5s %sn”,$5,substr($0, 61, 8),$2,substr($0,69,5),substr($0,75))}”,”echo `uptime`, load average: 0.00, `sar | awk ‘{oldidle=idle;idle=$5} END {print 100-oldidle}’`,0.00″,None)

# Now to make commandlist with the correct one for your OS.

try:

self.commandlist = commandlist[os.uname()[0]]

except KeyError:

print “Your platform isn’t supported by nagios-statd – exiting.”

sys.exit(3)

# Below are the functions that the client can call.

def disk(self):

return self.__run(0)

def proc(self):

return self.__run(2)

def swap(self):

return self.__run(4)

def uptime(self):

return self.__run(3)

def user(self):

return self.__run(1)

def raid(self):

return self.__run(5)

def version(self):

i = “nagios-statd ” + str(self.nagios_statd_version)

return i

def __run(self,cmdnum):

# Unmask SIGCHLD so popen can detect the return status (temporarily)

signal.signal(signal.SIGCHLD, signal.SIG_DFL)

outputfh = os.popen(self.commandlist[cmdnum])

output = outputfh.read()

returnvalue = outputfh.close()

signal.signal(signal.SIGCHLD, signal.SIG_IGN)

if (returnvalue):

return “ERROR %s ” % output

else:

return output

class NagiosStatd(SocketServer.StreamRequestHandler):

“Handles connection initialization and data transfer (as daemon)”

def handle(self):

# Check to see if user is allowed

if self.__notallowedhost():

self.wfile.write(self.error)

return 1

if not hasattr(self,”generichandler”):

self.generichandler = GenericHandler(self.rfile,self.wfile)

self.generichandler.run()

def __notallowedhost(self):

“Compares list of allowed users to client’s IP address.”

if hasattr(self.server,”allowedhosts”) == 0:

return 0

for i in self.server.allowedhosts:

if i == self.client_address[0]: # Address is in list

return 0

try: # Do an IP lookup of host in blocked list

i_ip = socket.gethostbyname(i)

except:

self.error = “ERROR DNS lookup of blocked host “%s” failed. Denying by default.” % i

return 1

if i_ip != i: # If address in list isn’t an IP

if socket.getfqdn(i) == socket.getfqdn(self.client_address[0]):

return 0

self.error = “ERROR Client is not among hosts allowed to connect.”

return 1

class GenericHandler:

def __init__(self,rfile=sys.stdin,wfile=sys.stdout):

# Create functions object

self.functions = Functions()

self.rfile = rfile

self.wfile = wfile

def run(self):

# Get the request from the client

line = self.rfile.readline()

line = line.strip()

# Check for appropriate requests from client

if len(line) == 0:

self.wfile.write(“ERROR No function requested from client.”)

return 1

# Call the appropriate function

try:

output = getattr(self.functions,line)()

except AttributeError:

error = “ERROR Function “” + line + “” does not exist.”

self.wfile.write(error)

return 1

except TypeError:

error = “ERROR Function “” + line + “” not supported on this platform.”

self.wfile.write(error)

return 1

# Send output

if output.isspace():

error = “ERROR Function “” + line + “” returned no information.”

self.wfile.write(error)

return 1

elif output == “ERROR”:

error = “ERROR Function “” + line + “” exited abnormally.”

self.wfile.write(error)

else:

for line in output:

self.wfile.write(line)

class ReUsingServer (SocketServer.ForkingTCPServer):

allow_reuse_address = True

class Initialization:

“Methods for interacting with user – initial code entry point.”

def __init__(self):

self.port = 1040

self.ip = ”

# Run this through Functions initially, to make sure the platform is supported.

i = Functions()

del(i)

def getoptions(self):

“Parses command line”

try:

opts, args = getopt.getopt(sys.argv[1:], “a:b:ip:P:Vh”, [“allowedhosts=”,”bindto=”,”inetd”,”port=”,”pid=”,”version”,”help”])

except getopt.GetoptError, (msg, opt):

print sys.argv[0] + “: ” + msg

print “Try ‘” + sys.argv[0] + ” –help’ for more information.”

sys.exit(3)

for option,value in opts:

if option in (“-a”,”–allowedhosts”):

value = value.replace(” “,””)

self.allowedhosts = value.split(“,”)

elif option in (“-b”,”–bindto”):

self.ip = value

elif option in (“-i”,”–inetd”):

self.runfrominetd = 1

elif option in (“-p”,”–port”):

self.port = int(value)

elif option in (“-P”,”–pid”):

self.pidfile = value

elif option in (“-V”,”–version”):

self.version()

sys.exit(3)

elif option in (“-h”,”–help”):

self.usage()

def main(self):

# Retrieve command line options

self.getoptions()

# Just splat to stdout if we’re running under inetd

if hasattr(self,”runfrominetd”):

server = GenericHandler()

server.run()

sys.exit(0)

# Check to see if the port is available

try:

s = socket.socket(socket.AF_INET, socket.SOCK_STREAM)

s.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)

s.bind((self.ip, self.port))

s.close()

del(s)

except socket.error, (errno, msg):

print “Unable to bind to port %s: %s – exiting.” % (self.port, msg)

sys.exit(2)

# Detach from terminal

if os.fork() == 0:

# Make this the controlling process

os.setsid()

# Be polite and chdir to /

os.chdir(‘/’)

# Try to close all open filehandles

for i in range(0,256):

try: os.close(i)

except: pass

# Redirect the offending filehandles

sys.stdin = open(‘/dev/null’,’r’)

sys.stdout = open(‘/dev/null’,’w’)

sys.stderr = open(‘/dev/null’,’w’)

# Set the path

os.environ[“PATH”] = “/bin:/usr/bin:/usr/local/bin:/usr/sbin”

# Reap children automatically

signal.signal(signal.SIGCHLD, signal.SIG_IGN)

# Save pid if user requested it

if hasattr(self,”pidfile”):

self.savepid(self.pidfile)

# Create a forking TCP/IP server and start processing

server = ReUsingServer((self.ip,self.port),NagiosStatd)

if hasattr(self,”allowedhosts”):

server.allowedhosts = self.allowedhosts

server.serve_forever()

# Get rid of the parent

else:

sys.exit(0)

def savepid(self,file):

try:

fh = open(file,”w”)

fh.write(str(os.getpid()))

fh.close()

except:

print “Unable to save PID file – exiting.”

sys.exit(2)

def usage(self):

print “Usage: ” + sys.argv[0] + ” [OPTION]”

print “nagios-statd daemon – remote UNIX system monitoring tool for Nagios.n”

print “-a, –allowedhosts=HOSTS Comma delimited list of IPs/hosts allowed to connect.”

print “-b, –bindto=IP IP address for the daemon to bind to.”

print “-i, –inetd Run from inetd.”

print “-p, –port=PORT Port to listen on.”

print “-P, –pid=FILE Save pid to FILE.”

print “-V, –version Output version information and exit.”

print ” -h, –help Print this help and exit.”

sys.exit(3)

def version(self):

i = Functions()

print “nagios-statd %.2f” % i.nagios_statd_version

print “os.uname()[0] = %s ” % os.uname()[0]

print “Written by Nick Reinkingn”

print “Copyright (C) 2002 Nick Reinking”

print “This is free software. There is NO warranty; not even for MERCHANTABILITY or”

print “FITNESS FOR A PARTICULAR PURPOSE.”

print “nNagios is a trademark of Ethan Galstad.”

if __name__ == “__main__”:

# Check to see if running Python 2.x+ / needed because getfqdn() is Python 2.0+ only

if (int(sys.version[0]) < 2):

print “nagios-statd requires Python version 2.0 or greater.”

sys.exit(3)

i = Initialization()

i.main()

/etc/init.d/nagios-statd:

#!/bin/sh

#

# This file should have uid root, gid sys and chmod 744

#

if [ ! -d /usr/bin ]

then # /usr not mounted

exit

fi

killproc() { # kill the named process(es)

pid=`/bin/ps -e |

/bin/grep -w $1 |

/bin/sed -e ‘s/^ *//’ -e ‘s/ .*//’`

[ “$pid” != “” ] && kill $pid

}

# Start/stop processes required for netsaint_statd server

case “$1” in

‘start’)

/customcommands/nagios-statd -a <IP of Allowed Nagios Server>,<IP of Test Workstation> -p 1040

;;

‘stop’)

killproc nagios-statd

;;

*)

echo “Usage: /etc/init.d/nagios-statd { start | stop }”

;;

esac

 

Testing:

As you can see in the script file above, I’ve added the IP Address of a test workstation. This will allow me to simply telnet to a node in the C6100 and execute one of the commands defined in this section of the /customcommands/nagios-statd script:

# Below are the functions that the client can call.

def disk(self):

return self.__run(0)

def proc(self):

return self.__run(2)

def swap(self):

return self.__run(4)

def uptime(self):

return self.__run(3)

def user(self):

return self.__run(1)

def raid(self):

return self.__run(5)

At your workstation, telnet to <Node IP Address> 1040

When connected, the screen will be blank.

Type “raid”. The screen won’t echo this.

When you hat enter, you should see:

vol_id:0 OK phys_id:2 OK phys_id:3 OK 100% 100%

Now you’re ready to move on to the Nagios configuration.

Matt Long

03/06/2015