COMMANDDUMP – Monitor File for error – Ding if found

An Elusive error was occuring that we needed to be notified of immediately.  The fastest way to catch it was to run the following script at a bash command prompt so that when the error happened the script would beep until we stopped it.

while true; do ret=`tail -n 1 error-error-main.log|grep -i FATAL `;if [ “$ret” != “” ] ; then echo $ret; echo -en “\0007”; fi; sleep 1; done

Load problems after disk replacement on a ocfs2 and drbd system.

Notes Blurb on investigating a complex issue.    resolved,  however not with a concise description,  notes kept in order to continue the issue in the case it happens again.

Recently,    we had a disk failure on one of two SAN servers utilizing MD, OCFS2 and drbd to keep two servers synchronized.

We will call the two Systems: A  and B

 

The disk was replaced on System A, which required a reboot in order for the system to recognize the new disk,   then we ad to –re-add the disk to the MD.  Once this happened,  the disk started to rebuild.   The OCFS and drbd layers did not seem to have any issue rebuilding quickly as soon as the servers rebuilt,  the layers of redundancy made it fairly painless.   However,  the load on System B went up to 2.0+  and on System A up to 7.0+!

This slowed down System B significantly and made System A completely unusable.

I took a look at the many different tools to try to debug this.

  • top
  • iostat -x 1
  • iotop
  • lsof
  • atop

The dynamics of how we use the redundant sans should be taken into should be taken into account here.

We mount System B to an application server via NFS,  and reads and writes are done to System B,   this makes it odd that System A is having such a hard time keeping up,  it honly has to handle the DRBD and OCFS2 communication in order to keep synced (System A is handling reads and writes,   where System B is only having to handle writes on the DRBD layer when changes are made.  iotop shows this between 5 and 40 K/s,  which seemed minimal.

Nothing is pointing to any kind of a direct indicator of what is causing the 7+ load on System A.   the top two processes seem to be drbd_r_r0 and o2hb-XXXXXX,  which take up minimal amounts of read and write

The command to run on a disk to see what is happening is

#iotop -oa

This command shows you only the commands that have used some amount of disk reas or write (-o)  and it shows them cumulatively (-a) so you can easily see what is using the io on the system.    From this I figured out that a majority of the write on the system,  was going to the system drive.

What I found from this,  is that the iotop, tool does not show the activity that is occuring at the drbd / ocfs2 level.   I was able to see that on System B,  where the NFS drive was connected to,  that the nfsd command was writing MULTIPLE MB of information when I would write to the nfsdrive (cat /dev/zero> tmpfile),  but I would see only 100K or something written to drbd on System B,  and nothing on SystemA,  however I would be able to see the file on System A,

I looked at the cpuload on Sysetm A when running the huge write,  and it increased by about 1 (from 7+ to 8+)  so it was doing some work ,  iotop just did not monitor it.

So  i looked to iostat to find out if i would allow me to see the writes to the actual devices in the MD.

I ran

#iostat -x 5

So I could see what was being written to the devices,  here is could see that the disk utilization on System A and System B was similar (about 10% per drive in the MD Array)  and the await time on System B was a bit higher than System A.  When I did this test I caused the load to go up on all servers to about 7 (application server,  System A and System B)  Stopping the write made the load time on the application server, and on System B go back down.

While this did not give me the cause, it helped me to see that disk writes on System A are trackable through iostat, and since no writes are occurring when I run iostat -x 5 I have to assume that there is some sort of other overhead that is causing the huge load time.    With nothing else I felt I could test,  I just rebooted the Server A.

Low and behold,   the load dropped,    writing huge files,  deleting huge files was no longer an issue.   The only think I could think was that there was a large amount of traffic of something which  was being transferred back and forth to some ‘zombie’ server or something.   (I had attempted to restart ocfs2 and drbd and the system wouldn’t allow that either which seems like it indicates a problem with some process being held open by a zombie process)

In the end,  this is the best scenario I can use to describe the problem.  While this is not real resolution.  I publish this so that when an issue comes up with this in the future,  we will be able to investigate about three different possibilities in order to get closer to  figuring out the true issue.

  1. Investigate the network traffic (using ntop for traffic,   tcpdump for contents,  and eth for total stats and possible errors)
  2. Disconnect / Reconnect the drbd and ocfs2 pair to stop the synchronization and watch the load balance to see if that is related to the issue.
  3. Attempt to start and stop the drbd and ocfs2 processes and debug any problems with that process. (watch the traffic or other errors related to those processes)

Find out which PHP packages are installed on ubuntu / debian

Find out which PHP packages are installed on ubuntu / debian

As we have moved or upgraded sites from one server to another,  sometimes we have needed to know which PHP5 dependencies were installed on one server servera,  so that we could make sure those same dependencies were met on another server serverb

To do this we can run a simply command line tool on server a

servera# echo `dpkg -l|awk '$1 ~ /ii/ && $2 ~ /php5/{print $2}'`
libapache2-mod-php5 php5 php5-cli php5-common php5-curl php5-gd php5-mcrypt php5-mysql php5-pgsql 

and then we copy the contents of the output and past it after the apt-get install command on serverb

serverb# apt-get install libapache2-mod-php5 php5 php5-cli php5-common php5-curl php5-gd php5-mcrypt php5-mysql php5-pgsql 

Dont forget to reload apache as some packages do not reload it automatically

serverb# /etc/init.d/apache2 reload

Utility – Bulk Convert the Unix Timestamp in log messages To a Readable Date

Utility – Bulk Convert the Unix Timestamp in log messages To a Readable Date

I have often run into the need to convert a large list of timestamps from Unix Timestamp to a readable date.

Batch Unix Timestamp Convert

Often times this is simply a need that I have when receiving an error message from a server,  or when reviewing log files which only use Unix Timestamps.

So I created a simple utility,  just paste in your text from the log file,   the utility will search out the string for timestamps listed as the first part of each line,  and convert the timestamp to a date.

While this might be useful at some point as an automated process,  for now I just use it when I need it.

I am documenting the tool here with a link for myself (or any one else that may need it) so that it is simple to find.

http://matraex.com/batch-timestamp-to-date.php

Possible future upgrades to this utility will likely search out Unix Timestamps anywhere in the text and convert them,  instead of just at the first of the line.

 

 

One Line WordPress Install

One Line WordPress Install

To install the latest version of WordPress to your current working directory in Linux you can run this command

#wget -O - https://wordpress.org/latest.tar.gz |tar --strip-components=1 -xvzf - wordpress/

Just make sure you are in your install directory when you run it

#cd /var/www/html


my btmp file is huge on linux, what do I do

my btmp file is huge on linux,  what do I do

The /var/log/btmp file is one that tracks all of the login attempts on your machine.  If it is huge it probably means someone is trying to brute force attack you computer.

the file is binary so you can not just view it,  you have to use

#lastb|less

Most likely you will find that someone has been attempting to repeatedly hack your computer,   consider setting up a firewall which limits the IP address that are allowed to login to your SSH port.

You could also install DenyHosts

#apt-get install denyhosts

One issue that can occur is that if you are getting attacked,  the log size gets to large.

Most likely your logrotate.conf file has a /var/log/btmp entry in it.   Update this file to rotate and compress the log file more frequently (see the logrotate documentation)

The Linux find command is awesomely powerful!

The Linux find command is awesomely powerful!

At least I think it is awesome. Here are a couple of useful commands which highlight some of it more powerful features.  (these are just ones I used recently,   as soon as you start chaining sed, awk, sort and uniq,   the commands get even more powerful)

Changing the ownership of all files which do not have the correct ownership (useful to me when doing a server migration where the postfix user was uid 102 and changed to uid 105)
This command also lists the details of the file before it runs the chown command on it.

find . -not -uid 105 -exec chown postfix {} ;

Get a list of all of the files that have been modified in the last 20 minutes

find . -mmin -20

find all log files and their sizes older than 60 days,   I use awk to sum the size of these up.

find /data/webs/ -path '*/logs/*' -name '*log' -mtime +60 -exec du {} ; |awk '{t+=$1; print t" "$0}'

Often times I just turn around and delete these files if I do not need them ,  the command above helps me know what kind of space I would be recovering and if there are any HUGE file size offenders.

find /data/webs/ -path '*/logs/*' -name '*log' -mtime +60 -delete

 

How to skip certain directories when running updatedb

How to skip certain directories when running updatedb

To skip certain directories when running updatedb edit

/etc/updatedb.conf

and add the directories you want to skip to the PRUNEFS configuration variable

PRUNEFS="/tmp /my/giant/directory"

That is it,  then run updatedb again,  it will skip the listed directory,   in my case updatedb ran much faster.

 

Quick script to install WordPress from the Linux command line

Quick script to install WordPress from the Linux command line

I find that it is much faster to download and install WordPress from the command line in Linux than attempting to use FTP

By running the following script in a new directory, you will:

  • download the latest version of WordPress
  • untar / unzip it
  • move the files into the current directory
  • cleanup the unused empty directory
  • and update the ownership of all of the files to match the directory you are already in.
wget https://wordpress.org/latest.tar.gz
tar -xvzf latest.tar.gz
mv wordpress/* .
rm -rf wordpress/ latest.tar.gz
chown -R `stat . --printf '%u.%g'` *

AWK script to show number of apache hits per minute

AWK script to show number of apache hits per minute

Documenting this script,  to save me the time of rewriting it time and again on different servers

 tail -n100000 www.access.log|awk '/09/Apr/{print $4}'|awk -F'[' '{print $2}'|awk -F':' '{print $1":"$2":"$3}' |sort -n|uniq -c

This shows output like this

 21 09/Apr/2015:12:48
 21 09/Apr/2015:12:49
 21 09/Apr/2015:12:50
 21 09/Apr/2015:12:51
 21 09/Apr/2015:12:52
 711 09/Apr/2015:12:53
 1371 09/Apr/2015:12:54
 1903 09/Apr/2015:12:55
 2082 09/Apr/2015:12:56
 2256 09/Apr/2015:12:57
 2123 09/Apr/2015:12:58
 1951 09/Apr/2015:12:59
 1589 09/Apr/2015:13:00
 1427 09/Apr/2015:13:01
 811 09/Apr/2015:13:02

Call Now Button(208) 344-1115

SIGN UP TO
GET OUR 
FREE
 APP BLUEPRINT

Join our email list

and get your free whitepaper