Hacking a corrupt VHD on xen in order to access innodb mysql information.

A client ran into a corrupted .vhd file for the data drive for a xen server in a pool. We helped them to restore from a backup, however there were some items that they had not backed up properly, our task was to see if we could some how restore the data from their drive.

First, we had to find the raw file for the drive. To do this we looked at the Local Storage -> General tab on the XenCenter to find the UUID that will contain the  failing disk.

When we tried to attach the failing disk we get this error

Attaching virtual disk 'xxxxxx' to VM 'xxxx'
The attempt to load the VDI failed

So, we know that the xen servers / pool reject loading the corrupted vhd. So I came up with a way to try and access the data.

After much research I came across a tool that was published by ‘twindb.com’ called ‘undrop tool for innodb’.  The idea is that even after you drop or delete innodb files on your system, there are still markers in the file system which allow code to parse what ‘used’ to be on the system. They claimed some level of this worked for corrupted file systems.

The documentation was poor, and it took a long time to figure out, however they claimed to have 24-hour support, so I thought I would call them and just pay them to sort out the issue. They took a while and didn’t call back before I had sorted it out. All of the documentation he did have showed a link to his github account,  however the link was dead.  I searched and found a couple other people out there that had forked it before twindb took it down.  I am thinking perhaps they run more of an service business now and can help people resolve the issue and they dont want to support the code.  Since this code worked for our needs,  I have forked it so that we can make it permanently available: https://github.com/matraexinc/undrop-for-innodb

First step was for me to copy the .vhd to a working directory

# cp -a 3f204a06-ba18-42ab-ad28-84ca3a73d397.vhd /tmp/restore_vhd/orig.vhd
#cd /tmp/restore_vhd/
#git clone https://github.com/matraexinc/undrop-for-innodb
#cd undrop-for-innodb
#apt-get install bison flex
#apt-get install libmysqld-dev  #this was not mentioned anywhere,  however an important file was quitely not compiled without it.
#mv * ../.  #move all of the compiles files into your working directory
#cd ../
#./stream_parser -f orig.vhd # here is the magic – their code goes through and finds all of the ibdata1 logs and markers and creates data you can start to work through
#mv pages-orig.vhd pages-ibdata1  #the program created an organized set of data for you,  and the next programs need to find this at pages-ibdata1.
#./recover_dictionary.sh #this will need to run mysql as root and it will create a database named ‘test’ which has a listing of all of the databases, tables and indexes it found.

This was where I had to start coming up with a custom solution in order to process the large volume of customer databases.  I used some PHP to script the following commands for all of the many databases that needed to be restored.   But here are the commands for each database and table you must run a command that corresponds to an ‘index’ file that the previous commands created for you,  so you must loop through each of them.

 

select c.name as tablename

,a.id as indexid
from SYS_INDEXES a
join SYS_TABLES c on (a.TABLE_ID =c.ID)

 

This returns a list of the tables and any associated indexes,   Using this you must generate a command which

  1. generates a create statement for the table you are backing up,
  2. generate a load infile sql statement and associated data file

#sys_parser -h localhost -u username -p password -d test tablennamefromsql

This generates the createstatement for the tables,   save this to a createtable.sql file and execute it on your database to restore your table.

#c_parser -5 -o data.load -f pages-ibdata1/FIL_PAGE_INDEX/00000017493.page -t createtable.sql

This outputs a “load data infile ‘data.load’ statement,   you should pipe this to MYSQL and it will restore your data.


I found one example where the was createstatement  was notproperty created for table_id 754,   it appears that the sys_parser code relies on indexes,  and in one case the client tables did not have an index (not even a primary key),   this make it so that no create statement was created and the import did not continue.   To work around this,  I manually inserted a fake primary key on one of the columns into the database

#insert into SYS_INDEXES set id=1000009, table_id = 754,  name=PRIMARY, N_FIELDS=1, Type=3,SPACE=0, PAGE_NO=400000000
#insert into SYS_FIELDS set INDEX_ID=10000009, POS=0, COL_NAME=myprimaryfield

Then I was able to run the sys_parser command which then created the statement.


An Idea that Did not work ….

The idea is to create a new hdd device at /dev/xvdX create a new filesystem and mount it.   The using a tool use as dd or qemu-img ,  overwrite the already mounted device with the contents of the vhd.   While the contents are corrupted,  the idea is that we will be able to explore the corrupted contents as best we can.

so the command I ran was

#qemu-img convert -p -f vpc -O raw /var/run/sr-mount/f40f93af-ae36-147b-880a-729692279845/3f204a06-ba18-42ab-ad28-84ca3a73d397.vhd/dev/xvde

 

Where 3f204a06-ba18-42ab-ad28-84ca3a73d397.vhd is the name of the file / UUID that is corrupted on the xen DOM0 server  and f40f93af-ae36-147b-880a-729692279845 is the UUID of the Storage / SR that it was located on

 

The command took a while to complete (it had to convert 50GB) but the contents of the vhd started to show up as I ran find commands on the mounted directory.   During the transfer,  the results were sporadic as the partition was only partially build,  however after it was completed,  I had access to about 50% of the data.

An Idea that Did not work (2) ….

This was not good enough to get the files the client needed.   I had a suspicion that the  qemu-img convert command may have dropped some of the data that was still available,  so i figured I would try another, somewhat similar command,  that actually seems to be a bit simpler.

This time I created another disk on the same local storage and found it using the xe vdi-list command on the dom0.

#xe vdi-list name-label=disk_for_copyingover

this showed me the UUID of this file was ‘fd959935-63c7-4415-bde0-e11a133a50c0.vhd’

i found it on disk and I executed a cat  from the corrupted vhd file into the mounted vhd file while it was running.

cat 3f204a06-ba18-42ab-ad28-84ca3a73d397.vhd > ../8c5ecc86-9df9-fd72-b300-a40ace668c9b/fd959935-63c7-4415-bde0-e11a133a50c0.vhd

Where 3f204a06-ba18-42ab-ad28-84ca3a73d397.vhd is the name of the file / UUID that is corrupted on the xen DOM0 server fd959935-63c7-4415-bde0-e11a133a50c0.vhd is the name of the vdi we created to copy over

 

This method completely corrupted the mounted drive, so I scrapped this method.

Next up:  

Try some  file partition recovery tools:

I started with testdisk (apt-get install testdisk)   and ran it directly againstt the vhd file

testdisk 3f204a06-ba18-42ab-ad28-84ca3a73d397.vhd

Wordfence – CPU issue with exhaustive scans – COMMANDDUMP

Wordfence has some default scans which run hourly.   On many systems this works well.  In at least one case we found a situation where Wordfence was running hourly scans on some VERY large libraries at the same time on multiple sites on the same server.

A fix was implemented for this,  but in the time that it took us to recognize this issue,  we came up with the following command which helped to kill the CPU hog so we could continue to use the WordPress websites.

 

 kill `apachectl fullstatus|grep wordfence_doScan|awk '{print $2}'`

Some of the ways you can find out that the issue is occuring is by running some of these investigative commands

  • apachectl fullstatus|grep wordfence – how many concurrent scans are running
  • mysqladmin  processlist|grep wf – the number of insert / update / select commands against Word Fence tables
  • vmstat 1 – run a monitor on your system to see how active you are
  • uptime – see your 1 , 5 and 10 minute loads

 

mysqldump fails with ‘help_topic’ is marked as crashed and should be repaired

mysql table is marked crashed,  but status is still ‘OK’

The Problem

We got an error when attempting to dump and backup the mysql database.  It would get an error saying the table was crashed and needed to be retored.

mysqldump: Error 1194: Table 'help_topic' is marked as crashed and should be repaired when dumping table `help_topic` at row: 507

This most likely could apply to any table.

The analysis

We just run a check against the database and it shows a warning

#mysqlcheck -c mysql help_topic
mysql.help_topic
warning : Size of datafile is: 484552 Should be: 482440
status : OK 

So it seems odd that we have a problem if it says the status is okay,  but the data file size issue is the problem

The solution

Even though it says it is okay,  it has to be repaired,   so we run

#mysqlcheck -r mysql help_topic

Voila!  No more error.

 

(This fix can be used on any table)

 

mysql database dump restore – awk script create one script per table

mysql database dump restore – awk script create one script per table

I have used the following scripts to take a huge mysql dump scripts,  with multiple databases and multiple tables,  and use it to create one single script file for creating each table and for inserting the records into that table.

 

This script allows one to create a single large dump file for all databases on a server,  yet when it comes time that some incremental restore is needed,  the large dump file can be quickly stripped into files that can be used to restore a smaller incremental change.

 

Current Database: `49erstrivia`
awk 'BEGIN{ TABLE="table_not_set"}
{
 if($1=="--" && /Current Database:/)
 { 
     CURRENTDB=$NF;
     gsub("`","",CURRENTDB);
     inserted=false;
     print CURRENTDB;
 }
 if($1=="CREATE" && $2=="TABLE")
 { 
     TABLE=$3
     gsub("`","",TABLE)
     inserted=false
 }
 if($1!="INSERT") 
 {
     if(!inserted)
     {
         print $0 > "mysql."CURRENTDB"."TABLE".beforeinsert";
     }
     else
     {
         print $0 > "mysql."CURRENTDB"."TABLE".afterinsert";
     }
 } 
 else 
 {
     print $0 > "mysql."CURRENTDB"."TABLE".insert"; 
     inserted=true
 }
}
'

Debugging Multiple MySQL query in a stored procedure – unusually long runtime

Debugging Multiple MySQL query in a stored procedure – unusually long runtime

I recently was asked to look into an issue where a client had recently moved a mysql database from one server to another server,   and the run time of a stored procedure (which internally ran 2000+ queries ) was running 90 seconds,  where it had previously only taken 0.6 seconds.

What follows is an ongoing journal of the debugging, trouble shooting and discovery as we walked through the process of addressing issues.  Along the way we had lots of small wins,  and at the end we had one major win which basically allowed us to undo all of the other changes.

The result is a documented discussion of many different methods to trouble shoot issues,  as well as some solutions to different issues.   I find these type of blogs to be useful to me in the future as they help  me to shortcut addressing different issues where I don’t have to come up similar troubleshooting steps again.

Here is a list of commands I ran that help me evaluate the overall server

  • top – to determine whether there was significant load and adequate memory free (in this case there was a load average hovering between .2 and 1 and there was adequate memory)
  • mysqladmin variables|grep datadir – the location of the MySQL datadirectory
  • mysqladmin variables|grep storage – to find out what storage engine is being used (innodb or myisam)
  • mysqladmin variables|grep binlog – to find out what synchronization or replication the server might be doing
  • df -h – to determine what disks are mounted and used to store you mysql instance
  • cat /proc/mdstat – to find out if there are any software raid instances
  • du -h /var/lib/mysql – to show the total size of the databases
  • ls -lt /var/lib/mysql/*/* – to find out the size of tables and which tables and logs are update most often (in my case there were no large tables being updates during the last few days)
  • echo ‘desc tblname’|mysql databasename – to ensure there is an index on the column I am using to update with
  • dd if=/dev/zero of=/tmp/test bs=1048576 count=2048 – to check the write speed (I have had to run this command multiple times before in order to get accurate disk write speeds because of File System caching)
  • dd if=test of=/dev/null bs=1048576 – to check the read speed
  • iostat -x 1  – to watch the disc write speed and usage (run this in the other window to watch dd run) -x gives more table based output
  • echo “show create procedure procedurename” |mysql databasename – to evaluate any stored procedures that are causing the issues
  • wc -l /var/lib/mysql/mysql-bin.* – find out how big the bin logs are (if any) to help determine what kind of mysql overhead is required to keep the bin logs up to date.

I created two connections to the server so I could evaluate what was happening as I executed the 180 second run time of the query.

  • Based on the output of the dd commands the write speed on the /var/lib/mysql folder is about double (60+ MB/s) than thta of the /home dir.  They are separate partitions however we are not doing any software raid,  so I assume some sort of a Hardware Raid with striping.

I additionally started to dig into the stored procedure that started having the issue,  adding debugging statements, commenting out capabilities to evaluate the time each takes.

  • As I ran the stored procedure,  I also ran wc -l /var/lib/mysql/mysql-bin.000current and found that the binlog was growing rapidly during the running of the stored procedure.   It turns out that the binlog file was only growing by less than 100 K during the time that 100s of queries were running,
  • Next I evaluated the contents of the binlogs and it turns out that it is actually adding 1000s of lines to the bin logs,  multiple every time that one of the ‘update’ queries inside of the stored procedure ran.

Now I recognize that somehow,  inside of the stored procedure,  the system is syncing the results of an update to the disk before it is done.

  • The server it am running on is 5.5.xx so there are not any advanced binlog write delay features like in 5.7 so I had to come up with another solution.

By using a Start Transaction and Commit,  I could avoid writing / committing the transaction to disk until the very end of stored procedure,  so I added this to the begging and end.

This was IT!  by using transactions I avoided having to evaluate and reconfigure the server (which is running a master slave configuration for backup) and I was able to reduce SP run times from

180 seconds to sub seconds.

One important thing to note about the amount of time that it takes to execute – Each update statement that it runs (more than 2000) do not actually end up making any changes to the database.  Perhaps this is much easier for MySQL to write to the binlog after the transaction is complete,  because the server has the ability to say that the NET CHANGE is …. nothing.

 

Digging Deeper

It turns out that even though I was able to find a way to make the multiple sql statements go faster on the new server.  We still have other uses of the same SQL Statements which require a faster run time.  (we are talking about an execution time of a single query which takes .15 seconds on the new server and .01 seconds on the old server.   Some additional commands that I ran (on both the old and new servers) in order to figure out differences and why it may be running a bit slower on the new server.

  • tune2fs -l /dev/mapper/DEVICE |grep -i block
  • mysql.databasename> stop slave; – tried this temporarily to determine whether the running replication is causing a delay – this did not stop the writes to the bin log.
  • mysqladmin variables|grep innodb_buffer_pool_size – confirms that the buffer is big enough to fit the entire set of tables into the server memory
  • mysql.databasename>  SET sql_log_bin = 0; – this stops the current connection from logging to the bin log – this can help to evaluate the amount of time that it takes to run the update event when the bin file is not being updated
  • mysql.databasename>  SET sql_log_bin = 1; — reset this to 1 so that future updates are bin logged.

I noticed with this testing that the sql_log_bin = 0  reduced the update time by 50%!   I have two possible assumptions from this,  which may note be mutually exclusive

  • That the disk seek and write speed is slower on the new server to the point that the additional work that has to be done by writing to the bin file is slowing the server down.
  • That the server is simply computationally slower and CPU work is simply slower,  so this additional work takes longer.

Related to the sql_log_bin is a varialbe I found which seems to make a difference (from a run time of 0.15 second to 0.10) is sync_binlog.  I updated this to set it to 0 and this apparently makes it so that MySQL does not have to wait until the binlog is synced to the file system before it returns.

Some notes about what I feel comfortable ruling out:

  • There is adequate memory,  and memory is not being swapped around by this very mild update.  Infact,  I can run the exact same update statement a dozen times,  to where there is actually NO change happening to the database.
  • When an update is made to the database,  no change is being immediately pushed to a slave.  the only change that happens is a write to the binlog file

So since no change is happening to the database,  i wonder,  what is causing the overhead?  Why would turning off the  sql_log_bin (0) help anything if we are not actually logging a change?

I decided to look into the binlog to see what was being tracked,  even though there is no real need to do anything on the slave server.

  • mysqlbinlog mysql-bin.0000CURRENT|grep MYTABLENAME

The result showed me a plethora of updates to the same table , even though the change did not make a difference to the underlying database.  With more research I found that binlog_format accepts either ‘STATEMENT’ or ‘ROW’.   It seems that if you use ‘ROW’  the actual database changes are somehow appended to the log file,  instead of the statements which generate the change.

  • mysql> SET binlog_format = ‘ROW’;
  • mysql> SET binlog_format = ‘STATEMENT’;

I immediately noticed a change in the amount of time it took to run the statements.  However the time it took is smaller ONLY when there is NOT an update to the database.

To dig deeper here and find out which of the two methods is better, we will need to do a more indepth analysis of how many of the updates actually manipulate the database,   and how many updates do not.   At this point,    I am not yet comfortable changing the binlog method from statement to row permanently because I am not comfortable that the log files will be properly replicated.   I would want to be able to have a more robust test setup where I am not only evaluating a single statement,  but rather a group of them so that I could determine what method of updating is more optimal.

Version Differences

As we look into the difference between the update statement on the new server and the old server,  we can quickly tell that there is an issue which how quickly things are writing to the disk,  when we remove some of the features which write to the disk,  we quickly find out that the queries operate more quickly.  To break this down to a something we can verify,  we stopped all other services which might write to the disk and we ran on a second screen

  • iostat 1 – this shows the statistics of how many writes are happening within one second

We found that there were no writes until we ran an update statement

  • update TABLENAME set column=’xxxx’ where id = 10;

This updates a single rows,  and we were able to see the number of bytes that it wrote to the disk.  We could run the query multiple times and we could see it write to the disk multiple times.  However,  when we run the same test on the original server,  it would write to the disk the first time the update statement ran,  but not the second, or subsequent time.   Logically this makes sense,  the row / column does not need to be updated the second time because the column already = ‘xxxx’ for id = 10.

With this in mind,  we have a  test that I came up with we can do to confirm that our different servers behave differently to the same query.  Since we are running innodb we can check to see some stats about how many times innodb has written something to the disk.  and we can check this to see whether it increments as we make updates

Old Server – MySQL 5.0

  • echo “show engine innodb status”|mysql databasename|sed -e’s/\\n/\n/g’|grep ‘OS file’
    29744505 OS file reads, 10036031 OS file writes, 6213811 OS fsyncs
  • [root@auth1 ~]# echo ” update TABLENAME set column=’xxxx‘ where id = 10;”|mysql databasename
  • [root@auth1 ~]# echo “show engine innodb status”|mysql databasename|sed -e’s/\\n/\n/g’|grep ‘OS file’
    29744505 OS file reads, 10036033 OS file writes, 6213813 OS fsyncs
  • [root@auth1 ~]# echo ” update TABLENAME set column=’xxxx‘ where id = 10;”|mysql databasename
  • [root@auth1 ~]# echo “show engine innodb status”|mysql databasename|sed -e’s/\\n/\n/g’|grep ‘OS file’
    29744505 OS file reads, 10036033 OS file writes, 6213813 OS fsyncs
  • [root@auth1 ~]# echo ” update TABLENAME set column=’1234‘ where id = 10;”|mysql databasename
  • [root@auth1 ~]# echo “show engine innodb status”|mysql databasename|sed -e’s/\\n/\n/g’|grep ‘OS file’
    29744505 OS file reads, 10036035 OS file writes, 6213815 OS fsyncs

Note that the first time that value is set the number of writes to the OS increments,  but the SECOND time it does not,  because there is no actual update to the data

However,  If I run the exact same set of updates on the new server,  the number of file writes increments EACH time that an update statement is run,  regardless of whether the update statement represents a change in the data

New Server – MySQL 5.5

  • echo “show engine innodb status”|mysql databasename|sed -e’s/\\n/\n/g’|grep ‘OS file’
    28371 OS file reads, 27188 OS file writes, 9678 OS fsyncs
  • [root@auth1 ~]# echo ” update TABLENAME set column=’xxxx‘ where id = 10;”|mysql databasename
  • [root@auth1 ~]# echo “show engine innodb status”|mysql databasename|sed -e’s/\\n/\n/g’|grep ‘OS file’
    28371 OS file reads, 27190 OS file writes, 9680 OS fsyncs
  • [root@auth1 ~]# echo ” update TABLENAME set column=’xxxx‘ where id = 10;”|mysql databasename
  • [root@auth1 ~]# echo “show engine innodb status”|mysql databasename|sed -e’s/\\n/\n/g’|grep ‘OS file’
    28371 OS file reads, 27192 OS file writes, 9682 OS fsyncs
  • [root@auth1 ~]# echo ” update TABLENAME set column=’1234‘ where id = 10;”|mysql databasename
  • [root@auth1 ~]# echo “show engine innodb status”|mysql databasename|sed -e’s/\\n/\n/g’|grep ‘OS file’
    28371 OS file reads, 27194 OS file writes, 9684 OS fsyncs

Basically this means that unless we can find out why the update is writing each time,  we are not goig to be able to reduce the query time on the new server.  It seems like this issue is either

  1. Built into the new version innodb engine
  2. Built into the new version of MySQL
  3. Either way,  it could be an option configurable via my.cnf

As an aside this means that we could potentially write sql which DOES NOT select a row if it does not need to update it.Forexample,  note that the following SQL statement has somewhat of a redundancy in it,  but it results in NOT touching a row in the way that we are seeing the row be touched in the New Server

  • update TABLENAME set column=’xxxx‘ where id = 10 and column <> ‘xxxx’

I started looking into a couple of innodb variables to try to find some way of addressing this.  This article seemed to help and led me to the ultimate solution – 8.5.8 Optimizing InnoDB Disk I/O

The article suggested innodb_flush_method parameter to O_DSYNC.   I updated this and immediately my queries that were running at 50 MS went down to 20 MS,  consistanly.   In thinking about this,  we potentially have a small possibility of data loss between the time that the innodb receives the update and when it is flushed to disk.   In our case with the type of data we are tracking,  that is acceptable.   Elsewhere on the page,  I found another variable which seemed to go one more step. innodb_flush_log_at_trx_commit.  This setting intuitively explains that the log must be flushed to the drive each time that the transaction is commited.   The option did not even exist in 5.0 so I am unsure of what the default behavior was,  however when setting innodb_flush_log_at_trx_commit to 0,   I was able to get the query time down to 0.00.

Basically this means that,  after an update query is run, innodb is allowed to return immediately before the change is flushed and stored to disk.    This addresses our concerns completely.

Run this command to see mysql variables that control how things are flushed.

  • mysqladmin variables|grep flush

So drastic was the change I made using the innodb_flush_log_at_trx_commit variable set to 0,  I decided to undo some of the other testing I had done,  to see how performance was without them.

  1. When I restore innodb_flush_method to default (NULL),  the time to update stayed at 0.00
  2. The OS file writes DO NOT always increment immediately,  however they will increment after a second or two.  I could confirm this by running a statement to look at the number of srv_master_thread log flush and writes: in the innodb status, immediately before and immediately after an update statement and they would be the same,  however if I wait a couple of seconds,  they would increment.
    1. echo “show engine innodb status”|mysql radius|sed -e’s/\\n/\n/g’|grep srv_master_thread
  3. I updated restored sync_binlog from 0 back to 1 in the /etc/my.cnf.  to determine whether syncing the binlog makes a difference.   This is where I ran into a bit of discrepancy.  I have actualy been testing with two separate servers this whole time,  one is under load,  and one is not under load.  until now everything has been the same,  however changing the sync_binlog variable affects the performance differently on the two servers.
    1. On the server WITHOUT load,  the sync_binlog can be 1,  and the server still has an update time of 0.00
    2. On the server WITH load,  the sync_binlog must be 0 to keep the update time to 0.00
      This makes some  sense as a server that does not have much load on it,  may have more File System caching available or other resources available in order to be able to quickly respond to a single infrequent disk write.  This is likely a similar principle to the concept where we must run dd a couple of times in order to get past the FS cache.

 

Create correct user privileges / grant to use for mysqldump

Create correct user privileges / grant to use for mysqldump

To make sure you have the correct user permissions for the backup user that you will use for dumping mysql databases for backup

mysql>  GRANT SELECT, SHOW VIEW, RELOAD, REPLICATION CLIENT, EVENT, TRIGGER ON *.* TO 'backup'@'localhost';
mysql>  GRANT LOCK TABLES ON *.* TO 'backup'@'localhost';

Thanks to this post for the help

 

Call Now Button(208) 344-1115

SIGN UP TO
GET OUR 
FREE
 APP BLUEPRINT

Join our email list

and get your free whitepaper