Debugging Multiple MySQL query in a stored procedure – unusually long runtime

I recently was asked to look into an issue where a client had recently moved a mysql database from one server to another server,   and the run time of a stored procedure (which internally ran 2000+ queries ) was running 90 seconds,  where it had previously only taken 0.6 seconds.

What follows is an ongoing journal of the debugging, trouble shooting and discovery as we walked through the process of addressing issues.  Along the way we had lots of small wins,  and at the end we had one major win which basically allowed us to undo all of the other changes.

The result is a documented discussion of many different methods to trouble shoot issues,  as well as some solutions to different issues.   I find these type of blogs to be useful to me in the future as they help  me to shortcut addressing different issues where I don’t have to come up similar troubleshooting steps again.

Here is a list of commands I ran that help me evaluate the overall server

  • top – to determine whether there was significant load and adequate memory free (in this case there was a load average hovering between .2 and 1 and there was adequate memory)
  • mysqladmin variables|grep datadir – the location of the MySQL datadirectory
  • mysqladmin variables|grep storage – to find out what storage engine is being used (innodb or myisam)
  • mysqladmin variables|grep binlog – to find out what synchronization or replication the server might be doing
  • df -h – to determine what disks are mounted and used to store you mysql instance
  • cat /proc/mdstat – to find out if there are any software raid instances
  • du -h /var/lib/mysql – to show the total size of the databases
  • ls -lt /var/lib/mysql/*/* – to find out the size of tables and which tables and logs are update most often (in my case there were no large tables being updates during the last few days)
  • echo ‘desc tblname’|mysql databasename – to ensure there is an index on the column I am using to update with
  • dd if=/dev/zero of=/tmp/test bs=1048576 count=2048 – to check the write speed (I have had to run this command multiple times before in order to get accurate disk write speeds because of File System caching)
  • dd if=test of=/dev/null bs=1048576 – to check the read speed
  • iostat -x 1  – to watch the disc write speed and usage (run this in the other window to watch dd run) -x gives more table based output
  • echo “show create procedure procedurename” |mysql databasename – to evaluate any stored procedures that are causing the issues
  • wc -l /var/lib/mysql/mysql-bin.* – find out how big the bin logs are (if any) to help determine what kind of mysql overhead is required to keep the bin logs up to date.

I created two connections to the server so I could evaluate what was happening as I executed the 180 second run time of the query.

  • Based on the output of the dd commands the write speed on the /var/lib/mysql folder is about double (60+ MB/s) than thta of the /home dir.  They are separate partitions however we are not doing any software raid,  so I assume some sort of a Hardware Raid with striping.

I additionally started to dig into the stored procedure that started having the issue,  adding debugging statements, commenting out capabilities to evaluate the time each takes.

  • As I ran the stored procedure,  I also ran wc -l /var/lib/mysql/mysql-bin.000current and found that the binlog was growing rapidly during the running of the stored procedure.   It turns out that the binlog file was only growing by less than 100 K during the time that 100s of queries were running,
  • Next I evaluated the contents of the binlogs and it turns out that it is actually adding 1000s of lines to the bin logs,  multiple every time that one of the ‘update’ queries inside of the stored procedure ran.

Now I recognize that somehow,  inside of the stored procedure,  the system is syncing the results of an update to the disk before it is done.

  • The server it am running on is 5.5.xx so there are not any advanced binlog write delay features like in 5.7 so I had to come up with another solution.

By using a Start Transaction and Commit,  I could avoid writing / committing the transaction to disk until the very end of stored procedure,  so I added this to the begging and end.

This was IT!  by using transactions I avoided having to evaluate and reconfigure the server (which is running a master slave configuration for backup) and I was able to reduce SP run times from

180 seconds to sub seconds.

One important thing to note about the amount of time that it takes to execute – Each update statement that it runs (more than 2000) do not actually end up making any changes to the database.  Perhaps this is much easier for MySQL to write to the binlog after the transaction is complete,  because the server has the ability to say that the NET CHANGE is …. nothing.


Digging Deeper

It turns out that even though I was able to find a way to make the multiple sql statements go faster on the new server.  We still have other uses of the same SQL Statements which require a faster run time.  (we are talking about an execution time of a single query which takes .15 seconds on the new server and .01 seconds on the old server.   Some additional commands that I ran (on both the old and new servers) in order to figure out differences and why it may be running a bit slower on the new server.

  • tune2fs -l /dev/mapper/DEVICE |grep -i block
  • mysql.databasename> stop slave; – tried this temporarily to determine whether the running replication is causing a delay – this did not stop the writes to the bin log.
  • mysqladmin variables|grep innodb_buffer_pool_size – confirms that the buffer is big enough to fit the entire set of tables into the server memory
  • mysql.databasename>  SET sql_log_bin = 0; – this stops the current connection from logging to the bin log – this can help to evaluate the amount of time that it takes to run the update event when the bin file is not being updated
  • mysql.databasename>  SET sql_log_bin = 1; — reset this to 1 so that future updates are bin logged.

I noticed with this testing that the sql_log_bin = 0  reduced the update time by 50%!   I have two possible assumptions from this,  which may note be mutually exclusive

  • That the disk seek and write speed is slower on the new server to the point that the additional work that has to be done by writing to the bin file is slowing the server down.
  • That the server is simply computationally slower and CPU work is simply slower,  so this additional work takes longer.

Related to the sql_log_bin is a varialbe I found which seems to make a difference (from a run time of 0.15 second to 0.10) is sync_binlog.  I updated this to set it to 0 and this apparently makes it so that MySQL does not have to wait until the binlog is synced to the file system before it returns.

Some notes about what I feel comfortable ruling out:

  • There is adequate memory,  and memory is not being swapped around by this very mild update.  Infact,  I can run the exact same update statement a dozen times,  to where there is actually NO change happening to the database.
  • When an update is made to the database,  no change is being immediately pushed to a slave.  the only change that happens is a write to the binlog file

So since no change is happening to the database,  i wonder,  what is causing the overhead?  Why would turning off the  sql_log_bin (0) help anything if we are not actually logging a change?

I decided to look into the binlog to see what was being tracked,  even though there is no real need to do anything on the slave server.

  • mysqlbinlog mysql-bin.0000CURRENT|grep MYTABLENAME

The result showed me a plethora of updates to the same table , even though the change did not make a difference to the underlying database.  With more research I found that binlog_format accepts either ‘STATEMENT’ or ‘ROW’.   It seems that if you use ‘ROW’  the actual database changes are somehow appended to the log file,  instead of the statements which generate the change.

  • mysql> SET binlog_format = ‘ROW’;
  • mysql> SET binlog_format = ‘STATEMENT’;

I immediately noticed a change in the amount of time it took to run the statements.  However the time it took is smaller ONLY when there is NOT an update to the database.

To dig deeper here and find out which of the two methods is better, we will need to do a more indepth analysis of how many of the updates actually manipulate the database,   and how many updates do not.   At this point,    I am not yet comfortable changing the binlog method from statement to row permanently because I am not comfortable that the log files will be properly replicated.   I would want to be able to have a more robust test setup where I am not only evaluating a single statement,  but rather a group of them so that I could determine what method of updating is more optimal.

Version Differences

As we look into the difference between the update statement on the new server and the old server,  we can quickly tell that there is an issue which how quickly things are writing to the disk,  when we remove some of the features which write to the disk,  we quickly find out that the queries operate more quickly.  To break this down to a something we can verify,  we stopped all other services which might write to the disk and we ran on a second screen

  • iostat 1 – this shows the statistics of how many writes are happening within one second

We found that there were no writes until we ran an update statement

  • update TABLENAME set column=’xxxx’ where id = 10;

This updates a single rows,  and we were able to see the number of bytes that it wrote to the disk.  We could run the query multiple times and we could see it write to the disk multiple times.  However,  when we run the same test on the original server,  it would write to the disk the first time the update statement ran,  but not the second, or subsequent time.   Logically this makes sense,  the row / column does not need to be updated the second time because the column already = ‘xxxx’ for id = 10.

With this in mind,  we have a  test that I came up with we can do to confirm that our different servers behave differently to the same query.  Since we are running innodb we can check to see some stats about how many times innodb has written something to the disk.  and we can check this to see whether it increments as we make updates

Old Server – MySQL 5.0

  • echo “show engine innodb status”|mysql databasename|sed -e’s/\\n/\n/g’|grep ‘OS file’
    29744505 OS file reads, 10036031 OS file writes, 6213811 OS fsyncs
  • [root@auth1 ~]# echo ” update TABLENAME set column=’xxxx‘ where id = 10;”|mysql databasename
  • [root@auth1 ~]# echo “show engine innodb status”|mysql databasename|sed -e’s/\\n/\n/g’|grep ‘OS file’
    29744505 OS file reads, 10036033 OS file writes, 6213813 OS fsyncs
  • [root@auth1 ~]# echo ” update TABLENAME set column=’xxxx‘ where id = 10;”|mysql databasename
  • [root@auth1 ~]# echo “show engine innodb status”|mysql databasename|sed -e’s/\\n/\n/g’|grep ‘OS file’
    29744505 OS file reads, 10036033 OS file writes, 6213813 OS fsyncs
  • [root@auth1 ~]# echo ” update TABLENAME set column=’1234‘ where id = 10;”|mysql databasename
  • [root@auth1 ~]# echo “show engine innodb status”|mysql databasename|sed -e’s/\\n/\n/g’|grep ‘OS file’
    29744505 OS file reads, 10036035 OS file writes, 6213815 OS fsyncs

Note that the first time that value is set the number of writes to the OS increments,  but the SECOND time it does not,  because there is no actual update to the data

However,  If I run the exact same set of updates on the new server,  the number of file writes increments EACH time that an update statement is run,  regardless of whether the update statement represents a change in the data

New Server – MySQL 5.5

  • echo “show engine innodb status”|mysql databasename|sed -e’s/\\n/\n/g’|grep ‘OS file’
    28371 OS file reads, 27188 OS file writes, 9678 OS fsyncs
  • [root@auth1 ~]# echo ” update TABLENAME set column=’xxxx‘ where id = 10;”|mysql databasename
  • [root@auth1 ~]# echo “show engine innodb status”|mysql databasename|sed -e’s/\\n/\n/g’|grep ‘OS file’
    28371 OS file reads, 27190 OS file writes, 9680 OS fsyncs
  • [root@auth1 ~]# echo ” update TABLENAME set column=’xxxx‘ where id = 10;”|mysql databasename
  • [root@auth1 ~]# echo “show engine innodb status”|mysql databasename|sed -e’s/\\n/\n/g’|grep ‘OS file’
    28371 OS file reads, 27192 OS file writes, 9682 OS fsyncs
  • [root@auth1 ~]# echo ” update TABLENAME set column=’1234‘ where id = 10;”|mysql databasename
  • [root@auth1 ~]# echo “show engine innodb status”|mysql databasename|sed -e’s/\\n/\n/g’|grep ‘OS file’
    28371 OS file reads, 27194 OS file writes, 9684 OS fsyncs

Basically this means that unless we can find out why the update is writing each time,  we are not goig to be able to reduce the query time on the new server.  It seems like this issue is either

  1. Built into the new version innodb engine
  2. Built into the new version of MySQL
  3. Either way,  it could be an option configurable via my.cnf

As an aside this means that we could potentially write sql which DOES NOT select a row if it does not need to update it.Forexample,  note that the following SQL statement has somewhat of a redundancy in it,  but it results in NOT touching a row in the way that we are seeing the row be touched in the New Server

  • update TABLENAME set column=’xxxx‘ where id = 10 and column <> ‘xxxx’

I started looking into a couple of innodb variables to try to find some way of addressing this.  This article seemed to help and led me to the ultimate solution – 8.5.8 Optimizing InnoDB Disk I/O

The article suggested innodb_flush_method parameter to O_DSYNC.   I updated this and immediately my queries that were running at 50 MS went down to 20 MS,  consistanly.   In thinking about this,  we potentially have a small possibility of data loss between the time that the innodb receives the update and when it is flushed to disk.   In our case with the type of data we are tracking,  that is acceptable.   Elsewhere on the page,  I found another variable which seemed to go one more step. innodb_flush_log_at_trx_commit.  This setting intuitively explains that the log must be flushed to the drive each time that the transaction is commited.   The option did not even exist in 5.0 so I am unsure of what the default behavior was,  however when setting innodb_flush_log_at_trx_commit to 0,   I was able to get the query time down to 0.00.

Basically this means that,  after an update query is run, innodb is allowed to return immediately before the change is flushed and stored to disk.    This addresses our concerns completely.

Run this command to see mysql variables that control how things are flushed.

  • mysqladmin variables|grep flush

So drastic was the change I made using the innodb_flush_log_at_trx_commit variable set to 0,  I decided to undo some of the other testing I had done,  to see how performance was without them.

  1. When I restore innodb_flush_method to default (NULL),  the time to update stayed at 0.00
  2. The OS file writes DO NOT always increment immediately,  however they will increment after a second or two.  I could confirm this by running a statement to look at the number of srv_master_thread log flush and writes: in the innodb status, immediately before and immediately after an update statement and they would be the same,  however if I wait a couple of seconds,  they would increment.
    1. echo “show engine innodb status”|mysql radius|sed -e’s/\\n/\n/g’|grep srv_master_thread
  3. I updated restored sync_binlog from 0 back to 1 in the /etc/my.cnf.  to determine whether syncing the binlog makes a difference.   This is where I ran into a bit of discrepancy.  I have actualy been testing with two separate servers this whole time,  one is under load,  and one is not under load.  until now everything has been the same,  however changing the sync_binlog variable affects the performance differently on the two servers.
    1. On the server WITHOUT load,  the sync_binlog can be 1,  and the server still has an update time of 0.00
    2. On the server WITH load,  the sync_binlog must be 0 to keep the update time to 0.00
      This makes some  sense as a server that does not have much load on it,  may have more File System caching available or other resources available in order to be able to quickly respond to a single infrequent disk write.  This is likely a similar principle to the concept where we must run dd a couple of times in order to get past the FS cache.