Tuesday, January 29, 2008

MyISAM to InnoDB Conversion: Converting almost a billion records

So my challenge last week was to convert almost a billion records in 37 databases and 74 tables. Oh, and I wanted to do it only during the night (between 12:30AM and 6AM) and in the least number of nights.

This was not my first massive MyISAM to InnoDB migration, nonetheless, I still manged to learn a few things. We're at a stage where MyISAM just isn't the solution any more. Our plans are to go almost 100% InnoDB and this migration project completes yet another massive step towards that direction.

The first night was depressing. I applied every trick up my sleeve but only ended up getting 9.48% complete. Second night was only half depressing allowing me to complete ~ 33%. Third night was a charm and I got ~42.5% done. I would have finished the project that night but there was some storage issue and so I completed the last ~15% on the fourth day. Of course, optimizations and the method of conversion I selected really saved the day in the end.

Now that this has been accomplished, the next step will be to cover another 700-900 million records in a different cluster. Oh well, I will start doing that this weekend.

As far as performance is concerned, I couldn't be more happier. Load average on all servers in this cluster is well under 0.30 and each server is happily doing upto 1800 queries per second.

Note: Indexes were not dropped. Primary key did not consist of an integer. There was a composite primary key on all tables. The servers continued to handle production traffic on all databases on which migration was not being performed.

Update: What I did in a nutshell, for very large tables, was:

1. set innodb_flush_log_at_trx_commit=0 in my.cnf and restart
2. create new table tbl_innodb
3. set sort_buffer_size and read_rnd_buffer_size
SET SESSION sort_buffer_size=512*1024*1024; 
SET SESSION read_rnd_buffer_size=512*1024*1024;

4. then insert into tbl_innodb selecting from tbl_myisam and sorting by primary key
5. rename tbl_innodb to tbl

Of course, you need to have sufficient memory or adjust the buffers accordingly. But doing this ended up working great for me.

Update2: Regarding read_rnd_buffer_size, MySQL Manual says,
"If you are performing GROUP BY or ORDER BY operations on tables that are much larger than your available memory, you should increase the value of read_rnd_buffer_size to speed up the reading of rows following sorting operations."

Peter Zaitsev has also written about read_rnd_buffer_size.

Update3: My system (Solaris 10) had 10G of memory and read/writes were done to mounts residing on a SAN.

Update4: Best case scenario was 9600 records / second on average. Worst case was 500 records / second on average.


Sykosoft said...

Just curious if your load was lower or higher when using MyISAM? What was the reason behind the conversion, lock contention, for for some other reason?

Curious, what size were these tables?

Random info, we're using a 4gb fiber channel SAN (raid 6), with 32gb of RAM per database server, and 2xQuad Core Intel Xeon 5300 series CPU's.

Michael S. Moody
Sr. Systems Engineer
Global Systems Consulting
Web: http://www.GlobalSystemsConsulting.com

Todd said...

Hi Frank,

I'm curious how you managed this while processing queries. Doesn't the select * from myisam_tbl end up locking it for the duration of the insert?


Frank said...

Hi Michael,

Our load was much higher with MyISAM due to table locking issues which was the primary reason for conversion. DELETEs on MyISAM in a heavy production environment like ours are extremely expensive.

The total size of all tables was several hundred GB.

We're using RAID 5 (3PAR).

Frank said...

Hi Todd,

We shut down traffic to affected databases and then performed the migration. You may also lock tables and/or rename before and after migration,

John Dzilvelis said...

Interesting Blog Frank. I have a few questions.

Do you use one tablespace or file-per-table?

How do you manage io for tables in the database? I've struggled with this in innodb for large tables. I'm not sure if the raid configuration has anything to do with this. I'm using raid 10 and I noticed you use raid 6. Does that help innodb performance?

Finally, Do you have much of a problem with defragmenting the tablespace? I've run into this problem a bit with large tables with a high level of update/delete ops.

Great blog..its a good resource.

Andy said...

This is probably a silly question, but why do you need the order by here?:

4. then insert into tbl_innodb selecting from tbl_myisam and sorting by primary key

if I just select * from myisam into innodb everything looks ok, but I don't have an easy way to verify the whole table. my key is an int autonumber id, so I guess I am getting the order by for free?