I have noticed a lot of queries of SproutSearch's main database table are getting slow as SproutSearch passes 8 million indexed blogs. I finally decided to do something about it after trying to add alter this table. I attempted to add a column that keeps track of the date and time of each blog's most recent post. The alter table command ran for at least 8 hours, and then MySQL either crashed or the admins killed my process. I attempted this a second time without making a new index, which also failed.
I figured I would just create a new table with the extra column and write a program to slowly copy everything over. The first version of this PHP program queried 10,000 rows of data from the old table and inserted them one by one into the new table. I set up a cron job to run this every 10 minutes. Once the new table started getting big, the cron jobs were overlapping, some records were not copied, and copy processes started backing up.
It dawned on me that I'd better learn something about MySQL optimization. I read some online articles and decided to try using mysqli_multi_query to copy the records. That would reduce the network overhead. The program ran several times faster but I wanted to look into other methods. I tried using prepared statements, which wasn't much better.
I found this excellent article (
http://www.informit.com/articles/article.asp?p=377652&seqNum=4&rl=1) which said if I use the insert format like:
insert into table (column1, column2) values(val1, val2), (val1, val2)...
MySQL wouldn't have to flush the index after every insert. I made my program create a giant insert statement in this format. I tried it out and it only took a few seconds when the new table was empty. I modified the program to run 10 batches of 10,000 records, which would take a few minutes. This program has been running for a few days, and all my data is finally in the new table.
I am still having problems with the table being locked during lengthy select statements. It causes certain pages to hang for a long time. I am now copying all the data from a MYISAM table to a INNODB table because it has row level locking.