Mailing List

Enter your Email


Powered by FeedBlitz

RSS Feed


By TwitterButtons.com

Links

Blogshares Links

Beginner's Guide to BlogShares
A guide about the BlogShares fantasy blog stock market.
Scared Bunny
BlogShares Price Tracker
This program that archives information about the BlogShares fantasy stock market. You can view graphs of any industry, and analyze your portfolio.

SproutWorks Projects

Digg Archive
A new experimental Digg page.
AJAX Pixel Editor
A Collaborative pixel editor currently in development.
Web promotion links
These tools help you get visitors on your website.
SproutPics
My photography Site
SproutZoo
My zoo photographs
Tag Cloud
A summary of tagged articles.
Found Photos
An automated page that thumbnails photos from another site.
SproutSearch
I designed this blog indexing tool, and it has accumulated over 6 million blogs so far.
Products
Some of the programs I've written.
RSS Feeds
RSS Feeds from the SproutWorks Forums
SproutTree Demo
A demo of a tree-drawing PHP script.
My Gallery

SproutWorks Chat
A chat room I programmed, most likely empty.
Link Exchange - Link Directory - Web Hosting

Sign In

Username:
Password:
Remember Me

sprout man
Forums/

sproutworks
October 16th, 2006 3:18 AM PST
I have noticed a lot of queries of SproutSearch's main database table are getting slow as SproutSearch passes 8 million indexed blogs. I finally decided to do something about it after trying to add alter this table. I attempted to add a column that keeps track of the date and time of each blog's most recent post. The alter table command ran for at least 8 hours, and then MySQL either crashed or the admins killed my process. I attempted this a second time without making a new index, which also failed.

I figured I would just create a new table with the extra column and write a program to slowly copy everything over. The first version of this PHP program queried 10,000 rows of data from the old table and inserted them one by one into the new table. I set up a cron job to run this every 10 minutes. Once the new table started getting big, the cron jobs were overlapping, some records were not copied, and copy processes started backing up.

It dawned on me that I'd better learn something about MySQL optimization. I read some online articles and decided to try using mysqli_multi_query to copy the records. That would reduce the network overhead. The program ran several times faster but I wanted to look into other methods. I tried using prepared statements, which wasn't much better.

I found this excellent article (http://www.informit.com/articles/article.asp?p=377652&seqNum=4&rl=1) which said if I use the insert format like:

insert into table (column1, column2) values(val1, val2), (val1, val2)...

MySQL wouldn't have to flush the index after every insert. I made my program create a giant insert statement in this format. I tried it out and it only took a few seconds when the new table was empty. I modified the program to run 10 batches of 10,000 records, which would take a few minutes. This program has been running for a few days, and all my data is finally in the new table.

I am still having problems with the table being locked during lengthy select statements. It causes certain pages to hang for a long time. I am now copying all the data from a MYISAM table to a INNODB table because it has row level locking.