Speeding up series of complex INSERTS
am 13.05.2005 17:27:26 von KEVIN ZEMBOWERThis is my first experience working with a large database. I'm trying to =
insert over 320,000 record of a bibliographic database of medical =
articles. If anyone would like to test my system, the program is available =
at http://www.jhuccp.org/temp/KZ/pop2sql.pl, the SQL statements to set up =
the database are at http://www.jhuccp.org/temp/KZ/popline.sql and a file =
consisting of a header record and 1000 data records is at http://www.jhuccp=
...org/temp/KZ/popline.dmp.gz and must be decompressed before use. The =
program is run by specifying the name of the data file to be processed on =
the command line, such as "./pop2sql.pl popline.dmp". The only unusual =
module used by the program is Text::xSV.
When testing this program with 1000 data records, I was processing about =
570 records per minute. This would have processed all 320,000 records in =
about 9 hours, which was acceptable to me. However, I started it running =
yesterday at about 15:40 and it's still running now, almost 20 hours =
later. Right now, this is the condition of the database:
Database popline running on localhost
Table Records Type Size=20
abstract 220,038 MyISAM 517.9 MB =20
abstractword 0 MyISAM 1.0 KB =20
abstractword_article 0 MyISAM 1.0 KB =20
article 243,817 MyISAM 66.5 MB =20
author 164,213 MyISAM 7.2 MB =20
author_article 418,227 MyISAM 3.6 MB =20
clientgroup 0 MyISAM 1.0 KB =20
community 0 MyISAM 1.0 KB =20
corpname 2,750 MyISAM 376.7 KB =20
corpname_article 20,477 MyISAM 181.0 KB =20
country 0 MyISAM 1.0 KB =20
findingstype 0 MyISAM 1.0 KB =20
findingstype_article 0 MyISAM 1.0 KB =20
journal 45,032 MyISAM 5.2 MB =20
journal_article 149,919 MyISAM 1.3 MB =20
keyword 2,109 MyISAM 73.8 KB =20
keyword_article 2,393,673 MyISAM 20.5 MB =20
languguage 3 MyISAM 4.1 KB =20
popreporttopic 0 MyISAM 1.0 KB =20
popreporttopic_article 0 MyISAM 1.0 KB =20
20 table(s) 3,660,258 -- 622.8 MB =20
As you can see, only 243,817 articles (records) have been inserted. It's =
currently inserting records at the rate of 104 records per minute. At this =
rate, it'll take another 13 hours to complete this run.
I read the article in the reference manual at 7.2.14 on the Speed of =
INSERT Statements. I only use INSERT and SELECT statements. I don't know =
which of these techniques I can apply to my program. I don't think I can =
insert multiple rows at a time; I need to process the input file one row =
at a time. Would INSERT DELAYED help here? Would it be worthwhile to =
rewrite my program completely to generate a series of input files that I =
could use LOAD DATA INFILE with? I'm guessing that the most probable =
speed-up might occur if I disable the indexes, load the data, then =
recreate the indexes with myisamchk (I can't use the DISABLE/ENABLE KEYS =
of MySQL 4.0 as I'm running 3.23.49). Would it be worthwhile to rewrite my =
program to use more than one thread at a time?
Any suggestions on what I can try? Any advice on diagnostics I can run to =
determine the bottleneck?
Thanks for all your help and suggestions on my problem.
-Kevin Zembower
-----
E. Kevin Zembower
Internet Systems Group manager
Johns Hopkins University
Bloomberg School of Public Health
Center for Communications Programs
111 Market Place, Suite 310
Baltimore, MD 21202
410-659-6139
--
MySQL Perl Mailing List
For list archives: http://lists.mysql.com/perl
To unsubscribe: http://lists.mysql.com/perl?unsub=3Dgcdmp-msql-mysql-modules @m.gmane.org