Help needed - SELECT query optimization

Help needed - SELECT query optimization

am 26.10.2008 23:04:42 von Nitsan Bin-Nun

------=_Part_28072_8982556.1225058682620
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

Hi,
I have a mysql database with one table in it, currently it contains 36,807
records and it costs 6.8MB of space, every night another thousand~ of rows
are being added to the table.
The scheme (as taken from phpmyadmin) is like follows:

CREATE TABLE IF NOT EXISTS `search` (
`id` int(11) NOT NULL auto_increment,
`time` int(10) default NULL,
`unique` varchar(255) collate utf8_unicode_ci default NULL,
`site` varchar(50) collate utf8_unicode_ci default NULL,
`url` varchar(255) collate utf8_unicode_ci default NULL,
`filename` varchar(255) collate utf8_unicode_ci default NULL,
`snippet` varchar(255) collate utf8_unicode_ci default NULL,
`tags` varchar(255) collate utf8_unicode_ci default NULL,
`password` varchar(255) collate utf8_unicode_ci default NULL,
PRIMARY KEY (`id`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci
AUTO_INCREMENT=44306 ;


insert example:

INSERT INTO `search` VALUES (null, 1225041602, '110755357', 'rapidshare', '
http://rapidshare.com/files/110755357/Taxi4.By.HuNTeR.part1. rar',
'Taxi4.By.HuNTeR.part1.rar', 'Taxi 4 (2007) DVDRip XviD - KL', '',
'--'),(null, 1225041602, '110756297', 'rapidshare', '
http://rapidshare.com/files/110756297/Taxi4.By.HuNTeR.part2. rar',
'Taxi4.By.HuNTeR.part2.rar', 'Taxi 4 (2007) DVDRip XviD - KL', '',
'--'),(null, 1225041602, '110767009', 'rapidshare', '
http://rapidshare.com/files/110767009/Taxi4.By.HuNTeR.part3. rar',
'Taxi4.By.HuNTeR.part3.rar', 'Taxi 4 (2007) DVDRip XviD - KL', '',
'--'),(null, 1225041602, '110768015', 'rapidshare', '
http://rapidshare.com/files/110768015/Taxi4.By.HuNTeR.part4. rar',
'Taxi4.By.HuNTeR.part4.rar', 'Taxi 4 (2007) DVDRip XviD - KL', '',
'--'),(null, 1225041602, '110779013', 'rapidshare', '
http://rapidshare.com/files/110779013/Taxi4.By.HuNTeR.part5. rar',
'Taxi4.By.HuNTeR.part5.rar', 'Taxi 4 (2007) DVDRip XviD - KL', '',
'--'),(null, 1225041602, '110792243', 'rapidshare', '
http://rapidshare.com/files/110792243/Taxi4.By.HuNTeR.part6. rar',
'Taxi4.By.HuNTeR.part6.rar', 'Taxi 4 (2007) DVDRip XviD - KL', '',
'--'),(null, 1225041602, '110793721', 'rapidshare', '
http://rapidshare.com/files/110793721/Taxi4.By.HuNTeR.part7. rar',
'Taxi4.By.HuNTeR.part7.rar', 'Taxi 4 (2007) DVDRip XviD - KL', '', '--');

It should be a rapidshare links database (which updates with a PHP crawler I
wroted last Saturday).
I would like to change the snippet to title and add another column for the
snippet,
My main queries are INSERT's - a big insert (usually 10-100 links per
insert) in each hour.
My crawler is checking if the link it is about to add is already in the
database with a select query like this:
SELECT `id` FROM `tbl_name` WHERE `unique` = '%_UNIQUE_VARIABLE_%'

I'm definiatly not an database scheme expert, I'm looking to optimaize the
table for fast select queries (to check if file is already exists)
And for fast filename-based search (select * from `tbl_name` where
`filename` like '%_WHATEVER_%')

The unique colmn is used for other websites unique ID's as well so I don't
think I have any chance to crop it's length.
I probably can change the site colmn to numeric id or something (I'm just
about doing that) - but this is not the real problems, I have heard about
something called Indexes but I have no idea what this is about.

I hope I will find an answer, or a path where to look for in order to get
this table optimaized,

Thanks in Advance,
Nitsan

------=_Part_28072_8982556.1225058682620--

Re: Help needed - SELECT query optimization

am 26.10.2008 23:20:48 von dmagick

> It should be a rapidshare links database (which updates with a PHP crawler I
> wroted last Saturday).
> I would like to change the snippet to title and add another column for the
> snippet,
> My main queries are INSERT's - a big insert (usually 10-100 links per
> insert) in each hour.
> My crawler is checking if the link it is about to add is already in the
> database with a select query like this:
> SELECT `id` FROM `tbl_name` WHERE `unique` = '%_UNIQUE_VARIABLE_%'

create index unique_idx on tbl_name(unique);

mysql might work better with this index:

create index unique_idx on tbl_name(unique, id);

because in some cases it doesn't have to go back to the data file to get
the actual data, it can just read everything from the index.

> I'm definiatly not an database scheme expert, I'm looking to optimaize the
> table for fast select queries (to check if file is already exists)
> And for fast filename-based search (select * from `tbl_name` where
> `filename` like '%_WHATEVER_%')

like queries are harder to optimize.

If you put a wildcard at the front:

like '%...%';

the db can't use an index to find it, because you're saying the text can
be anywhere and for that type of search you're best off setting up
fulltext (http://dev.mysql.com/doc/refman/5.0/en/fulltext-search.html ).

If you don't put a % at the front:

like '...%';

the db use an index to find it (up until the wildcard)..

create index filename_idx on tbl_name(filename);


To understand indexing, check out my article:

http://www.designmagick.com/article/16/PostgreSQL/How-to-ind ex-a-database

(Yes it's on a postgres site but the ideas/understanding work for all
db's - and the commands should even work for mysql).

--
Postgresql & php tutorials
http://www.designmagick.com/


--
PHP Database Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php