A better select?

am 20.02.2007 17:55:49 von John Drako

This is for MySQL 5.0

I took over some php/mysql code and I'm trying to optimize it if
possible.

I have one select statement on an indexed varchar field that gets built
and used quite frequently. The field contains a series of keywords or
tags. The query is supposed to find the user's selected tags and
sometime find the rows that exclude certain tags.

The existing query is like so:

SELECT * FROM tableName WHERE tagsField LIKE '%tag1%' AND tagsField LIKE
'%tag2%' AND tagsField NOT LIKE '%tag3%' AND NOT LIKE '%tag4%';

the tags selected or excluded could reach as many as 30 making the query
long.

Is there a better way of doing this type of query?

TIA

John

Re: A better select?

am 20.02.2007 22:12:37 von Paul Lautman

Re: A better select?

am 20.02.2007 23:00:40 von John Drako

On Tue, 20 Feb 2007 16:12:37 -0500, Paul Lautman wrote
(in article <5416i5F1uq6qjU1@mid.individual.net>):

> John Drako wrote:
>> This is for MySQL 5.0
>>
>> I took over some php/mysql code and I'm trying to optimize it if
>> possible.
>>
>> I have one select statement on an indexed varchar field that gets
>> built and used quite frequently. The field contains a series of
>> keywords or tags. The query is supposed to find the user's selected
>> tags and sometime find the rows that exclude certain tags.
>>
>> The existing query is like so:
>>
>> SELECT * FROM tableName WHERE tagsField LIKE '%tag1%' AND tagsField
>> LIKE '%tag2%' AND tagsField NOT LIKE '%tag3%' AND NOT LIKE '%tag4%';
>>
>> the tags selected or excluded could reach as many as 30 making the
>> query long.
>>
>> Is there a better way of doing this type of query?
>>
>> TIA
>>
>> John
>
> First thing to say is that there is a better way of organising this
> type of data! You should not have lots of values in a single field.
> You should have a 1-to-many table where each tag occupies a separate
> row.

OK, understood. But, how would that enhance the search? How would you
write such a query?

The field containing these tags gets displayed on the site after
fetching the rows. So if each tag is in a row of its own, a function has
to be built to create one string out of all the rows that belong to a
certain article. Otherwise, we would need data replication to have a
field containing all the tags for display purposes and the other table
for search. And we would still end up with a long query.

I asked the question because in some programming languages you have fast
functions that can compare a string to an array of strings, so you
simply give it a main string and an array of strings and it give you
back whether any or all of the strings in the array can be found in the
main string. I was hoping that SQL has something similar that I couldn't
find on my own.

Re: A better select?

am 21.02.2007 00:00:59 von Paul Lautman

John Drako wrote:
> On Tue, 20 Feb 2007 16:12:37 -0500, Paul Lautman wrote
> (in article <5416i5F1uq6qjU1@mid.individual.net>):
>
>> John Drako wrote:
>>> This is for MySQL 5.0
>>>
>>> I took over some php/mysql code and I'm trying to optimize it if
>>> possible.
>>>
>>> I have one select statement on an indexed varchar field that gets
>>> built and used quite frequently. The field contains a series of
>>> keywords or tags. The query is supposed to find the user's selected
>>> tags and sometime find the rows that exclude certain tags.
>>>
>>> The existing query is like so:
>>>
>>> SELECT * FROM tableName WHERE tagsField LIKE '%tag1%' AND tagsField
>>> LIKE '%tag2%' AND tagsField NOT LIKE '%tag3%' AND NOT LIKE '%tag4%';
>>>
>>> the tags selected or excluded could reach as many as 30 making the
>>> query long.
>>>
>>> Is there a better way of doing this type of query?
>>>
>>> TIA
>>>
>>> John
>>
>> First thing to say is that there is a better way of organising this
>> type of data! You should not have lots of values in a single field.
>> You should have a 1-to-many table where each tag occupies a separate
>> row.
>
> OK, understood. But, how would that enhance the search?
I assume that the reason for all the LIKE '%something%' is that the tags are
all mixed up in there.
So you can't use an index on that field. For each of those LIKEs and for
every row, MySQL has to look through the whole row every time.

> How would you
> write such a query?
I would most likely build a query with a series of JOINs.

> The field containing these tags gets displayed on the site after
> fetching the rows. So if each tag is in a row of its own, a function
> has to be built to create one string out of all the rows that belong
> to a certain article.
Nope. If you need to put values from lots of lines together you can use
GROUP_CONCAT(). However in this case when you built the query, you could
just create the "string" using CONCAT_WS() as the values will end up on one
row (due to the JOINs)

> Otherwise, we would need data replication to
> have a field containing all the tags for display purposes and the
> other table for search.
Nope, see above
> And we would still end up with a long query.
The query will be long as you have a lot of individual criteria. No way of
avoiding that. However the performance of the query can still be greatly
enhanced.

> I asked the question because in some programming languages you have
> fast functions that can compare a string to an array of strings, so
> you simply give it a main string and an array of strings and it give
> you back whether any or all of the strings in the array can be found
> in the main string. I was hoping that SQL has something similar that
> I couldn't find on my own.

Re: A better select?

am 23.02.2007 23:25:16 von Christoph Burschka

John Drako wrote:
> This is for MySQL 5.0
>
> I took over some php/mysql code and I'm trying to optimize it if
> possible.
>
> I have one select statement on an indexed varchar field that gets built
> and used quite frequently. The field contains a series of keywords or
> tags. The query is supposed to find the user's selected tags and
> sometime find the rows that exclude certain tags.
>
> The existing query is like so:
>
> SELECT * FROM tableName WHERE tagsField LIKE '%tag1%' AND tagsField LIKE
> '%tag2%' AND tagsField NOT LIKE '%tag3%' AND NOT LIKE '%tag4%';
>
> the tags selected or excluded could reach as many as 30 making the query
> long.
>
> Is there a better way of doing this type of query?
>
> TIA
>
> John
>
As already suggested, you can greatly improve the performance by normalizing the
database a bit.

In this case, let's say you have a table like the following (I don't know the
actual other fields after all):

"articles" columns:
id int(8) primary key,
title varchar(32),
tagsField text.

What you should make from this are these two tables:

"articles" columns:
id int(8) primary key,
title varchar(32).

"tags" columns:
articleId int(8),
tag varchar(32),
primary key(articleId, tag)

In this structure, each *article* would be saved only once in articles (with its
own unique number), but there would be a separate row for each tag. Then, you
can do the above query like this:

> SELECT * FROM articles JOIN tags ON id=articleId WHERE tag IN
> ('tag1','tag2','tag3') GROUP BY id;

(Excluding certain tags is a bit more complex, but is also possible.)

--
Christoph Burschka
Math.-Techn. Assistent i.A.
-------------------------------------------------
RWTH Aachen
Rechen- und Kommunikationszentrum
Dienstgebäude Seffenter Weg 23
52074 Aachen
Tel: +49 (241) 80-20376
Fax: +49 (241) 80-29100
-------------------------------------------------
PGP: http://pgp.mit.edu:11371/pks/lookup?op=get&search=0xFF4BDDE8