Mammoth amounts of self joins

am 22.03.2006 17:56:16 von sks

I have a table that contains keywords (Varchars) each one mapped to a
product. so the database schema is just an id column, product column and
keyword column.

I want to select the products that contain x,y,z keywords. Now if this query
involves many keywords I end up with a massive amount of self joins on the
keywords table, is there a better way to do this than self joins? What would
work perhaps is multiple unions where I could discard non duplicated rows
but you can't seem to do that. Any ideas... ?

Re: Mammoth amounts of self joins

am 22.03.2006 18:41:51 von avidfan

sks wrote:

> I have a table that contains keywords (Varchars) each one mapped to a
> product. so the database schema is just an id column, product column and
> keyword column.

> I want to select the products that contain x,y,z keywords. Now if this query
> involves many keywords I end up with a massive amount of self joins on the
> keywords table, is there a better way to do this than self joins? What would
> work perhaps is multiple unions where I could discard non duplicated rows
> but you can't seem to do that. Any ideas... ?

Show us the queries you have come up with and why they don't work and
provide some sample data (does not have to be real data).

Is there more than one table involved?

What do you consider "many keywords"?

Why are you doing "self-joins"?

Do the keywords map to more than one product?

Is this for a school project or some job that you have?

Re: Mammoth amounts of self joins

am 24.03.2006 15:55:44 von sks

"noone" wrote in message
news:f287d760f97d97302a1ea56b308183f0$1@www.firstdbasource.c om...
> sks wrote:
>
>> I have a table that contains keywords (Varchars) each one mapped to a
>> product. so the database schema is just an id column, product column and
>> keyword column.
>
>> I want to select the products that contain x,y,z keywords. Now if this
>> query
>> involves many keywords I end up with a massive amount of self joins on
>> the
>> keywords table, is there a better way to do this than self joins? What
>> would
>> work perhaps is multiple unions where I could discard non duplicated rows
>> but you can't seem to do that. Any ideas... ?
>
> Show us the queries you have come up with and why they don't work and
> provide some sample data (does not have to be real data).
> Is there more than one table involved?
> What do you consider "many keywords"?
>
> Why are you doing "self-joins"?
> Do the keywords map to more than one product?
> Is this for a school project or some job that you have?

It's for a software package I write.

The keywords are used as meta data about an item. Each of the keywords is
for a particular attribute that is relevant for certain items only. This is
all definable by the user at runtime. Let's say we had a site selling DVDs,
the attributes in this case might be Director, Actor, and Genre. Each of
these attributes might have multiple values (or none).

Here is a simple schema to demonstrate

Items
--------------
id
name

Attributes
----------
id
name

Attributes_Values
----------
id
item
value

I might want to find all DVDs that had Clint Eastwood, or Tommy Lee Jones,
or Richard Burton in as Actor, and where Comedies, Action or Drama in Genre.

I could self join like this

select distinct i.* from item i join attributes_values av1 on i.id=av1.item
join attributes_values av2 on i.id=av2.item where (av1.value='Clint
Eastwood' or av1.value='Tommy Lee Jones' ......) and av2.value='Comedy' or
av2.value='Drama' ....) where i.status='Live' .... (other item related
clauses here).

This runs extremely slowly when I have more than a few attributes. 9 or 10
is quite common on some items and then you could be searching for 5 to 10
values in each attribute.

My faster solution (and the one I use ) is to select the item column from
the attribute_values table and union it for each different attribute I am
searching against, then group that result and select from it where count is
equal to the number of attributes I searched for and then join that against
the item table. It's fast but I think ugly.

I am wondering if there is an obvious solution I have missed.

Re: Mammoth amounts of self joins

am 24.03.2006 19:58:13 von Bill Karwin

"sks" wrote in message
news:GIGdnevEw4EAlbnZnZ2dnUVZ8t2dnZ2d@eclipse.net.uk...
> My faster solution (and the one I use ) is to select the item column from
> the attribute_values table and union it for each different attribute I am
> searching against, then group that result and select from it where count
> is equal to the number of attributes I searched for and then join that
> against the item table. It's fast but I think ugly.

This solution also scales much better than the self-join method. Keep in
mind that MySQL has a limit of 31 joined tables per query (or 63 if you use
a 64-bit operating system). It's likely that a 31-way join performs very
poorly.

The Entity-Attribute-Value data model (or EAV) is hard to use right, and it
violates a few principles of relational modeling and normalization.

See also my reply to today's thread with subject, "Joining data from 2
fields into the first one".

Regards,
Bill K.