Excluding records that don"t match condition

Excluding records that don"t match condition

am 24.09.2009 12:42:47 von James Fryer

I have a database of Titles (books, magazines, etc.) with a M:M relation
to Keywords.

I would like to be able to generate queries for the condition "Return
titles matching X with keywords NOT matching A".

This seems quite hard to do. Here is a minimal table structure:

CREATE TABLE Title
(
id int(10) unsigned NOT NULL,
title CHAR(2)
);
CREATE TABLE Keyword
(
id int(10) unsigned NOT NULL,
kw CHAR(1)
);
CREATE TABLE TitleKeyword
(
title_id int(10) unsigned NOT NULL,
keyword_id int(10) unsigned NOT NULL
);

# X1: A, B
# X2: B, C
# X3: C
# Y1: A, B
# Y2: B, C
# Y3: C
INSERT INTO Title (id, title) VALUES (1, 'X1'), (2, 'X2'), (3, 'X3'),
(4, 'Y1'), (5, 'Y2'), (6, 'Y3');
INSERT INTO Keyword (id, kw) VALUES (1, 'A'), (2, 'B'), (3, 'C');
INSERT INTO TitleKeyword VALUES (1, 1), (1, 2), (2, 2), (2, 3), (3, 3),
(4, 1), (4, 2), (5, 2), (5, 3), (6, 3);

Naively I tried this query:

SELECT DISTINCT
title
FROM
Title
JOIN
TitleKeyword ON Title.id=title_id
JOIN
Keyword ON Keyword.id=keyword_id
WHERE
title LIKE 'X%'
AND kw <> 'A'
;

but this includes X1 because it matches B as well as A. I only want X2,
X3 returned.

This works:

SELECT
title
FROM
Title
WHERE title LIKE 'X%'
AND Title.id NOT IN(
SELECT
title_id
FROM
TitleKeyword
JOIN
Keyword ON Keyword.id=keyword_id
WHERE
kw = 'A'
)
;

However, this uses subselects which I have always found slow, and there
may be many keywords (thousands), and I believe IN() is not recommended
for large lists.

So my question is, can this query be rewritten to use JOINs? It seems
hard to me because all the keywords need to be examined to eliminate the
title. On the other hand it must be a common requirement so there may be
something I have overlooked.

Many thanks in advance,

James
--
James Fryer / jim@invocrown.com / jim@cix.co.uk

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: Excluding records that don"t match condition

am 24.09.2009 17:35:57 von Mike Spreitzer

I'm not sure whether the following will meet your needs. Have you
considered

SELECT title FROM Title WHERE NOT EXISTS (SELECT * FROM Keyword,
TitleKeyword WHERE Keyword.kw='A' AND Keyword.id=TitleKeyword.keyword_id
AND TitleKeyword.title_id=Title.id)

Regards,
Mike Spreitzer
SMTP: mspreitz@us.ibm.com, Lotus Notes: Mike Spreitzer/Watson/IBM
Office phone: +1-914-784-6424 (IBM T/L 863-)
AOL Instant Messaging: M1k3Sprtzr



James Fryer
09/24/09 06:42 AM

To
mysql@lists.mysql.com
cc

Subject
Excluding records that don't match condition






I have a database of Titles (books, magazines, etc.) with a M:M relation
to Keywords.

I would like to be able to generate queries for the condition "Return
titles matching X with keywords NOT matching A".

This seems quite hard to do. Here is a minimal table structure:

CREATE TABLE Title
(
id int(10) unsigned NOT NULL,
title CHAR(2)
);
CREATE TABLE Keyword
(
id int(10) unsigned NOT NULL,
kw CHAR(1)
);
CREATE TABLE TitleKeyword
(
title_id int(10) unsigned NOT NULL,
keyword_id int(10) unsigned NOT NULL
);

# X1: A, B
# X2: B, C
# X3: C
# Y1: A, B
# Y2: B, C
# Y3: C
INSERT INTO Title (id, title) VALUES (1, 'X1'), (2, 'X2'), (3, 'X3'),
(4, 'Y1'), (5, 'Y2'), (6, 'Y3');
INSERT INTO Keyword (id, kw) VALUES (1, 'A'), (2, 'B'), (3, 'C');
INSERT INTO TitleKeyword VALUES (1, 1), (1, 2), (2, 2), (2, 3), (3, 3),
(4, 1), (4, 2), (5, 2), (5, 3), (6, 3);

Naively I tried this query:

SELECT DISTINCT
title
FROM
Title
JOIN
TitleKeyword ON Title.id=title_id
JOIN
Keyword ON Keyword.id=keyword_id
WHERE
title LIKE 'X%'
AND kw <> 'A'
;

but this includes X1 because it matches B as well as A. I only want X2,
X3 returned.

This works:

SELECT
title
FROM
Title
WHERE title LIKE 'X%'
AND Title.id NOT IN(
SELECT
title_id
FROM
TitleKeyword
JOIN
Keyword ON Keyword.id=keyword_id
WHERE
kw = 'A'
)
;

However, this uses subselects which I have always found slow, and there
may be many keywords (thousands), and I believe IN() is not recommended
for large lists.

So my question is, can this query be rewritten to use JOINs? It seems
hard to me because all the keywords need to be examined to eliminate the
title. On the other hand it must be a common requirement so there may be
something I have overlooked.

Many thanks in advance,

James
--
James Fryer / jim@invocrown.com / jim@cix.co.uk

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=mspreitz@us.ibm.com




--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org

Re: Excluding records that don"t match condition

am 25.09.2009 09:40:57 von James Fryer

Mike Spreitzer wrote:
> I'm not sure whether the following will meet your needs. Have you
> considered
> SELECT title FROM Title WHERE NOT EXISTS (SELECT * FROM Keyword,
> TitleKeyword WHERE Keyword.kw='A' AND Keyword.id=TitleKeyword.keyword_id
> AND TitleKeyword.title_id=Title.id)

I believe you are right, EXISTS is what I want. Thanks,

James
--
James Fryer / jim@invocrown.com / jim@cix.co.uk

--
MySQL General Mailing List
For list archives: http://lists.mysql.com/mysql
To unsubscribe: http://lists.mysql.com/mysql?unsub=gcdmg-mysql-2@m.gmane.org