Re: Left join returns records it shouldn"t

Re: Left join returns records it shouldn"t

am 24.01.2003 13:56:01 von Alexander Keremidarski

Hello,


>
> "Legally, you could take all the conditions out of the WHERE clause and put
> them in the ON clause"
>
> In the MySQL manual mentioned about
>
> "You should generally not have any conditions in the ON part that are used to
> restrict which rows you have in the result set (there are exceptions to this
> rule)."
>
> but there is no any mention that I can't do it ..
>
> So, it's bug or MySQL feature?
>
> And what are the "exceptions to this rule"?
>
>
>


>>Description:
>
>
> The following silly query (silly because it should return no
> records since it is impossible for a.type to have two different
> values simultaneously), returns two records when it should
> return none with the enclosed test data.
>
> select a.*, b.* from dns_rec a left join dns_rec b
> on (a.zone = b.zone and
> a.rightside = b.rightside and
> a.type = 'A' and b.type = 'A' and
> a.leftside = '' and b.leftside != '')
> where b.zone is null and a.type = 'MX'
> ;
>
>
>>How-To-Repeat:
>
>
> Restore following dump, containing 19 records. Run above query.
> If you drop the index right_idx, you get the correct result, but
> this makes the REAL intended query (the above query without the
> "and a.type = 'MX'" part) run much MUCH slower than is tolerable
> (I've never had time to let it finish) for the full database
> (about 200,000 records). You get the same wrong answer if
> right_idx is an index on rightside only.
>
> This data is from a DNS database. The fields zone and rightside
> have been replaced with a hash, but the problem is still
> reproducable with the hash. The intent of the query is to
> find all the top-level (name is the same as name of the zone,
> e.g. foobar.com) A records which do not have a non-top-level
> A record pointing at the same place (e.g. www.foobar.com).
> I get back almost all the records.
>
> I have tried checking indexes and dropping the table and reloading.
> Corrupted tables or indexes do not seem to be an issue.

Indexes are not related in this case.
Here is minimalistic example:

CREATE TABLE `lj` (
`a` int(11) default NULL,
`b` int(11) default NULL,
`c` int(11) default NULL
) TYPE=MyISAM

insert into lj values(1, 2, 3);
insert into lj values(3, 1, 2);


mysql> select * from lj as l1 left join lj as l2 on (l1.a = l2.a and l1.b=l2.b);
+------+------+------+------+------+------+
| a | b | c | a | b | c |
+------+------+------+------+------+------+
| 1 | 2 | 3 | 1 | 2 | 3 |
| 3 | 1 | 2 | 3 | 1 | 2 |
+------+------+------+------+------+------+

mysql> select * from lj as l1 left join lj as l2 on (l1.a = l2.a and l1.b=l2.b and
l1.c=2);

+------+------+------+------+------+------+
| a | b | c | a | b | c |
+------+------+------+------+------+------+
| 1 | 2 | 3 | NULL | NULL | NULL |
| 3 | 1 | 2 | 3 | 1 | 2 |
+------+------+------+------+------+------+


Take close look at this result. It is correct as by definition Left Join means:

"Include in result set every row from Left table - if there is corresponding row
in Right table matching join condition use this row, if not fill values for Right
table with NULLs)"

This explain why:

mysql> select * from lj as l1 left join lj as l2 on (l1.a = l2.a and l1.b=l2.b and
l1.c=2) where l1.c=3;

+------+------+------+------+------+------+
| a | b | c | a | b | c |
+------+------+------+------+------+------+
| 1 | 2 | 3 | NULL | NULL | NULL |
+------+------+------+------+------+------+


In this case WHERE clause is applied to above result and only rows matching WHERE
clause are left into that result.

i.e. WHERE is used as a filter for rows while ON() is used for "join decision" -
which rows are to be joined or not.


I agreee MySQL manual does not cover this case very well, but it is not a bug.



Back to your original data and query.
You have classical problem with Tree structure (rcursive by nature) represented in
Relational DB table - Unordered Set of Tuples.

This is known for decades as challenging problem even at theoretical level.

> reproducable with the hash. The intent of the query is to
> find all the top-level (name is the same as name of the zone,
> e.g. foobar.com) A records which do not have a non-top-level
> A record pointing at the same place (e.g. www.foobar.com).


Probably you can achieve this with joining table one more time, but this is
shooting in a dark quess :)


Best regards

--
MySQL Conference and Expo 2003 http://www.mysql.com/events/uc2003/index.html
For technical support contracts, visit https://order.mysql.com/?ref=msal
__ ___ ___ ____ __
/ |/ /_ __/ __/ __ \/ / Mr. Alexander Keremidarski
/ /|_/ / // /\ \/ /_/ / /__ MySQL AB, Full-Time Developer
/_/ /_/\_, /___/\___\_\___/ Sofia, Bulgaria
<___/ www.mysql.com




------------------------------------------------------------ ---------
Before posting, please check:
http://www.mysql.com/manual.php (the manual)
http://lists.mysql.com/ (the list archive)

To request this thread, e-mail bugs-thread13575@lists.mysql.com
To unsubscribe, e-mail