RE: MySQL Bug: Outer Joins producing incorrect result on a single

RE: MySQL Bug: Outer Joins producing incorrect result on a single

am 01.10.2002 18:06:03 von Jean-Pierre Pelletier

Hi,

Here is how I think it should work.
- Joins should be processed from left to right (in the absence of nested
joins).

- Each join is for exactly 2 table expressions.

- The table expression on the left side of a join is the result of all
preceding joins
or the first table in the from clause if there are no preceding joins.

- The table expression on the right side of a join is the single table named
unless
nested joins are used.

- I believed than to be consistent with this, in the on clause of a join,
forward references to column of tables that haven't yet been joined should
be disallowed.


These rules are for SQL-92 inner/outer join, I don't use much old style
joins with commas.
I am not sure how the joins should be ordered when old-style joins and
outer-joins are used
as in your last example.


I haven't found much documentation about the join processing
so what I know come from experience with Ms SQL Server
and more recently DB2. I have used Oracle until 8i but it didn't have
SQL-92 compliant outer joins before 9i.


Here is the only reference I found about join processing
but interestingly it's about multi-table joins and
it's published by Oracle which interpretation of the standard
shouldn't be taken lightly.

Mutable Joins are those where more than two tables are joined.
The SQL: 1999 standard assumes the tables are joined from the left to the
right,
with the join conditions only being able to reference columns relating to
the current join and any previous joins to the left
http://www.oracle-base.com/Articles/9i/ANSIISOSQLSupport.asp


The above rules applied to the 2 cases I submitted would give
what I called the right answers.


You said:
"you would get a lot of problems using LEFT join in
combination with many tables where you only want to guard against
missing rows in one the tables (which I believe is the most common way
to use LEFT join)."

It's possible to apply the rules I described without problems
that's what MS SQL Server and DB2 do and Oracle has published
their interpretation of the SQL-1999 rules to be along the same lines.

How-To-Repeat:

FIRST CASE:

CREATE TABLE Object (Object_ID INTEGER NOT NULL, Object_Table VARCHAR(6) NOT
NULL);
CREATE TABLE NMID (Appliance_ID INTEGER NOT NULL, NMID_NMID INTEGER,
Object_ID INTEGER NOT NULL);
CREATE TABLE Health (Appliance_ID INTEGER NOT NULL, Health_NMID INTEGER NOT
NULL, Health_DeviceOrPort VARCHAR(6) NOT NULL);

INSERT INTO Object(Object_ID, Object_Table) VALUES (101,'Device');
INSERT INTO Object(Object_ID, Object_Table) VALUES (201,'Device');
INSERT INTO Object(Object_ID, Object_Table) VALUES (202,'Port');
INSERT INTO Object(Object_ID, Object_Table) VALUES (302,'Port');

INSERT INTO NMID (Appliance_ID, NMID_NMID, Object_ID) VALUES (0, 1, 101);
INSERT INTO NMID (Appliance_ID, NMID_NMID, Object_ID) VALUES (0, 2, 201);
INSERT INTO NMID (Appliance_ID, NMID_NMID, Object_ID) VALUES (0, 2, 202);
INSERT INTO NMID (Appliance_ID, NMID_NMID, Object_ID) VALUES (0, 3, 302);

INSERT INTO Health (Appliance_ID, Health_NMID, Health_DeviceOrPort) VALUES
(0, 1, 'Device');
INSERT INTO Health (Appliance_ID, Health_NMID, Health_DeviceOrPort) VALUES
(0, 2, 'Device');
INSERT INTO Health (Appliance_ID, Health_NMID, Health_DeviceOrPort) VALUES
(0, 3, 'Device');
INSERT INTO Health (Appliance_ID, Health_NMID, Health_DeviceOrPort) VALUES
(0, 4, 'Device');

===> THE RIGHT JOIN REFERENCES ALL THREE TABLES, it can't be processed first

SELECT DISTINCT
H.Appliance_ID, H.Health_NMID, H.Health_DeviceOrPort
FROM
Object O

INNER JOIN NMID N
ON O.Object_ID = N.Object_ID

RIGHT OUTER JOIN Health H
ON H.Appliance_ID = N.Appliance_ID
AND H.Health_NMID = N.NMID_NMID
AND H.Health_DeviceOrPOrt = O.Object_Table
WHERE
H.Appliance_ID = 0
AND O.Object_ID IS NULL;

The following 2 rows should be returned

Appliance_ID Health_NMID Health_DeviceOrPort
------------ ----------- -------------------
0 3 Device
0 4 Device


On MYSQL, This incorrectly returns no rows.


SECOND CASE)


create table T1 (ID integer);
create table T2 (ID integer);
create table T3 (ID integer);

insert into T1 values (1);
insert into T1 values (2);
insert into T1 values (3);

insert into T2 values (2);
insert into T2 values (3);
insert into T2 values (4);

insert into T3 values (3);
insert into T3 values (4);
insert into T3 values (5);

SELECT
T1.ID AS T1,
T2.ID AS T2,
T3.ID AS T3
FROM
T2

LEFT OUTER JOIN T3
ON T2.ID = T3.ID

RIGHT OUTER JOIN T1
ON T1.ID = T2.ID
ORDER BY
T1.ID,
T2.ID,
T3.ID;

The following 3 rows should be returned

+------+------+------+
| T1 | T2 | T3 |
+------+------+------+
| 1 | NULL | NULL |
| 2 | 2 | NULL |
| 3 | 3 | 3 |
+------+------+------+

MySQL incorrectly returns these 9 rows.

+------+------+------+
| T1 | T2 | T3 |
+------+------+------+
| 1 | 2 | NULL |
| 1 | 3 | NULL |
| 1 | 4 | NULL |
| 2 | 2 | NULL |
| 2 | 3 | NULL |
| 2 | 4 | NULL |
| 3 | 2 | NULL |
| 3 | 3 | 3 |
| 3 | 4 | NULL |
+------+------+------+


Regards,
Jean-Pierre Pelletier

-----Original Message-----
From: Michael Widenius [mailto:monty@mysql.com]
Sent: Tuesday, October 01, 2002 2:04 AM
To: Jean-Pierre Pelletier
Cc: bugs@lists.mysql.com; Mark Matthews; Robert Berman; Eric Neron;
Matthew Darwin; Allan Mertner
Subject: RE: MySQL Bug: Outer Joins producing incorrect result on a
single join referencing multiple tables



Hi!

>>>>> "Jean-Pierre" == Jean-Pierre Pelletier
writes:

JP> How-To-Repeat:

JP> FIRST CASE:

JP> CREATE TABLE Object (Object_ID INTEGER NOT NULL, Object_Table VARCHAR(6)
NOT
JP> NULL);



JP> ===> THE RIGHT JOIN REFERENCES ALL THREE TABLES, it can't be processed
first

JP> SELECT DISTINCT
JP> H.Appliance_ID, H.Health_NMID, H.Health_DeviceOrPort
JP> FROM
JP> Object O

JP> INNER JOIN NMID N
JP> ON O.Object_ID = N.Object_ID

JP> RIGHT OUTER JOIN Health H
JP> ON H.Appliance_ID = N.Appliance_ID
JP> AND H.Health_NMID = N.NMID_NMID
JP> AND H.Health_DeviceOrPOrt = O.Object_Table
JP> WHERE
JP> H.Appliance_ID = 0
JP> AND O.Object_ID IS NULL;

JP> The following 2 rows should be returned

JP> Appliance_ID Health_NMID Health_DeviceOrPort
JP> ------------ ----------- -------------------
JP> 0 3 Device
JP> 0 4 Device


JP> On MYSQL, This incorrectly returns no rows.

If you run EXPLAIN on the above query you get:

+-----------------------------------------------------+
| Comment |
+-----------------------------------------------------+
| Impossible WHERE noticed after reading const tables |
+-----------------------------------------------------+
1 row in set (0.00 sec)

This is because MySQL notices that O.Object_ID can never be NULL and
thus the query is impossible.

The logic MySQL uses with RIGHT JOIN is that if the RIGHT table (in
this case table H) doesn't have a any rows that satisfies the ON
clause then we generate a NULL-row for this table. MySQL doesn't
generate any NULL-rows for the O table as this is not part of the
RIGHT join.

I don't know if this is the correct way to do this; I have always
thought if you don't have braces RIGHT JOIN and LEFT JOIN will only
generate a NULL-row for the table that is immediate before or after,
not to a set of previous tables.

I did a new check in 'SQL-99 Complete, Really' but could not find
anything that proves or disapproves this logic :(
Do you have any reference that shows how this should work when there
is more than two tables involved ?



JP> create table T1 (ID integer);
JP> create table T2 (ID integer);
JP> create table T3 (ID integer);

JP> insert into T1 values (1);
JP> insert into T1 values (2);
JP> insert into T1 values (3);

JP> insert into T2 values (2);
JP> insert into T2 values (3);
JP> insert into T2 values (4);

JP> insert into T3 values (3);
JP> insert into T3 values (4);
JP> insert into T3 values (5);

JP> SELECT
JP> T1.ID AS T1,
JP> T2.ID AS T2,
JP> T3.ID AS T3
JP> FROM
JP> T2

JP> LEFT OUTER JOIN T3
JP> ON T2.ID = T3.ID

JP> RIGHT OUTER JOIN T1
JP> ON T1.ID = T2.ID
JP> ORDER BY
JP> T1.ID,
JP> T2.ID,
JP> T3.ID;

JP> The following 3 rows should be returned

JP> +------+------+------+
JP> | T1 | T2 | T3 |
JP> +------+------+------+
JP> | 1 | NULL | NULL |
JP> | 2 | 2 | NULL |
JP> | 3 | 3 | 3 |
JP> +------+------+------+

JP> MySQL incorrectly returns these 9 rows.

JP> +------+------+------+
JP> | T1 | T2 | T3 |
JP> +------+------+------+
JP> | 1 | 2 | NULL |
JP> | 1 | 3 | NULL |
JP> | 1 | 4 | NULL |
JP> | 2 | 2 | NULL |
JP> | 2 | 3 | NULL |
JP> | 2 | 4 | NULL |
JP> | 3 | 2 | NULL |
JP> | 3 | 3 | 3 |
JP> | 3 | 4 | NULL |
JP> +------+------+------+

First, the result you gave can't be correct as T2 can never be NULL.

An EXPLAIN for the above shows the join order MySQL uses:

----------------+
| table | type | possible_keys | key | key_len | ref | rows | Extra
|
+-------+------+---------------+------+---------+------+---- --+-------------
--------------------+
| T2 | ALL | NULL | NULL | NULL | NULL | 3 | Using
temporary; Using filesort |
| T1 | ALL | NULL | NULL | NULL | NULL | 3 |
|
| T3 | ALL | NULL | NULL | NULL | NULL | 3 |
|
+-------+------+---------------+------+---------+------+---- --+-------------
--------------------+

In other words, MySQL will solve the JOIN the following way:

- Read a row from T2
- Read through all rows from T1
- Read through all rows from T3
If there is no rows matching the ON clause generate an extra
NULL-row in T3

- As T2 is used on the LEFT side of a LEFT join, we don't generate
NULL-rows for this.
- As T1 is on the RIGHT side of a RIGHT join, we don't generate NULL
rows for this.

I think the above is the only logical way to do this. If this would
not be the logic, you would get a lot of problems using LEFT join in
combination with many tables where you only want to guard against
missing rows in one the tables (which I belive is the most common way
to use LEFT join).

For example:

SELECT * from customer_group, order LEFT JOIN customer USING
(customer_id), product WHERE ....

In this case you don't want to have generated null-rows for the
customer_group, product or order tables as these is not part of the
left join.

To repeat: MySQL interpretes the above the above LEFT join to mean

- You have to read 'order' before customer.
- If there is no matching customer, generate a null row in the
customer table.
- Other tables are not affected by the LEFT join.


Regards,
Monty

--
For technical support contracts, goto https://order.mysql.com/
__ ___ ___ ____ __
/ |/ /_ __/ __/ __ \/ / Mr. Michael Widenius
/ /|_/ / // /\ \/ /_/ / /__ MySQL AB, CTO
/_/ /_/\_, /___/\___\_\___/ Helsinki, Finland
<___/ www.mysql.com

------------------------------------------------------------ ---------
Before posting, please check:
http://www.mysql.com/manual.php (the manual)
http://lists.mysql.com/ (the list archive)

To request this thread, e-mail bugs-thread12632@lists.mysql.com
To unsubscribe, e-mail