An optimum query..?

am 03.04.2008 03:39:07 von dsdevonsomer

Hello friends,
I have one simple question. I have two tables. 1 ( Table A ) has about
2.5 million rows and second one ( Table B ) has about 1 million. There
are common ID fields in both tables. I want join them on ID field and
get all rows of Table A which are not in there Table B.

When I ran following two queries, I got same result set, but time it
took was very different.

Following query took 1:35 minutes
SELECT Tbl1.UID, Tbl1.[LAST NAME], Tbl1.[FIRST NAME], Tbl1.[HOUSE
NUMBER], Tbl1.ADDRESS, Tbl1.CITY, Tbl1.STATE
FROM [Table A] Tbl1
WHERE NOT exists (
SELECT 1 from [Table B] Tbl2 WHERE
Tbl1.UID = Tbl2.UID )

vs this one took .45 seconds.

SELECT Tbl1.UID, Tbl1.[LAST NAME], Tbl1.[FIRST NAME], Tbl1.[HOUSE
NUMBER], Tbl1.ADDRESS, Tbl1.CITY, Tbl1.STATE
FROM [Table A] Tbl1 LEFT OUTER JOIN [Table B] Tbl2 WHERE ON Tbl1.UID =
Tbl2.UID
WHERE Tbl2.UID IS NULL

Which option is better ? I have subsequent joins to another table
which has about 2 mil more rows and trying to optimize the response
time.

I appreciate all help from the community.
JB

Re: An optimum query..?

am 03.04.2008 04:21:33 von Dan Guzman

> Which option is better ? I have subsequent joins to another table
> which has about 2 mil more rows and trying to optimize the response
> time.

I would have expected both the NOT EXISTS and the LEFT JOIN...WHERE..IS NULL
to yield the same plan and performance. However, the second query you
posted is invalid (has a WHERE in the JOIN clause). Can you post the actual
table DDL and queries?

--
Hope this helps.

Dan Guzman
SQL Server MVP
http://weblogs.sqlteam.com/dang/

wrote in message
news:195e21cb-f77d-48ef-b040-bbb4c403750d@m73g2000hsh.google groups.com...
> Hello friends,
> I have one simple question. I have two tables. 1 ( Table A ) has about
> 2.5 million rows and second one ( Table B ) has about 1 million. There
> are common ID fields in both tables. I want join them on ID field and
> get all rows of Table A which are not in there Table B.
>
> When I ran following two queries, I got same result set, but time it
> took was very different.
>
> Following query took 1:35 minutes
> SELECT Tbl1.UID, Tbl1.[LAST NAME], Tbl1.[FIRST NAME], Tbl1.[HOUSE
> NUMBER], Tbl1.ADDRESS, Tbl1.CITY, Tbl1.STATE
> FROM [Table A] Tbl1
> WHERE NOT exists (
> SELECT 1 from [Table B] Tbl2 WHERE
> Tbl1.UID = Tbl2.UID )
>
> vs this one took .45 seconds.
>
> SELECT Tbl1.UID, Tbl1.[LAST NAME], Tbl1.[FIRST NAME], Tbl1.[HOUSE
> NUMBER], Tbl1.ADDRESS, Tbl1.CITY, Tbl1.STATE
> FROM [Table A] Tbl1 LEFT OUTER JOIN [Table B] Tbl2 WHERE ON Tbl1.UID =
> Tbl2.UID
> WHERE Tbl2.UID IS NULL
>
> Which option is better ? I have subsequent joins to another table
> which has about 2 mil more rows and trying to optimize the response
> time.
>
> I appreciate all help from the community.
> JB

Re: An optimum query..?

am 03.04.2008 05:06:55 von dsdevonsomer

On Apr 2, 10:21=A0pm, "Dan Guzman" online.sbcglobal.net> wrote:
> > Which option is better ? I have subsequent joins to another table
> > which has about 2 mil more rows and trying to optimize the response
> > time.
>
> I would have expected both the NOT EXISTS and the LEFT JOIN...WHERE..IS NU=
LL
> to yield the same plan and performance. =A0However, the second query you
> posted is invalid (has a WHERE in the JOIN clause). =A0Can you post the ac=
tual
> table DDL and queries?
>
> --
> Hope this helps.
>
> Dan Guzman
> SQL Server MVPhttp://weblogs.sqlteam.com/dang/
>
> wrote in message
>
> news:195e21cb-f77d-48ef-b040-bbb4c403750d@m73g2000hsh.google groups.com...
>
>
>
> > Hello friends,
> > I have one simple question. I have two tables. 1 ( Table A ) has about
> > 2.5 million rows and second one ( Table B ) has about 1 million. There
> > are common ID fields in both tables. I want join them on ID field and
> > get all rows of Table A which are not in there Table B.
>
> > When I ran following two queries, I got same result set, but time it
> > took was very different.
>
> > Following query took 1:35 minutes
> > SELECT Tbl1.UID, Tbl1.[LAST NAME], Tbl1.[FIRST NAME], Tbl1.[HOUSE
> > NUMBER], Tbl1.ADDRESS, Tbl1.CITY, Tbl1.STATE
> > FROM [Table A] Tbl1
> > WHERE NOT exists (
> > SELECT =A01 from [Table B] Tbl2 WHERE
> > Tbl1.UID =3D Tbl2.UID )
>
> > vs this one took .45 seconds.
>
> > SELECT Tbl1.UID, Tbl1.[LAST NAME], Tbl1.[FIRST NAME], Tbl1.[HOUSE
> > NUMBER], Tbl1.ADDRESS, Tbl1.CITY, Tbl1.STATE
> > FROM [Table A] Tbl1 LEFT OUTER JOIN [Table B] Tbl2 WHERE ON Tbl1.UID =3D=

> > Tbl2.UID
> > WHERE Tbl2.UID IS NULL
>
> > Which option is better ? I have subsequent joins to another table
> > which has about 2 mil more rows and trying to optimize the response
> > time.
>
> > I appreciate all help from the community.
> > JB- Hide quoted text -
>
> - Show quoted text -

Oh,
second query is just without WHERE clause. It was an honest mistake. I
checked on client statistics and execution plan. In Left Outer join
query, Client processing time 67858, vs in NOT EXISTS query Client
processing time 72074 .

Thanks Dan..

Re: An optimum query..?

am 03.04.2008 13:57:05 von DeanGC

On Apr 2, 8:06 pm, dsdevonso...@gmail.com wrote:
> Oh,
> second query is just without WHERE clause. It was an honest mistake. I
> checked on client statistics and execution plan. In Left Outer join
> query, Client processing time 67858, vs in NOT EXISTS query Client
> processing time 72074 .

What do the query plans look like? Like Dan, I would expect that these
queries would produce the same plan.

Did you run the queries multiple times? If you ran each one only once,
the difference may be due to caching.

Re: An optimum query..?

am 03.04.2008 14:05:36 von Dan Guzman

> second query is just without WHERE clause. It was an honest mistake.

I though that was the case but just wanted be make sure.

> checked on client statistics and execution plan. In Left Outer join
> query, Client processing time 67858, vs in NOT EXISTS query Client
> processing time 72074 .

This is a symptom that the plans for the 2 queries are different. Assuming
the UID column in both tables does not allow NULLs, I would have expected
the same plan for both queries. Can you post the DDL for the tables,
including constraints and indexes?

--
Hope this helps.

Dan Guzman
SQL Server MVP
http://weblogs.sqlteam.com/dang/

wrote in message
news:950eba44-34d6-4585-aafd-ce3ae9eb7d31@8g2000hse.googlegr oups.com...
On Apr 2, 10:21 pm, "Dan Guzman" online.sbcglobal.net> wrote:
> > Which option is better ? I have subsequent joins to another table
> > which has about 2 mil more rows and trying to optimize the response
> > time.
>
> I would have expected both the NOT EXISTS and the LEFT JOIN...WHERE..IS
> NULL
> to yield the same plan and performance. However, the second query you
> posted is invalid (has a WHERE in the JOIN clause). Can you post the
> actual
> table DDL and queries?
>
> --
> Hope this helps.
>
> Dan Guzman
> SQL Server MVPhttp://weblogs.sqlteam.com/dang/
>
> wrote in message
>
> news:195e21cb-f77d-48ef-b040-bbb4c403750d@m73g2000hsh.google groups.com...
>
>
>
> > Hello friends,
> > I have one simple question. I have two tables. 1 ( Table A ) has about
> > 2.5 million rows and second one ( Table B ) has about 1 million. There
> > are common ID fields in both tables. I want join them on ID field and
> > get all rows of Table A which are not in there Table B.
>
> > When I ran following two queries, I got same result set, but time it
> > took was very different.
>
> > Following query took 1:35 minutes
> > SELECT Tbl1.UID, Tbl1.[LAST NAME], Tbl1.[FIRST NAME], Tbl1.[HOUSE
> > NUMBER], Tbl1.ADDRESS, Tbl1.CITY, Tbl1.STATE
> > FROM [Table A] Tbl1
> > WHERE NOT exists (
> > SELECT 1 from [Table B] Tbl2 WHERE
> > Tbl1.UID = Tbl2.UID )
>
> > vs this one took .45 seconds.
>
> > SELECT Tbl1.UID, Tbl1.[LAST NAME], Tbl1.[FIRST NAME], Tbl1.[HOUSE
> > NUMBER], Tbl1.ADDRESS, Tbl1.CITY, Tbl1.STATE
> > FROM [Table A] Tbl1 LEFT OUTER JOIN [Table B] Tbl2 WHERE ON Tbl1.UID =
> > Tbl2.UID
> > WHERE Tbl2.UID IS NULL
>
> > Which option is better ? I have subsequent joins to another table
> > which has about 2 mil more rows and trying to optimize the response
> > time.
>
> > I appreciate all help from the community.
> > JB- Hide quoted text -
>
> - Show quoted text -

Oh,
second query is just without WHERE clause. It was an honest mistake. I
checked on client statistics and execution plan. In Left Outer join
query, Client processing time 67858, vs in NOT EXISTS query Client
processing time 72074 .

Thanks Dan..

Re: An optimum query..?

am 04.04.2008 00:55:24 von Joe Celko

Here is another alternative, if you have 2005 or 2008. Since you did
not use very good data element names, you might not know about SAN
("Standard Address Number") which is used in many industries for
address data. On Oracle and other DBs, the EXCEPT is quite fast.

SELECT san, last_name, first_name, house_nbr, street_name, city_name,
state_code
FROM Addresses
WHERE san
IN
(SELECT san FROM Addresses
EXCEPT
SELECT san FROM Foobar); -- not a great table name ..