Need to find instances of duplicates within a column; joining 2 tables.
am 22.08.2007 01:33:31 von aknoch
My basic situation is this - I ONLY want duplicates, so the opposite
of DISTINCT:
I have two tables. Ordinarily, Table1ColumnA corresponds in a one to
one ratio with Table2ColumnB through a shared variable. So if I query
TableB using the shared variable, there really should only be on
record returned. In essence, if I run this and return TWO rows, it is
very bad:
select * from TableB where SharedVariable = 1234
I know how to join the tables on a single record to see if this is the
case with one record, but I need to find out how many, among possibly
millions of records this affects.
Every record in Table1ColumnA (and also the shared variable) will be
unique. There is another column in Table1 (I'll call it
Table1ColumnC) that will be duplicated if the record in Table2 is a
duplicate, so I am trying to use that to filter my results in Table1.
I am looking to see how many from Table1 map to DUPLICATE instances in
Table2.
I need to be able to say, in effect, "how many unique records in
Table1ColumnA that have a duplicate in Table1ColumnC also have a
duplicate in Table2ColumnB?"
Thanks if anyone can help!
-- aknoch
Re: Need to find instances of duplicates within a column; joining 2 tables.
am 22.08.2007 02:36:21 von Roy Harvey
I'm not sure if you can simply test each table individually or if you
have to join them, but the basic approach is the same either way.
>select * from TableB where SharedVariable = 1234
The query to test TableB alone and find ALL values of SharedVariable
that appear more than once:
SELECT SharedVariable, count(*) as Dups
FROM TableB
GROUP BY SharedVariable
HAVING count(*) > 1
If you need to see all the data in the rows involved then use the
query above to determine the set of values to search for:
SELECT TableB.*
FROM TableB
JOIN (SELECT SharedVariable, count(*) as Dups
FROM TableB
GROUP BY SharedVariable
HAVING count(*) > 1) as K
ON TableB.SharedVariable = K.SharedVariable
I hope that helps.
Roy Harvey
Beacon Falls, CT
On Tue, 21 Aug 2007 16:33:31 -0700, aknoch@gmail.com wrote:
>My basic situation is this - I ONLY want duplicates, so the opposite
>of DISTINCT:
>
>I have two tables. Ordinarily, Table1ColumnA corresponds in a one to
>one ratio with Table2ColumnB through a shared variable. So if I query
>TableB using the shared variable, there really should only be on
>record returned. In essence, if I run this and return TWO rows, it is
>very bad:
>
>select * from TableB where SharedVariable = 1234
>
>I know how to join the tables on a single record to see if this is the
>case with one record, but I need to find out how many, among possibly
>millions of records this affects.
>
>Every record in Table1ColumnA (and also the shared variable) will be
>unique. There is another column in Table1 (I'll call it
>Table1ColumnC) that will be duplicated if the record in Table2 is a
>duplicate, so I am trying to use that to filter my results in Table1.
>I am looking to see how many from Table1 map to DUPLICATE instances in
>Table2.
>
>I need to be able to say, in effect, "how many unique records in
>Table1ColumnA that have a duplicate in Table1ColumnC also have a
>duplicate in Table2ColumnB?"
>
>Thanks if anyone can help!
>
>-- aknoch