Help with a Query

am 29.11.2005 03:07:09 von Peter Lewis

Hi there

I'm trying to work out how to run a query
that checks if there are duplicated records
in a table and then have to run a query to
delete them (need to check if there are
an duplicates before I delete them)

the 3 fields to use as the test would be
product, description, dnote

any help / advice?

Brian

Re: Help with a Query

am 29.11.2005 03:40:26 von IchBin

Brian wrote:
> Hi there
>
> I'm trying to work out how to run a query
> that checks if there are duplicated records
> in a table and then have to run a query to
> delete them (need to check if there are
> an duplicates before I delete them)
>
> the 3 fields to use as the test would be
> product, description, dnote
>
> any help / advice?
>
> Brian
>
>
>

I do not know off the top of my head. You could write a simple program,
after you sort the tables then run a program to detect the duplicate rows.

I will say that after you clean it up make sure you define primary keys,
unique indexes and Foreign keys to preserve the integrity of your data.

--

Thanks in Advance...
IchBin, Pocono Lake, Pa, USA
http://weconsultants.servebeer.com/JHackerAppManager
____________________________________________________________ ______________

'If there is one, Knowledge is the "Fountain of Youth"'
-William E. Taylor, Regular Guy (1952-)

Re: Help with a Query

am 29.11.2005 12:29:18 von Stefan Rybacki

Re: Help with a Query

am 29.11.2005 13:06:21 von Hilarion

>> I'm trying to work out how to run a query
>> that checks if there are duplicated records
>> in a table and then have to run a query to
>> delete them (need to check if there are
>> an duplicates before I delete them)
>>
>> the 3 fields to use as the test would be
>> product, description, dnote
>>
>> any help / advice?
>
> Just do a grouping over those three attributes and count the group members.
>
> SELECT * FROM table GROUP BY product, description, dnote HAVING count(*)>1
>
> For deletion you could use this as subquery (mysql >4.1) or a temporary table with which
> you join in the delete statement.

The select statement above will not work in many SQL engines. You should
NOT use "*" in GROUP BY select statements. It should be something like this:

SELECT product, description, dnote, COUNT(*) AS repeat_count
FROM some_table
GROUP BY product, description, dnote
HAVING COUNT(*) > 1

Why? Because SQL standards require that only fields retrieved by
SELECT be from GROUP BY clause or be aggregate values.

I'm also assuming that Brian wants to remove duplicates, but leaving
one entry. It can't be done without using some other field in this
table which differs between the records. If there's no such field,
then the only way is to remove all entries and then reinserting one.

To Brian:
If you have, for example, a "product_id" field in the table,
then you can list all duplicate entries and also get one "product_id"
value, that should not be removed. I assumed that "product_id"
is an autoincrement and that you want to leave the oldest entry
(which will have lowest value of "product_id"). In that case
the query to list the entries would look like this:

SELECT
product,
description,
dnote,
COUNT(*) AS repeat_count,
MIN( product_id ) AS product_id_to_stay
FROM some_table
GROUP BY product, description, dnote
HAVING COUNT(*) > 1

How to use this data to delete the duplicates (and leave the one
entry) depends on your SQL engine - they differ in using the
subqueries / joins in DELETE (and UPDATE) statements.

Hilarion

Re: Help with a Query

am 29.11.2005 16:26:19 von Stefan Rybacki

Hilarion wrote:
>...
> The select statement above will not work in many SQL engines. You should
> NOT use "*" in GROUP BY select statements. It should be something like
> this:
>
> SELECT product, description, dnote, COUNT(*) AS repeat_count
> FROM some_table
> GROUP BY product, description, dnote
> HAVING COUNT(*) > 1
>
> Why? Because SQL standards require that only fields retrieved by
> SELECT be from GROUP BY clause or be aggregate values.

Hi Hilarion,

I know this, of course, I just used * because it is easier and faster to write ;)

>
>
> I'm also assuming that Brian wants to remove duplicates, but leaving
> one entry. It can't be done without using some other field in this
> table which differs between the records. If there's no such field,
> then the only way is to remove all entries and then reinserting one.

Yes. This is a way.

>
> To Brian:
> If you have, for example, a "product_id" field in the table,
> then you can list all duplicate entries and also get one "product_id"
> value, that should not be removed. I assumed that "product_id"
> is an autoincrement and that you want to leave the oldest entry
> (which will have lowest value of "product_id"). In that case
> the query to list the entries would look like this:
>
> SELECT
> product,
> description,
> dnote,
> COUNT(*) AS repeat_count,
> MIN( product_id ) AS product_id_to_stay
> FROM some_table
> GROUP BY product, description, dnote
> HAVING COUNT(*) > 1
>

Another nice way ;)

>
>...
> Hilarion

Regards
Stefan