MySQL, SELECTing items with GROUP BY, HAVING COUNT(*) > 1
am 10.10.2006 04:16:55 von michael.martinekGreetings!
I've recently been trying to do something, which apparently looks like
it may be a little odd.. I'm not finding anything in the manuals or
anywhere on the web where something similiar is being done.. so it
looks like my two options now are to post for some assistance, or find
another way to accomplish my task.
Basically, what I'm trying to do is select the number of duplicate
entries from a table using three columns. There is one primary key with
the table (and that seems to be what's throwing me off). Here's an
example:
mysql> SELECT *, COUNT(*) FROM tbBillingLineItems GROUP BY user_id,
pull_start, item_type HAVING COUNT(*) > 1;
+---------+---------+------------+-----------+--------+----- ------------+-------------+------------+-------+----------+
| item_id | user_id | pull_start | item_type | amount | amount_per_item
| total_items | free_items | notes | COUNT(*) |
+---------+---------+------------+-----------+--------+----- ------------+-------------+------------+-------+----------+
| 52 | 1 | 1157068800 | 1 | 0.00 | 0.01
| 2 | 5 | NULL | 8 |
| 53 | 1 | 1157068800 | 2 | 0.19 | 0.01
| 24 | 5 | NULL | 8 |
| 54 | 1 | 1157068800 | 3 | 10.00 | 10.00
| 1 | 0 | NULL | 8 |
| 55 | 2 | 1157068800 | 1 | 11.00 | 0.10
| 115 | 5 | NULL | 8 |
| 56 | 2 | 1157068800 | 2 | 4.60 | 0.02
| 236 | 5 | NULL | 8 |
| 51 | 2 | 1157068800 | 3 | 5.00 | 5.00
| 1 | 0 | NULL | 8 |
+---------+---------+------------+-----------+--------+----- ------------+-------------+------------+-------+----------+
6 rows in set (0.00 sec)
What I'm looking to do, is pull the eight items that are being
matched.. I need to delete all but one copy of them. This is used in
the event that an application is run multiple times, it will create too
many entries into the database which need to be removed. Since the
primary key cannot be used to determine unique values, what I use
instead is the timestamp in which the entries are being run for, the
type of item it is, and the user_id the item is for.
In essence, what I'm looking for is something to delete all duplicate
entries from the table, based on user_id, pull_start, and item_type. If
I can figure out a way to get the item_id for each of these items that
were deemed duplicate (as indicated by the resulting COUNT(*)), then I
can delete them from there with a temporary table or a simple script.
One way to accomplish this that I've found was to repeat execution of
this query multiple times until there are no more duplicate items
returned.. but I'm sure there has to be a more efficient way.. I'm open
to any and all suggestions.
Thanks!
Michael Martinek