MySQL, SELECTing items with GROUP BY, HAVING COUNT(*) > 1

am 10.10.2006 04:16:55 von michael.martinek

Greetings!

I've recently been trying to do something, which apparently looks like
it may be a little odd.. I'm not finding anything in the manuals or
anywhere on the web where something similiar is being done.. so it
looks like my two options now are to post for some assistance, or find
another way to accomplish my task.

Basically, what I'm trying to do is select the number of duplicate
entries from a table using three columns. There is one primary key with
the table (and that seems to be what's throwing me off). Here's an
example:

mysql> SELECT *, COUNT(*) FROM tbBillingLineItems GROUP BY user_id,
pull_start, item_type HAVING COUNT(*) > 1;
+---------+---------+------------+-----------+--------+----- ------------+-------------+------------+-------+----------+
| item_id | user_id | pull_start | item_type | amount | amount_per_item
| total_items | free_items | notes | COUNT(*) |
+---------+---------+------------+-----------+--------+----- ------------+-------------+------------+-------+----------+
| 52 | 1 | 1157068800 | 1 | 0.00 | 0.01
| 2 | 5 | NULL | 8 |
| 53 | 1 | 1157068800 | 2 | 0.19 | 0.01
| 24 | 5 | NULL | 8 |
| 54 | 1 | 1157068800 | 3 | 10.00 | 10.00
| 1 | 0 | NULL | 8 |
| 55 | 2 | 1157068800 | 1 | 11.00 | 0.10
| 115 | 5 | NULL | 8 |
| 56 | 2 | 1157068800 | 2 | 4.60 | 0.02
| 236 | 5 | NULL | 8 |
| 51 | 2 | 1157068800 | 3 | 5.00 | 5.00
| 1 | 0 | NULL | 8 |
+---------+---------+------------+-----------+--------+----- ------------+-------------+------------+-------+----------+
6 rows in set (0.00 sec)

What I'm looking to do, is pull the eight items that are being
matched.. I need to delete all but one copy of them. This is used in
the event that an application is run multiple times, it will create too
many entries into the database which need to be removed. Since the
primary key cannot be used to determine unique values, what I use
instead is the timestamp in which the entries are being run for, the
type of item it is, and the user_id the item is for.

In essence, what I'm looking for is something to delete all duplicate
entries from the table, based on user_id, pull_start, and item_type. If
I can figure out a way to get the item_id for each of these items that
were deemed duplicate (as indicated by the resulting COUNT(*)), then I
can delete them from there with a temporary table or a simple script.
One way to accomplish this that I've found was to repeat execution of
this query multiple times until there are no more duplicate items
returned.. but I'm sure there has to be a more efficient way.. I'm open
to any and all suggestions.

Thanks!
Michael Martinek

Re: MySQL, SELECTing items with GROUP BY, HAVING COUNT(*) > 1

am 10.10.2006 04:43:59 von michael.martinek

Here is my scripted solution, for anyone who might be interested or
possibly running into the same wall as I am.

mysql_connect('dbHost','dbUser','dbPassword') or die('Unable to connect
to database.');
mysql_select_db('dbName') or die('Unable to select database.');

$sSQL = 'CREATE TEMPORARY TABLE dupes SELECT *, COUNT(*) - 1 AS cnt
FROM'
.' tbBillingLineItems GROUP BY user_id, pull_start, item_type'
.' HAVING COUNT(*) > 1 ORDER BY item_id';
mysql_query($sSQL);

$sSQL = 'SELECT * FROM dupes';
if ($mRes = mysql_query($sSQL)) {
while ($mRow = mysql_fetch_assoc($mRes)) {
$sSQL = sprintf('DELETE FROM tbBillingLineItems WHERE
user_id = %u AND pull_start = %u AND item_type = %u LIMIT %u',
$mRow['user_id'], $mRow['pull_start'],
$mRow['item_type'], $mRow['cnt']
);
mysql_query($sSQL);
}
mysql_free_result($mRes);
}

mysql_close();
?>

Of course, I'd prefer not to have to do it in a fashion like this.. but
if I have to, ah well. At least it works.