Counting Distinct Records
Counting Distinct Records
am 16.11.2004 20:45:45 von tfo
I am wondering whether the following two forms of SELECT statements are
logically equivalent:
SELECT COUNT( DISTINCT table.column ) ...
and
SELECT DISTINCT COUNT( * ) ...
If they are the same, then why is the latter query much slower in
postgres when applied to the same FROM and WHERE clauses?
Furthermore, is there a better way of performing this sort of operation
in postgres (or just in SQL in general)?
Thanks!
-tfo
--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC
http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
Re: Counting Distinct Records
am 16.11.2004 21:03:35 von sszabo
On Tue, 16 Nov 2004, Thomas F.O'Connell wrote:
> I am wondering whether the following two forms of SELECT statements are
> logically equivalent:
>
> SELECT COUNT( DISTINCT table.column ) ...
>
> and
>
> SELECT DISTINCT COUNT( * ) ...
Not in general.
The former counts how many distinct table.column values there are. The
distinct in the latter would be basically meaningless unless there's a
group by involved.
---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend
Re: Counting Distinct Records
am 16.11.2004 21:09:33 von tfo
Is there another way to accomplish what the former is doing, then?
For practical reasons, I'd like to come up with something better.
For theoretical curiosity, I'd like to know whether there's a way to
combine COUNT and DISTINCT that still allows one to reference * rather
than naming specific columns without grouping.
If I resort to GROUP BY, is there an efficient way of counting all the
groups, or would it just be something like:
SELECT COUNT ( * ) FROM ( SELECT ... GROUP BY ... );
-tfo
--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC
http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005
On Nov 16, 2004, at 2:03 PM, Stephan Szabo wrote:
> On Tue, 16 Nov 2004, Thomas F.O'Connell wrote:
>
>> I am wondering whether the following two forms of SELECT statements
>> are
>> logically equivalent:
>>
>> SELECT COUNT( DISTINCT table.column ) ...
>>
>> and
>>
>> SELECT DISTINCT COUNT( * ) ...
>
> Not in general.
>
> The former counts how many distinct table.column values there are. The
> distinct in the latter would be basically meaningless unless there's a
> group by involved.
>
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly
Re: Counting Distinct Records
am 16.11.2004 23:34:31 von sszabo
On Tue, 16 Nov 2004, Thomas F.O'Connell wrote:
> Is there another way to accomplish what the former is doing, then?
The only thing I can think of is a subselect in from that uses distinct.
select count(*) from (select distinct ...) foo
That also theoretically allows you to use select distinct * inside the
subselect.
---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
Re: Counting Distinct Records
am 17.11.2004 00:02:56 von tfo
Hmm. I was more interested in using COUNT( * ) than DISTINCT *.
I want a count of all rows, but I want to be able to specify which
columns are distinct.
That's definitely an interesting approach, but testing doesn't show it
to be appreciably faster.
If I do a DISTINCT *, postgres will attempt to guarantee that there are
no duplicate values across all columns rather than a subset of columns?
Is that right?
Anyway, I was just wondering if there were any best practices out there
for counting distinct values in sets of values that might not
themselves be distinct.
Thanks for the tips so far!
-tfo
--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC
http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005
On Nov 16, 2004, at 4:34 PM, Stephan Szabo wrote:
> On Tue, 16 Nov 2004, Thomas F.O'Connell wrote:
>
>> Is there another way to accomplish what the former is doing, then?
>
> The only thing I can think of is a subselect in from that uses
> distinct.
> select count(*) from (select distinct ...) foo
>
> That also theoretically allows you to use select distinct * inside the
> subselect.
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?
http://www.postgresql.org/docs/faqs/FAQ.html
Re: Counting Distinct Records
am 17.11.2004 15:52:54 von sszabo
On Tue, 16 Nov 2004, Thomas F.O'Connell wrote:
> Hmm. I was more interested in using COUNT( * ) than DISTINCT *.
>
> I want a count of all rows, but I want to be able to specify which
> columns are distinct.
I'm now a bit confused about exactly what you're looking for in the end.
Can you give a short example?
> That's definitely an interesting approach, but testing doesn't show it
> to be appreciably faster.
>
> If I do a DISTINCT *, postgres will attempt to guarantee that there are
> no duplicate values across all columns rather than a subset of columns?
> Is that right?
It guarantees one output row for each distinct set of column values across
all columns.
---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
Re: Counting Distinct Records
am 17.11.2004 20:27:17 von tfo
The specific problem I'm trying to solve involves a user table with
some history.
Something like this:
create table user_history (
user_id int
event_time_stamp timestamp
);
I'd like to be able to count the distinct user_ids in this table, even
if it were joined to other tables.
-tfo
--
Thomas F. O'Connell
Co-Founder, Information Architect
Sitening, LLC
http://www.sitening.com/
110 30th Avenue North, Suite 6
Nashville, TN 37203-6320
615-260-0005
On Nov 17, 2004, at 8:52 AM, Stephan Szabo wrote:
> On Tue, 16 Nov 2004, Thomas F.O'Connell wrote:
>
>> Hmm. I was more interested in using COUNT( * ) than DISTINCT *.
>>
>> I want a count of all rows, but I want to be able to specify which
>> columns are distinct.
>
> I'm now a bit confused about exactly what you're looking for in the
> end.
> Can you give a short example?
>
>> That's definitely an interesting approach, but testing doesn't show it
>> to be appreciably faster.
>>
>> If I do a DISTINCT *, postgres will attempt to guarantee that there
>> are
>> no duplicate values across all columns rather than a subset of
>> columns?
>> Is that right?
>
> It guarantees one output row for each distinct set of column values
> across
> all columns.
---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?
http://www.postgresql.org/docs/faqs/FAQ.html