SQL syntax extentions - to put postgres ahead in the race

SQL syntax extentions - to put postgres ahead in the race

am 05.08.2004 10:28:44 von ram_nathaniel

This is a multi-part message in MIME format.

------=_NextPart_000_01B9_01C47AD6.FBF146A0
Content-Type: text/plain;
charset="windows-1255"
Content-Transfer-Encoding: quoted-printable

Hi,

I am a developer working with many databases, as a part of my job. I use he=
avy SQL queries and have become somewhat of an expert in SQL, including tri=
cks and workarounds of the limitation of the language.

I feel like a lot of the workarounds could be avoided with adding a few new=
operators to the SQL language that should be rather easy to support but wo=
uld give a vast improvement and probably a leap of performance in many comm=
on queries. I write you about this hoping that you would support these oper=
ators (even though they are not in the ANSI) and thereby position PostGres =
as a leader and not just a follower! I personaly have a great interest in s=
eeing open source software making it big time! So please - forward this to =
one of your more technical developers/executives and hopefully I will get t=
o see it in the next version of PostGres:

1) The operator "of max":
suppose I have a table "grades" of 3 fields: class/student/grade where I st=
ore many grades of many students of many classes. I want to get the name of=
the highest scoring student in each class. Note that there may be many stu=
dents with the same grade, but for starters let's say there is a primary ke=
y of class+grade.

My query would now be:
select student from grades where class+'#'+grade in=20
(
select class+'#'+max(grade) from grades group by class
) a

This means working the query twice - and relying on louzy conversion of the=
grade from numerical to textual.
We could also use:

select student from grades where student in=20
(
select student from grades group by class=20
having class+'#'+grade =3D max(class+'#'+grade)
) a

This is even worse!

The optimal would be to introduce a new operator "of max" that would be use=
d as follows:

select student of max(grade) from grades group by class

simillarly one should support "of min" and "of any" (brings a representativ=
e of the group)

2) aggregated concatenation:
Traditionally the SQL language has referred from supporting order dependent=
operators from taking a role in aggregated functions. This means that sinc=
e the query: "select class, grade from grades" does not ensure the order of=
the records returned, the operation sum() is supported (i.e. select class,=
sum(grade) from grades group by class) but other operations that would be =
order dependent are not supported.
I think this approach should be revised. In many cases one would want to ge=
t a list of the student names delimited with a comma. It would be great if =
one could write:
select class, list(student, ',') from grades group by class
and get

class list
----- ----
class1 john, ruth,...




This is of course an order dependent operation so the syntax can either be:
select class, list(student, ',') from grades group by class order by studen=
t, grade
in which case the list would be ordered before the list is created, or if n=
o particular order is requested the concatenation of the names should be in=
an arbitrary order.

Well - that's all for now :-)

Good luck!

Ram

------=_NextPart_000_01B9_01C47AD6.FBF146A0
Content-Type: text/html;
charset="windows-1255"
Content-Transfer-Encoding: quoted-printable



5">




Hi,

 

I am a developer working with many databas=
es, as a=20
part of my job. I use heavy SQL queries and have become somewhat of an expe=
rt in=20
SQL, including tricks and workarounds of the limitation of the=20
language.

 

I feel like a lot of the workarounds could=
be=20
avoided with adding a few new operators to the SQL language that should be=
=20
rather easy to support but would give a vast improvement and probably a lea=
p of=20
performance in many common queries. I write you about this hoping that you =
would=20
support these operators (even though they are not in the ANSI) and thereby=
=20
position PostGres as a leader and not just a follower! I personaly hav=
e a=20
great interest in seeing open source software making it big time! So please=
-=20
forward this to one of your more technical developers/executives and hopefu=
lly I=20
will get to see it in the next version of PostGres:

 

1) The operator  "of max":
suppose=
I have a=20
table "grades" of 3 fields: class/student/grade where I store many grades o=
f=20
many students of many classes. I want to get the name of the highest scorin=
g=20
student in each class. Note that there may be many students with the same g=
rade,=20
but for starters let's say there is a primary key of class+grade.
IV>
 

My query would now be:
select student f=
rom=20
grades where class+'#'+grade in
(
   select=20
class+'#'+max(grade) from grades group by class
) a

 

This means working the query twice - and r=
elying on=20
louzy conversion of the grade from numerical to textual.
We could also=
=20
use:

 

select student from grades where student i=
n=20

(
  select student from grades group by class
  having=
=20
class+'#'+grade =3D max(class+'#'+grade)
) a

 

This is even worse!

 

The optimal would be to introduce a new op=
erator=20
"of max" that would be used as follows:

 

select student of max(grade) from grades g=
roup by=20
class

 

simillarly one should support "of min" and=
"of any"=20
(brings a representative of the group)

 

2) aggregated concatenation:
Traditiona=
lly the=20
SQL language has referred from supporting order dependent operators from ta=
king=20
a role in aggregated functions. This means that since the query: "select cl=
ass,=20
grade from grades" does not ensure the order of the records returned, the=
=20
operation sum() is supported (i.e. select class, sum(grade) from grades gro=
up by=20
class) but other operations that would be order dependent are not=20
supported.
I think this approach should be revised. In many cases one wo=
uld=20
want to get a list of the student names delimited with a comma. It would be=
=20
great if one could write:
select class, list(student, ',') from grades g=
roup=20
by class
and get

 

class  list
-----  ----
cl=
ass1=20
john, ruth,...
.
.
.

 

This is of course an order dependent opera=
tion so=20
the syntax can either be:
select class, list(student, ',') from grades g=
roup=20
by class order by student, grade
in which case the list would be ordered=
=20
before the list is created, or if no particular order is requested the=20
concatenation of the names should be in an arbitrary order.

 

Well - that's all for now :-)

 

Good luck!

 

Ram


------=_NextPart_000_01B9_01C47AD6.FBF146A0--

Re: SQL syntax extentions - to put postgres ahead in the race

am 06.08.2004 07:22:33 von sszabo

On Thu, 5 Aug 2004, Ram Nathaniel wrote:

>
> 1) The operator "of max":
> suppose I have a table "grades" of 3 fields: class/student/grade where I
> store many grades of many students of many classes. I want to get the
> name of the highest scoring student in each class. Note that there may
> be many students with the same grade, but for starters let's say there
> is a primary key of class+grade.
>
> My query would now be:
> select student from grades where class+'#'+grade in
> (
> select class+'#'+max(grade) from grades group by class
> ) a

As a side note, I'd think that something like:
select student from grades where (class,grade) in
(select class, max(grade) from grades group by class);
should avoid textual operations. I'm assuming the + above are meant to be
concatenation (||).

> The optimal would be to introduce a new operator "of max" that would be used as follows:
>
> select student of max(grade) from grades group by class

PostgreSQL provides an extension called DISTINCT ON.

Something like
select distinct on (class) student from grades order by class, grade
desc;
should get you one arbitrary student with the highest grade in his or her
class.

If you want to order by the grades, I think you need a layer around it.
If you don't care about the class order, you might consider making the
class ordering desc as well to make it easier to use a multi-column index
on (class,grade).

> 2) aggregated concatenation:

Theoretically, you should be able to do this right now in PostgreSQL with
user defined aggregates (although you can't pass a second argument
currently for the separator). I believe that an ordered subquery in FROM
will currently allow you to get an ordered aggregate, or perhaps you'd
have to turn off hash aggregation, but I think you should be able to get
it to keep the ordering.

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

Re: SQL syntax extentions - to put postgres ahead in the race

am 06.08.2004 09:07:27 von tgl

Stephan Szabo writes:
> On Thu, 5 Aug 2004, Ram Nathaniel wrote:
>> 2) aggregated concatenation:

> Theoretically, you should be able to do this right now in PostgreSQL with
> user defined aggregates (although you can't pass a second argument
> currently for the separator).

There's nothing particularly stopping us from supporting
multiple-argument aggregates, except a lack of round tuits.
(I suppose we'd want to rethink the syntax of CREATE AGGREGATE,
but otherwise it ought to be pretty straightforward.)

> I believe that an ordered subquery in FROM
> will currently allow you to get an ordered aggregate, or perhaps you'd
> have to turn off hash aggregation, but I think you should be able to get
> it to keep the ordering.

I think you would want to ORDER BY twice:

SELECT class, list(student) from
(select class, student from grades order by class, student) ss
order by class;

It looks like (at least in CVS tip) planner.c will take into account the
relative costs of doing a GroupAgg vs doing a HashAgg and re-sorting,
but I'm too tired to try it right now...

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Re: SQL syntax extentions - to put postgres ahead in the race

am 06.08.2004 16:30:06 von tgl

I wrote:
> There's nothing particularly stopping us from supporting
> multiple-argument aggregates, except a lack of round tuits.

BTW, you can actually fake this pretty well in 8.0, by making an
aggregate that uses a rowtype input. For example:

regression=# create type twostrings as (s1 text, s2 text);
CREATE TYPE
regression=# create function list_concat(text, twostrings) returns text as $$
regression$# select case when $1 is null then $2.s1
regression$# when $2.s1 is null then $1
regression$# else $1 || $2.s2 || $2.s1
regression$# end$$ language sql;
CREATE FUNCTION
regression=# create aggregate concat (
regression(# basetype = twostrings,
regression(# stype = text,
regression(# sfunc = list_concat);
CREATE AGGREGATE
regression=# select * from text_tbl;
f1
-------------------
doh!
hi de ho neighbor
more stuff
and more
(4 rows)

regression=# select concat((f1, '|')) from text_tbl;
concat
--------------------------------------------
doh!|hi de ho neighbor|more stuff|and more
(1 row)


This is somewhat inefficient compared to native support for
multi-argument aggregates, but at least we have something we can point
people to until we find time to make that happen.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org