SQL Challenge: Arbitrary Cross-tab

am 18.08.2004 04:55:11 von Josh

Folks,

I have a wierd business case. Annoyingly it has to be written in *portable*
SQL92, which means no arrays or custom aggregates. I think it may be
impossible to do in SQL which is why I thought I'd give the people on this
list a crack at it. Solver gets a free drink/lunch on me if we ever meet at
a convention.

The Problem: for each "case" there are from zero to eight "timekeepers"
authorized to work on the "case", out of a pool of 150 "timekeepers". This
data is stored vertically:

authorized_timekeepers:
case_id | timekeeper_id
213447 | 047
132113 | 021
132113 | 115
132113 | 106
etc.

But, a client's e-billing application wants to see these timekeepers displayed
in the following horizontal format:

case_id | tk1 | tk2 | tk3 | tk4 | tk5 | tk6 | tk7 | tk8
213447 | 047 | | | | | | | |
132113 | 021 | 115 | 106 | 034 | 109 | 112 | 087 |
etc.

Order does not matter for timekeepers 1-8.

This is a daunting problem because traditional crosstab solutions do not work;
timekeepers 1-8 are coming out of a pool of 150.

Can it be done? Or are we going to build this with a row-by-row procedural
loop? (to reiterate: I'm not allowed to use a custom aggregate or other
PostgreSQL "advanced feature")

--
Josh Berkus
Aglio Database Solutions
San Francisco

---------------------------(end of broadcast)---------------------------
TIP 2: you can get off all lists at once with the unregister command
(send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

Re: SQL Challenge: Arbitrary Cross-tab

am 18.08.2004 05:18:36 von elein

I would use my report writer, but in any case you'd
want at least 2 separate queries, maybe three to
keep it simple and readable.

If you are allowed to use stored procedures you can
build up the output by using simple concats instead
of text aggregation (which is a procedure of simple
concats).

Using loops and subqueries you should be to construct
the heading (count distinct timekeeper_id) and then select
the data row by row concatenating results before you
send it out.

This is a non-solution which effectively hides the
aggregation in a function.

Or write it in a client perl app if you must.

You can't really do it w/o loops or aggregates.
(I wish (hope?) I were wrong about this.)

--elein

On Tue, Aug 17, 2004 at 07:55:11PM -0700, Josh Berkus wrote:
> Folks,
>
> I have a wierd business case. Annoyingly it has to be written in *portable*
> SQL92, which means no arrays or custom aggregates. I think it may be
> impossible to do in SQL which is why I thought I'd give the people on this
> list a crack at it. Solver gets a free drink/lunch on me if we ever meet at
> a convention.
>
> The Problem: for each "case" there are from zero to eight "timekeepers"
> authorized to work on the "case", out of a pool of 150 "timekeepers". This
> data is stored vertically:
>
> authorized_timekeepers:
> case_id | timekeeper_id
> 213447 | 047
> 132113 | 021
> 132113 | 115
> 132113 | 106
> etc.
>
> But, a client's e-billing application wants to see these timekeepers displayed
> in the following horizontal format:
>
> case_id | tk1 | tk2 | tk3 | tk4 | tk5 | tk6 | tk7 | tk8
> 213447 | 047 | | | | | | | |
> 132113 | 021 | 115 | 106 | 034 | 109 | 112 | 087 |
> etc.
>
> Order does not matter for timekeepers 1-8.
>
> This is a daunting problem because traditional crosstab solutions do not work;
> timekeepers 1-8 are coming out of a pool of 150.
>
> Can it be done? Or are we going to build this with a row-by-row procedural
> loop? (to reiterate: I'm not allowed to use a custom aggregate or other
> PostgreSQL "advanced feature")
>
> --
> Josh Berkus
> Aglio Database Solutions
> San Francisco
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
> (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Re: SQL Challenge: Arbitrary Cross-tab

am 18.08.2004 05:35:53 von sszabo

On Tue, 17 Aug 2004, Josh Berkus wrote:

> I have a wierd business case. Annoyingly it has to be written in *portable*
> SQL92, which means no arrays or custom aggregates. I think it may be
> impossible to do in SQL which is why I thought I'd give the people on this
> list a crack at it. Solver gets a free drink/lunch on me if we ever meet at
> a convention.
>
> The Problem: for each "case" there are from zero to eight "timekeepers"
> authorized to work on the "case", out of a pool of 150 "timekeepers". This
> data is stored vertically:
>
> authorized_timekeepers:
> case_id | timekeeper_id
> 213447 | 047
> 132113 | 021
> 132113 | 115
> 132113 | 106
> etc.
>
> But, a client's e-billing application wants to see these timekeepers displayed
> in the following horizontal format:
>
> case_id | tk1 | tk2 | tk3 | tk4 | tk5 | tk6 | tk7 | tk8
> 213447 | 047 | | | | | | | |
> 132113 | 021 | 115 | 106 | 034 | 109 | 112 | 087 |
> etc.
>
> Order does not matter for timekeepers 1-8.
>
> This is a daunting problem because traditional crosstab solutions do not work;
> timekeepers 1-8 are coming out of a pool of 150.
>
> Can it be done? Or are we going to build this with a row-by-row procedural

If you know it's max 8, I think it may be possible, but I can't think of a
way that'd be better than just writing code yourself.

Just maybe something like the following would give you three timekeepers:
select foo.case_id, foo.v1, foo.v2, (select min(timekeeper_id) from
authorized_timekeepers where
authorized_timekeepers.case_id=foo.case_id and timekeeper_id > foo.v2) as
v3 from (
select foo.case_id, foo.v1, (select min(timekeeper_id) from
authorized_timekeepers where
authorized_timekeepers.case_id=foo.case_id and timekeeper_id > foo.v1) as
v2
from
(select foo.case_id, foo.v1 from
(select foo.case_id,
(select min(timekeeper_id) from authorized_timekeepers where
authorized_timekeepers.case_id=foo.case_id) as v1
from (select distinct case_id from authorized_timekeepers) foo
) foo) foo) foo;

If that works for 3 (and I think that's standard behavior), then you
should be able to extend it to any fixed number using the pattern.

---------------------------(end of broadcast)---------------------------
TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

Re: SQL Challenge: Arbitrary Cross-tab

am 18.08.2004 05:36:41 von Greg

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Names shortened to spare the line lengths:

SELECT bob.cid,
(SELECT tid FROM ats WHERE cid=bob.cid LIMIT 1) AS tk1,
(SELECT tid FROM ats WHERE cid=bob.cid LIMIT 1 OFFSET 1) AS tk2,
(SELECT tid FROM ats WHERE cid=bob.cid LIMIT 1 OFFSET 2) AS tk3,
(SELECT tid FROM ats WHERE cid=bob.cid LIMIT 1 OFFSET 3) AS tk4,
(SELECT tid FROM ats WHERE cid=bob.cid LIMIT 1 OFFSET 4) AS tk5,
(SELECT tid FROM ats WHERE cid=bob.cid LIMIT 1 OFFSET 5) AS tk6,
(SELECT tid FROM ats WHERE cid=bob.cid LIMIT 1 OFFSET 6) AS tk7,
(SELECT tid FROM ats WHERE cid=bob.cid LIMIT 1 OFFSET 7) AS tk8,
FROM (SELECT DISTINCT cid FROM ats) AS bob;

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200408172335
-----BEGIN PGP SIGNATURE-----

iD8DBQFBIs7AvJuQZxSWSsgRAkglAJ9mNEmOYlLPynygMmelvzlqkYoHlwCe JqTb
g5gyh9LztONPCZj32aOEuGI=
=Yy7m
-----END PGP SIGNATURE-----

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend

Re: SQL Challenge: Arbitrary Cross-tab

am 18.08.2004 05:41:09 von Josh

Greg, Stephan,

> (SELECT tid FROM ats WHERE cid=bob.cid LIMIT 1 OFFSET 1) AS tk2,

bzzzzz! Thanks for playing. LIMIT and OFFSET, sadly, are not SQL standard.
They're only portable to MySQL. This has to port to SQL Server and Oracle.

> If that works for 3 (and I think that's standard behavior), then you
> should be able to extend it to any fixed number using the pattern.

Hmmm... that might work. I'll have to test that the nesting doesn't kill SQL
Server (a serious danger) but it's worth a try. Performance will really suck
but fortunately we only run this bill once a month.

--
Josh Berkus
Aglio Database Solutions
San Francisco

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Re: SQL Challenge: Arbitrary Cross-tab

am 18.08.2004 07:15:59 von mail

Josh Berkus wrote:
> The Problem: for each "case" there are from zero to eight "timekeepers"
> authorized to work on the "case", out of a pool of 150 "timekeepers". This
> data is stored vertically:
>
> authorized_timekeepers:
> case_id | timekeeper_id
> 213447 | 047
> 132113 | 021
> 132113 | 115
> 132113 | 106
> etc.
>
> But, a client's e-billing application wants to see these timekeepers displayed
> in the following horizontal format:
>
> case_id | tk1 | tk2 | tk3 | tk4 | tk5 | tk6 | tk7 | tk8
> 213447 | 047 | | | | | | | |
> 132113 | 021 | 115 | 106 | 034 | 109 | 112 | 087 |
> etc.
>
> Order does not matter for timekeepers 1-8.
>
> This is a daunting problem because traditional crosstab solutions do not work;
> timekeepers 1-8 are coming out of a pool of 150.
>
> Can it be done? Or are we going to build this with a row-by-row procedural
> loop? (to reiterate: I'm not allowed to use a custom aggregate or other
> PostgreSQL "advanced feature")
>

This is pretty much exactly how contrib/tablefunc's crosstab (non-hashed
version; crosstab(sourcesql, ncols)) works. If you really need it to be
portable, though, application layer procedural code is likely to be the
easiest and fastest way to go. crosstab just wraps the procedural code
in an SRF for you.

Joe

---------------------------(end of broadcast)---------------------------
TIP 6: Have you searched our list archives?

http://archives.postgresql.org

Re: SQL Challenge: Arbitrary Cross-tab

am 18.08.2004 13:01:24 von Gaetano Mendola

Greg Sabino Mullane wrote:

>
> Names shortened to spare the line lengths:
>
> SELECT bob.cid,
> (SELECT tid FROM ats WHERE cid=bob.cid LIMIT 1) AS tk1,
> (SELECT tid FROM ats WHERE cid=bob.cid LIMIT 1 OFFSET 1) AS tk2,
> (SELECT tid FROM ats WHERE cid=bob.cid LIMIT 1 OFFSET 2) AS tk3,
> (SELECT tid FROM ats WHERE cid=bob.cid LIMIT 1 OFFSET 3) AS tk4,
> (SELECT tid FROM ats WHERE cid=bob.cid LIMIT 1 OFFSET 4) AS tk5,
> (SELECT tid FROM ats WHERE cid=bob.cid LIMIT 1 OFFSET 5) AS tk6,
> (SELECT tid FROM ats WHERE cid=bob.cid LIMIT 1 OFFSET 6) AS tk7,
> (SELECT tid FROM ats WHERE cid=bob.cid LIMIT 1 OFFSET 7) AS tk8,
> FROM (SELECT DISTINCT cid FROM ats) AS bob;
>

Don't you miss for each subselect an order by tid ?

Regards
Gaetano Mendola

Re: SQL Challenge: Arbitrary Cross-tab

am 18.08.2004 13:47:41 von Greg

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

> Don't you miss for each subselect an order by tid ?

No: since all the SELECTs are part of one statement, they
will have the same (pseudo-random) implicit order. Since
Josh's requirement said the order of the tids* was not important,
I can be lazy and get away with it in this case. An order by
would not hurt, of course.

* As far as PG goes, this is not an ideal abbreviation! :)
- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200408180745
-----BEGIN PGP SIGNATURE-----

iD8DBQFBI0InvJuQZxSWSsgRApW7AKCpFN6TMQ3WjcJgZse5f+Ap6/Y7RACf SSlc
MZusqEadF2xZrE4PLOhmMek=
=pnQz
-----END PGP SIGNATURE-----

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Re: SQL Challenge: Arbitrary Cross-tab

am 18.08.2004 16:48:27 von Gaetano Mendola

Greg Sabino Mullane wrote:

>
>
>>>Don't you miss for each subselect an order by tid ?
>
>
> No: since all the SELECTs are part of one statement, they
> will have the same (pseudo-random) implicit order.

Is this guaranted ?

Regards
Gaetano Mendola

Re: SQL Challenge: Arbitrary Cross-tab

am 18.08.2004 17:13:35 von tgl

"Greg Sabino Mullane" writes:
>> Don't you miss for each subselect an order by tid ?

> No: since all the SELECTs are part of one statement, they
> will have the same (pseudo-random) implicit order.

Nope; Gaetano's right, you cannot assume that. It's entirely possible
for the planner to choose different plans depending on the OFFSET.
(Maybe not very likely, with such small offsets, but could happen.)

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Re: SQL Challenge: Arbitrary Cross-tab

am 18.08.2004 18:11:46 von Chris

Josh Berkus wrote:

>Folks,
>
>I have a wierd business case. Annoyingly it has to be written in *portable*
>SQL92, which means no arrays or custom aggregates. I think it may be
>impossible to do in SQL which is why I thought I'd give the people on this
>list a crack at it. Solver gets a free drink/lunch on me if we ever meet at
>a convention.
>
>
>
Might be possible. Would certainly be ugly.

>The Problem: for each "case" there are from zero to eight "timekeepers"
>authorized to work on the "case", out of a pool of 150 "timekeepers". This
>data is stored vertically:
>
>authorized_timekeepers:
>case_id | timekeeper_id
>213447 | 047
>132113 | 021
>132113 | 115
>132113 | 106
>etc.
>
>But, a client's e-billing application wants to see these timekeepers displayed
>in the following horizontal format:
>
>case_id | tk1 | tk2 | tk3 | tk4 | tk5 | tk6 | tk7 | tk8
>213447 | 047 | | | | | | | |
>132113 | 021 | 115 | 106 | 034 | 109 | 112 | 087 |
>etc.
>
>Order does not matter for timekeepers 1-8.
>
>This is a daunting problem because traditional crosstab solutions do not work;
>timekeepers 1-8 are coming out of a pool of 150.
>
>Can it be done? Or are we going to build this with a row-by-row procedural
>loop? (to reiterate: I'm not allowed to use a custom aggregate or other
>PostgreSQL "advanced feature")
>
>
>
If it can be done, it might be extremely ugly. I am thinking a massive
set of left self joins (since there could be between 0 and 8).

Something like:
select case_id FROM authorized_timekeeper t0
LEFT JOIN
(SELECT case_id, min(timekeeper_id) FROM authorized_timekeeper
GROUP BY case_id) t1
ON case_id
LEFT JOIN
(SELECT case_id, min(timekeeper_id) FROM authorized_timekeeper
WHERE timekeeper_id <> t1.timekeeper
GROUP BY case_id) t2
ON case_id
LEFT JOIN
(SELECT case_id, min(timekeeper_id) FROM authorized_timekeeper
WHERE timekeeper_id NOT IN (t1.timekeeper, t2.timekeeper)
GROUP BY case_id) t3
etc....

If this is not an option, instead I would create a series of views.
Something like:
CREATE VIEW t1 AS select case_id, min(timekeeper_id) AS tk_id
from authorized_timekeepers
group by case_id;
CREATE VIEW t2 AS select case_id, min(timekeeper_id) AS tk_id
from authorized_timekeepers
WHERE tk_id NOT IN (SELECT tk_id FROM t1)
group by case_id;
CREATE VIEW t3 AS select case_id, min(timekeeper_id) AS tk_id
FROM authorized_timekeepers
WHERE tk_id NOT IN (SELECT tk_id FROM t1)
AND tk_id NOT IN (SELECT tk_id FROM t2)
GROUP BY case_id;
Etc.
Then you do a left join among the views.

Hope that this helps.

Best Wishes,
Chris Travers

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Re: SQL Challenge: Arbitrary Cross-tab

am 18.08.2004 18:56:14 von Josh

Joe, Elein:

> This is pretty much exactly how contrib/tablefunc's crosstab (non-hashed
> version; crosstab(sourcesql, ncols)) works. If you really need it to be
> portable, though, application layer procedural code is likely to be the
> easiest and fastest way to go. crosstab just wraps the procedural code
> in an SRF for you.

No, you're missing one factor in the spec. Timekeeper_1 for case_id = 182738
is not the same timekeeper as Timekeeper_1 for case_id = 217437. That's why
traditional crosstab plans don't work.

Anyway, here's the wrap-up: I tried Stephan's idea, it works, but it's so
slow that we're going to to the procedural loop. Thanks, all!

--
Josh Berkus
Aglio Database Solutions
San Francisco

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Re: SQL Challenge: Arbitrary Cross-tab

am 18.08.2004 19:54:39 von mail

Josh Berkus wrote:
>>This is pretty much exactly how contrib/tablefunc's crosstab (non-hashed
>>version; crosstab(sourcesql, ncols)) works. If you really need it to be
>>portable, though, application layer procedural code is likely to be the
>>easiest and fastest way to go. crosstab just wraps the procedural code
>>in an SRF for you.
>
> No, you're missing one factor in the spec. Timekeeper_1 for case_id = 182738
> is not the same timekeeper as Timekeeper_1 for case_id = 217437. That's why
> traditional crosstab plans don't work.

No, I understood. E.g.

create table authorized_timekeepers (
case_id int,
timekeeper_id text
);

insert into authorized_timekeepers values(213447,'047');
insert into authorized_timekeepers values(132113,'021');
insert into authorized_timekeepers values(132113,'115');
insert into authorized_timekeepers values(132113,'106');
insert into authorized_timekeepers values(140000,'106');
insert into authorized_timekeepers values(140000,'021');
insert into authorized_timekeepers values(140000,'115');
insert into authorized_timekeepers values(140000,'108');
insert into authorized_timekeepers values(140000,'006');
insert into authorized_timekeepers values(140000,'042');
insert into authorized_timekeepers values(140000,'142');
insert into authorized_timekeepers values(140000,'064');
insert into authorized_timekeepers values(140000,'999');

select * from crosstab('select case_id, ''cat'' as cat, timekeeper_id
from authorized_timekeepers order by 1',8)
as t(case_id int, tk1 text, tk2 text, tk3 text, tk4 text, tk5 text, tk6
text, tk7 text, tk8 text);

case_id | tk1 | tk2 | tk3 | tk4 | tk5 | tk6 | tk7 | tk8
---------+-----+-----+-----+-----+-----+-----+-----+-----
132113 | 021 | 115 | 106 | | | | |
140000 | 106 | 021 | 115 | 108 | 006 | 042 | 142 | 064
213447 | 047 | | | | | | |
(3 rows)

Or even:

select * from crosstab('select case_id, ''cat'' as cat, timekeeper_id
from authorized_timekeepers order by 1',4)
as t(case_id int, tk1 text, tk2 text, tk3 text, tk4 text);

case_id | tk1 | tk2 | tk3 | tk4
---------+-----+-----+-----+-----
132113 | 021 | 115 | 106 |
140000 | 106 | 021 | 115 | 108
213447 | 047 | | |
(3 rows)

But I know that doesn't help you with portability.

Joe

---------------------------(end of broadcast)---------------------------
TIP 5: Have you checked our extensive FAQ?

http://www.postgresql.org/docs/faqs/FAQ.html

Re: SQL Challenge: Arbitrary Cross-tab

am 18.08.2004 20:52:56 von Josh

Joe,

> case_id | tk1 | tk2 | tk3 | tk4 | tk5 | tk6 | tk7 | tk8
> ---------+-----+-----+-----+-----+-----+-----+-----+-----
> 132113 | 021 | 115 | 106 | | | | |
> 140000 | 106 | 021 | 115 | 108 | 006 | 042 | 142 | 064
> 213447 | 047 | | | | | | |

Darn I wish this didn't have to be portable ....

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Re: SQL Challenge: Arbitrary Cross-tab

am 20.08.2004 04:16:50 von Greg

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

> Nope; Gaetano's right, you cannot assume that. It's entirely possible
> for the planner to choose different plans depending on the OFFSET.
> (Maybe not very likely, with such small offsets, but could happen.)

Interesting. I realized that there was a finite chance of different
plans being chosen, but it seemed neigh-impossible since there is
no WHERE clause and the offsets only vary from 0-7. What sort of
different plans would it choose, out of curiosity?
Seq-scan vs. index-scan? Are there any particular cases where the
same plan is guaranteed to be used?

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200408192216
-----BEGIN PGP SIGNATURE-----

iD8DBQFBJV9xvJuQZxSWSsgRAp74AJ96mtrKC1J53y0TPqTPdq2Xost0fACg 4DnJ
7P+dgpHWBazGNE9+SR7uxLY=
=MZuM
-----END PGP SIGNATURE-----

---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
subscribe-nomail command to majordomo@postgresql.org so that your
message can get through to the mailing list cleanly

Re: SQL Challenge: Arbitrary Cross-tab

am 20.08.2004 05:25:21 von tgl

"Greg Sabino Mullane" writes:
>> Nope; Gaetano's right, you cannot assume that. It's entirely possible
>> for the planner to choose different plans depending on the OFFSET.
>> (Maybe not very likely, with such small offsets, but could happen.)

> Interesting. I realized that there was a finite chance of different
> plans being chosen, but it seemed neigh-impossible since there is
> no WHERE clause and the offsets only vary from 0-7. What sort of
> different plans would it choose, out of curiosity?

For the particular trivial case you were showing (no WHERE, no GROUP BY,
no ORDER BY, no DISTINCT, no nada) it's probably true that only a
seqscan plan would ever be chosen. I was worrying that people would
take this example and try to add "just that other little thing" to it
and get burnt.

> Are there any particular cases where the same plan is guaranteed to be
> used?

No, I'd never care to make such a guarantee. The planner is entirely
driven by cost estimates. Even if I could say something definitive
about the behavior with the default cost parameters, it wouldn't
necessary hold up when someone had taken an axe to random_page_cost
or something like that. (It's not impossible that the thing would pick
an indexscan plan for even this trivial case, were you to set
random_page_cost below 1.)

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 8: explain analyze is your friend