How to bind to a LARGE array

How to bind to a LARGE array

am 01.05.2006 14:21:22 von bealach

Hi folks,

Apologies if this has been asked and resolved - I've spent half a day
searching various archive, but have not found anything.

My problem is the following:

I have a very large postgresql table (ca. 3 mil records) and need to
replace a nested sql query with an "itarative" one so I end up with:



@names is a very large array - of the order of 100000 elements.


sub prep_sql {
my $stmt =3D < SELECT j.job_log_id
FROM job_log j, node n
WHERE n.name in ? <============ (@names=
) should be here
=20
DBI documentations says
=20
this must be a scalar
AND n.type =3D ?
AND n.usage =3D ?
AND n.node_id =3D j.node_id
EOSQL

my @bind =3D (@names,'B','LIVE'); <=========3Dcan't use=
@names here
return ($stmt,@bind);
} # end sub prep_sql


I know how to use sql subqueries to do this in one go in postgresql,
but I think this might be a lot faster?

Has anyone tried the same sort of thing?

Regards,

Bealach

Re: How to bind to a LARGE array

am 02.05.2006 02:01:20 von dnm

On May 1, Bealach Na Bo scribed:

> Hi folks,
>
> Apologies if this has been asked and resolved - I've spent half a day
> searching various archive, but have not found anything.
>
> My problem is the following:
>
> I have a very large postgresql table (ca. 3 mil records) and need to
> replace a nested sql query with an "itarative" one so I end up with:
>
>
>
> @names is a very large array - of the order of 100000 elements.
>
>
> sub prep_sql {
> my $stmt = < > SELECT j.job_log_id
> FROM job_log j, node n
> WHERE n.name in ? <============ (@names) should be here
>
> DBI documentations says
>
> this must be a scalar
> AND n.type = ?
> AND n.usage = ?
> AND n.node_id = j.node_id
> EOSQL
>
> my @bind = (@names,'B','LIVE'); <=========can't use @names here
> return ($stmt,@bind);
> } # end sub prep_sql
>
>
> I know how to use sql subqueries to do this in one go in postgresql,
> but I think this might be a lot faster?
>
> Has anyone tried the same sort of thing?
>
> Regards,
>
> Bealach
>

Why not create a temp table, insert @names into it, and just do a join?

HTH,
Dave

Re: How to bind to a LARGE array

am 02.05.2006 11:04:42 von Alexander

If @names was a few records, you could write something like
select ... where name in (?,?,?,?,?)
using some Perl code to generate the placeholders ( join(',', ('?') x
scalar @names) ). But with 100,000 records, i think this would bring
DBI, DBD::Pg and the Pg SQL parser to their limits. Usually, I strongly
oppose to using temporary tables. But in your case, I would create a
temporary table (using some unique identifier, like hostname + PID +
random number, as the table name), insert @names into that table, and
then let Pg do the job:

create table temptable2744143 ....;

insert into temptable2744143 (name) values (?);
(for each record in @names)

SELECT j.job_log_id
FROM job_log j, node n , temptable2744143 t
WHERE n.name = t.name
AND n.type = ?
AND n.usage = ?
AND n.node_id = j.node_id;

drop table temptable2744143;


Where does @names come from?

Alexander

On 01.05.2006 14:21, Bealach Na Bo wrote:

> Hi folks,
>
> Apologies if this has been asked and resolved - I've spent half a day
> searching various archive, but have not found anything.
>
> My problem is the following:
>
> I have a very large postgresql table (ca. 3 mil records) and need to
> replace a nested sql query with an "itarative" one so I end up with:
>
>
>
> @names is a very large array - of the order of 100000 elements.
>
>
> sub prep_sql {
> my $stmt = < > SELECT j.job_log_id
> FROM job_log j, node n
> WHERE n.name in ? <============ (@names) should be here
>
> DBI documentations says
>
> this must be a scalar
> AND n.type = ?
> AND n.usage = ?
> AND n.node_id = j.node_id
> EOSQL
>
> my @bind = (@names,'B','LIVE'); <=========can't use @names here
> return ($stmt,@bind);
> } # end sub prep_sql
>
>
> I know how to use sql subqueries to do this in one go in postgresql,
> but I think this might be a lot faster?
>
> Has anyone tried the same sort of thing?
>
> Regards,
>
> Bealach
>


--
Alexander Foken
mailto:alexander@foken.de http://www.foken.de/alexander/

Re: How to bind to a LARGE array

am 03.05.2006 10:53:58 von bealach

Thank you all for your comments. I think a temporary table is
unavoidable if I want to control the way the query is done here (as
opposed to letting the planner do its job).

The @names array is the result of an earlier DBI query.

I agree, a Perl snippet to generate the scalar @names place holders
would probably push the query over max query size, as well as various
other name space limits. Probably a bad idea.

And iterating over the array and issuing scalar @names queries is not
something I want to try.

So I'm left with temporary tables.

Bealach



On 5/2/06, Alexander Foken wrote:
> If @names was a few records, you could write something like
> select ... where name in (?,?,?,?,?)
> using some Perl code to generate the placeholders ( join(',', ('?') x
> scalar @names) ). But with 100,000 records, i think this would bring
> DBI, DBD::Pg and the Pg SQL parser to their limits. Usually, I strongly
> oppose to using temporary tables. But in your case, I would create a
> temporary table (using some unique identifier, like hostname + PID +
> random number, as the table name), insert @names into that table, and
> then let Pg do the job:
>
> create table temptable2744143 ....;
>
> insert into temptable2744143 (name) values (?);
> (for each record in @names)
>
> SELECT j.job_log_id
> FROM job_log j, node n , temptable2744143 t
> WHERE n.name =3D t.name
> AND n.type =3D ?
> AND n.usage =3D ?
> AND n.node_id =3D j.node_id;
>
> drop table temptable2744143;
>
>
> Where does @names come from?
>
> Alexander
>
> On 01.05.2006 14:21, Bealach Na Bo wrote:
>
> > Hi folks,
> >
> > Apologies if this has been asked and resolved - I've spent half a day
> > searching various archive, but have not found anything.
> >
> > My problem is the following:
> >
> > I have a very large postgresql table (ca. 3 mil records) and need to
> > replace a nested sql query with an "itarative" one so I end up with:
> >
> >
> >
> > @names is a very large array - of the order of 100000 elements.
> >
> >
> > sub prep_sql {
> > my $stmt =3D < > > SELECT j.job_log_id
> > FROM job_log j, node n
> > WHERE n.name in ? <============ (@na=
mes) should be here
> >
> > DBI documentations says
> >
> > this must be a scalar
> > AND n.type =3D ?
> > AND n.usage =3D ?
> > AND n.node_id =3D j.node_id
> > EOSQL
> >
> > my @bind =3D (@names,'B','LIVE'); <=========3Dcan't =
use @names here
> > return ($stmt,@bind);
> > } # end sub prep_sql
> >
> >
> > I know how to use sql subqueries to do this in one go in postgresql,
> > but I think this might be a lot faster?
> >
> > Has anyone tried the same sort of thing?
> >
> > Regards,
> >
> > Bealach
> >
>
>
> --
> Alexander Foken
> mailto:alexander@foken.de http://www.foken.de/alexander/
>
>

Re: How to bind to a LARGE array

am 03.05.2006 14:10:53 von Alexander

On 03.05.2006 10:53, Bealach Na Bo wrote:

> Thank you all for your comments. I think a temporary table is
> unavoidable if I want to control the way the query is done here (as
> opposed to letting the planner do its job).
>
> The @names array is the result of an earlier DBI query.


So why don't you just combine the earlier query and this query into a
new query?

If @names comes from something like "SELECT something as names FROM a
lot of tables WHERE a lot of conditions match", merge it with the query
you posted earlier:


SELECT j.job_log_id
FROM job_log j, node n
WHERE n.name in ( SELECT something as names FROM a lot of tables
WHERE a lot of conditions match )
AND n.type = ?
AND n.usage = ?
AND n.node_id = j.node_id

This seems to be the cleanest and most natural way to me. It even allows
the query optimizer of the Database do do its job better than with two
queries that don't seem to be related to each other to the query optimizer.

Temporary tables are not needed here (and cause more trouble than
needed) -- unless @names comes from another database with no ability to
link to the database you wand to query "now". If, for example, we talk
about two Oracle DBs, even on different machines, even with different
versions of Oracle, even with different Operating systems, and even on
different platforms, there is a way to connect both databases so that
the contents of one appears to be in the other. In that case, you should
ask an Oracle expert for more details about "database links". (I know
*this* setup from my former job, but I never had to work with it, so I
lack some knowledge here.) I think MSSQL has a similar capability. For
MySQL and PostgreSQL, this should at least not be impossible. I would
recommend to RTFM.

>
> I agree, a Perl snippet to generate the scalar @names place holders
> would probably push the query over max query size, as well as various
> other name space limits. Probably a bad idea.
>
> And iterating over the array and issuing scalar @names queries is not
> something I want to try.
>
> So I'm left with temporary tables.

No. Rethink the way you issue the queries. You would be stuck with
temporary tables if @names came from a plain file or a device, but not
if @names comes from the same or a similar database.

Alexander

>
> Bealach
>
>
>
> On 5/2/06, Alexander Foken wrote:
>
>> If @names was a few records, you could write something like
>> select ... where name in (?,?,?,?,?)
>> using some Perl code to generate the placeholders ( join(',', ('?') x
>> scalar @names) ). But with 100,000 records, i think this would bring
>> DBI, DBD::Pg and the Pg SQL parser to their limits. Usually, I strongly
>> oppose to using temporary tables. But in your case, I would create a
>> temporary table (using some unique identifier, like hostname + PID +
>> random number, as the table name), insert @names into that table, and
>> then let Pg do the job:
>>
>> create table temptable2744143 ....;
>>
>> insert into temptable2744143 (name) values (?);
>> (for each record in @names)
>>
>> SELECT j.job_log_id
>> FROM job_log j, node n , temptable2744143 t
>> WHERE n.name = t.name
>> AND n.type = ?
>> AND n.usage = ?
>> AND n.node_id = j.node_id;
>>
>> drop table temptable2744143;
>>
>>
>> Where does @names come from?
>>
>> Alexander
>>
>> On 01.05.2006 14:21, Bealach Na Bo wrote:
>>
>> > Hi folks,
>> >
>> > Apologies if this has been asked and resolved - I've spent half a day
>> > searching various archive, but have not found anything.
>> >
>> > My problem is the following:
>> >
>> > I have a very large postgresql table (ca. 3 mil records) and need to
>> > replace a nested sql query with an "itarative" one so I end up with:
>> >
>> >
>> >
>> > @names is a very large array - of the order of 100000 elements.
>> >
>> >
>> > sub prep_sql {
>> > my $stmt = < >> > SELECT j.job_log_id
>> > FROM job_log j, node n
>> > WHERE n.name in ? <============ (@names) should be here
>> >
>> > DBI documentations says
>> >
>> > this must be a scalar
>> > AND n.type = ?
>> > AND n.usage = ?
>> > AND n.node_id = j.node_id
>> > EOSQL
>> >
>> > my @bind = (@names,'B','LIVE'); <=========can't use @names here
>> > return ($stmt,@bind);
>> > } # end sub prep_sql
>> >
>> >
>> > I know how to use sql subqueries to do this in one go in postgresql,
>> > but I think this might be a lot faster?
>> >
>> > Has anyone tried the same sort of thing?
>> >
>> > Regards,
>> >
>> > Bealach
>> >
>>
>>
>> --
>> Alexander Foken
>> mailto:alexander@foken.de http://www.foken.de/alexander/
>>
>>
>


--
Alexander Foken
mailto:alexander@foken.de http://www.foken.de/alexander/

Re: How to bind to a LARGE array

am 04.05.2006 01:55:53 von ron

On Wed, 3 May 2006 10:53:58 +0200, Bealach Na Bo wrote:

Hi Bealach

> I agree, a Perl snippet to generate the scalar @names place holders
> would probably push the query over max query size, as well as
> various other name space limits. Probably a bad idea.

I found with Postgres V 7 I had to limit the length of the SQL to avoid the=

error: 'expression too complex'. So I picked 2000 place holders at a time.
MySQL handled up to 10,600 in one expression, so it was the difference in=
max
sql length: Postgres =3D> 32K chars, MySQL =3D> 64K chars. But really all to=
long to
be good style.

--
Cheers
Ron Savage, ron@savage.net.au on 4/05/2006
http://savage.net.au/index.html
Let the record show: Microsoft is not an Australian company

Re: How to bind to a LARGE array

am 04.05.2006 15:41:28 von Greg

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


> I found with Postgres V 7 I had to limit the length of the SQL to avoid the
> error: 'expression too complex'. So I picked 2000 place holders at a time.
> MySQL handled up to 10,600 in one expression, so it was the difference in max
> sql length: Postgres => 32K chars, MySQL => 64K chars. But really all to long to
> be good style.

If you really want to cause yourself such pain in Postgres, you can now change the
"stack_depth_limit" inside of postgresdql.conf and then HUP the server. The default
is 2048 (kilobytes), but you can set it as high as your OS can handle.

As others have said though, there are far more elegant solutions. :)

- --
Greg Sabino Mullane greg@turnstep.com
PGP Key: 0x14964AC8 200605040938
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B90 6714964AC8
-----BEGIN PGP SIGNATURE-----

iD8DBQFEWgRzvJuQZxSWSsgRAmf6AJ9kyMFazDOye4+YT5lT5ZGRTxEsHQCe IHFU
nLfGOqD5cY52thuhyop4f2I=
=6paY
-----END PGP SIGNATURE-----

Re: How to bind to a LARGE array

am 05.05.2006 01:35:30 von ron

On Thu, 4 May 2006 13:41:28 -0000, Greg Sabino Mullane wrote:

Hi Greg

> If you really want to cause yourself such pain in Postgres, you can

In my defense (:-), I would like to point out I wasn't even trying to
stress-test Postgres in this case. One user had figured out a combination of=

inputs on a CGI form which triggered this effect.
--
Cheers
Ron Savage, ron@savage.net.au on 5/05/2006
http://savage.net.au/index.html
Let the record show: Microsoft is not an Australian company