Slow pg_dump

Slow pg_dump

am 13.04.2008 02:59:09 von Ryan Wells

This is a multi-part message in MIME format.

------_=_NextPart_001_01C89D01.C91D02AF
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable


We're having what seem like serious performance issues with pg_dump, and =
I hope someone can help. =20

We have several tables that are used to store binary data as bytea (in =
this example image files), but we're having similar time issues with =
text tables as well.

In my most recent test, the sample table was about 5 GB in 1644 rows, =
with image files sizes between 1 MB and 35 MB. The server was a 3.0 GHz =
P4 running WinXP, with 2 GB of ram, the backup stored to a separate disk =
from the data, and little else running on the sytem.

We're doing the following:
=20
pg_dump -i -h localhost -p 5432 -U postgres -F c -v -f =
"backupTest.backup" -t "public"."images" db_name

In the test above, this took 1hr 45min to complete. Since we expect to =
have users with 50-100GB of data, if not more, backup times that take =
nearly an entire day are unacceptable.

We think there must be something we're doing wrong. A search turned up a =
similar thread =
(http://archives.postgresql.org/pgsql-performance/2007-12/ms g00404.php), =
but our number are so much higher than those that we must be doing =
something very wrong. Hopefully, either there's a server setting or =
pg_dump option we need to change, but we're open to design changes if =
necessary.

Can anyone who has dealt with this before advise us?

Thanks!
Ryan


=20




------_=_NextPart_001_01C89D01.C91D02AF
Content-Type: text/html;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable




charset=3Diso-8859-1">
6.5.7652.24">
Slow pg_dump






We're having what seem like serious performance issues =
with pg_dump, and I hope someone can help. 



We have several tables that are used to store binary data as bytea (in =
this example image files), but we're having similar time issues with =
text tables as well.



In my most recent test, the sample table was about 5 GB in 1644 rows, =
with image files sizes between 1 MB and 35 MB.  The server was a =
3.0 GHz P4 running WinXP, with 2 GB of ram, the backup stored to a =
separate disk from the data, and little else running on the sytem.



We're doing the following:



pg_dump -i -h localhost -p 5432 -U postgres -F c -v -f =
"backupTest.backup" -t "public"."images" =
db_name



In the test above, this took 1hr 45min to complete.  Since we =
expect to have users with 50-100GB of data, if not more, backup times =
that take nearly an entire day are unacceptable.



We think there must be something we're doing wrong. A search turned up a =
similar thread ( HREF=3D"http://archives.postgresql.org/pgsql-performance/200 7-12/msg00404=
..php">http://archives.postgresql.org/pgsql-performance/2007 -12/msg00404.p=
hp
), but our number are so much higher than those that we must be =
doing something very wrong.  Hopefully, either there's a server =
setting or pg_dump option we need to change, but we're open to design =
changes if necessary.



Can anyone who has dealt with this before advise us?



Thanks!

Ryan


















------_=_NextPart_001_01C89D01.C91D02AF--

Re: Slow pg_dump

am 13.04.2008 04:46:06 von Tom Lane

"Ryan Wells" writes:
> We have several tables that are used to store binary data as bytea (in
> this example image files),

Precompressed image formats, no doubt?

> pg_dump -i -h localhost -p 5432 -U postgres -F c -v -f "backupTest.backup" -t "public"."images" db_name

Try it with -Z0, or even drop the -Fc completely, since it's certainly
not very helpful on a single-table dump. Re-compressing already
compressed data is not only useless but impressively slow ...

Also, drop the -i, that's nothing but a foot-gun.

regards, tom lane

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Re: Slow pg_dump

am 15.04.2008 00:35:32 von Ryan Wells

The images are stored in whatever format our users load them as, so we
don't have any control over their compression or lack thereof.

I ran pg_dump with the arguments you suggested, and my 4 GB test table
finished backing up in about 25 minutes, which seems great. The only
problem is that the resulting backup file was over 9 GB. Using -Z2
resulting in a 55 minute 6GB backup.

Here's my interpretation of those results: the TOAST tables for our
image files are compressed by Postgres. During the backup, pg_dump
uncompresses them, and if compression is turned on, recompresses the
backup. Please correct me if I'm wrong there.

If we can't find a workable balance using pg_dump, then it looks like
our next best alternative may be a utility to handle filesystem backups,
which is a little scary for on-site, user-controlled servers.

Ryan

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]=20
Sent: Saturday, April 12, 2008 9:46 PM
To: Ryan Wells
Cc: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] Slow pg_dump=20

"Ryan Wells" writes:
> We have several tables that are used to store binary data as bytea (in

> this example image files),

Precompressed image formats, no doubt?
=20
> pg_dump -i -h localhost -p 5432 -U postgres -F c -v -f=20
> "backupTest.backup" -t "public"."images" db_name

Try it with -Z0, or even drop the -Fc completely, since it's certainly
not very helpful on a single-table dump. Re-compressing already
compressed data is not only useless but impressively slow ...

Also, drop the -i, that's nothing but a foot-gun.

regards, tom lane

--=20
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Re: Slow pg_dump

am 15.04.2008 01:22:40 von Phillip Smith

> I ran pg_dump with the arguments you suggested, and my 4 GB test table
finished
> backing up in about 25 minutes, which seems great. The only problem is
that the
> resulting backup file was over 9 GB. Using -Z2 resulting in a 55 minute
6GB backup.
>=20
> Here's my interpretation of those results: the TOAST tables for our image
files
> are compressed by Postgres. During the backup, pg_dump uncompresses them,
and if
> compression is turned on, recompresses the backup. Please correct me if
I'm wrong
> there.
>=20
> If we can't find a workable balance using pg_dump, then it looks like our
next
> best alternative may be a utility to handle filesystem backups, which is a
littlescary for on-site, user-controlled servers.

Ryan

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]=20
Sent: Saturday, April 12, 2008 9:46 PM
To: Ryan Wells
Cc: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] Slow pg_dump=20

"Ryan Wells" writes:
> We have several tables that are used to store binary data as bytea (in

> this example image files),

Precompressed image formats, no doubt?
=20
> pg_dump -i -h localhost -p 5432 -U postgres -F c -v -f
> "backupTest.backup" -t "public"."images" db_name

Try it with -Z0, or even drop the -Fc completely, since it's certainly not
very helpful on a single-table dump. Re-compressing already compressed data
is not only useless but impressively slow ...

Also, drop the -i, that's nothing but a foot-gun.

regards, tom lane

--=20
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org) To make
changes to your subscription: http://www.postgresql.org/mailpref/pgsql-admin


THINK BEFORE YOU PRINT - Save paper if you don't really need to print this

*******************Confidentiality and Privilege Notice*******************

The material contained in this message is privileged and confidential to
the addressee. If you are not the addressee indicated in this message or
responsible for delivery of the message to such person, you may not copy
or deliver this message to anyone, and you should destroy it and kindly
notify the sender by reply email.

Information in this message that does not relate to the official business
of Weatherbeeta must be treated as neither given nor endorsed by Weatherbee=
ta.
Weatherbeeta, its employees, contractors or associates shall not be liable
for direct, indirect or consequential loss arising from transmission of this
message or any attachments
e-mail.

--=20
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Re: Slow pg_dump

am 15.04.2008 01:26:10 von Phillip Smith

(Sorry, hit send too soon!)

> I ran pg_dump with the arguments you suggested, and my 4 GB test
> table finished backing up in about 25 minutes, which seems great.
> The only problem is that the resulting backup file was over 9 GB.
> Using -Z2 resulting in a 55 minute 6GB backup.
>
> Here's my interpretation of those results: the TOAST tables for
> our image files are compressed by Postgres. During the backup,
> pg_dump uncompresses them, and if compression is turned on,
> recompresses the backup. Please correct me if I'm wrong there.
>
> If we can't find a workable balance using pg_dump, then it looks
> like our next best alternative may be a utility to handle
> filesystem backups, which is a little scary for on-site,
> user-controlled servers.

How about a post-backup compress?
pg_dump -Z0 > uncompressed-backup.sql
gzip uncompressed-backup.sql
mv uncompressed-backup.sql.gz compressed-backup.sql.gz

Your backup is completed in reasonable time, you're just handling
the storage of the backup afterwards, while users can be using the
System again...


THINK BEFORE YOU PRINT - Save paper if you don't really need to print this

*******************Confidentiality and Privilege Notice*******************

The material contained in this message is privileged and confidential to
the addressee. If you are not the addressee indicated in this message or
responsible for delivery of the message to such person, you may not copy
or deliver this message to anyone, and you should destroy it and kindly
notify the sender by reply email.

Information in this message that does not relate to the official business
of Weatherbeeta must be treated as neither given nor endorsed by Weatherbeeta.
Weatherbeeta, its employees, contractors or associates shall not be liable
for direct, indirect or consequential loss arising from transmission of this
message or any attachments
e-mail.

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Re: Slow pg_dump

am 15.04.2008 02:46:07 von Tena Sakai

This is a multi-part message in MIME format.

------_=_NextPart_001_01C89E92.171D7B40
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable

Hi,

I have never dealt with tables that are made
of compressed data, but I back up the database
via crontab file like this:


.
.
filename=3D`date +%G%m%d.%w`.gz
/usr/local/pgsql/bin/pg_dumpall | gzip > /some_destination/$filename
.
.

Hope this helps.

Tena Sakai
tsakai@gallo.ucsf.edu


-----Original Message-----
From: pgsql-admin-owner@postgresql.org on behalf of Ryan Wells
Sent: Sat 4/12/2008 5:59 PM
To: Ryan Wells; pgsql-admin@postgresql.org
Subject: [ADMIN] Slow pg_dump
=20

We're having what seem like serious performance issues with pg_dump, and =
I hope someone can help. =20

We have several tables that are used to store binary data as bytea (in =
this example image files), but we're having similar time issues with =
text tables as well.

In my most recent test, the sample table was about 5 GB in 1644 rows, =
with image files sizes between 1 MB and 35 MB. The server was a 3.0 GHz =
P4 running WinXP, with 2 GB of ram, the backup stored to a separate disk =
from the data, and little else running on the sytem.

We're doing the following:
=20
pg_dump -i -h localhost -p 5432 -U postgres -F c -v -f =
"backupTest.backup" -t "public"."images" db_name

In the test above, this took 1hr 45min to complete. Since we expect to =
have users with 50-100GB of data, if not more, backup times that take =
nearly an entire day are unacceptable.

We think there must be something we're doing wrong. A search turned up a =
similar thread =
(http://archives.postgresql.org/pgsql-performance/2007-12/ms g00404.php), =
but our number are so much higher than those that we must be doing =
something very wrong. Hopefully, either there's a server setting or =
pg_dump option we need to change, but we're open to design changes if =
necessary.

Can anyone who has dealt with this before advise us?

Thanks!
Ryan


=20





------_=_NextPart_001_01C89E92.171D7B40
Content-Type: text/html;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable




charset=3Diso-8859-1">
6.5.7651.59">
RE: [ADMIN] Slow pg_dump




Hi,



I have never dealt with tables that are made

of compressed data, but I back up the database

via crontab file like this:



     <some envrironment variable setup>

            &=
nbsp; .

            &=
nbsp; .

  filename=3D`date +%G%m%d.%w`.gz

  /usr/local/pgsql/bin/pg_dumpall | gzip > =
/some_destination/$filename

            &=
nbsp; .

            &=
nbsp; .



Hope this helps.



Tena Sakai

tsakai@gallo.ucsf.edu





-----Original Message-----

From: pgsql-admin-owner@postgresql.org on behalf of Ryan Wells

Sent: Sat 4/12/2008 5:59 PM

To: Ryan Wells; pgsql-admin@postgresql.org

Subject: [ADMIN] Slow pg_dump





We're having what seem like serious performance issues with pg_dump, and =
I hope someone can help. 



We have several tables that are used to store binary data as bytea (in =
this example image files), but we're having similar time issues with =
text tables as well.



In my most recent test, the sample table was about 5 GB in 1644 rows, =
with image files sizes between 1 MB and 35 MB.  The server was a =
3.0 GHz P4 running WinXP, with 2 GB of ram, the backup stored to a =
separate disk from the data, and little else running on the sytem.



We're doing the following:



pg_dump -i -h localhost -p 5432 -U postgres -F c -v -f =
"backupTest.backup" -t "public"."images" =
db_name



In the test above, this took 1hr 45min to complete.  Since we =
expect to have users with 50-100GB of data, if not more, backup times =
that take nearly an entire day are unacceptable.



We think there must be something we're doing wrong. A search turned up a =
similar thread ( HREF=3D"http://archives.postgresql.org/pgsql-performance/200 7-12/msg00404=
..php">http://archives.postgresql.org/pgsql-performance/2007 -12/msg00404.p=
hp
), but our number are so much higher than those that we must be =
doing something very wrong.  Hopefully, either there's a server =
setting or pg_dump option we need to change, but we're open to design =
changes if necessary.



Can anyone who has dealt with this before advise us?



Thanks!

Ryan




















------_=_NextPart_001_01C89E92.171D7B40--

Re: Slow pg_dump

am 15.04.2008 02:58:48 von Tom Lane

"Phillip Smith" writes:
>> Here's my interpretation of those results: the TOAST tables for
>> our image files are compressed by Postgres. During the backup,
>> pg_dump uncompresses them, and if compression is turned on,
>> recompresses the backup. Please correct me if I'm wrong there.

No, the TOAST tables aren't compressed, they're pretty much going to be
the raw image data (plus a bit of overhead).

What I think is happening is that COPY OUT is encoding the bytea
data fairly inefficiently (one byte could go to \\nnn, five bytes)
and the compression on the pg_dump side isn't doing very well at buying
that back.

I experimented a bit and noticed that pg_dump -Fc is a great deal
smarter about storing large objects than big bytea fields --- it seems
to be pretty nearly one-to-one with the original data size when storing
a compressed file that was put into a large object. I dunno if it's
practical for you to switch from bytea to large objects, but in the near
term I think that's your only option if the dump file size is a
showstopper problem for you.

regards, tom lane

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Re: Slow pg_dump

am 15.04.2008 03:08:33 von Ryan Wells

This is a multi-part message in MIME format.

------_=_NextPart_001_01C89E95.39EFC9A8
Content-Type: text/plain;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable


Thanks for the info on TOAST. We're still finding our legs with =
Postgres after several years on MySQL.

We do have the flexibility to adjust our data types and schema if we =
need to. We try to keep it to a minimum, but it's doable. I'm =
completely open to the possibility that we just have a very inefficient =
DB design or that we're misusing the data types.

We'll be running some more tests looking for the sweet spot between time =
and size. I expect we'll find a good balance somewhere.

Thanks!

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Mon 4/14/2008 7:58 PM
To: Phillip Smith
Cc: Ryan Wells; pgsql-admin@postgresql.org
Subject: Re: [ADMIN] Slow pg_dump=20
=20
"Phillip Smith" writes:
>> Here's my interpretation of those results: the TOAST tables for
>> our image files are compressed by Postgres. During the backup,
>> pg_dump uncompresses them, and if compression is turned on,
>> recompresses the backup. Please correct me if I'm wrong there.

No, the TOAST tables aren't compressed, they're pretty much going to be
the raw image data (plus a bit of overhead).

What I think is happening is that COPY OUT is encoding the bytea
data fairly inefficiently (one byte could go to \\nnn, five bytes)
and the compression on the pg_dump side isn't doing very well at buying
that back.

I experimented a bit and noticed that pg_dump -Fc is a great deal
smarter about storing large objects than big bytea fields --- it seems
to be pretty nearly one-to-one with the original data size when storing
a compressed file that was put into a large object. I dunno if it's
practical for you to switch from bytea to large objects, but in the near
term I think that's your only option if the dump file size is a
showstopper problem for you.

regards, tom lane


------_=_NextPart_001_01C89E95.39EFC9A8
Content-Type: text/html;
charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable




charset=3Diso-8859-1">
6.5.7652.24">
RE: [ADMIN] Slow pg_dump






Thanks for the info on TOAST.  We're still =
finding our legs with Postgres after several years on MySQL.



We do have the flexibility to adjust our data types and schema if we =
need to.  We try to keep it to a minimum, but it's doable.  =
I'm completely open to the possibility that we just have a very =
inefficient DB design or that we're misusing the data types.



We'll be running some more tests looking for the sweet spot between time =
and size.  I expect we'll find a good balance somewhere.



Thanks!



-----Original Message-----

From: Tom Lane [ HREF=3D"mailto:tgl@sss.pgh.pa.us">mailto:tgl@sss.pgh.pa.us]

Sent: Mon 4/14/2008 7:58 PM

To: Phillip Smith

Cc: Ryan Wells; pgsql-admin@postgresql.org

Subject: Re: [ADMIN] Slow pg_dump



"Phillip Smith" <phillip.smith@weatherbeeta.com.au> =
writes:

>> Here's my interpretation of those results: the TOAST tables =
for

>> our image files are compressed by Postgres.  During the =
backup,

>> pg_dump uncompresses them, and if compression is turned on,

>> recompresses the backup.  Please correct me if I'm wrong =
there.



No, the TOAST tables aren't compressed, they're pretty much going to =
be

the raw image data (plus a bit of overhead).



What I think is happening is that COPY OUT is encoding the bytea

data fairly inefficiently (one byte could go to \\nnn, five bytes)

and the compression on the pg_dump side isn't doing very well at =
buying

that back.



I experimented a bit and noticed that pg_dump -Fc is a great deal

smarter about storing large objects than big bytea fields --- it =
seems

to be pretty nearly one-to-one with the original data size when =
storing

a compressed file that was put into a large object.  I dunno if =
it's

practical for you to switch from bytea to large objects, but in the =
near

term I think that's your only option if the dump file size is a

showstopper problem for you.



        =
        =
        regards, tom lane








------_=_NextPart_001_01C89E95.39EFC9A8--