WAL and archive disks full

WAL and archive disks full

am 23.08.2010 23:21:54 von Kieren Scott

--_f0cafb56-cd55-440c-87ec-a70ec0539fff_
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable


Hi=2C

What would be the best course of action for resolving a situation whereby y=
our=20
postgres instance had crashed due to the wal disk and archive wal disk beco=
ming 100% full? Say
your backups have been failing and your 'monitoring' had not reported it co=
rrectly.

You can't start the instance because it needs to write to the WAL disk (whi=
ch is full)=2C but if you
manually move WAL files off the WAL disk=2C the archiver will fail because =
it can't find WAL files it
needs to archive. The instance may also still be in backup mode=2C because =
the backups had not=20
completed due to the disk full situation.

Being new to postgres=2C im trying to understand what actions need to be ta=
ken to get the instance=20
back up and running without compromising recoverability...?

Thanks in advance.
=

--_f0cafb56-cd55-440c-87ec-a70ec0539fff_
Content-Type: text/html; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable






Hi=2C

What would be the best course of action for resolving a situat=
ion whereby your
postgres instance had crashed due to the wal disk and =
archive wal disk becoming 100% full? Say
your backups have been failing =
and your 'monitoring' had not reported it correctly.

You can't start=
the instance because it needs to write to the WAL disk (which is full)=2C =
but if you
manually move WAL files off the WAL disk=2C the archiver will=
fail because it can't find WAL files it
needs to archive. The instance =
may also still be in backup mode=2C because the backups had not
complet=
ed due to the disk full situation.

Being new to postgres=2C im tryin=
g to understand what actions need to be taken to get the instance
back =
up and running without compromising recoverability...?

Thanks in adv=
ance.

=

--_f0cafb56-cd55-440c-87ec-a70ec0539fff_--

Re: WAL and archive disks full

am 23.08.2010 23:47:57 von Kevin Grittner

Kieren Scott wrote:

> What would be the best course of action for resolving a situation
> whereby your postgres instance had crashed due to the wal disk and
> archive wal disk becoming 100% full? Say your backups have been
> failing and your 'monitoring' had not reported it correctly.
>
> You can't start the instance because it needs to write to the WAL
> disk (which is full), but if you manually move WAL files off the
> WAL disk, the archiver will fail because it can't find WAL files
> it needs to archive. The instance may also still be in backup
> mode, because the backups had not completed due to the disk full
> situation.
>
> Being new to postgres, im trying to understand what actions need
> to be taken to get the instance back up and running without
> compromising recoverability...?

You will get more detailed advice if you avoid hypotheticals and say
exactly what's going on and what your priorities are. For starters,
are you OK with a situation which gets your primary database running
again and lets you start over with a new base backup, or is it
critical that you continue your backup stream without having to take
a new base backup? My advice would depend on that answer to that.

Also, it would be helpful to have an idea what your various mount
points are, how big they are, and what's on them. (If there's
something else *also* on the same mount point as the WAL files, that
might make a difference. What do you mean, exactly, when you say
your wal disk and archive wal disk are 100% full? (Are those
separate mount points? Did the archive fail to restore, thereby
building up to where the archive later began to fail, or is it a
shared drive?)

-Kevin

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Re: WAL and archive disks full

am 24.08.2010 00:16:22 von Kieren Scott

--_a9f36063-c2b6-48cf-bf65-db79862bb3bf_
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable


Apologies for the hypothetical scenario=2C I was trying to gain a greater
understanding of what actions postgres would require in order to get the in=
stance
started without any errors (such as archiver errors because wal files had b=
een=20
wrongly manually deleted in order to free up space).

I'd be happy with a sitution which lets us start over again with a new base=
backup.

We have separate mount points for wal=2C and archived wal filesystems. Noth=
ing
else apart from wal files are written to the filesystems.

I noticed a situation recently whereby our backup scripts had been failing=
=2C and the script
had subsequently not been clearing down the archive wal filesysytem after a=
successful backup.
The wal filesystem was almost full because the archive_command couldn't cop=
y wal files=20
to the archive filesystem.

Sorry it's a bit of a what-if scenario. I can envisage encountering a situa=
tion in the future
whereby we hit this problem=2C and I was trying to put a plan in place for =
how to deal with it.

Thanks in advance.



> Date: Mon=2C 23 Aug 2010 16:47:57 -0500
> From: Kevin.Grittner@wicourts.gov
> To: kierenscott@hotmail.com=3B pgsql-admin@postgresql.org
> Subject: Re: [ADMIN] WAL and archive disks full
>=20
> Kieren Scott wrote:
> =20
> > What would be the best course of action for resolving a situation
> > whereby your postgres instance had crashed due to the wal disk and
> > archive wal disk becoming 100% full? Say your backups have been
> > failing and your 'monitoring' had not reported it correctly.
> >=20
> > You can't start the instance because it needs to write to the WAL
> > disk (which is full)=2C but if you manually move WAL files off the
> > WAL disk=2C the archiver will fail because it can't find WAL files
> > it needs to archive. The instance may also still be in backup
> > mode=2C because the backups had not completed due to the disk full
> > situation.
> >=20
> > Being new to postgres=2C im trying to understand what actions need
> > to be taken to get the instance back up and running without
> > compromising recoverability...?
> =20
> You will get more detailed advice if you avoid hypotheticals and say
> exactly what's going on and what your priorities are. For starters=2C
> are you OK with a situation which gets your primary database running
> again and lets you start over with a new base backup=2C or is it
> critical that you continue your backup stream without having to take
> a new base backup? My advice would depend on that answer to that.
> =20
> Also=2C it would be helpful to have an idea what your various mount
> points are=2C how big they are=2C and what's on them. (If there's
> something else *also* on the same mount point as the WAL files=2C that
> might make a difference. What do you mean=2C exactly=2C when you say
> your wal disk and archive wal disk are 100% full? (Are those
> separate mount points? Did the archive fail to restore=2C thereby
> building up to where the archive later began to fail=2C or is it a
> shared drive?)
> =20
> -Kevin
>=20
> --=20
> Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-admin
=

--_a9f36063-c2b6-48cf-bf65-db79862bb3bf_
Content-Type: text/html; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable






Apologies for the hypothetical scenario=2C I was trying to gain a greater r>understanding of what actions postgres would require in order to get the =
instance
started without any errors (such as archiver errors because wal=
files had been
wrongly manually deleted in order to free up space). >
I'd be happy with a sitution which lets us start over again with a new=
base backup.

We have separate mount points for wal=2C and archived =
wal filesystems. Nothing
else apart from wal files are written to the fi=
lesystems.

I noticed a situation recently whereby our backup scripts=
had been failing=2C and the script
had subsequently not been clearing d=
own the archive wal filesysytem after a successful backup.
The wal files=
ystem was almost full because the archive_command couldn't copy wal files <=
br>to the archive filesystem.

Sorry it's a bit of a what-if scenario=
.. I can envisage encountering a situation in the future
whereby we hit t=
his problem=2C and I was trying to put a plan in place for how to deal with=
it.

Thanks in advance.



>=3B Date: Mon=2C 23 Aug 20=
10 16:47:57 -0500
>=3B From: Kevin.Grittner@wicourts.gov
>=3B To:=
kierenscott@hotmail.com=3B pgsql-admin@postgresql.org
>=3B Subject: R=
e: [ADMIN] WAL and archive disks full
>=3B
>=3B Kieren Scott <=
=3Bkierenscott@hotmail.com>=3B wrote:
>=3B
>=3B >=3B What w=
ould be the best course of action for resolving a situation
>=3B >=
=3B whereby your postgres instance had crashed due to the wal disk and
&=
gt=3B >=3B archive wal disk becoming 100% full? Say your backups have bee=
n
>=3B >=3B failing and your 'monitoring' had not reported it correc=
tly.
>=3B >=3B
>=3B >=3B You can't start the instance becaus=
e it needs to write to the WAL
>=3B >=3B disk (which is full)=2C but=
if you manually move WAL files off the
>=3B >=3B WAL disk=2C the ar=
chiver will fail because it can't find WAL files
>=3B >=3B it needs =
to archive. The instance may also still be in backup
>=3B >=3B mode=
=2C because the backups had not completed due to the disk full
>=3B &g=
t=3B situation.
>=3B >=3B
>=3B >=3B Being new to postgres=2C=
im trying to understand what actions need
>=3B >=3B to be taken to =
get the instance back up and running without
>=3B >=3B compromising =
recoverability...?
>=3B
>=3B You will get more detailed advice =
if you avoid hypotheticals and say
>=3B exactly what's going on and wh=
at your priorities are. For starters=2C
>=3B are you OK with a situat=
ion which gets your primary database running
>=3B again and lets you s=
tart over with a new base backup=2C or is it
>=3B critical that you co=
ntinue your backup stream without having to take
>=3B a new base backu=
p? My advice would depend on that answer to that.
>=3B
>=3B Al=
so=2C it would be helpful to have an idea what your various mount
>=3B=
points are=2C how big they are=2C and what's on them. (If there's
>=
=3B something else *also* on the same mount point as the WAL files=2C that<=
br>>=3B might make a difference. What do you mean=2C exactly=2C when you=
say
>=3B your wal disk and archive wal disk are 100% full? (Are thos=
e
>=3B separate mount points? Did the archive fail to restore=2C ther=
eby
>=3B building up to where the archive later began to fail=2C or is=
it a
>=3B shared drive?)
>=3B
>=3B -Kevin
>=3B
&=
gt=3B --
>=3B Sent via pgsql-admin mailing list (pgsql-admin@postgres=
ql.org)
>=3B To make changes to your subscription:
>=3B http://ww=
w.postgresql.org/mailpref/pgsql-admin

=

--_a9f36063-c2b6-48cf-bf65-db79862bb3bf_--

Re: WAL and archive disks full

am 24.08.2010 00:41:59 von Kevin Grittner

Kieren Scott wrote:

> Sorry it's a bit of a what-if scenario. I can envisage
> encountering a situation in the future whereby we hit this
> problem, and I was trying to put a plan in place for how to deal
> with it.

Oh, OK. I was afraid you had actually *hit* this situation and were
being coy. No need to apologize for contingency planning! :-)

Hypothetically, in the situation where the stall originated with the
application of files from the archive, fixing that end would and
clearing files from the archive directory (or moving or deleting old
ones if they were applying cleanly and just sitting there after
application), would allow the archive process to resume copying and
cleaning up files on the source database.

If someone panicked and deleted files from the pg_xlog directory,
well, the first thing is to try to make sure nobody does that. You
might be able to turn off archiving and get the server to come up.
If not, start by making a complete copy of your data directory and
all of its subdirectories while PostgreSQL is stopped, because you
may wander into trouble and want to try again. If you can't start
with archiving turned off, you might want to look at this:

http://www.postgresql.org/docs/current/interactive/app-pgres etxlog.html

Of course, you want to monitor closely to ensure your backups are
running correctly so you never need any of the above advice. ;-)

-Kevin

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Re: WAL and archive disks full

am 24.08.2010 01:04:49 von Kieren Scott

--_2c013622-cac6-4ff6-8dd8-4cc775aa7388_
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable


Thanks Kevin.

So if the wal filesystem is 100% full=2C can you actually startup postgres =
in archiving mode (so the archive process can resume copying)? Presumably p=
ostgres will try to write to the wal filesystem when you start it=2C and fa=
il due to the filesystem full and then just shutdown/abort? Wouldnt you hav=
e to free some space in the wal filesystem in order to get postgres up and =
running?

Thanks for you help.

> Date: Mon=2C 23 Aug 2010 17:41:59 -0500
> From: Kevin.Grittner@wicourts.gov
> To: kierenscott@hotmail.com=3B pgsql-admin@postgresql.org
> Subject: Re: [ADMIN] WAL and archive disks full
>=20
> Kieren Scott wrote:
> =20
> > Sorry it's a bit of a what-if scenario. I can envisage
> > encountering a situation in the future whereby we hit this
> > problem=2C and I was trying to put a plan in place for how to deal
> > with it.
> =20
> Oh=2C OK. I was afraid you had actually *hit* this situation and were
> being coy. No need to apologize for contingency planning! :-)
> =20
> Hypothetically=2C in the situation where the stall originated with the
> application of files from the archive=2C fixing that end would and
> clearing files from the archive directory (or moving or deleting old
> ones if they were applying cleanly and just sitting there after
> application)=2C would allow the archive process to resume copying and
> cleaning up files on the source database.
> =20
> If someone panicked and deleted files from the pg_xlog directory=2C
> well=2C the first thing is to try to make sure nobody does that. You
> might be able to turn off archiving and get the server to come up.=20
> If not=2C start by making a complete copy of your data directory and
> all of its subdirectories while PostgreSQL is stopped=2C because you
> may wander into trouble and want to try again. If you can't start
> with archiving turned off=2C you might want to look at this:
> =20
> http://www.postgresql.org/docs/current/interactive/app-pgres etxlog.html
> =20
> Of course=2C you want to monitor closely to ensure your backups are
> running correctly so you never need any of the above advice. =3B-)
> =20
> -Kevin
>=20
> --=20
> Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-admin
=

--_2c013622-cac6-4ff6-8dd8-4cc775aa7388_
Content-Type: text/html; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable






Thanks Kevin.

So if the wal filesystem is 100% full=2C can you actua=
lly startup postgres in archiving mode (so the archive process can resume c=
opying)? Presumably postgres will try to write to the wal filesystem when y=
ou start it=2C and fail due to the filesystem full and then just shutdown/a=
bort? Wouldnt you have to free some space in the wal filesystem in order to=
get postgres up and running?

Thanks for you help.

>=3B Dat=
e: Mon=2C 23 Aug 2010 17:41:59 -0500
>=3B From: Kevin.Grittner@wicourt=
s.gov
>=3B To: kierenscott@hotmail.com=3B pgsql-admin@postgresql.org r>>=3B Subject: Re: [ADMIN] WAL and archive disks full
>=3B
>=
=3B Kieren Scott <=3Bkierenscott@hotmail.com>=3B wrote:
>=3B
=
>=3B >=3B Sorry it's a bit of a what-if scenario. I can envisage
>=
=3B >=3B encountering a situation in the future whereby we hit this
&g=
t=3B >=3B problem=2C and I was trying to put a plan in place for how to d=
eal
>=3B >=3B with it.
>=3B
>=3B Oh=2C OK. I was afraid=
you had actually *hit* this situation and were
>=3B being coy. No ne=
ed to apologize for contingency planning! :-)
>=3B
>=3B Hypoth=
etically=2C in the situation where the stall originated with the
>=3B =
application of files from the archive=2C fixing that end would and
>=
=3B clearing files from the archive directory (or moving or deleting old >>=3B ones if they were applying cleanly and just sitting there after
=
>=3B application)=2C would allow the archive process to resume copying an=
d
>=3B cleaning up files on the source database.
>=3B
>=3B=
If someone panicked and deleted files from the pg_xlog directory=2C
>=
=3B well=2C the first thing is to try to make sure nobody does that. You r>>=3B might be able to turn off archiving and get the server to come up.=

>=3B If not=2C start by making a complete copy of your data director=
y and
>=3B all of its subdirectories while PostgreSQL is stopped=2C be=
cause you
>=3B may wander into trouble and want to try again. If you =
can't start
>=3B with archiving turned off=2C you might want to look a=
t this:
>=3B
>=3B http://www.postgresql.org/docs/current/intera=
ctive/app-pgresetxlog.html
>=3B
>=3B Of course=2C you want to m=
onitor closely to ensure your backups are
>=3B running correctly so yo=
u never need any of the above advice. =3B-)
>=3B
>=3B -Kevin r>>=3B
>=3B --
>=3B Sent via pgsql-admin mailing list (pgsql-=
admin@postgresql.org)
>=3B To make changes to your subscription:
&g=
t=3B http://www.postgresql.org/mailpref/pgsql-admin

=

--_2c013622-cac6-4ff6-8dd8-4cc775aa7388_--

Re: WAL and archive disks full

am 24.08.2010 02:17:02 von Tom Lane

Kieren Scott writes:
> [ hypothetical scenario: ]
> You can't start the instance because it needs to write to the WAL disk
> (which is full), but if you manually move WAL files off the WAL disk,
> the archiver will fail because it can't find WAL files it needs to
> archive.

Uh, no, that shouldn't be a problem. You can manually move the same WAL
files that the archiver would move. Look into the
pg_xlog/archive_status subdirectory. Any WAL files that have a ".ready"
file in there can be moved to archive, and then you delete the .ready
file, and you're good to go.

Of course, if you don't have any .ready files, you're going to need to
look elsewhere for some disk space to reclaim :-(

regards, tom lane

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Re: WAL and archive disks full

am 24.08.2010 11:39:56 von Kieren Scott

--_679c714a-6fd4-4f33-98be-24946974306e_
Content-Type: text/plain; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable


Thank you.
Kieren

> To: kierenscott@hotmail.com
> CC: pgsql-admin@postgresql.org
> Subject: Re: [ADMIN] WAL and archive disks full=20
> Date: Mon=2C 23 Aug 2010 20:17:02 -0400
> From: tgl@sss.pgh.pa.us
>=20
> Kieren Scott writes:
> > [ hypothetical scenario: ]
> > You can't start the instance because it needs to write to the WAL disk
> > (which is full)=2C but if you manually move WAL files off the WAL disk=
=2C
> > the archiver will fail because it can't find WAL files it needs to
> > archive.
>=20
> Uh=2C no=2C that shouldn't be a problem. You can manually move the same =
WAL
> files that the archiver would move. Look into the
> pg_xlog/archive_status subdirectory. Any WAL files that have a ".ready"
> file in there can be moved to archive=2C and then you delete the .ready
> file=2C and you're good to go.
>=20
> Of course=2C if you don't have any .ready files=2C you're going to need t=
o
> look elsewhere for some disk space to reclaim :-(
>=20
> regards=2C tom lane
>=20
> --=20
> Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-admin
=

--_679c714a-6fd4-4f33-98be-24946974306e_
Content-Type: text/html; charset="iso-8859-1"
Content-Transfer-Encoding: quoted-printable






Thank you.
Kieren

>=3B To: kierenscott@hotmail.com
>=3B CC=
: pgsql-admin@postgresql.org
>=3B Subject: Re: [ADMIN] WAL and archive=
disks full
>=3B Date: Mon=2C 23 Aug 2010 20:17:02 -0400
>=3B Fr=
om: tgl@sss.pgh.pa.us
>=3B
>=3B Kieren Scott <=3Bkierenscott@h=
otmail.com>=3B writes:
>=3B >=3B [ hypothetical scenario: ]
>=
=3B >=3B You can't start the instance because it needs to write to the WA=
L disk
>=3B >=3B (which is full)=2C but if you manually move WAL fil=
es off the WAL disk=2C
>=3B >=3B the archiver will fail because it c=
an't find WAL files it needs to
>=3B >=3B archive.
>=3B
>=
=3B Uh=2C no=2C that shouldn't be a problem. You can manually move the sam=
e WAL
>=3B files that the archiver would move. Look into the
>=
=3B pg_xlog/archive_status subdirectory. Any WAL files that have a ".ready=
"
>=3B file in there can be moved to archive=2C and then you delete th=
e .ready
>=3B file=2C and you're good to go.
>=3B
>=3B Of c=
ourse=2C if you don't have any .ready files=2C you're going to need to
&=
gt=3B look elsewhere for some disk space to reclaim :-(
>=3B
>=
=3B regards=2C tom lane
>=3B
>=3B --
>=3B Sent via pgsq=
l-admin mailing list (pgsql-admin@postgresql.org)
>=3B To make changes=
to your subscription:
>=3B http://www.postgresql.org/mailpref/pgsql-a=
dmin

=

--_679c714a-6fd4-4f33-98be-24946974306e_--