pgsql 8.3.3 "Invalid page header" leads to "FATAL: segment too big"

pgsql 8.3.3 "Invalid page header" leads to "FATAL: segment too big"

am 22.04.2010 14:16:08 von Achilleus Mantzios

Hello,
i have this serious problem in one of our remote vessels. (comm is done by minicom to the remote satelite modem)
I think that this server was under some sort of constant resets or hardware failures.
Initially,i had this problem:
ERROR: invalid page header in block 672720 of relation "pg_toast_125716009"

This toast table corresponds to a table named "mail_message",
Table "public.mail_message"
Column | Type | Modifiers
-----------+-------------------+---------------------------- -------------------------------
msgno | mail_msgno_domain | not null default nextval('mail_msgno_sequence'::regclass)
msgsource | bytea |
Indexes:
"mail_message_key" PRIMARY KEY, btree (msgno)

(obviously the TOAST table serves the msgsource varlena) the contents of which is not of vital importance.
I tried, REINDEXING, with no success, and after that, i tried resetting the said block on disk as per this
suggestion by Tom here: http://old.nabble.com/invalid-page-header-td11981154.html

i found the oid of the table:
SELECT tableoid,1 from pg_toast.pg_toast_125716009 order by chunk_id LIMIT 1
tableoid | ?column?
-----------+----------
125716013 | 1

(and just to verify)
SELECT relname from pg_class where oid=125716013;
relname
--------------------
pg_toast_125716009

Then i did: (as i said i do not need the contents of msgsource - yet the contents of msgno are vital)
dd if=/dev/zero of=/usr/local/var/lib/pgsql/data/base/125714957/125716013 seek=672720 bs=8192 count=1

However, after that, unfortunately i get constant postgresql server restarts with:
FATAL: segment too big
server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
The connection to the server was lost. Attempting reset: Succeeded.

Is there anything i can do to savage the situation?

(one of) the hard part here is that i dont have neither physical nor network access to the server
(only ultra expensive unreliable satellite comms)

Thanks for any hints...

--
Achilleas Mantzios

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Re: pgsql 8.3.3 "Invalid page header" leads to "FATAL:

am 22.04.2010 15:53:05 von cedric.villemain.debian

2010/4/22 Achilleas Mantzios :
> Hello,
> i have this serious problem in one of our remote vessels. (comm is done b=
y minicom to the remote satelite modem)
> I think that this server was under some sort of constant resets or hardwa=
re failures.
> Initially,i had this problem:
> ERROR: =A0invalid page header in block 672720 of relation "pg_toast_12571=
6009"
>
> This toast table corresponds to a table named "mail_message",
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0Table "pub=
lic.mail_message"
> =A0Column =A0 | =A0 =A0 =A0 Type =A0 =A0 =A0 =A0| =A0 =A0 =A0 =A0 =A0 =A0=
=A0 =A0 =A0 =A0 =A0 =A0 Modifiers
> -----------+-------------------+---------------------------- -------------=
------------------
> =A0msgno =A0 =A0 | mail_msgno_domain | not null default nextval('mail_msg=
no_sequence'::regclass)
> =A0msgsource | bytea =A0 =A0 =A0 =A0 =A0 =A0 |
> Indexes:
> =A0 =A0"mail_message_key" PRIMARY KEY, btree (msgno)
>
> (obviously the TOAST table serves the msgsource varlena) the contents of =
which is not of vital importance.
> I tried, REINDEXING, with no success, and after that, i tried resetting t=
he said block on disk as per this
> suggestion by Tom here: http://old.nabble.com/invalid-page-header-td11981=
154.html
>
> i found the oid of the table:
> SELECT tableoid,1 from pg_toast.pg_toast_125716009 order by chunk_id LIMI=
T 1
> =A0tableoid =A0| ?column?
> -----------+----------
> =A0125716013 | =A0 =A0 =A0 =A01
>
> (and just to verify)
> SELECT relname from pg_class where oid=3D125716013;
> =A0 =A0 =A0relname
> --------------------
> =A0pg_toast_125716009
>
> Then i did: (as i said i do not need the contents of msgsource - yet the =
contents of msgno are vital)
> dd if=3D/dev/zero of=3D/usr/local/var/lib/pgsql/data/base/125714957/12571=
6013 seek=3D672720 bs=3D8192 count=3D1

segment have 1.1GB size maximum. You have to catch in what segment the
faulty block is, and reajust the block value from the error report to
the real one in the good segment.

>
> However, after that, unfortunately i get constant postgresql server resta=
rts with:
> FATAL: =A0segment too big
> server closed the connection unexpectedly
> =A0 =A0 =A0 =A0This probably means the server terminated abnormally
> =A0 =A0 =A0 =A0before or while processing the request.
> The connection to the server was lost. Attempting reset: Succeeded.
>
> Is there anything i can do to savage the situation?
>
> (one of) the hard part here is that i dont have neither physical nor netw=
ork access to the server
> (only ultra expensive unreliable satellite comms)
>
> Thanks for any hints...
>
> --
> Achilleas Mantzios
>
> --
> Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-admin
>



--=20
C=E9dric Villemain

--=20
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Re: pgsql 8.3.3 "Invalid page header" leads to "FATAL: segment too big"

am 22.04.2010 16:42:33 von Tom Lane

Achilleas Mantzios writes:
> Then i did: (as i said i do not need the contents of msgsource - yet the contents of msgno are vital)
> dd if=/dev/zero of=/usr/local/var/lib/pgsql/data/base/125714957/125716013 seek=672720 bs=8192 count=1

> However, after that, unfortunately i get constant postgresql server restarts with:
> FATAL: segment too big

You got the dd command wrong and made the file size change to something
it shouldn't be. I think you can use dd to truncate the file back to
what it should be (ie, exactly 1GB) but haven't had enough caffeine to
remember exactly how.

I think the underlying error is that block 672720 isn't going to be in
the first segment of the table --- you need to be zeroing something in
one of the other segments...

regards, tom lane

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Re: pgsql 8.3.3 "Invalid page header" leads to "FATAL: segment too big"

am 22.04.2010 17:18:51 von Achilleus Mantzios

Στις Thursday 22 April 2010 16:53:05 ο/η C=C3=
=A9dric Villemain έγραψε:
> 2010/4/22 Achilleas Mantzios :
> > Hello,
> > i have this serious problem in one of our remote vessels. (comm is done=
by minicom to the remote satelite modem)
> > I think that this server was under some sort of constant resets or hard=
ware failures.
> > Initially,i had this problem:
> > ERROR:  invalid page header in block 672720 of relation "pg_toast_=
125716009"
> >
> > This toast table corresponds to a table named "mail_message",
> >                     =
           Table "public.mail_message"
> >  Column   |       Type       =
 |                   =C2=
=A0     Modifiers
> > -----------+-------------------+---------------------------- -----------=
--------------------
> >  msgno     | mail_msgno_domain | not null default nextva=
l('mail_msgno_sequence'::regclass)
> >  msgsource | bytea             |
> > Indexes:
> >    "mail_message_key" PRIMARY KEY, btree (msgno)
> >
> > (obviously the TOAST table serves the msgsource varlena) the contents o=
f which is not of vital importance.
> > I tried, REINDEXING, with no success, and after that, i tried resetting=
the said block on disk as per this
> > suggestion by Tom here: http://old.nabble.com/invalid-page-header-td119=
81154.html
> >
> > i found the oid of the table:
> > SELECT tableoid,1 from pg_toast.pg_toast_125716009 order by chunk_id LI=
MIT 1
> >  tableoid  | ?column?
> > -----------+----------
> >  125716013 |        1
> >
> > (and just to verify)
> > SELECT relname from pg_class where oid=3D125716013;
> >      relname
> > --------------------
> >  pg_toast_125716009
> >
> > Then i did: (as i said i do not need the contents of msgsource - yet th=
e contents of msgno are vital)
> > dd if=3D/dev/zero of=3D/usr/local/var/lib/pgsql/data/base/125714957/125=
716013 seek=3D672720 bs=3D8192 count=3D1
>=20
> segment have 1.1GB size maximum. You have to catch in what segment the
> faulty block is, and reajust the block value from the error report to
> the real one in the good segment.
>=20

Thanx,
Taking a look at /usr/local/src/postgresql-8.3.3/src/backend/storage/smgr/m=
d.c
i see the the error comes from function mdnblocks

if (nblocks > ((BlockNumber) RELSEG_SIZE))
elog(FATAL, "segment too big");

That means, that some segment file is bigger than RELSEG_SIZE
At least in my system:
#define BLCKSZ 8192
#define RELSEG_SIZE (0x40000000 / BLCKSZ)
So, any segment file cannot be bigger than RELSEG_SIZE blocks (or 2^30 byte=
s =3D 1GB)

Currently i dont have any access to the machine but tomorrow i will check t=
he file sizes.

Can anyone shed some light as to some method of identifying all the segment=
files of a table?
The first one has the same name as the tableoid.=20
How about the subsequent segments?

> >
> > However, after that, unfortunately i get constant postgresql server res=
tarts with:
> > FATAL:  segment too big
> > server closed the connection unexpectedly
> >        This probably means the server terminated ab=
normally
> >        before or while processing the request.
> > The connection to the server was lost. Attempting reset: Succeeded.
> >
> > Is there anything i can do to savage the situation?
> >
> > (one of) the hard part here is that i dont have neither physical nor ne=
twork access to the server
> > (only ultra expensive unreliable satellite comms)
> >
> > Thanks for any hints...
> >
> > --
> > Achilleas Mantzios
> >
> > --
> > Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-admin
> >
>=20
>=20
>=20



--=20
Achilleas Mantzios

--=20
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Re: pgsql 8.3.3 "Invalid page header" leads to "FATAL: segment too big"

am 22.04.2010 17:25:20 von Achilleus Mantzios

Στις Thursday 22 April 2010 17:42:33 γρά=
ψατε:
> Achilleas Mantzios writes:
> > Then i did: (as i said i do not need the contents of msgsource - yet th=
e contents of msgno are vital)
> > dd if=3D/dev/zero of=3D/usr/local/var/lib/pgsql/data/base/125714957/125=
716013 seek=3D672720 bs=3D8192 count=3D1=20
>=20
> > However, after that, unfortunately i get constant postgresql server res=
tarts with:
> > FATAL: segment too big
>=20
> You got the dd command wrong and made the file size change to something
> it shouldn't be. I think you can use dd to truncate the file back to
> what it should be (ie, exactly 1GB) but haven't had enough caffeine to
> remember exactly how.
>=20
> I think the underlying error is that block 672720 isn't going to be in
> the first segment of the table --- you need to be zeroing something in
> one of the other segments...

Great, thanx a lot,
any idea about the naming of the files of those other segments?

>=20
> regards, tom lane
>=20



--=20
Achilleas Mantzios

--=20
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Re: pgsql 8.3.3 "Invalid page header" leads to "FATAL:

am 22.04.2010 18:02:00 von cedric.villemain.debian

2010/4/22 Achilleas Mantzios :
> Στις Thursday 22 April 2010 16:53:05 ο/η C=
édric Villemain έγραψε:
>> 2010/4/22 Achilleas Mantzios :
>> > Hello,
>> > i have this serious problem in one of our remote vessels. (comm is don=
e by minicom to the remote satelite modem)
>> > I think that this server was under some sort of constant resets or har=
dware failures.
>> > Initially,i had this problem:
>> > ERROR:  invalid page header in block 672720 of relation "pg_toast=
_125716009"
>> >
>> > This toast table corresponds to a table named "mail_message",
>> >                     =
           Table "public.mail_message"
>> >  Column   |       Type       =
 |                   =C2=
=A0     Modifiers
>> > -----------+-------------------+---------------------------- ----------=
---------------------
>> >  msgno     | mail_msgno_domain | not null default nextv=
al('mail_msgno_sequence'::regclass)
>> >  msgsource | bytea             |
>> > Indexes:
>> >    "mail_message_key" PRIMARY KEY, btree (msgno)
>> >
>> > (obviously the TOAST table serves the msgsource varlena) the contents =
of which is not of vital importance.
>> > I tried, REINDEXING, with no success, and after that, i tried resettin=
g the said block on disk as per this
>> > suggestion by Tom here: http://old.nabble.com/invalid-page-header-td11=
981154.html
>> >
>> > i found the oid of the table:
>> > SELECT tableoid,1 from pg_toast.pg_toast_125716009 order by chunk_id L=
IMIT 1
>> >  tableoid  | ?column?
>> > -----------+----------
>> >  125716013 |        1
>> >
>> > (and just to verify)
>> > SELECT relname from pg_class where oid=3D125716013;
>> >      relname
>> > --------------------
>> >  pg_toast_125716009
>> >
>> > Then i did: (as i said i do not need the contents of msgsource - yet t=
he contents of msgno are vital)
>> > dd if=3D/dev/zero of=3D/usr/local/var/lib/pgsql/data/base/125714957/12=
5716013 seek=3D672720 bs=3D8192 count=3D1
>>
>> segment have 1.1GB size maximum. You have to catch in what segment the
>> faulty block is, and reajust the block value from the error report to
>> the real one in the good segment.
>>
>
> Thanx,
> Taking a look at /usr/local/src/postgresql-8.3.3/src/backend/storage/smgr=
/md.c
> i see the the error comes from function mdnblocks
>
> if (nblocks > ((BlockNumber) RELSEG_SIZE))
>                     =C2=
=A0  elog(FATAL, "segment too big");
>
> That means, that some segment file is bigger than RELSEG_SIZE
> At least in my system:
> #define BLCKSZ   8192
> #define RELSEG_SIZE (0x40000000 / BLCKSZ)
> So, any segment file cannot be bigger than RELSEG_SIZE blocks (or 2^30 by=
tes =3D 1GB)
>
> Currently i dont have any access to the machine but tomorrow i will check=
the file sizes.
>
> Can anyone shed some light as to some method of identifying all the segme=
nt files of a table?
> The first one has the same name as the tableoid.
> How about the subsequent segments?

Your execution of dd make your first segment bigger than expected.
Other segment have the same name with a .1 .2 etc suffix.

You have to shrink your first segment to the correct size.
check what happens, you should have now the original error.

And, I have never used it, but I think it is the purpose of
zero_damaged_pages to parameter to allow postgresql itself to zero the
bad black. (reading
src/backend/storage/buffer/bufmgr.c confirm that.

*BUT* take care that it will zero *every* bad page, perhaps not only
the one trapping an error.

In those situation, it is good to make a snapshot of the pgdata
directory, in case your fingers surf too fast on the keyboard....

If you don't want to activate zero_damage_page, then go and calculate
which block in which segment you have to zeroing.

side note, it may be usefull to have the relevant information in the
error message...

>
>> >
>> > However, after that, unfortunately i get constant postgresql server re=
starts with:
>> > FATAL:  segment too big
>> > server closed the connection unexpectedly
>> >        This probably means the server terminated a=
bnormally
>> >        before or while processing the request.
>> > The connection to the server was lost. Attempting reset: Succeeded.
>> >
>> > Is there anything i can do to savage the situation?
>> >
>> > (one of) the hard part here is that i dont have neither physical nor n=
etwork access to the server
>> > (only ultra expensive unreliable satellite comms)
>> >
>> > Thanks for any hints...
>> >
>> > --
>> > Achilleas Mantzios
>> >
>> > --
>> > Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
>> > To make changes to your subscription:
>> > http://www.postgresql.org/mailpref/pgsql-admin
>> >
>>
>>
>>
>
>
>
> --
> Achilleas Mantzios
>



--=20
Cédric Villemain

--=20
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Re: pgsql 8.3.3 "Invalid page header" leads to "FATAL: segment too big"

am 23.04.2010 11:06:38 von Achilleus Mantzios

Στις Thursday 22 April 2010 19:02:00 ο/η C=C3=
=A9dric Villemain έγραψε:
> 2010/4/22 Achilleas Mantzios :
> > Στις Thursday 22 April 2010 16:53:05 ο/η =
Cédric Villemain έγραψε:
> >> 2010/4/22 Achilleas Mantzios :
> >> > Hello,
> >> > i have this serious problem in one of our remote vessels. (comm is d=
one by minicom to the remote satelite modem)
> >> > I think that this server was under some sort of constant resets or h=
ardware failures.
> >> > Initially,i had this problem:
> >> > ERROR:  invalid page header in block 672720 of relation "pg_toa=
st_125716009"
> >> >
> >> > This toast table corresponds to a table named "mail_message",
> >> >                   =C2=
=A0            Table "public.mail_message"
> >> >  Column   |       Type      =
 |                   =C2=
=A0     Modifiers
> >> > -----------+-------------------+---------------------------- --------=
-----------------------
> >> >  msgno     | mail_msgno_domain | not null default nex=
tval('mail_msgno_sequence'::regclass)
> >> >  msgsource | bytea             |
> >> > Indexes:
> >> >    "mail_message_key" PRIMARY KEY, btree (msgno)
> >> >
> >> > (obviously the TOAST table serves the msgsource varlena) the content=
s of which is not of vital importance.
> >> > I tried, REINDEXING, with no success, and after that, i tried resett=
ing the said block on disk as per this
> >> > suggestion by Tom here: http://old.nabble.com/invalid-page-header-td=
11981154.html
> >> >
> >> > i found the oid of the table:
> >> > SELECT tableoid,1 from pg_toast.pg_toast_125716009 order by chunk_id=
LIMIT 1
> >> >  tableoid  | ?column?
> >> > -----------+----------
> >> >  125716013 |        1
> >> >
> >> > (and just to verify)
> >> > SELECT relname from pg_class where oid=3D125716013;
> >> >      relname
> >> > --------------------
> >> >  pg_toast_125716009
> >> >
> >> > Then i did: (as i said i do not need the contents of msgsource - yet=
the contents of msgno are vital)
> >> > dd if=3D/dev/zero of=3D/usr/local/var/lib/pgsql/data/base/125714957/=
125716013 seek=3D672720 bs=3D8192 count=3D1
> >>
> >> segment have 1.1GB size maximum. You have to catch in what segment the
> >> faulty block is, and reajust the block value from the error report to
> >> the real one in the good segment.
> >>
> >
> > Thanx,
> > Taking a look at /usr/local/src/postgresql-8.3.3/src/backend/storage/sm=
gr/md.c
> > i see the the error comes from function mdnblocks
> >
> > if (nblocks > ((BlockNumber) RELSEG_SIZE))
> >                     =
   elog(FATAL, "segment too big");
> >
> > That means, that some segment file is bigger than RELSEG_SIZE
> > At least in my system:
> > #define BLCKSZ   8192
> > #define RELSEG_SIZE (0x40000000 / BLCKSZ)
> > So, any segment file cannot be bigger than RELSEG_SIZE blocks (or 2^30 =
bytes =3D 1GB)
> >
> > Currently i dont have any access to the machine but tomorrow i will che=
ck the file sizes.
> >
> > Can anyone shed some light as to some method of identifying all the seg=
ment files of a table?
> > The first one has the same name as the tableoid.
> > How about the subsequent segments?
>=20
> Your execution of dd make your first segment bigger than expected.
> Other segment have the same name with a .1 .2 etc suffix.
>=20
> You have to shrink your first segment to the correct size.
> check what happens, you should have now the original error.
>=20
> And, I have never used it, but I think it is the purpose of
> zero_damaged_pages to parameter to allow postgresql itself to zero the
> bad black. (reading
> src/backend/storage/buffer/bufmgr.c confirm that.
>=20
> *BUT* take care that it will zero *every* bad page, perhaps not only
> the one trapping an error.
>=20
> In those situation, it is good to make a snapshot of the pgdata
> directory, in case your fingers surf too fast on the keyboard....
>=20
> If you don't want to activate zero_damage_page, then go and calculate
> which block in which segment you have to zeroing.
>=20
> side note, it may be usefull to have the relevant information in the
> error message...
>=20


Many Thanks, Cédric Villemain and Tom
What i did was first to correct the first segment file with smth like
dd if=3D216293737 of=3D216293737.good seek=3D0 bs=3D8192 count=3D131072
which effectively truncates all but the first 131072 blocks (or 2^30 bytes =
=3D 1GB)

After that was done, and restarting postgresql backend, then i fell back to=
the situation
with the Invalid page header, as noted before.

I stopped the backend and calculated the exact segment file and offset wher=
e the problem was:
The block with the invalid header was the block with number: 672720=20
Now each segment contains at most 131072 blocks, with all but the last
containing exactly 131072 blocks.
So my problematic segment was the one with number:
672720 /131072 =3D 5
and the block offset inside this segment was:
672720 - (5*131072) =3D 17360

so i tried to zero that patricular block with

dd conv=3Dnotrunc if=3D216293737.5 of=3D216293737.5.GOOD seek=3D17360 bs=3D=
8192 count=3D1

i started postgresql and i threw out a warning about initializing this zero=
page.
After that, i reported error in header for block 672740,
i repeated the same procedure, and i was able to go further and even reinde=
x the whole database
which went fine.

I tried as the ultimate test (and a useful one at this point) to pg_dump th=
e database.
Unfortunately one table seems to be in error: the error is:

pg_dump: SQL command failed
pg_dump: Error message from server: ERROR: compressed data is corrupt
pg_dump: The command was: COPY public.mail_entity (msgno, entno, entparentn=
o....

I think the initial issue of this thread is solved, i'll come back with new=
s on the pg_dump issue.

> >
> >> >
> >> > However, after that, unfortunately i get constant postgresql server =
restarts with:
> >> > FATAL:  segment too big
> >> > server closed the connection unexpectedly
> >> >        This probably means the server terminated=
abnormally
> >> >        before or while processing the request.
> >> > The connection to the server was lost. Attempting reset: Succeeded.
> >> >
> >> > Is there anything i can do to savage the situation?
> >> >
> >> > (one of) the hard part here is that i dont have neither physical nor=
network access to the server
> >> > (only ultra expensive unreliable satellite comms)
> >> >
> >> > Thanks for any hints...
> >> >
> >> > --
> >> > Achilleas Mantzios
> >> >
> >> > --
> >> > Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
> >> > To make changes to your subscription:
> >> > http://www.postgresql.org/mailpref/pgsql-admin
> >> >
> >>
> >>
> >>
> >
> >
> >
> > --
> > Achilleas Mantzios
> >
>=20
>=20
>=20
> --=20
> Cédric Villemain
>=20



--=20
Achilleas Mantzios

--=20
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin