Re: PostgreSQL server terminated by signal 11

Re: PostgreSQL server terminated by signal 11

am 28.07.2006 18:09:48 von Daniel Caune

> De=A0: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Envoyé : vendredi, juillet 28, 2006 09:38
> À : Daniel Caune
> Cc=A0: pgsql-admin@postgresql.org; pgsql-sql@postgresql.org
> Objet=A0: Re: [SQL] PostgreSQL server terminated by signal 11
>=20
> "Daniel Caune" writes:
> > Program received signal SIGSEGV, Segmentation fault.
> > 0x08079e2a in slot_attisnull ()
> > (gdb) bt
> > #0 0x08079e2a in slot_attisnull ()
> > #1 0x0807a1d0 in slot_getattr ()
> > #2 0x080c6c73 in FormIndexDatum ()
> > #3 0x080c6ef1 in IndexBuildHeapScan ()
> > #4 0x0809b44d in btbuild ()
> > #5 0x0825dfdd in OidFunctionCall3 ()
> > #6 0x080c4f95 in index_build ()
> > #7 0x080c68eb in index_create ()
> > #8 0x08117e36 in DefineIndex ()
>=20
> Hmph. gdb is lying to you, because slot_getattr doesn't call
> slot_attisnull.
> This isn't too unusual in a non-debug build, because the symbol table is
> incomplete (no mention of non-global functions).
>=20
> Given that this doesn't happen right away, but only after it's been
> processing for awhile, we can assume that FormIndexDatum has been
> successfully iterated many times already, which seems to eliminate
> theories like the slot or the keycol value being bogus. I'm pretty well
> convinced now that we're looking at a problem with corrupted data. Can
> you do a SELECT * FROM (or COPY FROM) the table without error?
>=20
> regards, tom lane

The statement "copy gslog_event to stdout;" leads to "ERROR: invalid memor=
y alloc request size 4294967293" after awhile.

(...)
354964834 2006-07-19 10:53:42.813+00 (...)
354964835 2006-07-19 10:53:44.003+00 (...)
ERROR: invalid memory alloc request size 4294967293


I tried then "select * from gslog_event where gslog_event_id >=3D 354964834=
and gslog_event_id <=3D 354964900;":

354964834 | 2006-07-19 10:53:42.813+00 | (...)
354964835 | 2006-07-19 10:53:44.003+00 | (...)
354964837 | 2006-07-19 10:53:44.113+00 | (...)
354964838 | 2006-07-19 10:53:44.223+00 | (...)
(...)
(66 rows)


The statement "select * from gslog_event;" leads to "Killed"... Ouch! The =
psql client just exits (the postgres server crashes too)!

The statement "select * from gslog_event where gslog_event_id <=3D 35496483=
4;" passed.


I did other tests on some other tables that contain less data but that seem=
also corrupted:

copy player to stdout
ERROR: invalid memory alloc request size 1918988375

select * from player where id >=3D771042 and id<=3D771043;
ERROR: invalid memory alloc request size 1918988375

select max(length(username)) from player;
ERROR: invalid memory alloc request size 1918988375

select max(length(username)) from player where id <=3D 771042;
max
-----
15

select max(length(username)) from player where id >=3D 771050;
max
-----
15

select max(length(username)) from player where id >=3D 771044 and id <=3D=
771050;
max
-----
13

Finally:

select * from player where id=3D771043;
ERROR: invalid memory alloc request size 1918988375

select id from player where id=3D771043;
id
--------
771043
(1 row)

agora=3D> select username from player where id=3D771043;
ERROR: invalid memory alloc request size 1918988375


I'm also pretty much convinced that there are some corrupted data, especial=
ly varchar row. Before dropping corrupted rows, is there a way to read par=
t of corrupted data?

Thanks Tom for your great support. I'm just afraid that I wasted your time=
.... Anyway I'll write a FAQ that provides some information about this kind=
of problem we have faced.

Regards,


--
Daniel

---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Re: [SQL] PostgreSQL server terminated by signal 11

am 28.07.2006 18:31:44 von Tom Lane

"Daniel Caune" writes:
> The statement "copy gslog_event to stdout;" leads to "ERROR: invalid memory alloc request size 4294967293" after awhile.
> ...
> I did other tests on some other tables that contain less data but that seem also corrupted:

This is a bit scary as it suggests a systemic problem. You should
definitely try to find out exactly what the corruption looks like.
It's usually not hard to home in on where the first corrupted row is
--- you do
SELECT ctid, * FROM tab LIMIT n;
and determine the largest value of n that won't trigger a failure.
The corrupted region is then just after the last ctid you see.
You can look at those blocks with "pg_filedump -i -f" and see if
anything pops out. Check the PG archives for previous discussions
of dealing with corrupted data.

regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster