Server Crash

Server Crash

am 22.04.2008 16:14:10 von Nick Hajek

This is a multi-part message in MIME format.

--Boundary_(ID_5jmntepe3v9+XOrql3aYeg)
Content-type: text/plain; charset=us-ascii
Content-transfer-encoding: 7BIT

All,
We experienced a crash of a Postgresql server which from the log appears
to have began with this entry:

Log: background writer process (PID 3457) was terminated by signal 9

After the db was restarted, it operated apparently normally for about 15
minutes and then crashed again with the log recording the same message
at the beginning of the second event. After that crash, I rebooted the
server and it has ran normally since that time - although that's been
less than one hour.

System Details - Postgresql 8.2.4, Suse 10.1 Linux (2.6.16), HP DL380
w/ RAID drives.

Anyone have any thoughts?

thanks,

Nick

--Boundary_(ID_5jmntepe3v9+XOrql3aYeg)
Content-type: text/html; charset=us-ascii
Content-transfer-encoding: 7BIT






class=300010114-22042008>All,

We experienced a
crash of a Postgresql server which from the log appears to have began with this
entry:

class=300010114-22042008> 

Log: 
background writer process (PID 3457) was terminated by signal
9

class=300010114-22042008> 

After the db was
restarted, it operated apparently normally for about 15 minutes and then crashed
again with the log recording the same message at the beginning of the
second event.  After that crash, I rebooted the server and it has ran
normally since that time - although that's been less than one
hour.

class=300010114-22042008> 

System Details -
Postgresql 8.2.4, Suse 10.1 Linux (2.6.16),  HP DL380 w/ RAID
drives.

class=300010114-22042008> 

Anyone have any
thoughts?

class=300010114-22042008> 

class=300010114-22042008>thanks,

class=300010114-22042008> 

class=300010114-22042008>Nick


--Boundary_(ID_5jmntepe3v9+XOrql3aYeg)--

Re: Server Crash

am 22.04.2008 17:13:09 von Scott Marlowe

On Tue, Apr 22, 2008 at 8:14 AM, Hajek, Nick wrote:
>
>
> All,
> We experienced a crash of a Postgresql server which from the log appears to
> have began with this entry:
>
> Log: background writer process (PID 3457) was terminated by signal 9

Kill -9 is the "shoot it in the head" signal. It is not generated by
postgresql in normal operation. It can be generated by "pg_ctl -m
immediate stop" . At least I think that's what signal it sends.

Anyway, the most common cause of kill -9s randomly showing up in linux
is the OOM killer.

It's quite possible you're running your machine out of memory / swap
somehow and linux is killing the biggest, fattest process it can find,
which is pgsql.

you might wanna run vmstat 1 to see what's happening during these times.

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Re: Server Crash

am 22.04.2008 17:17:03 von Ray Stell

On Tue, Apr 22, 2008 at 09:13:09AM -0600, Scott Marlowe wrote:
> On Tue, Apr 22, 2008 at 8:14 AM, Hajek, Nick wrote:
> It's quite possible you're running your machine out of memory / swap
> somehow and linux is killing the biggest, fattest process it can find,
> which is pgsql.

syslog would have something to say about that, also.

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Re: Server Crash

am 22.04.2008 17:39:51 von Nick Hajek

> -----Original Message-----
> From: Scott Marlowe [mailto:scott.marlowe@gmail.com]
> Sent: Tuesday, April 22, 2008 10:13 AM
> To: Hajek, Nick
> Cc: pgsql-admin@postgresql.org
> Subject: Re: [ADMIN] Server Crash
>
> On Tue, Apr 22, 2008 at 8:14 AM, Hajek, Nick
> wrote:
> >
> >
> > All,
> > We experienced a crash of a Postgresql server which from the log
> > appears to have began with this entry:
> >
> > Log: background writer process (PID 3457) was terminated
> by signal 9
>
> Kill -9 is the "shoot it in the head" signal. It is not
> generated by postgresql in normal operation. It can be
> generated by "pg_ctl -m immediate stop" . At least I think
> that's what signal it sends.
>
> Anyway, the most common cause of kill -9s randomly showing up
> in linux is the OOM killer.
>
> It's quite possible you're running your machine out of memory
> / swap somehow and linux is killing the biggest, fattest
> process it can find, which is pgsql.
>
> you might wanna run vmstat 1 to see what's happening during
> these times.
>

Bingo. I checked the syslog and found the OOM killer and indications
that the free swap space was zero. Now I just need to find what's
eating memory. Thanks for the help.

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Re: Server Crash

am 22.04.2008 18:06:58 von Tom Lane

> From: Scott Marlowe [mailto:scott.marlowe@gmail.com]
>> Kill -9 is the "shoot it in the head" signal. It is not
>> generated by postgresql in normal operation. It can be
>> generated by "pg_ctl -m immediate stop" . At least I think
>> that's what signal it sends.

Just for the archives: Postgres never generates kill -9 at all.
(Immediate stop uses SIGQUIT, instead.) When you see that in
the log, you can be sure it was a manual action or the OOM killer.

regards, tom lane

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Re: Server Crash

am 22.04.2008 18:21:41 von Scott Marlowe

On Tue, Apr 22, 2008 at 10:06 AM, Tom Lane wrote:
> > From: Scott Marlowe [mailto:scott.marlowe@gmail.com]
>
> >> Kill -9 is the "shoot it in the head" signal. It is not
> >> generated by postgresql in normal operation. It can be
> >> generated by "pg_ctl -m immediate stop" . At least I think
> >> that's what signal it sends.
>
> Just for the archives: Postgres never generates kill -9 at all.
> (Immediate stop uses SIGQUIT, instead.) When you see that in
> the log, you can be sure it was a manual action or the OOM killer.

Thanks. Just wondering, what's the difference in behavior from
pgsql's perspective from sigquit and siqkill? Is sigkill more
dangerous than sigquit?

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Re: Server Crash

am 22.04.2008 18:32:14 von Tom Lane

"Scott Marlowe" writes:
> Thanks. Just wondering, what's the difference in behavior from
> pgsql's perspective from sigquit and siqkill? Is sigkill more
> dangerous than sigquit?

Yes it is, because sigkill can't be trapped --- it causes instant
process death with no chance to clean up. Not that we have backends
do a lot of cleanup after sigquit either, but at least the option
exists. The real difference is in the postmaster: kill -9 on the
postmaster is a seriously bad idea, because it gets no chance to shut
down its children.

regards, tom lane

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin