ERROR: XX001 (Critical and Urgent)

ERROR: XX001 (Critical and Urgent)

am 07.05.2010 10:03:08 von Siddharth Shah

This is a multi-part message in MIME format.
--------------080709060004000102080909
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit

Hello All,

Getting *ERROR: XX001: could not read block 17 of relation
base/16386/2619: read only 0 of 8192 bytes*, While vacuuming database
Manual vacuuming and Auto vacuuming process *constantly* taking high
CPU, not able to skip corrupted table for vacuuming and dump this
message at regular interval. * fsync is off* , From strace, found that
semop call was in infinite loop.

I have tried with making fsync on, Now manual vacuum process is
taking high CPU, Strace unable to show any results (may be dead lock
situation)
and not any error / warning from postgres daemon

Postgres Version : 8.4.3 (Migrated data from 8.4.1)

What can be issue ? Is it issue coming after database table corruption,
Can fsync on can prevent such (corruption) scenarios ?

Thanks,
Siddharth

--------------080709060004000102080909
Content-Type: text/html; charset=UTF-8
Content-Transfer-Encoding: quoted-printable




>


Hello All,

   

    Getting ERROR:  XX001: could not read block 17=
of relation
base/16386/2619: read only 0 of 8192 bytes
, While vacuuming database<=
br>
    Manual vacuuming and Auto vacuuming process constan=
tly

taking high CPU, not able to skip corrupted table for vacuuming and
dump this message at regular interval.  fsync is off , From
strace, found that semop call was in infinite loop.

     

    I have tried with making fsync on, Now manual vacuum p=
rocess is
taking high CPU, Strace unable to show any results (may be dead lock
situation)

    and not any error / warning from postgres daemon



    Postgres Version : 8.4.3 (Migrated data from 8.4.1) >


What can be issue ? Is it issue coming after database  table
corruption, Can fsync on can prevent such (corruption) scenarios ?



Thanks,

Siddharth



--------------080709060004000102080909--

Re: ERROR: XX001 (Critical and Urgent)

am 07.05.2010 15:35:58 von Kevin Grittner

Siddharth Shah wrote:

> * fsync is off*

If you are running the database with fsync off and there is any sort
of unusual termination, your database will probably be corrupted. I
recommend restoring from your last good backup. If you don't have
one, recovery is going to be painful; I recommend contracting with
one of the many companies which off PostgreSQL support. (I'm not
affiliated with any of them.)

> I have tried with making fsync on

That may help prevent further corruption, but will do nothing to
help recover from the damage already done.

> Postgres Version : 8.4.3 (Migrated data from 8.4.1)

What do you mean by that? You installed 8.4.3 and reindexed hash
indexes?

> What can be issue ? Is it issue coming after database table
> corruption

Yes.

> Can fsync on can prevent such (corruption) scenarios ?

Yes.

-Kevin

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Re: ERROR: XX001 (Critical and Urgent)

am 07.05.2010 15:51:29 von Siddharth Shah

This is a multi-part message in MIME format.
--------------010600040400040301070806
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Thanks Kevin.

Yes, I installed 8.4.3 then I have found that DDL and DML statements
were getting failed to execute in some distributions
So that's why taken call for reindexing

What can be the method to verify that it's a database corruption ?

xdb=# \dt;
ERROR: index "pg_class_relname_nsp_index" contains unexpected zero page
at block 33
HINT: Please REINDEX it.
xdb=# analyse verbose pg_class_relname_nsp_index;
ANALYZE
xdb=# \dt;
ERROR: index "pg_class_relname_nsp_index" contains unexpected zero page
at block 33
HINT: Please REINDEX it.
xdb=# reindex index pg_class_relname_nsp_index;

Now INDEXing taking High CPU and postgres baffled.


I don't have any backup available, Is there any way to fix this ?


Kevin Grittner wrote:
> Siddharth Shah wrote:
>
>
>> * fsync is off*
>>
>
> If you are running the database with fsync off and there is any sort
> of unusual termination, your database will probably be corrupted. I
> recommend restoring from your last good backup. If you don't have
> one, recovery is going to be painful; I recommend contracting with
> one of the many companies which off PostgreSQL support. (I'm not
> affiliated with any of them.)
>
>
>> I have tried with making fsync on
>>
>
> That may help prevent further corruption, but will do nothing to
> help recover from the damage already done.
>
>
>> Postgres Version : 8.4.3 (Migrated data from 8.4.1)
>>
>
> What do you mean by that? You installed 8.4.3 and reindexed hash
> indexes?
>
>
>> What can be issue ? Is it issue coming after database table
>> corruption
>>
>
> Yes.
>
>
>> Can fsync on can prevent such (corruption) scenarios ?
>>
>
> Yes.
>
> -Kevin
>


--------------010600040400040301070806
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit







Thanks Kevin.



Yes, I installed 8.4.3 then I have found that DDL and DML statements
were getting failed to execute in some distributions

So that's why taken call for reindexing



What can be the method to verify that it's a database corruption ?  



xdb=# \dt;

ERROR:  index "pg_class_relname_nsp_index" contains unexpected zero
page at block 33

HINT:  Please REINDEX it.

xdb=# analyse verbose pg_class_relname_nsp_index;

ANALYZE

xdb=# \dt;   

ERROR:  index "pg_class_relname_nsp_index" contains unexpected zero
page at block 33

HINT:  Please REINDEX it.

xdb=# reindex index pg_class_relname_nsp_index;



Now INDEXing taking High CPU and postgres baffled.





I don't have any backup available, Is there any way to fix this ?





Kevin Grittner wrote:

type="cite">
Siddharth Shah  wrote:



* fsync is off*


 
If you are running the database with fsync off and there is any sort
of unusual termination, your database will probably be corrupted. I
recommend restoring from your last good backup. If you don't have
one, recovery is going to be painful; I recommend contracting with
one of the many companies which off PostgreSQL support. (I'm not
affiliated with any of them.)



I have tried with making fsync on


 
That may help prevent further corruption, but will do nothing to
help recover from the damage already done.



Postgres Version : 8.4.3 (Migrated data from 8.4.1)


 
What do you mean by that? You installed 8.4.3 and reindexed hash
indexes?



What can be issue ? Is it issue coming after database table
corruption


 
Yes.



Can fsync on can prevent such (corruption) scenarios ?


 
Yes.

-Kevin







--------------010600040400040301070806--

Re: ERROR: XX001 (Critical and Urgent)

am 07.05.2010 16:19:23 von Kevin Grittner

[rearranged to put the most critical point first]

Siddharth Shah wrote:

> I don't have any backup available, Is there any way to fix this ?

I *strongly* recommend that you shut down the database and take a
file copy of the whole data tree (everything under what -D points to
on the server startup) which you should keep until long after you
think everything is working OK again. Before you do anything else.
You are at risk of losing everything in the database, and one
misstep could put you over the edge. If this is a production
database, tell the users that it is down until further notice.

> What can be the method to verify that it's a database corruption ?

> ERROR: index "pg_class_relname_nsp_index" contains unexpected
> zero page at block 33

Getting an error like that indicates database corruption.

> HINT: Please REINDEX it.
> xdb=# reindex index pg_class_relname_nsp_index;
>
> Now INDEXing taking High CPU and postgres baffled.

That is an index on the table which describes all your tables and
indexes. It normally doesn't take a long time to reindex. You
should consider doing your recovery in single-user mode (*AFTER* you
make that copy):

http://www.postgresql.org/docs/8.4/interactive/app-postgres. html

After trying reindex in that context, please post again.

-Kevin

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Re: ERROR: XX001 (Critical and Urgent)

am 07.05.2010 17:00:38 von Siddharth Shah

This is a multi-part message in MIME format.
--------------060208070909060906040800
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Kevin Grittner wrote:
> [rearranged to put the most critical point first]
>
> Siddharth Shah wrote:
>
>
>> I don't have any backup available, Is there any way to fix this ?
>>
>
> I *strongly* recommend that you shut down the database and take a
> file copy of the whole data tree (everything under what -D points to
> on the server startup) which you should keep until long after you
> think everything is working OK again. Before you do anything else.
> You are at risk of losing everything in the database, and one
> misstep could put you over the edge. If this is a production
> database, tell the users that it is down until further notice.
>
Yes Kevin, I have taken backup of DATADIR.
>
>
>> What can be the method to verify that it's a database corruption ?
>>
>
>
>> ERROR: index "pg_class_relname_nsp_index" contains unexpected
>> zero page at block 33
>>
>
> Getting an error like that indicates database corruption.
>
>
>> HINT: Please REINDEX it.
>> xdb=# reindex index pg_class_relname_nsp_index;
>>
>> Now INDEXing taking High CPU and postgres baffled.
>>
>
> That is an index on the table which describes all your tables and
> indexes. It normally doesn't take a long time to reindex. You
> should consider doing your recovery in single-user mode (*AFTER* you
> make that copy):
>
> http://www.postgresql.org/docs/8.4/interactive/app-postgres. html
>
> After trying reindex in that context, please post again.
>
postgres --single -P -D $DATADIR -p 5433 xdb
Same behavior in single mode.
>
> -Kevin
>



--------------060208070909060906040800
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit







Kevin Grittner wrote:

type="cite">
[rearranged to put the most critical point first]

Siddharth Shah wrote:



I don't have any backup available, Is there any way to fix this ?


 
I *strongly* recommend that you shut down the database and take a
file copy of the whole data tree (everything under what -D points to
on the server startup) which you should keep until long after you
think everything is working OK again. Before you do anything else.
You are at risk of losing everything in the database, and one
misstep could put you over the edge. If this is a production
database, tell the users that it is down until further notice.


    Yes Kevin, I have taken backup of DATADIR.

type="cite">
 


What can be the method to verify that it's a database corruption ?


 


ERROR:  index "pg_class_relname_nsp_index" contains unexpected
zero page at block 33


 
Getting an error like that indicates database corruption.



HINT:  Please REINDEX it.
xdb=# reindex index pg_class_relname_nsp_index;

Now INDEXing taking High CPU and postgres baffled.


 
That is an index on the table which describes all your tables and
indexes. It normally doesn't take a long time to reindex. You
should consider doing your recovery in single-user mode (*AFTER* you
make that copy):



After trying reindex in that context, please post again.


    postgres --single -P -D $DATADIR -p 5433 xdb

    Same behavior in single mode.

type="cite">
 
-Kevin









--------------060208070909060906040800--

Re: ERROR: XX001 (Critical and Urgent)

am 07.05.2010 17:09:44 von Siddharth Shah

This is a multi-part message in MIME format.
--------------040206000603000100040307
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit


One more point, This is observed two times while product firmware
updates which updates Postgres 8.4.3 from 8.4.1
Abruptly shutdown never leads to this type of corruption while fsync is
always off

Thanks,
Siddharth


Siddharth Shah wrote:
> Kevin Grittner wrote:
>> [rearranged to put the most critical point first]
>>
>> Siddharth Shah wrote:
>>
>>
>>> I don't have any backup available, Is there any way to fix this ?
>>>
>>
>> I *strongly* recommend that you shut down the database and take a
>> file copy of the whole data tree (everything under what -D points to
>> on the server startup) which you should keep until long after you
>> think everything is working OK again. Before you do anything else.
>> You are at risk of losing everything in the database, and one
>> misstep could put you over the edge. If this is a production
>> database, tell the users that it is down until further notice.
>>
> Yes Kevin, I have taken backup of DATADIR.
>>
>>
>>> What can be the method to verify that it's a database corruption ?
>>>
>>
>>
>>> ERROR: index "pg_class_relname_nsp_index" contains unexpected
>>> zero page at block 33
>>>
>>
>> Getting an error like that indicates database corruption.
>>
>>
>>> HINT: Please REINDEX it.
>>> xdb=# reindex index pg_class_relname_nsp_index;
>>>
>>> Now INDEXing taking High CPU and postgres baffled.
>>>
>>
>> That is an index on the table which describes all your tables and
>> indexes. It normally doesn't take a long time to reindex. You
>> should consider doing your recovery in single-user mode (*AFTER* you
>> make that copy):
>>
>> http://www.postgresql.org/docs/8.4/interactive/app-postgres. html
>>
>> After trying reindex in that context, please post again.
>>
> postgres --single -P -D $DATADIR -p 5433 xdb
> Same behavior in single mode.
>>
>> -Kevin
>>
>
>


--------------040206000603000100040307
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit









One more point, This is observed two times while product firmware
updates which updates Postgres 8.4.3 from 8.4.1

Abruptly shutdown never leads to this type of corruption while fsync is
always off



Thanks,

Siddharth





Siddharth Shah wrote:



Kevin Grittner wrote:
type="cite">
[rearranged to put the most critical point first]

Siddharth Shah href="mailto:siddharth.shah@elitecore.com"><siddharth.shah@elitecore.com > wrote:



I don't have any backup available, Is there any way to fix this ?


 
I *strongly* recommend that you shut down the database and take a
file copy of the whole data tree (everything under what -D points to
on the server startup) which you should keep until long after you
think everything is working OK again. Before you do anything else.
You are at risk of losing everything in the database, and one
misstep could put you over the edge. If this is a production
database, tell the users that it is down until further notice.


    Yes Kevin, I have taken backup of DATADIR.

type="cite">
 


What can be the method to verify that it's a database corruption ?


 


ERROR:  index "pg_class_relname_nsp_index" contains unexpected
zero page at block 33


 
Getting an error like that indicates database corruption.



HINT:  Please REINDEX it.
xdb=# reindex index pg_class_relname_nsp_index;

Now INDEXing taking High CPU and postgres baffled.


 
That is an index on the table which describes all your tables and
indexes. It normally doesn't take a long time to reindex. You
should consider doing your recovery in single-user mode (*AFTER* you
make that copy):

href="http://www.postgresql.org/docs/8.4/interactive/app-pos tgres.html">http://www.postgresql.org/docs/8.4/interactive/a pp-postgres.html

After trying reindex in that context, please post again.


    postgres --single -P -D $DATADIR -p 5433 xdb

    Same behavior in single mode.

type="cite">
 
-Kevin












--------------040206000603000100040307--

Re: ERROR: XX001 (Critical and Urgent)

am 07.05.2010 17:21:46 von Kevin Grittner

Siddharth Shah wrote:

>>> xdb=# reindex index pg_class_relname_nsp_index;
>>>
>>> Now INDEXing taking High CPU and postgres baffled.

>> consider doing your recovery in single-user mode

> postgres --single -P -D $DATADIR -p 5433 xdb
> Same behavior in single mode.

How long did you leave it running? Did you get any messages? Is
there anything in the log? What do CPU usage and disk usage look
like during the attempt?

-Kevin

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

Re: ERROR: XX001 (Critical and Urgent)

am 07.05.2010 17:42:15 von Siddharth Shah

This is a multi-part message in MIME format.
--------------000706090009010603040401
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Kevin Grittner wrote:
> Siddharth Shah wrote:
>
>
>>>> xdb=# reindex index pg_class_relname_nsp_index;
>>>>
>>>> Now INDEXing taking High CPU and postgres baffled.
>>>>
>
>
>>> consider doing your recovery in single-user mode
>>>
>
>
>> postgres --single -P -D $DATADIR -p 5433 xdb
>> Same behavior in single mode.
>>
>
> How long did you leave it running? Did you get any messages? Is
> there anything in the log? What do CPU usage and disk usage look
> like during the attempt?
>
> -Kevin
>

Kevin, It start normally , I have successfully retrieved data from few
tables
But I am not able to do dS / dT or dt. As you said this is index file
for Postgres tables and indexes
Now when I taken call for reindex pg_class_relname_nsp_index it takes
99% CPU

PID PPID USER STAT VSZ %MEM %CPU COMMAND
13419 13418 nobody R 39172 8% 99% postgres --single -P -D
/var/db -p 5433 xdb
It's been running from 10 minutes still there is no output or logs.


--------------000706090009010603040401
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit








Kevin Grittner wrote:

type="cite">
Siddharth Shah  wrote:





xdb=# reindex index pg_class_relname_nsp_index;

Now INDEXing taking High CPU and postgres baffled.




 



consider doing your recovery in single-user mode



 


    postgres --single -P -D $DATADIR -p 5433 xdb
Same behavior in single mode.


 
How long did you leave it running? Did you get any messages? Is
there anything in the log? What do CPU usage and disk usage look
like during the attempt?

-Kevin




Kevin, It start normally , I have successfully retrieved data from few
tables

But I am not able to do dS / dT or dt. As you said this is index file
for Postgres tables and indexes

Now when I taken call for reindex   pg_class_relname_nsp_index it takes
99% CPU



  PID  PPID USER     STAT   VSZ %MEM %CPU COMMAND

13419 13418 nobody   R    39172   8%  99% postgres --single -P -D
/var/db -p 5433 xdb

It's been running from 10 minutes still there is no output or logs.








--------------000706090009010603040401--

Re: ERROR: XX001 (Critical and Urgent)

am 07.05.2010 17:54:25 von Tom Lane

Siddharth Shah writes:
> PID PPID USER STAT VSZ %MEM %CPU COMMAND
> 13419 13418 nobody R 39172 8% 99% postgres --single -P -D
> /var/db -p 5433 xdb
> It's been running from 10 minutes still there is no output or logs.

What does "strace" show that process is doing?

regards, tom lane

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin