Computer suddenly failed

am 28.02.2006 12:05:15 von Peter.Jevos

Hi all

I'd like to ask you about strange problem. I hope I chose a correct
mailing list
I have 2 IDE disks in RAID 1 with Reiserfs.
Once I noticed message in the log:

hde: dma_timer_expiry: dma status == 0x20
hde: timeout waiting for DMA
PDC202XX: Primary channel reset.
hde: timeout waiting for DMA
hde: (__ide_dma_test_irq) called while not waiting
hde: status timeout: status=0xd0 { Busy }
PDC202XX: Primary channel reset.
hde: drive not ready for command
ide2: reset: success
hde: status error: status=0x58 { DriveReady SeekComplete DataRequest }
hde: drive not ready for command
hde: status error: status=0x58 { DriveReady SeekComplete DataRequest }

DMA on hde was turned off so I turned it on again. Than I tried to made
files backup on the hde, but when I ran tar computer didn't response
even for sysrq, no log was written. I had to made a hard restart. It
repeats for 4 times when I tried to did something with files on the hde.
Now I'm afraid to do anything on that machine,unfortunately it is
production server.
What's goning on ?
Thanks a lot for an answers

BR
Pet
-
To unsubscribe from this list: send the line "unsubscribe linux-admin" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Computer suddenly failed

am 28.02.2006 12:26:22 von urgrue

sounds like your hard drive is dead/dying. replace hde with a new disk.

On 02/28/2006 01:05:15 PM, Jevos, Peter wrote:
> Hi all
>
> I'd like to ask you about strange problem. I hope I chose a correct
> mailing list
> I have 2 IDE disks in RAID 1 with Reiserfs.
> Once I noticed message in the log:
>
> hde: dma_timer_expiry: dma status == 0x20
> hde: timeout waiting for DMA
> PDC202XX: Primary channel reset.
> hde: timeout waiting for DMA
> hde: (__ide_dma_test_irq) called while not waiting
> hde: status timeout: status=0xd0 { Busy }
> PDC202XX: Primary channel reset.
> hde: drive not ready for command
> ide2: reset: success
> hde: status error: status=0x58 { DriveReady SeekComplete DataRequest
> }
> hde: drive not ready for command
> hde: status error: status=0x58 { DriveReady SeekComplete DataRequest
> }
>
> DMA on hde was turned off so I turned it on again. Than I tried to
> made
> files backup on the hde, but when I ran tar computer didn't response
> even for sysrq, no log was written. I had to made a hard restart. It
> repeats for 4 times when I tried to did something with files on the
> hde.
> Now I'm afraid to do anything on that machine,unfortunately it is
> production server.
> What's goning on ?
> Thanks a lot for an answers
>
> BR
> Pet
> -
> To unsubscribe from this list: send the line "unsubscribe linux-admin"
> in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
-
To unsubscribe from this list: send the line "unsubscribe linux-admin" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Computer suddenly failed

am 28.02.2006 12:51:02 von foo

What hardware RAID controller is that?
Some lspci and dmesg will also help.

--Adrian

At 01:05 PM 2/28/2006, you wrote:
>Hi all
>
>I'd like to ask you about strange problem. I hope I chose a correct
>mailing list
>I have 2 IDE disks in RAID 1 with Reiserfs.
>Once I noticed message in the log:
>
>hde: dma_timer_expiry: dma status == 0x20
>hde: timeout waiting for DMA
>PDC202XX: Primary channel reset.
>hde: timeout waiting for DMA
>hde: (__ide_dma_test_irq) called while not waiting
>hde: status timeout: status=0xd0 { Busy }
> PDC202XX: Primary channel reset.
>hde: drive not ready for command
>ide2: reset: success
>hde: status error: status=0x58 { DriveReady SeekComplete DataRequest }
>hde: drive not ready for command
>hde: status error: status=0x58 { DriveReady SeekComplete DataRequest }
>
>DMA on hde was turned off so I turned it on again. Than I tried to made
>files backup on the hde, but when I ran tar computer didn't response
>even for sysrq, no log was written. I had to made a hard restart. It
>repeats for 4 times when I tried to did something with files on the hde.
>Now I'm afraid to do anything on that machine,unfortunately it is
>production server.
>What's goning on ?
>Thanks a lot for an answers
>
>BR
>Pet
>-
>To unsubscribe from this list: send the line "unsubscribe linux-admin" in
>the body of a message to majordomo@vger.kernel.org
>More majordomo info at http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe linux-admin" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

RE: Computer suddenly failed

am 28.02.2006 13:29:03 von Peter.Jevos

# root : lspci
00:00.0 Host bridge: Intel Corp. 82810E DC-133 GMCH [Graphics Memory
Controller Hub] (rev 03)
00:01.0 VGA compatible controller: Intel Corp. 82810E DC-133 CGC
[Chipset Graphics Controller] (rev 03)
00:1e.0 PCI bridge: Intel Corp. 82801AA PCI Bridge (rev 02)
00:1f.0 ISA bridge: Intel Corp. 82801AA ISA Bridge (LPC) (rev 02)
00:1f.1 IDE interface: Intel Corp. 82801AA IDE (rev 02)
00:1f.2 USB Controller: Intel Corp. 82801AA USB (rev 02)
00:1f.3 SMBus: Intel Corp. 82801AA SMBus (rev 02)
00:1f.5 Multimedia audio controller: Intel Corp. 82801AA AC'97 Audio
(rev 02)
01:0c.0 Ethernet controller: 3Com Corporation 3c905C-TX/TX-M [Tornado]
(rev 78)

Peter Jevos peter.jevos@oriflame-sw.com
Oriflame Software , s.r.o - Oriflame IT services
Na Pankraci 30, Praha 4, Czech rep.
Tel. +420 225 994 456, Fax +420225994412
-----Original Message-----
From: linux-admin-owner@vger.kernel.org
[mailto:linux-admin-owner@vger.kernel.org] On Behalf Of Adrian C.
Sent: Tuesday, February 28, 2006 12:51 PM
To: linux-admin@vger.kernel.org
Subject: Re: Computer suddenly failed

What hardware RAID controller is that?
Some lspci and dmesg will also help.

--Adrian

At 01:05 PM 2/28/2006, you wrote:
>Hi all
>
>I'd like to ask you about strange problem. I hope I chose a correct
>mailing list I have 2 IDE disks in RAID 1 with Reiserfs.
>Once I noticed message in the log:
>
>hde: dma_timer_expiry: dma status == 0x20
>hde: timeout waiting for DMA
>PDC202XX: Primary channel reset.
>hde: timeout waiting for DMA
>hde: (__ide_dma_test_irq) called while not waiting
>hde: status timeout: status=0xd0 { Busy }
> PDC202XX: Primary channel reset.
>hde: drive not ready for command
>ide2: reset: success
>hde: status error: status=0x58 { DriveReady SeekComplete DataRequest }
>hde: drive not ready for command
>hde: status error: status=0x58 { DriveReady SeekComplete DataRequest }
>
>DMA on hde was turned off so I turned it on again. Than I tried to made

>files backup on the hde, but when I ran tar computer didn't response
>even for sysrq, no log was written. I had to made a hard restart. It
>repeats for 4 times when I tried to did something with files on the
hde.
>Now I'm afraid to do anything on that machine,unfortunately it is
>production server.
>What's goning on ?
>Thanks a lot for an answers
>
>BR
>Pet
>-
>To unsubscribe from this list: send the line "unsubscribe linux-admin"
>in the body of a message to majordomo@vger.kernel.org More majordomo
>info at http://vger.kernel.org/majordomo-info.html

-
To unsubscribe from this list: send the line "unsubscribe linux-admin"
in the body of a message to majordomo@vger.kernel.org More majordomo
info at http://vger.kernel.org/majordomo-info.html
-
To unsubscribe from this list: send the line "unsubscribe linux-admin" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Computer suddenly failed

am 28.02.2006 21:57:56 von Glynn Clements

Adrian C. wrote:

> What hardware RAID controller is that?

He didn't say it was hardware RAID, just "in RAID 1".

--
Glynn Clements
-
To unsubscribe from this list: send the line "unsubscribe linux-admin" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Computer suddenly failed

am 28.02.2006 22:19:02 von Glynn Clements

Jevos, Peter wrote:

> I'd like to ask you about strange problem. I hope I chose a correct
> mailing list
> I have 2 IDE disks in RAID 1 with Reiserfs.
> Once I noticed message in the log:
>
> hde: dma_timer_expiry: dma status == 0x20
> hde: timeout waiting for DMA
> PDC202XX: Primary channel reset.
> hde: timeout waiting for DMA
> hde: (__ide_dma_test_irq) called while not waiting
> hde: status timeout: status=0xd0 { Busy }
> PDC202XX: Primary channel reset.
> hde: drive not ready for command
> ide2: reset: success
> hde: status error: status=0x58 { DriveReady SeekComplete DataRequest }
> hde: drive not ready for command
> hde: status error: status=0x58 { DriveReady SeekComplete DataRequest }

Your drive has died.

> DMA on hde was turned off so I turned it on again. Than I tried to made
> files backup on the hde, but when I ran tar computer didn't response
> even for sysrq, no log was written. I had to made a hard restart. It
> repeats for 4 times when I tried to did something with files on the hde.
> Now I'm afraid to do anything on that machine,unfortunately it is
> production server.

Once you have replaced the drive:

1. Ensure that the drives are being cooled. Modern (i.e. large) hard
drives tend to run quite hot. In the absence of sufficient airflow,
they can easily exceed their maximum operating temperature (typically
55C). This is more of an issue with larger drives, and with multiple
drives in adjacent drive bays. In my experience, Maxtor drives tend to
run hotter than similar drives from other vendors.

2. Run a temperature-monitoring utility such as hddtemp, and ensure
that it will notify support staff if the temperature gets too high. If
the location isn't staffed 24/7, ensure that it will shut down the
system in the event that the temperature exceeds the drives' operating
limit.

[I know of a case where a cooling fan in a file server failed
overnight, and the staff turned up the following morning to find that
all 4 drives had failed after reaching temperatures of up to 63C.]

--
Glynn Clements
-
To unsubscribe from this list: send the line "unsubscribe linux-admin" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html