mdadm and TLER (Time Limited Error Recovery)

mdadm and TLER (Time Limited Error Recovery)

am 08.09.2009 02:35:14 von Tim Rutter

There seems to many people asking this question with very vague
answers. So I'm looking for some clarification.

Is the TLER on drives like Western Digital's WD2002FYPS a problem or
benefit for mdadm(RAID5/RAID6)?
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: mdadm and TLER (Time Limited Error Recovery)

am 08.09.2009 16:17:58 von Mario Holbe

Tim Rutter wrote:
> Is the TLER on drives like Western Digital's WD2002FYPS a problem or
> benefit for mdadm(RAID5/RAID6)?

Neither nor with a very very small drift to "problem", IMHO.
A (not so little) while ago when md did not automatically correct
read-errors, the drift to "problem" was a bit less small ;) This could
be the reason for some of the vague answers you mentioned.

Anyways, clarification...
The only reason for TLER (Time Limited Error Recovery) is to behave
"friendly" toward RAID controllers that timeout disks.
In fact, md does not timeout disks as many Hardware RAID controllers do.
So, from md's point of view, TLER is useless, i.e. it has no benefit.

On the other hand, TLER leads to the disk not trying as hard to recover
from (read-)errors (i.e. get the data back) as it could - usually,
there's just no need to do it in a RAID, because another component (the
RAID controller) has a far easier way to get the data back (i.e. read it
from the other disk(s)).
Of course, there are the unusual cases, like degraded RAID or two disks
being unable to read this specific data. In these rare cases it would be
nice if the disk would do as much as it can to get the data back instead
of relying on the RAID controller. This is why I think there is a very
very small drift to "problem".


regards
Mario
--
File names are infinite in length where infinity is set to 255 characters.
-- Peter Collinson, "The Unix File System"

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: mdadm and TLER (Time Limited Error Recovery)

am 08.09.2009 20:48:37 von Iustin Pop

On Tue, Sep 08, 2009 at 04:17:58PM +0200, Mario 'BitKoenig' Holbe wrote:
> Tim Rutter wrote:
> > Is the TLER on drives like Western Digital's WD2002FYPS a problem or
> > benefit for mdadm(RAID5/RAID6)?
>
> Neither nor with a very very small drift to "problem", IMHO.
> A (not so little) while ago when md did not automatically correct
> read-errors, the drift to "problem" was a bit less small ;) This could
> be the reason for some of the vague answers you mentioned.
>
> Anyways, clarification...
> The only reason for TLER (Time Limited Error Recovery) is to behave
> "friendly" toward RAID controllers that timeout disks.
> In fact, md does not timeout disks as many Hardware RAID controllers do.
> So, from md's point of view, TLER is useless, i.e. it has no benefit.

I'm sorry but I disagree here. *Especially* because md is used over
normal SATA controllers most of the time, TLER is beneficial because the
drive doesn't go catatonic for minutes at a time trying to recover a bad
sector, which would (because md doesn't timeout disks) cause md to hung
up the whole device. TLER will allow md to see the error quickly and
attempt to rewrite (read) or retry/fail the disk (write) for a bad the
sector.

Just my understanding of the md stack.

regards,
iustin
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: mdadm and TLER (Time Limited Error Recovery)

am 08.09.2009 21:45:14 von Mario Holbe

Iustin Pop wrote:
> I'm sorry but I disagree here. *Especially* because md is used over
> normal SATA controllers most of the time, TLER is beneficial because the
> drive doesn't go catatonic for minutes at a time trying to recover a bad
> sector, which would (because md doesn't timeout disks) cause md to hung
> up the whole device. TLER will allow md to see the error quickly and

Yes, that's right. So - as in the most cases - it's up to user's demands
if he prefers quicker recovery from errors or harder attempts to correct
them :)


regards
Mario
--
Good, Fast, Cheap: Pick any two (you can't have all three).
-- RFC 1925, 7a

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: mdadm and TLER (Time Limited Error Recovery)

am 09.09.2009 03:33:56 von Maurice Hilarius

Iustin Pop wrote:
> ..
>> Anyways, clarification...
>> The only reason for TLER (Time Limited Error Recovery) is to behave
>> "friendly" toward RAID controllers that timeout disks.
>> In fact, md does not timeout disks as many Hardware RAID controllers do.
>> So, from md's point of view, TLER is useless, i.e. it has no benefit.
>>
>
> I'm sorry but I disagree here. *Especially* because md is used over
> normal SATA controllers most of the time, TLER is beneficial because the
> drive doesn't go catatonic for minutes at a time trying to recover a bad
> sector, which would (because md doesn't timeout disks) cause md to hung
> up the whole device. TLER will allow md to see the error quickly and
> attempt to rewrite (read) or retry/fail the disk (write) for a bad the
> sector.
>
> Just my understanding of the md stack.
>
> regards,
> iustin
>
>
I agree.
Before WD implemented this we would see cases quite often where a
perfectly good drive would get "kicked out"
of a RAID as frequently or even more often, than on a hardware RAID.
TLER management seems to have eliminated most of these cases.



--
Regards, Maurice
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

RE: mdadm and TLER (Time Limited Error Recovery)

am 09.09.2009 10:21:16 von Simon Jackson

This sounds an interesting proposition for RAID 1 setups that I am using. In a couple of cases I have seen unresponsive drives retrying on a bad block seemingly to lock up my system, or at least slow response significantly.

In my case I am using Seagate and Hitachi drives. A look at Wikipedia indicates that on Hitachi there is something called "Command Completion Time Limit" and on Seagate "Error Recovery Control".

Please can anyone tell me how I would go about setting timeout values on these types of drive. Are there utility programs to do this or a Linux
command.

Thanks Simon.

-----Original Message-----
From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Maurice Hilarius
Sent: 09 September 2009 02:34
To: Mario 'BitKoenig' Holbe
Cc: linux-raid@vger.kernel.org; iusty@k1024.org
Subject: Re: mdadm and TLER (Time Limited Error Recovery)

Iustin Pop wrote:
> ..
>> Anyways, clarification...
>> The only reason for TLER (Time Limited Error Recovery) is to behave
>> "friendly" toward RAID controllers that timeout disks.
>> In fact, md does not timeout disks as many Hardware RAID controllers do.
>> So, from md's point of view, TLER is useless, i.e. it has no benefit.
>>
>
> I'm sorry but I disagree here. *Especially* because md is used over
> normal SATA controllers most of the time, TLER is beneficial because the
> drive doesn't go catatonic for minutes at a time trying to recover a bad
> sector, which would (because md doesn't timeout disks) cause md to hung
> up the whole device. TLER will allow md to see the error quickly and
> attempt to rewrite (read) or retry/fail the disk (write) for a bad the
> sector.
>
> Just my understanding of the md stack.
>
> regards,
> iustin
>
>
I agree.
Before WD implemented this we would see cases quite often where a
perfectly good drive would get "kicked out"
of a RAID as frequently or even more often, than on a hardware RAID.
TLER management seems to have eliminated most of these cases.



--
Regards, Maurice
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: mdadm and TLER (Time Limited Error Recovery)

am 09.09.2009 11:00:38 von majedb

If there's no specific utility from the manufacturer for Linux, you
might want to take a look at "sdparm"

On Wed, Sep 9, 2009 at 11:21 AM, Simon Jackson wr=
ote:
> This sounds an interesting proposition for RAID 1 setups that I am us=
ing.  In a couple of cases I have seen unresponsive drives retryin=
g on a bad block seemingly to lock up my system, or at least slow respo=
nse significantly.
>
> In my case I am using Seagate and Hitachi drives.  A look at Wik=
ipedia indicates that on Hitachi there is something called "Command Com=
pletion Time Limit" and on Seagate "Error Recovery Control".
>
> Please can anyone tell me how I would go about setting timeout values=
on these types of drive. Are there utility programs to do this or a Li=
nux
> command.
>
> Thanks Simon.
>
> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-owner@vger.=
kernel.org] On Behalf Of Maurice Hilarius
> Sent: 09 September 2009 02:34
> To: Mario 'BitKoenig' Holbe
> Cc: linux-raid@vger.kernel.org; iusty@k1024.org
> Subject: Re: mdadm and TLER (Time Limited Error Recovery)
>
> Iustin Pop wrote:
>> ..
>>> Anyways, clarification...
>>> The only reason for TLER (Time Limited Error Recovery) is to behave
>>> "friendly" toward RAID controllers that timeout disks.
>>> In fact, md does not timeout disks as many Hardware RAID controller=
s do.
>>> So, from md's point of view, TLER is useless, i.e. it has no benefi=
t.
>>>
>>
>> I'm sorry but I disagree here. *Especially* because md is used over
>> normal SATA controllers most of the time, TLER is beneficial because=
the
>> drive doesn't go catatonic for minutes at a time trying to recover a=
bad
>> sector, which would (because md doesn't timeout disks) cause md to h=
ung
>> up the whole device. TLER will allow md to see the error quickly and
>> attempt to rewrite (read) or retry/fail the disk (write) for a bad t=
he
>> sector.
>>
>> Just my understanding of the md stack.
>>
>> regards,
>> iustin
>>
>>
> I agree.
> Before WD implemented this we would see cases quite often where a
> perfectly good drive would get "kicked out"
> of a RAID as frequently or even more often, than on a hardware RAID.
> TLER management seems to have eliminated most of these cases.
>
>
>
> --
> Regards, Maurice
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid"=
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.ht=
ml
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid"=
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.ht=
ml
>



--=20
Majed B.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: mdadm and TLER (Time Limited Error Recovery)

am 09.09.2009 13:04:37 von Mario Holbe

Simon Jackson wrote:
> In my case I am using Seagate and Hitachi drives.
> Please can anyone tell me how I would go about setting timeout values on these types of drive. Are there utility programs to do this or a Linux

Well, my Seagates have a RTL (Recovery time limit (ms)) field in the rw
(Read write error recovery) mode page.

You could try something like `sdparm -W RTL=7000 /dev/sdX' to set it to
7 seconds. I don't know if it works, I didn't test it, use it at your
own risk! Do a backup before, it could blow up your disk or even the
universe :) Tell us if it worked :)


regards
Mario
--
() Ascii Ribbon Campaign
/\ Support plain text e-mail

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

RE: mdadm and TLER (Time Limited Error Recovery)

am 10.09.2009 11:26:35 von Simon Jackson

Hmmm. I do not have sdparm on my system.

Linux 2.6.26-1-amd64 #1 SMP Sat Jan 10 17:57:00 UTC 2009 x86_64 GNU/Linux

Is this the same functionality as the hdparm utility?

-----Original Message-----
From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Mario 'BitKoenig' Holbe
Sent: 09 September 2009 12:05
To: linux-raid@vger.kernel.org
Subject: Re: mdadm and TLER (Time Limited Error Recovery)

Simon Jackson wrote:
> In my case I am using Seagate and Hitachi drives.
> Please can anyone tell me how I would go about setting timeout values on these types of drive. Are there utility programs to do this or a Linux

Well, my Seagates have a RTL (Recovery time limit (ms)) field in the rw
(Read write error recovery) mode page.

You could try something like `sdparm -W RTL=7000 /dev/sdX' to set it to
7 seconds. I don't know if it works, I didn't test it, use it at your
own risk! Do a backup before, it could blow up your disk or even the
universe :) Tell us if it worked :)


regards
Mario
--
() Ascii Ribbon Campaign
/\ Support plain text e-mail

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: mdadm and TLER (Time Limited Error Recovery)

am 10.09.2009 11:39:21 von majedb

sdparm is similar to hdparm but meant to deal with SCSI devices
(including SATA).

Your repository should have it, or download & compile.

On Thu, Sep 10, 2009 at 12:26 PM, Simon Jackson w=
rote:
> Hmmm.  I do not have sdparm on my system.
>
> Linux 2.6.26-1-amd64 #1 SMP Sat Jan 10 17:57:00 UTC 2009 x86_64 GNU/L=
inux
>
> Is this the same functionality as the hdparm utility?
>
> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-owner@vger.=
kernel.org] On Behalf Of Mario 'BitKoenig' Holbe
> Sent: 09 September 2009 12:05
> To: linux-raid@vger.kernel.org
> Subject: Re: mdadm and TLER (Time Limited Error Recovery)
>
> Simon Jackson wrote:
>> In my case I am using Seagate and Hitachi drives.
>> Please can anyone tell me how I would go about setting timeout value=
s on these types of drive. Are there utility programs to do this or a L=
inux
>
> Well, my Seagates have a RTL (Recovery time limit (ms)) field in the =
rw
> (Read write error recovery) mode page.
>
> You could try something like `sdparm -W RTL=3D7000 /dev/sdX' to set i=
t to
> 7 seconds. I don't know if it works, I didn't test it, use it at your
> own risk! Do a backup before, it could blow up your disk or even the
> universe :) Tell us if it worked :)
>
>
> regards
>   Mario
> --
> () Ascii Ribbon Campaign
> /\ Support plain text e-mail
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid"=
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.ht=
ml
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid"=
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.ht=
ml
>



--=20
Majed B.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: mdadm and TLER (Time Limited Error Recovery)

am 10.09.2009 11:46:51 von Robin Hill

--VS++wcV0S1rZb1Fb
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu Sep 10, 2009 at 10:26:35AM +0100, Simon Jackson wrote:

> Hmmm. I do not have sdparm on my system.=20
>=20
> Linux 2.6.26-1-amd64 #1 SMP Sat Jan 10 17:57:00 UTC 2009 x86_64 GNU/Linux
>=20
> Is this the same functionality as the hdparm utility?=20
>=20
Yes - it was written for SCSI drives rather than IDE drives. With the
different markets these were targeted for, SCSI drives have
(historically) provided a lot more low-level parameters for tweaking.
Newer ATA drives have been incorporating a lot of the same functionality
though, so it'll work (to some extent) with them as well.

The homepage is at http://sg.danny.cz/sg/sdparm.html

HTH,
Robin
--=20
___ =20
( ' } | Robin Hill |
/ / ) | Little Jim says .... |
// !! | "He fallen in de water !!" |

--VS++wcV0S1rZb1Fb
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.11 (GNU/Linux)

iEYEARECAAYFAkqoywoACgkQShxCyD40xBKxyACfVFRRjP3/RSfqzVJWmk65 FfbT
rbEAoImcz5C/zSUB+JdiPQdoZOvczpT8
=iNe0
-----END PGP SIGNATURE-----

--VS++wcV0S1rZb1Fb--
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

RE: mdadm and TLER (Time Limited Error Recovery)

am 15.09.2009 17:36:00 von Simon Jackson

Did not appear to work.

$ sdparm -s "RTL=7000" /dev/sda
/dev/sda: ATA ST980818SM 3.AA
change_mode_page: failed setting page: Read write error recovery

-----Original Message-----
From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Mario 'BitKoenig' Holbe
Sent: 09 September 2009 12:05
To: linux-raid@vger.kernel.org
Subject: Re: mdadm and TLER (Time Limited Error Recovery)

Simon Jackson wrote:
> In my case I am using Seagate and Hitachi drives.
> Please can anyone tell me how I would go about setting timeout values on these types of drive. Are there utility programs to do this or a Linux

Well, my Seagates have a RTL (Recovery time limit (ms)) field in the rw
(Read write error recovery) mode page.

You could try something like `sdparm -W RTL=7000 /dev/sdX' to set it to
7 seconds. I don't know if it works, I didn't test it, use it at your
own risk! Do a backup before, it could blow up your disk or even the
universe :) Tell us if it worked :)


regards
Mario
--
() Ascii Ribbon Campaign
/\ Support plain text e-mail

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html