raid5 software vs hardware: parity calculations?

am 11.01.2007 23:44:57 von James Ralston

I'm having a discussion with a coworker concerning the cost of md's
raid5 implementation versus hardware raid5 implementations.

Specifically, he states:

> The performance [of raid5 in hardware] is so much better with the
> write-back caching on the card and the offload of the parity, it
> seems to me that the minor increase in work of having to upgrade the
> firmware if there's a buggy one is a highly acceptable trade-off to
> the increased performance. The md driver still commits you to
> longer run queues since IO calls to disk, parity calculator and the
> subsequent kflushd operations are non-interruptible in the CPU. A
> RAID card with write-back cache releases the IO operation virtually
> instantaneously.

It would seem that his comments have merit, as there appears to be
work underway to move stripe operations outside of the spinlock:

http://lwn.net/Articles/184102/

What I'm curious about is this: for real-world situations, how much
does this matter? In other words, how hard do you have to push md
raid5 before doing dedicated hardware raid5 becomes a real win?

James

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: raid5 software vs hardware: parity calculations?

am 12.01.2007 18:39:16 von dean gaudet

On Thu, 11 Jan 2007, James Ralston wrote:

> I'm having a discussion with a coworker concerning the cost of md's
> raid5 implementation versus hardware raid5 implementations.
>
> Specifically, he states:
>
> > The performance [of raid5 in hardware] is so much better with the
> > write-back caching on the card and the offload of the parity, it
> > seems to me that the minor increase in work of having to upgrade the
> > firmware if there's a buggy one is a highly acceptable trade-off to
> > the increased performance. The md driver still commits you to
> > longer run queues since IO calls to disk, parity calculator and the
> > subsequent kflushd operations are non-interruptible in the CPU. A
> > RAID card with write-back cache releases the IO operation virtually
> > instantaneously.
>
> It would seem that his comments have merit, as there appears to be
> work underway to move stripe operations outside of the spinlock:
>
> http://lwn.net/Articles/184102/
>
> What I'm curious about is this: for real-world situations, how much
> does this matter? In other words, how hard do you have to push md
> raid5 before doing dedicated hardware raid5 becomes a real win?

hardware with battery backed write cache is going to beat the software at
small write traffic latency essentially all the time but it's got nothing
to do with the parity computation.

-dean
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: raid5 software vs hardware: parity calculations?

am 12.01.2007 21:34:43 von James Ralston

On 2007-01-12 at 09:39-08 dean gaudet wrote:

> On Thu, 11 Jan 2007, James Ralston wrote:
>
> > I'm having a discussion with a coworker concerning the cost of
> > md's raid5 implementation versus hardware raid5 implementations.
> >
> > Specifically, he states:
> >
> > > The performance [of raid5 in hardware] is so much better with
> > > the write-back caching on the card and the offload of the
> > > parity, it seems to me that the minor increase in work of having
> > > to upgrade the firmware if there's a buggy one is a highly
> > > acceptable trade-off to the increased performance. The md
> > > driver still commits you to longer run queues since IO calls to
> > > disk, parity calculator and the subsequent kflushd operations
> > > are non-interruptible in the CPU. A RAID card with write-back
> > > cache releases the IO operation virtually instantaneously.
> >
> > It would seem that his comments have merit, as there appears to be
> > work underway to move stripe operations outside of the spinlock:
> >
> > http://lwn.net/Articles/184102/
> >
> > What I'm curious about is this: for real-world situations, how
> > much does this matter? In other words, how hard do you have to
> > push md raid5 before doing dedicated hardware raid5 becomes a real
> > win?
>
> hardware with battery backed write cache is going to beat the
> software at small write traffic latency essentially all the time but
> it's got nothing to do with the parity computation.

I'm not convinced that's true. What my coworker is arguing is that md
raid5 code spinlocks while it is performing this sequence of
operations:

1. executing the write
2. reading the blocks necessary for recalculating the parity
3. recalculating the parity
4. updating the parity block

My [admittedly cursory] read of the code, coupled with the link above,
leads me to believe that my coworker is correct, which is why I was
for trolling for [informed] opinions about how much of a performance
hit the spinlock causes.

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: raid5 software vs hardware: parity calculations?

am 13.01.2007 10:20:25 von Dan Williams

On 1/12/07, James Ralston wrote:
> On 2007-01-12 at 09:39-08 dean gaudet wrote:
>
> > On Thu, 11 Jan 2007, James Ralston wrote:
> >
> > > I'm having a discussion with a coworker concerning the cost of
> > > md's raid5 implementation versus hardware raid5 implementations.
> > >
> > > Specifically, he states:
> > >
> > > > The performance [of raid5 in hardware] is so much better with
> > > > the write-back caching on the card and the offload of the
> > > > parity, it seems to me that the minor increase in work of having
> > > > to upgrade the firmware if there's a buggy one is a highly
> > > > acceptable trade-off to the increased performance. The md
> > > > driver still commits you to longer run queues since IO calls to
> > > > disk, parity calculator and the subsequent kflushd operations
> > > > are non-interruptible in the CPU. A RAID card with write-back
> > > > cache releases the IO operation virtually instantaneously.
> > >
> > > It would seem that his comments have merit, as there appears to be
> > > work underway to move stripe operations outside of the spinlock:
> > >
> > > http://lwn.net/Articles/184102/
> > >
> > > What I'm curious about is this: for real-world situations, how
> > > much does this matter? In other words, how hard do you have to
> > > push md raid5 before doing dedicated hardware raid5 becomes a real
> > > win?
> >
> > hardware with battery backed write cache is going to beat the
> > software at small write traffic latency essentially all the time but
> > it's got nothing to do with the parity computation.
>
> I'm not convinced that's true.
No, it's true. md implements a write-through cache to ensure that
data reaches the disk.

>What my coworker is arguing is that md
> raid5 code spinlocks while it is performing this sequence of
> operations:
>
> 1. executing the write
not performed under the lock
> 2. reading the blocks necessary for recalculating the parity
not performed under the lock
> 3. recalculating the parity
> 4. updating the parity block
>
> My [admittedly cursory] read of the code, coupled with the link above,
> leads me to believe that my coworker is correct, which is why I was
> for trolling for [informed] opinions about how much of a performance
> hit the spinlock causes.
>
The spinlock is not a source of performance loss, the reason for
moving parity calculations outside the lock is to maximize the benefit
of using asynchronous xor+copy engines.

The hardware vs software raid trade-offs are well documented here:
http://linux.yyz.us/why-software-raid.html

Regards,
Dan
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: raid5 software vs hardware: parity calculations?

am 13.01.2007 18:32:40 von Bill Davidsen

Dan Williams wrote:
> On 1/12/07, James Ralston wrote:
>> On 2007-01-12 at 09:39-08 dean gaudet wrote:
>>
>> > On Thu, 11 Jan 2007, James Ralston wrote:
>> >
>> > > I'm having a discussion with a coworker concerning the cost of
>> > > md's raid5 implementation versus hardware raid5 implementations.
>> > >
>> > > Specifically, he states:
>> > >
>> > > > The performance [of raid5 in hardware] is so much better with
>> > > > the write-back caching on the card and the offload of the
>> > > > parity, it seems to me that the minor increase in work of having
>> > > > to upgrade the firmware if there's a buggy one is a highly
>> > > > acceptable trade-off to the increased performance. The md
>> > > > driver still commits you to longer run queues since IO calls to
>> > > > disk, parity calculator and the subsequent kflushd operations
>> > > > are non-interruptible in the CPU. A RAID card with write-back
>> > > > cache releases the IO operation virtually instantaneously.
>> > >
>> > > It would seem that his comments have merit, as there appears to be
>> > > work underway to move stripe operations outside of the spinlock:
>> > >
>> > > http://lwn.net/Articles/184102/
>> > >
>> > > What I'm curious about is this: for real-world situations, how
>> > > much does this matter? In other words, how hard do you have to
>> > > push md raid5 before doing dedicated hardware raid5 becomes a real
>> > > win?
>> >
>> > hardware with battery backed write cache is going to beat the
>> > software at small write traffic latency essentially all the time but
>> > it's got nothing to do with the parity computation.
>>
>> I'm not convinced that's true.
> No, it's true. md implements a write-through cache to ensure that
> data reaches the disk.
>
>> What my coworker is arguing is that md
>> raid5 code spinlocks while it is performing this sequence of
>> operations:
>>
>> 1. executing the write
> not performed under the lock
>> 2. reading the blocks necessary for recalculating the parity
> not performed under the lock
>> 3. recalculating the parity
>> 4. updating the parity block
>>
>> My [admittedly cursory] read of the code, coupled with the link above,
>> leads me to believe that my coworker is correct, which is why I was
>> for trolling for [informed] opinions about how much of a performance
>> hit the spinlock causes.
>>
> The spinlock is not a source of performance loss, the reason for
> moving parity calculations outside the lock is to maximize the benefit
> of using asynchronous xor+copy engines.
>
> The hardware vs software raid trade-offs are well documented here:
> http://linux.yyz.us/why-software-raid.html

There have been several recent threads on the list regarding software
RAID-5 performance. The reference might be updated to reflect the poor
write performance of RAID-5 until/unless significant tuning is done.
Read that as tuning obscure parameters and throwing a lot of memory into
stripe cache. The reasons for hardware RAID should include "performance
of RAID-5 writes is usually much better than software RAID-5 with
default tuning.

--
bill davidsen
CTO TMR Associates, Inc
Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: raid5 software vs hardware: parity calculations?

am 14.01.2007 00:23:23 von Robin Bowes

Bill Davidsen wrote:
>
> There have been several recent threads on the list regarding software
> RAID-5 performance. The reference might be updated to reflect the poor
> write performance of RAID-5 until/unless significant tuning is done.
> Read that as tuning obscure parameters and throwing a lot of memory into
> stripe cache. The reasons for hardware RAID should include "performance
> of RAID-5 writes is usually much better than software RAID-5 with
> default tuning.

Could you point me at a source of documentation describing how to
perform such tuning?

Specifically, I have 8x500GB WD STAT drives on a Supermicro PCI-X 8-port
SATA card configured as a single RAID6 array (~3TB available space)

Thanks,

R.

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: raid5 software vs hardware: parity calculations?

am 14.01.2007 04:16:32 von dean gaudet

On Sat, 13 Jan 2007, Robin Bowes wrote:

> Bill Davidsen wrote:
> >
> > There have been several recent threads on the list regarding software
> > RAID-5 performance. The reference might be updated to reflect the poor
> > write performance of RAID-5 until/unless significant tuning is done.
> > Read that as tuning obscure parameters and throwing a lot of memory into
> > stripe cache. The reasons for hardware RAID should include "performance
> > of RAID-5 writes is usually much better than software RAID-5 with
> > default tuning.
>
> Could you point me at a source of documentation describing how to
> perform such tuning?
>
> Specifically, I have 8x500GB WD STAT drives on a Supermicro PCI-X 8-port
> SATA card configured as a single RAID6 array (~3TB available space)

linux sw raid6 small write performance is bad because it reads the entire
stripe, merges the small write, and writes back the changed disks.
unlike raid5 where a small write can get away with a partial stripe read
(i.e. the smallest raid5 write will read the target disk, read the parity,
write the target, and write the updated parity)... afaik this optimization
hasn't been implemented in raid6 yet.

depending on your use model you might want to go with raid5+spare.
benchmark if you're not sure.

for raid5/6 i always recommend experimenting with moving your fs journal
to a raid1 device instead (on separate spindles -- such as your root
disks).

if this is for a database or fs requiring lots of small writes then
raid5/6 are generally a mistake... raid10 is the only way to get
performance. (hw raid5/6 with nvram support can help a bit in this area,
but you just can't beat raid10 if you need lots of writes/s.)

beyond those config choices you'll want to become friendly with /sys/block
and all the myriad of subdirectories and options under there.

in particular:

/sys/block/*/queue/scheduler
/sys/block/*/queue/read_ahead_kb
/sys/block/*/queue/nr_requests
/sys/block/mdX/md/stripe_cache_size

for * = any of the component disks or the mdX itself...

some systems have an /etc/sysfs.conf you can place these settings in to
have them take effect on reboot. (sysfsutils package on debuntu)

-dean
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: raid5 software vs hardware: parity calculations?

am 15.01.2007 12:48:38 von Michael Tokarev

dean gaudet wrote:
[]
> if this is for a database or fs requiring lots of small writes then
> raid5/6 are generally a mistake... raid10 is the only way to get
> performance. (hw raid5/6 with nvram support can help a bit in this area,
> but you just can't beat raid10 if you need lots of writes/s.)

A small nitpick.

At least some databases never do "small"-sized I/O, at least not against
the datafiles. That is, for example, Oracle uses a fixed-size I/O block
size, specified at database (or tablespace) creation time, -- by default
it's 4Kb or 8Kb, but may be 16Kb or 32Kb as well. Now, if you'll make your
raid array stripe size to match the blocksize of a database, *and* ensure
the files are aligned on disk properly, it will just work without needless
reads to calculate parity blocks during writes.

But the problem with that is it's near impossible to do.

First, even if the db writes in 32Kb blocks, it means the stripe size should
be 32Kb, which is only suitable for raid5 with 3 disks, having chunk size of
16Kb, or with 5 disks, chunk size 8Kb (this last variant is quite bad, because
chunk size of 8Kb is too small). In other words, only very limited set of
configurations will be more-or-less good.

And second, most filesystems used for databases don't care about "correct"
file placement. For example, ext[23]fs with maximum blocksize of 4Kb will
align files by 4Kb, not by stripe size - which means that a whole 32Kb block
will be laid like - first 4Kb on first stripe, rest 24Kb on the next stripe,
which means that for both parts full read-write cycle will be needed again
to update parity blocks - the thing we tried to avoid by choosing the sizes
in a previous step. Only xfs so far (from the list of filesystems I've
checked) pays attention to stripe size and tries to ensure files are aligned
to stripe size. (Yes I know mke2fs's stride=xxx parameter, but it only
affects metadata, not data).

That's why all the above is a "small nitpick" - i.e., in theory, it IS possible
to use raid5 for database workload in certain cases, but due to all the gory
details, it's nearly impossible to do right.

/mjt
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: raid5 software vs hardware: parity calculations?

am 15.01.2007 16:29:10 von Bill Davidsen

Robin Bowes wrote:
> Bill Davidsen wrote:
>
>> There have been several recent threads on the list regarding software
>> RAID-5 performance. The reference might be updated to reflect the poor
>> write performance of RAID-5 until/unless significant tuning is done.
>> Read that as tuning obscure parameters and throwing a lot of memory into
>> stripe cache. The reasons for hardware RAID should include "performance
>> of RAID-5 writes is usually much better than software RAID-5 with
>> default tuning.
>>
>
> Could you point me at a source of documentation describing how to
> perform such tuning?
>
No. There has been a lot of discussion of this topic on this list, and a
trip through the archives of the last 60 days or so will let you pull
out a number of tuning tips which allow very good performance. My
concern was writing large blocks of data, 1MB per write, to RAID-5, and
didn't involve the overhead of small blocks at all, that leads through
other code and behavior.

I suppose while it's fresh in my mind I should write a script to rerun
the whole write test suite and generate some graphs, lists of
parameters, etc. If you are writing a LOT of data, you may find that
tuning the dirty_* parameters will result in better system response,
perhaps at the cost of some small total write throughput, although I
didn't notice anything significant when I tried them.
> Specifically, I have 8x500GB WD STAT drives on a Supermicro PCI-X 8-port
> SATA card configured as a single RAID6 array (~3TB available space)
>
No hot spare(s)?

--
bill davidsen
CTO TMR Associates, Inc
Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: raid5 software vs hardware: parity calculations?

am 15.01.2007 17:22:27 von Robin Bowes

Bill Davidsen wrote:
> Robin Bowes wrote:
>> Bill Davidsen wrote:
>>
>>> There have been several recent threads on the list regarding software
>>> RAID-5 performance. The reference might be updated to reflect the poor
>>> write performance of RAID-5 until/unless significant tuning is done.
>>> Read that as tuning obscure parameters and throwing a lot of memory into
>>> stripe cache. The reasons for hardware RAID should include "performance
>>> of RAID-5 writes is usually much better than software RAID-5 with
>>> default tuning.
>>>
>>
>> Could you point me at a source of documentation describing how to
>> perform such tuning?
>>
> No. There has been a lot of discussion of this topic on this list, and a
> trip through the archives of the last 60 days or so will let you pull
> out a number of tuning tips which allow very good performance. My
> concern was writing large blocks of data, 1MB per write, to RAID-5, and
> didn't involve the overhead of small blocks at all, that leads through
> other code and behavior.

Actually Bill, I'm running RAID6 (my mistake for not mentioning it
explicitly before) - I found some material relating to RAID5 but nothing
on RAID6.

Are the concepts similar, or is RAID6 a different beast altogether?

>> Specifically, I have 8x500GB WD STAT drives on a Supermicro PCI-X 8-port
>> SATA card configured as a single RAID6 array (~3TB available space)
>>
> No hot spare(s)?

I'm running RAID6 instead of RAID5+1 - I've had a couple of instances
where a drive has failed in a RAID5+1 array and a second has failed
during the rebuild after the hot-spare had kicked in.

R.
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: raid5 software vs hardware: parity calculations?

am 15.01.2007 18:37:15 von Bill Davidsen

Robin Bowes wrote:
> Bill Davidsen wrote:
>
>> Robin Bowes wrote:
>>
>>> Bill Davidsen wrote:
>>>
>>>
>>>> There have been several recent threads on the list regarding software
>>>> RAID-5 performance. The reference might be updated to reflect the poor
>>>> write performance of RAID-5 until/unless significant tuning is done.
>>>> Read that as tuning obscure parameters and throwing a lot of memory into
>>>> stripe cache. The reasons for hardware RAID should include "performance
>>>> of RAID-5 writes is usually much better than software RAID-5 with
>>>> default tuning.
>>>>
>>>>
>>> Could you point me at a source of documentation describing how to
>>> perform such tuning?
>>>
>>>
>> No. There has been a lot of discussion of this topic on this list, and a
>> trip through the archives of the last 60 days or so will let you pull
>> out a number of tuning tips which allow very good performance. My
>> concern was writing large blocks of data, 1MB per write, to RAID-5, and
>> didn't involve the overhead of small blocks at all, that leads through
>> other code and behavior.
>>
>
> Actually Bill, I'm running RAID6 (my mistake for not mentioning it
> explicitly before) - I found some material relating to RAID5 but nothing
> on RAID6.
>
> Are the concepts similar, or is RAID6 a different beast altogether?
>
You mentioned that before, and I think the concepts covered in the
RAID-5 discussion apply to RAID-6 as well. I don't have enough unused
drives to really test anything beyond RAID-5, so I have no particular
tuning information to share. Testing on system drives introduces too
much jitter to trust the results.
>
>>> Specifically, I have 8x500GB WD STAT drives on a Supermicro PCI-X 8-port
>>> SATA card configured as a single RAID6 array (~3TB available space)
>>>
>>>
>> No hot spare(s)?
>>
>
> I'm running RAID6 instead of RAID5+1 - I've had a couple of instances
> where a drive has failed in a RAID5+1 array and a second has failed
> during the rebuild after the hot-spare had kicked in.
--

bill davidsen
CTO TMR Associates, Inc
Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: raid5 software vs hardware: parity calculations?

am 15.01.2007 22:25:54 von dean gaudet

On Mon, 15 Jan 2007, Robin Bowes wrote:

> I'm running RAID6 instead of RAID5+1 - I've had a couple of instances
> where a drive has failed in a RAID5+1 array and a second has failed
> during the rebuild after the hot-spare had kicked in.

if the failures were read errors without losing the entire disk (the
typical case) then new kernels are much better -- on read error md will
reconstruct the sectors from the other disks and attempt to write it back.

you can also run monthly "checks"...

echo check >/sys/block/mdX/md/sync_action

it'll read the entire array (parity included) and correct read errors as
they're discovered.

-dean
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: raid5 software vs hardware: parity calculations?

am 15.01.2007 22:32:20 von Gordon Henderson

On Mon, 15 Jan 2007, dean gaudet wrote:

> you can also run monthly "checks"...
>
> echo check >/sys/block/mdX/md/sync_action
>
> it'll read the entire array (parity included) and correct read errors as
> they're discovered.

A-Ha ... I've not been keeping up with the list for a bit - what's the
minimum kernel version for this to work?

Cheers,

Gordon
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: raid5 software vs hardware: parity calculations?

am 16.01.2007 01:35:37 von berk walker

dean gaudet wrote:
> On Mon, 15 Jan 2007, Robin Bowes wrote:
>
>
>> I'm running RAID6 instead of RAID5+1 - I've had a couple of instances
>> where a drive has failed in a RAID5+1 array and a second has failed
>> during the rebuild after the hot-spare had kicked in.
>>
>
> if the failures were read errors without losing the entire disk (the
> typical case) then new kernels are much better -- on read error md will
> reconstruct the sectors from the other disks and attempt to write it back.
>
> you can also run monthly "checks"...
>
> echo check >/sys/block/mdX/md/sync_action
>
> it'll read the entire array (parity included) and correct read errors as
> they're discovered.
>
> -dean
> -
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
>
>

Could I get a pointer as to how I can do this "check" in my FC5 [BLAG]
system? I can find no appropriate "check", nor "md" available to me.
It would be a "good thing" if I were able to find potentially weak
spots, rewrite them to good, and know that it might be time for a new drive.

All of my arrays have drives of approx the same mfg date, so the
possibility of more than one showing bad at the same time can not be
ignored.

thanks
b-

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: raid5 software vs hardware: parity calculations?

am 16.01.2007 01:48:54 von dean gaudet

On Mon, 15 Jan 2007, berk walker wrote:

> dean gaudet wrote:
> > echo check >/sys/block/mdX/md/sync_action
> >
> > it'll read the entire array (parity included) and correct read errors as
> > they're discovered.

>
> Could I get a pointer as to how I can do this "check" in my FC5 [BLAG] system?
> I can find no appropriate "check", nor "md" available to me. It would be a
> "good thing" if I were able to find potentially weak spots, rewrite them to
> good, and know that it might be time for a new drive.
>
> All of my arrays have drives of approx the same mfg date, so the possibility
> of more than one showing bad at the same time can not be ignored.

it should just be:

echo check >/sys/block/mdX/md/sync_action

if you don't have a /sys/block/mdX/md/sync_action file then your kernel is
too old... or you don't have /sys mounted... (or you didn't replace X with
the raid number :)

iirc there were kernel versions which had the sync_action file but didn't
yet support the "check" action (i think possibly even as recent as 2.6.17
had a small bug initiating one of the sync_actions but i forget which
one). if you can upgrade to 2.6.18.x it should work.

debian unstable (and i presume etch) will do this for all your arrays
automatically once a month.

-dean
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: raid5 software vs hardware: parity calculations?

am 16.01.2007 04:41:23 von babydr

Hello Dean ,

On Mon, 15 Jan 2007, dean gaudet wrote:
....snip...
> it should just be:
>
> echo check >/sys/block/mdX/md/sync_action
>
> if you don't have a /sys/block/mdX/md/sync_action file then your kernel is
> too old... or you don't have /sys mounted... (or you didn't replace X with
> the raid number :)
>
> iirc there were kernel versions which had the sync_action file but didn't
> yet support the "check" action (i think possibly even as recent as 2.6.17
> had a small bug initiating one of the sync_actions but i forget which
> one). if you can upgrade to 2.6.18.x it should work.
>
> debian unstable (and i presume etch) will do this for all your arrays
> automatically once a month.
>
> -dean

Being able to run a 'check' is a good thing (tm) . But without a method
to acquire statii & data back from the check , Seems rather bland . Is there a
tool/file to poll/... where data & statii can be acquired ?
Tia , JimL

--
+----------------------------------------------------------- ------+
| James W. Laferriere | System Techniques | Give me VMS |
| Network Engineer | 663 Beaumont Blvd | Give me Linux |
| babydr@baby-dragons.com | Pacifica, CA. 94044 | only on AXP |
+----------------------------------------------------------- ------+
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: raid5 software vs hardware: parity calculations?

am 16.01.2007 05:16:23 von dean gaudet

On Mon, 15 Jan 2007, Mr. James W. Laferriere wrote:

> Hello Dean ,
>
> On Mon, 15 Jan 2007, dean gaudet wrote:
> ...snip...
> > it should just be:
> >
> > echo check >/sys/block/mdX/md/sync_action
> >
> > if you don't have a /sys/block/mdX/md/sync_action file then your kernel is
> > too old... or you don't have /sys mounted... (or you didn't replace X with
> > the raid number :)
> >
> > iirc there were kernel versions which had the sync_action file but didn't
> > yet support the "check" action (i think possibly even as recent as 2.6.17
> > had a small bug initiating one of the sync_actions but i forget which
> > one). if you can upgrade to 2.6.18.x it should work.
> >
> > debian unstable (and i presume etch) will do this for all your arrays
> > automatically once a month.
> >
> > -dean
>
> Being able to run a 'check' is a good thing (tm) . But without a
> method to acquire statii & data back from the check , Seems rather bland .
> Is there a tool/file to poll/... where data & statii can be acquired ?

i'm not 100% certain what you mean, but i generally just monitor dmesg for
the md read error message (mind you the message pre-2.6.19 or .20 isn't
very informative but it's obvious enough).

there is also a file mismatch_cnt in the same directory as sync_action ...
the Documentation/md.txt (in 2.6.18) refers to it incorrectly as
mismatch_count... but anyhow why don't i just repaste the relevant portion
of md.txt.

-dean

....

Active md devices for levels that support data redundancy (1,4,5,6)
also have

sync_action
a text file that can be used to monitor and control the rebuild
process. It contains one word which can be one of:
resync - redundancy is being recalculated after unclean
shutdown or creation
recover - a hot spare is being built to replace a
failed/missing device
idle - nothing is happening
check - A full check of redundancy was requested and is
happening. This reads all block and checks
them. A repair may also happen for some raid
levels.
repair - A full check and repair is happening. This is
similar to 'resync', but was requested by the
user, and the write-intent bitmap is NOT used to
optimise the process.

This file is writable, and each of the strings that could be
read are meaningful for writing.

'idle' will stop an active resync/recovery etc. There is no
guarantee that another resync/recovery may not be automatically
started again, though some event will be needed to trigger
this.
'resync' or 'recovery' can be used to restart the
corresponding operation if it was stopped with 'idle'.
'check' and 'repair' will start the appropriate process
providing the current state is 'idle'.

mismatch_count
When performing 'check' and 'repair', and possibly when
performing 'resync', md will count the number of errors that are
found. The count in 'mismatch_cnt' is the number of sectors
that were re-written, or (for 'check') would have been
re-written. As most raid levels work in units of pages rather
than sectors, this my be larger than the number of actual errors
by a factor of the number of sectors in a page.

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: raid5 software vs hardware: parity calculations?

am 16.01.2007 06:06:31 von Bill Davidsen

berk walker wrote:
>
> dean gaudet wrote:
>> On Mon, 15 Jan 2007, Robin Bowes wrote:
>>
>>
>>> I'm running RAID6 instead of RAID5+1 - I've had a couple of instances
>>> where a drive has failed in a RAID5+1 array and a second has failed
>>> during the rebuild after the hot-spare had kicked in.
>>>
>>
>> if the failures were read errors without losing the entire disk (the
>> typical case) then new kernels are much better -- on read error md
>> will reconstruct the sectors from the other disks and attempt to
>> write it back.
>>
>> you can also run monthly "checks"...
>>
>> echo check >/sys/block/mdX/md/sync_action
>>
>> it'll read the entire array (parity included) and correct read errors
>> as they're discovered.
>>
>> -dean
>> -
>> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>
>>
>
> Could I get a pointer as to how I can do this "check" in my FC5 [BLAG]
> system? I can find no appropriate "check", nor "md" available to me.
> It would be a "good thing" if I were able to find potentially weak
> spots, rewrite them to good, and know that it might be time for a new
> drive.
Grab a recent mdadm source, it's a part of that.
>
> All of my arrays have drives of approx the same mfg date, so the
> possibility of more than one showing bad at the same time can not be
> ignored.
Never can, but it is highly unlikely, given the MTBF of modern drives.
And when you consider total failures as opposed to bad sectors it gets
even smaller. There is no perfect way to avoid ever losing data, just
ways to reduce the chance to balance the cost of data loss vs. hardware.
Current Linux will rewrite bad sectors, whole drive failures are an
argument for spares.

--
bill davidsen
CTO TMR Associates, Inc
Doing interesting things with small computers since 1979

-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html