raid6 and parity calculations

am 14.09.2010 16:45:40 von Michael

Hi,

I've been looking through the drivers/md code, and I've got a few questions about the RAID6 parity calculations that have me stumped.

I can see that when recovering 1 or 2 data sections, it calls functions based on the content that we're recovering (eg. async_gen_syndrome, async_xor, async_raid6_datap_recov, etc.) However, the length parameter is always given as STRIPE_SIZE, which from what I can tell is the same as PAGE_SIZE, which for vanilla systems like the one I'm playing with is 4096 bytes.

The thing that I can't figure out is how this interacts with the RAID6 chunk size; the array I'm playing with has a default chunk size (64kb), which I understand means that there's 64kb of data striped across each disk (bar two), then 64kb of P, then 64kb of Q for the first stripe, correct? If so, I can't figure out where the whole parity calculation is done for all 64kb. There's no loops, no recursion, or anything that would process it that I can find. I'm obviously missing something here, can anyone enlighten me?

Thanks for any advice or pointers!

Cheers,
Michael

(as a side note: I'm playing with all this as I've managed to royally screw up an array which had 2 dropped drives, by readding them back in (in what appears to be the wrong order). That would have been fine if thr rebuild finished completely, however the rebuild failed a few percent in, so now I have 2 drives with "swapped" data. That is, drive A contains the data for raid member 4 for the first x%, and raid member 5 for the rest, and drive B contains the data for raid member 5 for the first x% and raid member 4 for the rest. So I'm trying to write a userspace program to manually go through the array members, inspecting each stripe, and manually doing parity calculations for a range of drive permutations to try and see what looks sensible, hence I'm trying to understand what's ON the driv
e to reverse engineer it.)

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: raid6 and parity calculations

am 15.09.2010 12:26:27 von NeilBrown

On Tue, 14 Sep 2010 14:45:40 +0000
"Michael Sallaway" wrote:

> Hi,
>
> I've been looking through the drivers/md code, and I've got a few questions about the RAID6 parity calculations that have me stumped.
>
> I can see that when recovering 1 or 2 data sections, it calls functions based on the content that we're recovering (eg. async_gen_syndrome, async_xor, async_raid6_datap_recov, etc.) However, the length parameter is always given as STRIPE_SIZE, which from what I can tell is the same as PAGE_SIZE, which for vanilla systems like the one I'm playing with is 4096 bytes.
>
> The thing that I can't figure out is how this interacts with the RAID6 chunk size; the array I'm playing with has a default chunk size (64kb), which I understand means that there's 64kb of data striped across each disk (bar two), then 64kb of P, then 64kb of Q for the first stripe, correct? If so, I can't figure out where the whole parity calculation is done for all 64kb. There's no loops, no recursion, or anything that would process it that I can find. I'm obviously missing something here, can anyone enlighten me?
>
> Thanks for any advice or pointers!

It is best not to think to think to much about chunks. Think about strips
(not stripes).
A strip is a set of blocks, one per device each at the same offset.
Think of page sizes blocks/strips.
Each strip has a P block and a Q block and a bunch of data blocks. Which
is P and which is Q and which each data block is a function of the offset,
the layout and the chunk size. Once you have used the chunksize to perform
that calculation, don't think about chunks any more - just blocks and strips.

Hope that helps.

>
> Cheers,
> Michael
>
>
> (as a side note: I'm playing with all this as I've managed to royally screw up an array which had 2 dropped drives, by readding them back in (in what appears to be the wrong order). That would have been fine if thr rebuild finished completely, however the rebuild failed a few percent in, so now I have 2 drives with "swapped" data. That is, drive A contains the data for raid member 4 for the first x%, and raid member 5 for the rest, and drive B contains the data for raid member 5 for the first x% and raid member 4 for the rest. So I'm trying to write a userspace program to manually go through the array members, inspecting each stripe, and manually doing parity calculations for a range of drive permutations to try and see what looks sensible, hence I'm trying to understand what's ON the dr
ive to reverse engineer it.)

Ouch... good luck.

NeilBrown

>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: raid6 and parity calculations

am 15.09.2010 17:55:24 von Michael

> -------Original Message-------
> From: Neil Brown
> To: Michael Sallaway
> Cc: linux-raid@vger.kernel.org
> Subject: Re: raid6 and parity calculations
> Sent: 15 Sep '10 10:26

> It is best not to think to think to much about chunks. Think ab=
out strips
> (not stripes).
> A strip is a set of blocks, one per device each at the same offset.
> Think of page sizes blocks/strips.
> Each strip has a P block and a Q block and a bunch of data blocks.=A0=
=A0Which
> is P and which is Q and which each data block is a function of the o=
ffset,
> the layout and the chunk size. Once you have used the chunksize=
to perform
> that calculation, don't think about chunks any more - just blocks an=
d strips.
> =20

Aah, perfect -- that makes sense, thanks for that.

As a sort-of follow up question, would anyone know if the data size of =
a Q calculation affects the result at all? eg. if I do a 64kb Q calcula=
tion on 10 drives of data, would that be the same as doing 16x 4kb Q ca=
lculations on sequential blocks of the same data, then concatenating it=
together? (I can't remember what that operation property is called....=
?)

I've been reading the maths of RAID6 PDF (http://kernel.org/pub/linux/k=
ernel/people/hpa/raid6.pdf), but I'm a bit too rusty to understand Galo=
is fields, and if the data size matters. I presume the data ordering is=
also critical for a Q calculation, correct? (eg. drives have to be d0 =
-> d10 in order, not just random).

And, in contrast, for the P calculations, data size and input order mak=
es no difference, correct? (since it's just a simple bitwise XOR of all=
the inputs).

> =20
> Ouch... good luck.

Thanks! I'm the only one to blame, though -- it happened in the month b=
etween "getting the new system set up" and "setting up backups for the =
new system". So it's the only copy of the data.... whoops. :-)

Thanks for the help/advice!

Cheers,
Michael
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: raid6 and parity calculations

am 15.09.2010 18:07:14 von Andre Noll

--Il7n/DHsA0sMLmDu
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Wed, Sep 15, 15:55, Michael Sallaway wrote:
> As a sort-of follow up question, would anyone know if the data size of
> a Q calculation affects the result at all? eg. if I do a 64kb Q
> calculation on 10 drives of data, would that be the same as doing 16x
> 4kb Q calculations on sequential blocks of the same data, then
> concatenating it together? (I can't remember what that operation
> property is called....?)

Yes, the result would be the same. In fact, byte n of Q depends only
on byte n of the 10/16 data drives.

> I've been reading the maths of RAID6 PDF
> (http://kernel.org/pub/linux/kernel/people/hpa/raid6.pdf), but I'm a
> bit too rusty to understand Galois fields, and if the data size
> matters. I presume the data ordering is also critical for a Q
> calculation, correct? (eg. drives have to be d0 -> d10 in order, not
> just random).

Right, order matters.

> And, in contrast, for the P calculations, data size and input order
> makes no difference, correct? (since it's just a simple bitwise XOR of
> all the inputs).

Also correct.

Andre
--=20
The only person who always got his work done by Friday was Robinson Crusoe

--Il7n/DHsA0sMLmDu
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: Digital signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQFMkO8yWto1QDEAkw8RAs2+AJ0RIXPjfpqWyWoyoUOzy+//0yZ7FgCg h7eX
4TerRbwaRmvlnrqJvILGS6E=
=ko8I
-----END PGP SIGNATURE-----

--Il7n/DHsA0sMLmDu--
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html