single threaded parity calculation ?

single threaded parity calculation ?

am 15.04.2011 21:18:48 von Simon McNair

Hi all,
I'm under the impression that the read speed of my 10x1TB RAID5 array is
limited by the 'single-threaded parity calculation' ? (I'm quoting Phil
Turmel on that and other linux-raid messages I've read seem to confirm
that terminology) I'm running an i7 920 with irqbalance but if
something is single threaded or single CPU bound I'm wondering what I
can do to alleviate it.

iostat reports 83MB/s for each disk, running up to 830MB/s for all 10
disks, but the max read speed of the array is approx 256MB/s.

Would it be better to have 5 (or more) partitions on each disk, create
5xraid5 arrays (each of which would in theory have a separate thread)
and then create a linear array over the top of them to join them together ?

yes...I know this is way overthinking and also a potentially dangerous
to recreate, but I'm curious what the opinions are. I think I'll
probably just end up buying another 1TB drive and making it an 11 disk
RAID6 instead. I want maximum space, maximum speed and maximum
redundancy ;-).

TIA :-)

Simon


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: single threaded parity calculation ?

am 15.04.2011 22:54:39 von Phil Turmel

On 04/15/2011 03:18 PM, Simon McNair wrote:
> Hi all,

> I'm under the impression that the read speed of my 10x1TB RAID5 array
> is limited by the 'single-threaded parity calculation' ? (I'm quoting
> Phil Turmel on that and other linux-raid messages I've read seem to
> confirm that terminology) I'm running an i7 920 with irqbalance but
> if something is single threaded or single CPU bound I'm wondering
> what I can do to alleviate it.

I'm not intimately familiar with the code, but I would've thought the single-threaded limitation wouldn't apply to reading from a clean array.

> iostat reports 83MB/s for each disk, running up to 830MB/s for all 10
> disks, but the max read speed of the array is approx 256MB/s.

But this means I'm probably wrong, unless your chunk size is just too small for your setup. If I recall correctly, it was 64K. Consider increasing it and retesting to see if it helps.

> Would it be better to have 5 (or more) partitions on each disk,
> create 5xraid5 arrays (each of which would in theory have a separate
> thread) and then create a linear array over the top of them to join
> them together ?

Probably not. If you use linear on top, any given file is likely to reside in just one of the underlying raids, and will appear to operate at the same speed as you have now. Streaming multiple files, if they reside in different underlying raids, could go faster computationally, but will suffer from extra seeks just like the plain raid5.

If the 83MB/s is the speed data can be pulled off the platters, a single 8ms seek displaces 664KB of data transfer.

If you put raid0 on top of your underlying raids, you will suffer from excess seeks all the time.

> yes...I know this is way overthinking and also a potentially
> dangerous to recreate, but I'm curious what the opinions are. I
> think I'll probably just end up buying another 1TB drive and making
> it an 11 disk RAID6 instead. I want maximum space, maximum speed and
> maximum redundancy ;-).

Heh. Pick any two. :-)

Max Space & Speed == raid0 w/ lots of spindles.
Max Space & Redundancy == raid6 w/ lots of spindles.
Max Speed & Redundancy == raid10 w/ lots of spindles.

There's a common theme there...

> TIA :-)


Phil
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: single threaded parity calculation ?

am 15.04.2011 23:28:56 von Phil Turmel

A couple more thoughts...

On 04/15/2011 04:54 PM, Phil Turmel wrote:
> On 04/15/2011 03:18 PM, Simon McNair wrote:
>> Hi all,
>
>> I'm under the impression that the read speed of my 10x1TB RAID5 array
>> is limited by the 'single-threaded parity calculation' ? (I'm quoting
>> Phil Turmel on that and other linux-raid messages I've read seem to
>> confirm that terminology) I'm running an i7 920 with irqbalance but
>> if something is single threaded or single CPU bound I'm wondering
>> what I can do to alleviate it.
>
> I'm not intimately familiar with the code, but I would've thought the single-threaded limitation wouldn't apply to reading from a clean array.

I'm also curious what top has to say while you are running your tests (with the cpus shown separately).

>> iostat reports 83MB/s for each disk, running up to 830MB/s for all 10
>> disks, but the max read speed of the array is approx 256MB/s.
>
> But this means I'm probably wrong, unless your chunk size is just too small for your setup. If I recall correctly, it was 64K. Consider increasing it and retesting to see if it helps.
>
>> Would it be better to have 5 (or more) partitions on each disk,
>> create 5xraid5 arrays (each of which would in theory have a separate
>> thread) and then create a linear array over the top of them to join
>> them together ?
>
> Probably not. If you use linear on top, any given file is likely to reside in just one of the underlying raids, and will appear to operate at the same speed as you have now. Streaming multiple files, if they reside in different underlying raids, could go faster computationally, but will suffer from extra seeks just like the plain raid5.
>
> If the 83MB/s is the speed data can be pulled off the platters, a single 8ms seek displaces 664KB of data transfer.
>
> If you put raid0 on top of your underlying raids, you will suffer from excess seeks all the time.

I believe I've mentioned to you privately that I do create partitions across my disks, and create multiple raids.

Specifically:

1) a small raid1 across all disks to be /boot, so any drive that ends up as the 1st BIOS drive will boot the system.
2) a medium-sized raid10 across all disks to be an LVM group for /, /home, and /usr. This is the workhorse.
3) a large raid5 (or 6) across all disks to be an LVM group for media files and other bulk items.

And if drive sizes are mixed,
4) a series of LVM PVs in remaining space on each drive, combined into an LVM group containing /tmp and any other low-value volumes I might need space for.

Note that all partitions #1 are idle once booted, and the partitions #3 do lots of large sequential operations. That leaves the bulk of the IOPS for the raid10 and the non-redundant scratch space. This setup is a compromise, of course, but I find it suitable for desktops and modest multi-purpose servers.

>> yes...I know this is way overthinking and also a potentially
>> dangerous to recreate, but I'm curious what the opinions are. I
>> think I'll probably just end up buying another 1TB drive and making
>> it an 11 disk RAID6 instead. I want maximum space, maximum speed and
>> maximum redundancy ;-).
>
> Heh. Pick any two. :-)
>
> Max Space & Speed == raid0 w/ lots of spindles.
> Max Space & Redundancy == raid6 w/ lots of spindles.
> Max Speed & Redundancy == raid10 w/ lots of spindles.

Note that my preferred partitioning mixes approaches, with the bulk of the activity occurring under option 3.

YMMV, and there's no single right answer.

> There's a common theme there...
>
>> TIA :-)
>
>
> Phil

Phil
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: single threaded parity calculation ?

am 15.04.2011 23:44:38 von Simon McNair

Thanks Phil,
Now I just need to figure out the best way to back up my data,
repartition, and reinstall the OS without it or, more specifically, me
getting confused about what I'm doing. I didn't know until recently
that you could have multiple partitions. The whole parted partition
sizing thing is a bit scary and I'm never certain if I could partition
a 'fresh' disk properly or even figure out which drive to pull if it
was failed (I wish hdd's had an activity led ;-). It's ok for the
caddies as I can just dd to null and see which caddy lights up, but
for the mobo it's a case of looking up serial numbers.

Cheers
Simon

On 15 Apr 2011, at 22:29, Phil Turmel wrote:

> A couple more thoughts...
>
> On 04/15/2011 04:54 PM, Phil Turmel wrote:
>> On 04/15/2011 03:18 PM, Simon McNair wrote:
>>> Hi all,
>>
>>> I'm under the impression that the read speed of my 10x1TB RAID5 array
>>> is limited by the 'single-threaded parity calculation' ? (I'm quoting
>>> Phil Turmel on that and other linux-raid messages I've read seem to
>>> confirm that terminology) I'm running an i7 920 with irqbalance but
>>> if something is single threaded or single CPU bound I'm wondering
>>> what I can do to alleviate it.
>>
>> I'm not intimately familiar with the code, but I would've thought the single-threaded limitation wouldn't apply to reading from a clean array.
>
> I'm also curious what top has to say while you are running your tests (with the cpus shown separately).
>
>>> iostat reports 83MB/s for each disk, running up to 830MB/s for all 10
>>> disks, but the max read speed of the array is approx 256MB/s.
>>
>> But this means I'm probably wrong, unless your chunk size is just too small for your setup. If I recall correctly, it was 64K. Consider increasing it and retesting to see if it helps.
>>
>>> Would it be better to have 5 (or more) partitions on each disk,
>>> create 5xraid5 arrays (each of which would in theory have a separate
>>> thread) and then create a linear array over the top of them to join
>>> them together ?
>>
>> Probably not. If you use linear on top, any given file is likely to reside in just one of the underlying raids, and will appear to operate at the same speed as you have now. Streaming multiple files, if they reside in different underlying raids, could go faster computationally, but will suffer from extra seeks just like the plain raid5.
>>
>> If the 83MB/s is the speed data can be pulled off the platters, a single 8ms seek displaces 664KB of data transfer.
>>
>> If you put raid0 on top of your underlying raids, you will suffer from excess seeks all the time.
>
> I believe I've mentioned to you privately that I do create partitions across my disks, and create multiple raids.
>
> Specifically:
>
> 1) a small raid1 across all disks to be /boot, so any drive that ends up as the 1st BIOS drive will boot the system.
> 2) a medium-sized raid10 across all disks to be an LVM group for /, /home, and /usr. This is the workhorse.
> 3) a large raid5 (or 6) across all disks to be an LVM group for media files and other bulk items.
>
> And if drive sizes are mixed,
> 4) a series of LVM PVs in remaining space on each drive, combined into an LVM group containing /tmp and any other low-value volumes I might need space for.
>
> Note that all partitions #1 are idle once booted, and the partitions #3 do lots of large sequential operations. That leaves the bulk of the IOPS for the raid10 and the non-redundant scratch space. This setup is a compromise, of course, but I find it suitable for desktops and modest multi-purpose servers.
>
>>> yes...I know this is way overthinking and also a potentially
>>> dangerous to recreate, but I'm curious what the opinions are. I
>>> think I'll probably just end up buying another 1TB drive and making
>>> it an 11 disk RAID6 instead. I want maximum space, maximum speed and
>>> maximum redundancy ;-).
>>
>> Heh. Pick any two. :-)
>>
>> Max Space & Speed == raid0 w/ lots of spindles.
>> Max Space & Redundancy == raid6 w/ lots of spindles.
>> Max Speed & Redundancy == raid10 w/ lots of spindles.
>
> Note that my preferred partitioning mixes approaches, with the bulk of the activity occurring under option 3.
>
> YMMV, and there's no single right answer.
>
>> There's a common theme there...
>>
>>> TIA :-)
>>
>>
>> Phil
>
> Phil
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: single threaded parity calculation ?

am 15.04.2011 23:46:09 von NeilBrown

On Fri, 15 Apr 2011 16:54:39 -0400 Phil Turmel wrote:

> On 04/15/2011 03:18 PM, Simon McNair wrote:
> > Hi all,
>
> > I'm under the impression that the read speed of my 10x1TB RAID5 array
> > is limited by the 'single-threaded parity calculation' ? (I'm quoting
> > Phil Turmel on that and other linux-raid messages I've read seem to
> > confirm that terminology) I'm running an i7 920 with irqbalance but
> > if something is single threaded or single CPU bound I'm wondering
> > what I can do to alleviate it.
>
> I'm not intimately familiar with the code, but I would've thought the single-threaded limitation wouldn't apply to reading from a clean array.

If the array is not degraded then read requests are simply divided into
chunks and sent to the appropriate disk.


>
> > iostat reports 83MB/s for each disk, running up to 830MB/s for all 10
> > disks, but the max read speed of the array is approx 256MB/s.
>
> But this means I'm probably wrong, unless your chunk size is just too small for your setup. If I recall correctly, it was 64K. Consider increasing it and retesting to see if it helps.

Agreed.
Large chunks tend to be better for large reads.

Still 256MB/s is awfully slow if the hardware can really sustain 830MB/s -
i.e. read the 10 drives simultaneously at full speed.
I would expect to get close to 90% (as 10% of the disk contains parity which
you don't read, but still have to seek over).
So 747MB/s would be expected, but only 1/3 of that is seen.

I can't explain that.

NeilBrown



>
> > Would it be better to have 5 (or more) partitions on each disk,
> > create 5xraid5 arrays (each of which would in theory have a separate
> > thread) and then create a linear array over the top of them to join
> > them together ?
>
> Probably not. If you use linear on top, any given file is likely to reside in just one of the underlying raids, and will appear to operate at the same speed as you have now. Streaming multiple files, if they reside in different underlying raids, could go faster computationally, but will suffer from extra seeks just like the plain raid5.
>
> If the 83MB/s is the speed data can be pulled off the platters, a single 8ms seek displaces 664KB of data transfer.
>
> If you put raid0 on top of your underlying raids, you will suffer from excess seeks all the time.
>
> > yes...I know this is way overthinking and also a potentially
> > dangerous to recreate, but I'm curious what the opinions are. I
> > think I'll probably just end up buying another 1TB drive and making
> > it an 11 disk RAID6 instead. I want maximum space, maximum speed and
> > maximum redundancy ;-).
>
> Heh. Pick any two. :-)
>
> Max Space & Speed == raid0 w/ lots of spindles.
> Max Space & Redundancy == raid6 w/ lots of spindles.
> Max Speed & Redundancy == raid10 w/ lots of spindles.
>
> There's a common theme there...
>
> > TIA :-)
>
>
> Phil
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: single threaded parity calculation ?

am 16.04.2011 15:45:18 von Drew

> iostat reports 83MB/s for each disk, running up to 830MB/s for all 10 disks,
> but the max read speed of the array is approx 256MB/s.

This may be a dumb question but what motherboard & disk controllers
are you using and how are they setup?

That 256MB/s feels to me like you may be bumping up against a
bandwidth limitation of your bus. PCI (32bit/33MHz) is limited to
133MB/s max and PCIe(1.0/2.0) is limited to 250/500MB/s per lane. To
sustain 830MB/s you need at least a PCIe (1.0) x4 or PCI-X 133
controller card which are both rated for (theoretically) 1GB/s.

As an aside, for read speeds on a RAID-5, the parity disk isn't used
so theoretical maximum in this case is 9x83MB/s or 747MB/s. ;-)


--
Drew

"Nothing in life is to be feared. It is only to be understood."
--Marie Curie
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: single threaded parity calculation ?

am 16.04.2011 21:07:14 von Simon McNair

Drew,
Thanks for the response. Phil is already intimately familiar with this
but it's an Asus p7t deluxe v2 and the card is a pcie x4 supermicro 8
port AOC-SASLP-MV8.
Simon

On 16 Apr 2011, at 14:45, Drew > wrote:

>> iostat reports 83MB/s for each disk, running up to 830MB/s for all 10
>> disks,
>> but the max read speed of the array is approx 256MB/s.
>
> This may be a dumb question but what motherboard & disk controllers
> are you using and how are they setup?
>
> That 256MB/s feels to me like you may be bumping up against a
> bandwidth limitation of your bus. PCI (32bit/33MHz) is limited to
> 133MB/s max and PCIe(1.0/2.0) is limited to 250/500MB/s per lane. To
> sustain 830MB/s you need at least a PCIe (1.0) x4 or PCI-X 133
> controller card which are both rated for (theoretically) 1GB/s.
>
> As an aside, for read speeds on a RAID-5, the parity disk isn't used
> so theoretical maximum in this case is 9x83MB/s or 747MB/s. ;-)
>
>
> --
> Drew
>
> "Nothing in life is to be feared. It is only to be understood."
> --Marie Curie
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html