RAID-10 initial sync is CPU-limited

am 03.01.2011 17:32:13 von Jan Kasprzak

Hello, Linux md developers!

I am trying to build a new md-based RAID-10 array out of 24 disks,
but it seems that the initial sync of the array is heavily limited
by CPU:

During the resync only 1-2 CPUs are busy (one for md1_raid10 thread
which is uses 100 % of a single CPU, and one for md1_resync thread, which
uses about 80 % of a single CPU).

Are there plans to make this process more parallel? I can imagine
that for near-copies algorithm there can be a separate thread for each
pair of disks in the RAID-10 array.

My hardware is apparently able to keep all the disks busy most
of the time (verified by running dd if=/dev/sd$i bs=1M of=/dev/null
in parallel - iostat reports 99-100 % utilization of each disk and about
55 MB/s read per disk). All the disks are connected by a four-lane
SAS controller, so the maximum theoretical throughput is 4x 3 Gbit/s
= 12 Gbit/s = 0.5 Gbit/s per disk = 62.5 MByte/s per disk).

Here are the performance data from the initial resync:

# cat /proc/mdstat
[...]
md1 : active raid10 sdz2[23] sdy2[22] sdx2[21] sdw2[20] sdu2[19] sdt2[18] sds2[17] sdr2[16] sdq2[15] sdp2[14] sdo2[13] sdn2[12] sdm2[11] sdl2[10] sdk2[9] sdj2[8] sdi2[7] sdh2[6] sdg2[5] sdf2[4] sde2[3] sdd2[2] sdc2[1] sdb2[0]
23190484992 blocks super 1.2 512K chunks 2 near-copies [24/24] [UUUUUUUUUUUUUUUUUUUUUUUU]
[=>...................] resync = 7.3% (1713514432/23190484992) finish=796.4min speed=449437K/sec

# top
top - 23:05:31 up 8:20, 5 users, load average: 3.12, 3.29, 3.25
Tasks: 356 total, 3 running, 353 sleeping, 0 stopped, 0 zombie
Cpu(s): 0.0%us, 8.3%sy, 0.0%ni, 91.1%id, 0.0%wa, 0.0%hi, 0.6%si, 0.0%st
Mem: 132298920k total, 3528792k used, 128770128k free, 53892k buffers
Swap: 10485756k total, 0k used, 10485756k free, 818496k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
12561 root 20 0 0 0 0 R 99.8 0.0 61:12.61 md1_raid10
12562 root 20 0 0 0 0 R 79.6 0.0 47:06.60 md1_resync
[...]

# iostat -kx 5
avg-cpu: %user %nice %system %iowait %steal %idle
0.00 0.00 9.54 0.00 0.00 90.46

Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
sdb 19.20 0.00 573.60 0.00 37939.20 0.00 132.28 0.50 0.87 0.30 17.26
sdc 19.60 0.00 573.20 0.00 37939.20 0.00 132.38 0.51 0.89 0.31 17.58
sdd 13.80 0.00 578.20 0.00 37888.00 0.00 131.05 0.52 0.89 0.31 18.02
sdf 19.20 0.00 572.80 0.00 37888.00 0.00 132.29 0.50 0.88 0.32 18.12
sde 12.80 0.00 579.40 0.00 37900.80 0.00 130.83 0.54 0.94 0.32 18.38
sdg 16.60 0.00 575.40 0.00 37888.00 0.00 131.69 0.53 0.93 0.33 18.76
[...]
sdy 14.40 0.00 579.20 0.00 37990.40 0.00 131.18 0.52 0.91 0.31 17.78
sdz 135.00 229.00 458.60 363.00 37990.40 37888.00 184.71 2.30 2.80 0.76 62.32
md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00

Thanks,

-Yenya

--
| Jan "Yenya" Kasprzak |
| GPG: ID 1024/D3498839 Fingerprint 0D99A7FB206605D7 8B35FCDE05B18A5E |
| http://www.fi.muni.cz/~kas/ Journal: http://www.fi.muni.cz/~kas/blog/ |
Please don't top post and in particular don't attach entire digests to your
mail or we'll all soon be using bittorrent to read the list. --Alan Cox
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: RAID-10 initial sync is CPU-limited

am 04.01.2011 06:24:37 von NeilBrown

On Mon, 3 Jan 2011 17:32:13 +0100 Jan Kasprzak wrote:

> Hello, Linux md developers!
>
> I am trying to build a new md-based RAID-10 array out of 24 disks,
> but it seems that the initial sync of the array is heavily limited
> by CPU:
>
> During the resync only 1-2 CPUs are busy (one for md1_raid10 thread
> which is uses 100 % of a single CPU, and one for md1_resync thread, which
> uses about 80 % of a single CPU).
>
> Are there plans to make this process more parallel? I can imagine
> that for near-copies algorithm there can be a separate thread for each
> pair of disks in the RAID-10 array.

No, no plans to make the resync more parallel at all.

The md1_raid10 process is probably spending lots of time in memcmp and memcpy.
The way it works is to read all blocks that should be the same, see if they
are the same and if not, copy on to the orders and write those other (or in
your case "that other").

In general this is cleaner and easier than always reading one device and
writing another.

It might be appropriate to special-case some layouts and do 'read one, write
other' when that is likely to be more efficient (patches welcome).

I'm surprised that md1_resync has such a high cpu usage though - it is just
scheduling read requests, not actually doing anything with data.

For a RAID10 it is perfectly safe to create with --assume-clean. If you also
add a write-intent bitmap, then you should never see a resync take much time
at all.

NeilBrown

>
> My hardware is apparently able to keep all the disks busy most
> of the time (verified by running dd if=/dev/sd$i bs=1M of=/dev/null
> in parallel - iostat reports 99-100 % utilization of each disk and about
> 55 MB/s read per disk). All the disks are connected by a four-lane
> SAS controller, so the maximum theoretical throughput is 4x 3 Gbit/s
> = 12 Gbit/s = 0.5 Gbit/s per disk = 62.5 MByte/s per disk).
>
> Here are the performance data from the initial resync:
>
> # cat /proc/mdstat
> [...]
> md1 : active raid10 sdz2[23] sdy2[22] sdx2[21] sdw2[20] sdu2[19] sdt2[18] sds2[17] sdr2[16] sdq2[15] sdp2[14] sdo2[13] sdn2[12] sdm2[11] sdl2[10] sdk2[9] sdj2[8] sdi2[7] sdh2[6] sdg2[5] sdf2[4] sde2[3] sdd2[2] sdc2[1] sdb2[0]
> 23190484992 blocks super 1.2 512K chunks 2 near-copies [24/24] [UUUUUUUUUUUUUUUUUUUUUUUU]
> [=>...................] resync = 7.3% (1713514432/23190484992) finish=796.4min speed=449437K/sec
>
> # top
> top - 23:05:31 up 8:20, 5 users, load average: 3.12, 3.29, 3.25
> Tasks: 356 total, 3 running, 353 sleeping, 0 stopped, 0 zombie
> Cpu(s): 0.0%us, 8.3%sy, 0.0%ni, 91.1%id, 0.0%wa, 0.0%hi, 0.6%si, 0.0%st
> Mem: 132298920k total, 3528792k used, 128770128k free, 53892k buffers
> Swap: 10485756k total, 0k used, 10485756k free, 818496k cached
>
> PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
> 12561 root 20 0 0 0 0 R 99.8 0.0 61:12.61 md1_raid10
> 12562 root 20 0 0 0 0 R 79.6 0.0 47:06.60 md1_resync
> [...]
>
> # iostat -kx 5
> avg-cpu: %user %nice %system %iowait %steal %idle
> 0.00 0.00 9.54 0.00 0.00 90.46
>
> Device: rrqm/s wrqm/s r/s w/s rkB/s wkB/s avgrq-sz avgqu-sz await svctm %util
> sdb 19.20 0.00 573.60 0.00 37939.20 0.00 132.28 0.50 0.87 0.30 17.26
> sdc 19.60 0.00 573.20 0.00 37939.20 0.00 132.38 0.51 0.89 0.31 17.58
> sdd 13.80 0.00 578.20 0.00 37888.00 0.00 131.05 0.52 0.89 0.31 18.02
> sdf 19.20 0.00 572.80 0.00 37888.00 0.00 132.29 0.50 0.88 0.32 18.12
> sde 12.80 0.00 579.40 0.00 37900.80 0.00 130.83 0.54 0.94 0.32 18.38
> sdg 16.60 0.00 575.40 0.00 37888.00 0.00 131.69 0.53 0.93 0.33 18.76
> [...]
> sdy 14.40 0.00 579.20 0.00 37990.40 0.00 131.18 0.52 0.91 0.31 17.78
> sdz 135.00 229.00 458.60 363.00 37990.40 37888.00 184.71 2.30 2.80 0.76 62.32
> md1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
>
> Thanks,
>
> -Yenya
>

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: RAID-10 initial sync is CPU-limited

am 04.01.2011 09:29:44 von Jan Kasprzak

NeilBrown wrote:
: The md1_raid10 process is probably spending lots of time in memcmp and memcpy.
: The way it works is to read all blocks that should be the same, see if they
: are the same and if not, copy on to the orders and write those other (or in
: your case "that other").

According to dmesg(8) my hardware is able to do XOR
at 9864 MB/s using generic_sse, and 2167 MB/s using int64x1. So I assume
memcmp+memcpy would not be much slower. According to /proc/mdstat, the resync
is running at 449 MB/s. So I expect just memcmp+memcpy cannot be a bottleneck
here.

: It might be appropriate to special-case some layouts and do 'read one, write
: other' when that is likely to be more efficient (patches welcome).
:
: I'm surprised that md1_resync has such a high cpu usage though - it is just
: scheduling read requests, not actually doing anything with data.

Maybe it is busy-waiting in some spinlock or whatever?
Can I test it somehow? I still have several days before I have to
put the server in question to the production use.

One more question, though: current mdadm creates the RAID-10
array with 512k chunk size, while XFS supports onlo 256k chunks (sunit).
I think the default value should be supported by XFS (either by modifying
XFS which I don't know how hard it could be, or by changing the default
in mdadm, which is trivial).

Thanks,

-Yenya

--
| Jan "Yenya" Kasprzak |
| GPG: ID 1024/D3498839 Fingerprint 0D99A7FB206605D7 8B35FCDE05B18A5E |
| http://www.fi.muni.cz/~kas/ Journal: http://www.fi.muni.cz/~kas/blog/ |
Please don't top post and in particular don't attach entire digests to your
mail or we'll all soon be using bittorrent to read the list. --Alan Cox
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: RAID-10 initial sync is CPU-limited

am 04.01.2011 12:15:58 von NeilBrown

On Tue, 4 Jan 2011 09:29:44 +0100 Jan Kasprzak wrote:

> NeilBrown wrote:
> : The md1_raid10 process is probably spending lots of time in memcmp and memcpy.
> : The way it works is to read all blocks that should be the same, see if they
> : are the same and if not, copy on to the orders and write those other (or in
> : your case "that other").
>
> According to dmesg(8) my hardware is able to do XOR
> at 9864 MB/s using generic_sse, and 2167 MB/s using int64x1. So I assume
> memcmp+memcpy would not be much slower. According to /proc/mdstat, the resync
> is running at 449 MB/s. So I expect just memcmp+memcpy cannot be a bottleneck
> here.

Maybe ... though I'm not at all sure that the speed-test filters out cache
effects properly...

>
> : It might be appropriate to special-case some layouts and do 'read one, write
> : other' when that is likely to be more efficient (patches welcome).
> :
> : I'm surprised that md1_resync has such a high cpu usage though - it is just
> : scheduling read requests, not actually doing anything with data.
>
> Maybe it is busy-waiting in some spinlock or whatever?
> Can I test it somehow? I still have several days before I have to
> put the server in question to the production use.

Nothing particularly useful springs to mind.

It might be interesting to try creating arrays is 2,4,6,8,10.... devices and
see how the resync speed changes with the number of devices. You could even
graph that - I love graphs. I cannot say if it would provide useful info or
not.

>
> One more question, though: current mdadm creates the RAID-10
> array with 512k chunk size, while XFS supports onlo 256k chunks (sunit).
> I think the default value should be supported by XFS (either by modifying
> XFS which I don't know how hard it could be, or by changing the default
> in mdadm, which is trivial).

I doubt that XFS would suffer from using a 256k sunit with a 512k chunk
RAID10. If you really want to know how the two numbers interact I suggest
you ask on the XFS mailing list - the man page doesn't seem particularly
helpful (and doesn't even mention a maximum).

I am unlikely to change the default chunksize, but if you find that XFS would
benefit from md arrays using at most 256K chunks, I could put a note in the
man page for mdadm....

NeilBrown

>
> Thanks,
>
> -Yenya
>

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: RAID-10 initial sync is CPU-limited

am 04.01.2011 15:47:13 von John Robinson

On 04/01/2011 08:29, Jan Kasprzak wrote:
> NeilBrown wrote:
> : The md1_raid10 process is probably spending lots of time in memcmp and memcpy.
> : The way it works is to read all blocks that should be the same, see if they
> : are the same and if not, copy on to the orders and write those other (or in
> : your case "that other").
>
> According to dmesg(8) my hardware is able to do XOR
> at 9864 MB/s using generic_sse, and 2167 MB/s using int64x1. So I assume
> memcmp+memcpy would not be much slower. According to /proc/mdstat, the resync
> is running at 449 MB/s. So I expect just memcmp+memcpy cannot be a bottleneck
> here.

I think it can. Those XOR benchmarks only tell you what the CPU core can
do internally, and don't reflect FSB/RAM bandwidth. My Core 2 Quad
3.2GHz on 1.6GHz FSB with dual-channel memory at 800MHz each (P45
chipset) has maximum memory bandwidth of about 4.5GB/s with two sticks
of RAM, according to memtest86+. With 4 sticks of RAM it's 3.5GB/s. In
real use it'll be rather less.

What you are doing with the resync is reading from two discs into RAM,
reading both from RAM into the CPU, which does the memcmp+memcpy, then
writing from the CPU into the RAM, and writing from RAM to one of the
discs. That means you're using your RAM 6 times for each chunk of data,
so the maximum resync throughput would be a sixth of your RAM's maximum
throughput - in my case, ~575MB/s - and as I say in real use I'd expect
it to be considerably less than this, and I imagine you would see this
memory saturation as high CPU usage.

One core can easily saturate the memory bandwidth, so having multiple
threads would not help at all.

I think the above may demonstrate why it may be worthwhile optimising
the resync in some circumstances to read one disc and write the other:
(a) if you memcpy it, you go through RAM 4 times instead of 6;
(b) if you can just write what you read in the first place, without
copying it so it never has to come to and from the CPU, you go through
RAM only twice;
(c) if you could get the discs/controllers to DMA the data straight from
one to the other, you'd never hit RAM at all.

In the mean time, wiping your discs before you create the array with `dd
if=/dev/zero of=/dev/disk` would only go from RAM to disc twice (once
for each disc), then create the array with --assume-clean.

Cheers,

John.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: RAID-10 initial sync is CPU-limited

am 04.01.2011 15:54:28 von John Robinson

On 03/01/2011 16:32, Jan Kasprzak wrote:
[...]
> My hardware is apparently able to keep all the disks busy most
> of the time (verified by running dd if=/dev/sd$i bs=1M of=/dev/null
> in parallel - iostat reports 99-100 % utilization of each disk and about
> 55 MB/s read per disk). All the disks are connected by a four-lane
> SAS controller, so the maximum theoretical throughput is 4x 3 Gbit/s
> = 12 Gbit/s = 0.5 Gbit/s per disk = 62.5 MByte/s per disk).

This is part of your limit. You're able to read from your discs at about
1320MB/s (24*55MB/s), but you're throwing the data away. Doing the
resync, you'd be reading two chunks for each one that's written, so
reading about 880MB/s and writing about 440MB/s. Modern 7200rpm discs
ought to be able to read at perhaps 125MB/s and write at over 100MB/s,
but because you're throttled by the PCIe x4 interface, you're only
getting about half of what your discs could do.

Check how fast one disc can go on its own with access to the whole PCIe
interface with a single dd invocation.

Cheers,

John.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: RAID-10 initial sync is CPU-limited

am 04.01.2011 17:41:23 von Jan Kasprzak

John Robinson wrote:
: On 03/01/2011 16:32, Jan Kasprzak wrote:
: [...]
: > My hardware is apparently able to keep all the disks busy most
: >of the time (verified by running dd if=/dev/sd$i bs=1M of=/dev/null
: >in parallel - iostat reports 99-100 % utilization of each disk and about
: >55 MB/s read per disk). All the disks are connected by a four-lane
: >SAS controller, so the maximum theoretical throughput is 4x 3 Gbit/s
: >= 12 Gbit/s = 0.5 Gbit/s per disk = 62.5 MByte/s per disk).
:
: This is part of your limit.

Yes, I am aware of this. A single disk is able to do about
147 MB/s according to hdparm -t. However (a big "however"),
my usage pattern rarely issues big/sequential requests, and for more
random load the total throughput generated by all disks will be
much lower and the disks themselves become the bottleneck.

I have just been suriprised that for initial RAID-10 resync
the bottleneck is in the (single) CPU.

: but because you're throttled by the PCIe x4 interface, you're only
: getting about half of what your discs could do.

I have not talked about PCIe x4, but SAS 4-way multichannel.
Anyway, my SAS controller is connected by PCIe 2.0 x8, which equals
to (if I read Wikipedia correctly :-) 32 Gbit/s, i.e. 2 GByte/s.
So PCIe is not a bottleneck here. SAS is, and I am aware of that.

-Yenya

--
| Jan "Yenya" Kasprzak |
| GPG: ID 1024/D3498839 Fingerprint 0D99A7FB206605D7 8B35FCDE05B18A5E |
| http://www.fi.muni.cz/~kas/ Journal: http://www.fi.muni.cz/~kas/blog/ |
Please don't top post and in particular don't attach entire digests to your
mail or we'll all soon be using bittorrent to read the list. --Alan Cox
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: RAID-10 initial sync is CPU-limited

am 04.01.2011 18:05:34 von John Robinson

On 04/01/2011 16:41, Jan Kasprzak wrote:
> John Robinson wrote:
[...]
> Yes, I am aware of this. A single disk is able to do about
> 147 MB/s according to hdparm -t. However (a big "however"),
> my usage pattern rarely issues big/sequential requests, and for more
> random load the total throughput generated by all disks will be
> much lower and the disks themselves become the bottleneck.

Sure, but doing a resync does require huge sequential reads and writes.

> I have just been suriprised that for initial RAID-10 resync
> the bottleneck is in the (single) CPU.
>
> : but because you're throttled by the PCIe x4 interface, you're only
> : getting about half of what your discs could do.
>
> I have not talked about PCIe x4, but SAS 4-way multichannel.

My bad. Same effect in this situation though.

> Anyway, my SAS controller is connected by PCIe 2.0 x8, which equals
> to (if I read Wikipedia correctly :-) 32 Gbit/s, i.e. 2 GByte/s.
> So PCIe is not a bottleneck here. SAS is, and I am aware of that.

Which is why the the md kernel threads appear to be using 100% of CPU,
they're blocked waiting for I/O. (And possibly RAM, per my other reply
to this thread.)

Cheers,

John.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: RAID-10 initial sync is CPU-limited

am 04.01.2011 18:13:24 von Jan Kasprzak

John Robinson wrote:
: > According to dmesg(8) my hardware is able to do XOR
: >at 9864 MB/s using generic_sse, and 2167 MB/s using int64x1. So I assume
: >memcmp+memcpy would not be much slower. According to /proc/mdstat, the
: >resync
: >is running at 449 MB/s. So I expect just memcmp+memcpy cannot be a
: >bottleneck
: >here.
:
: I think it can. Those XOR benchmarks only tell you what the CPU core can
: do internally, and don't reflect FSB/RAM bandwidth.

Fair enough.

: My Core 2 Quad
: 3.2GHz on 1.6GHz FSB with dual-channel memory at 800MHz each (P45
: chipset) has maximum memory bandwidth of about 4.5GB/s with two sticks
: of RAM, according to memtest86+. With 4 sticks of RAM it's 3.5GB/s. In
: real use it'll be rather less.

My system has 16 1333MHz DIMMs, so I expect the total
available bandwidth would be much higher than 6x 449 MB/s.

: One core can easily saturate the memory bandwidth, so having multiple
: threads would not help at all.

I am not sure about that, especially on NUMA systems
(my system is dual-socket Opteron 6128). I would think having at least
two threads (each one running on a core in a different socket) can help.

: (a) if you memcpy it, you go through RAM 4 times instead of 6;

Yes, I was wondering why the resync does memcpy at all instead
of passing the buffer to the other half of a mirror and doing DMA from it
as soon as memcmp fails.

: In the mean time, wiping your discs before you create the array with `dd
: if=/dev/zero of=/dev/disk` would only go from RAM to disc twice (once
: for each disc), then create the array with --assume-clean.

I think it is possible to do --assume-clean even without
cleaning the disk, provided that the resulting md device is used by a
filesystem. I don't think there is a filesystem that reads blocks which
it did not write before.

Anyway, I have tried to do "echo check > /sys/block/md1/md/sync_action"
and apparently just checking the array without writing (i.e. just memcmp
without memcpy) is sometimes able to keep the disks with 100% utilization
according to iostat. In /proc/mdstat I can see the rebuild speed of about
520 MB/s. md1_resync uses about 40-50% of a single CPU, and md1_raid10
still uses 90-100%.

Another possible source of the overhead is that the resync
uses page-sized chunks instead of something bigger, and relies on the
block layer to do request merging. I observe high variance of
the avgrq-sz value in iostat (varying between about 120 to 280).
Maybe this is what causes the md1_raid10 high CPU utilization?

Sincerely,

-Yenya

--
| Jan "Yenya" Kasprzak |
| GPG: ID 1024/D3498839 Fingerprint 0D99A7FB206605D7 8B35FCDE05B18A5E |
| http://www.fi.muni.cz/~kas/ Journal: http://www.fi.muni.cz/~kas/blog/ |
Please don't top post and in particular don't attach entire digests to your
mail or we'll all soon be using bittorrent to read the list. --Alan Cox
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: RAID-10 initial sync is CPU-limited

am 04.01.2011 18:17:57 von Jan Kasprzak

John Robinson wrote:
: > Yes, I am aware of this. A single disk is able to do about
: >147 MB/s according to hdparm -t. However (a big "however"),
: >my usage pattern rarely issues big/sequential requests, and for more
: >random load the total throughput generated by all disks will be
: >much lower and the disks themselves become the bottleneck.
:
: Sure, but doing a resync does require huge sequential reads and writes.

Yes, but it does not need to be further throttled by using
a single CPU core.

: >Anyway, my SAS controller is connected by PCIe 2.0 x8, which equals
: >to (if I read Wikipedia correctly :-) 32 Gbit/s, i.e. 2 GByte/s.
: >So PCIe is not a bottleneck here. SAS is, and I am aware of that.
:
: Which is why the the md kernel threads appear to be using 100% of CPU,
: they're blocked waiting for I/O. (And possibly RAM, per my other reply
: to this thread.)

I don't think those threads are busy-waiting for I/O.
And other type of waiting does not appear as using 100% CPU.

Sincerely,

-Yenya

--
| Jan "Yenya" Kasprzak |
| GPG: ID 1024/D3498839 Fingerprint 0D99A7FB206605D7 8B35FCDE05B18A5E |
| http://www.fi.muni.cz/~kas/ Journal: http://www.fi.muni.cz/~kas/blog/ |
Please don't top post and in particular don't attach entire digests to your
mail or we'll all soon be using bittorrent to read the list. --Alan Cox
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html