[PATCH/RFC] md/raid10: optimize read_balance() for "far copies" arrays

[PATCH/RFC] md/raid10: optimize read_balance() for "far copies" arrays

am 08.06.2011 09:00:45 von Namhyung Kim

If @conf->far_offset > 0, there is only 1 stripe so that we can treat
the array same as 'near' arrays. Furthermore we could calculate new
distance from the previous position even for the real 'far' array
cases if the position of given disk is already in the lowest stripe.

Signed-off-by: Namhyung Kim
---
drivers/md/raid10.c | 14 +++++++++++---
1 files changed, 11 insertions(+), 3 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 6e846688962f..9ec4c5f8cd48 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -531,11 +531,19 @@ retry:
break;

/* for far > 1 always use the lowest address */
- if (conf->far_copies > 1)
- new_distance = r10_bio->devs[slot].addr;
- else
+ if (conf->far_copies > 1 && conf->far_offset == 0) {
+ if (conf->mirrors[disk].head_position < conf->stride &&
+ r10_bio->devs[slot].addr < conf->stride)
+ /* already in the lowest stripe */
+ new_distance = abs(r10_bio->devs[slot].addr -
+ conf->mirrors[disk].head_position);
+ else
+ new_distance = r10_bio->devs[slot].addr;
+ } else {
new_distance = abs(r10_bio->devs[slot].addr -
conf->mirrors[disk].head_position);
+ }
+
if (new_distance < best_dist) {
best_dist = new_distance;
best_slot = slot;
--
1.7.5.2

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH/RFC] md/raid10: optimize read_balance() for "far copies"arrays

am 08.06.2011 09:21:57 von NeilBrown

On Wed, 8 Jun 2011 16:00:45 +0900 Namhyung Kim wrote:

> If @conf->far_offset > 0, there is only 1 stripe so that we can treat
> the array same as 'near' arrays. Furthermore we could calculate new
> distance from the previous position even for the real 'far' array
> cases if the position of given disk is already in the lowest stripe.
>
> Signed-off-by: Namhyung Kim
> ---
> drivers/md/raid10.c | 14 +++++++++++---
> 1 files changed, 11 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index 6e846688962f..9ec4c5f8cd48 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -531,11 +531,19 @@ retry:
> break;
>
> /* for far > 1 always use the lowest address */
> - if (conf->far_copies > 1)
> - new_distance = r10_bio->devs[slot].addr;
> - else
> + if (conf->far_copies > 1 && conf->far_offset == 0) {
> + if (conf->mirrors[disk].head_position < conf->stride &&
> + r10_bio->devs[slot].addr < conf->stride)
> + /* already in the lowest stripe */
> + new_distance = abs(r10_bio->devs[slot].addr -
> + conf->mirrors[disk].head_position);
> + else
> + new_distance = r10_bio->devs[slot].addr;
> + } else {
> new_distance = abs(r10_bio->devs[slot].addr -
> conf->mirrors[disk].head_position);
> + }
> +
> if (new_distance < best_dist) {
> best_dist = new_distance;
> best_slot = slot;


I agree that it still make sense to to balancing if far_offset != 0.
However there is absolutely no point in your change to the calculation of
new_distance.
You only wont new_distance to contain a distance from head position if we
want to choose the device with the 'closest' head. But we don't. We want to
choose the device were the data is closest to the start of the device. So
the current value for new_distance is correct.

If you would like to resubmit with just the first change I'll happily apply
the patch.

If you have performed some tests and can demonstrate some cases where this
makes something faster, and can show us the results of those tests, I would
be even more happy!!!

Thanks,
NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH/RFC] md/raid10: optimize read_balance() for "far copies" arrays

am 08.06.2011 09:42:27 von Namhyung Kim

NeilBrown writes:

> On Wed, 8 Jun 2011 16:00:45 +0900 Namhyung Kim wrote:
>
>> If @conf->far_offset > 0, there is only 1 stripe so that we can treat
>> the array same as 'near' arrays. Furthermore we could calculate new
>> distance from the previous position even for the real 'far' array
>> cases if the position of given disk is already in the lowest stripe.
>>
> I agree that it still make sense to to balancing if far_offset != 0.
> However there is absolutely no point in your change to the calculation of
> new_distance.
> You only wont new_distance to contain a distance from head position if we
> want to choose the device with the 'closest' head. But we don't. We want to
> choose the device were the data is closest to the start of the device. So
> the current value for new_distance is correct.
>

Still can't understand why we choose the closest-to-the-start disk in
case we could have possible sequencial access on other disk. Probably
because of the lack of my understanding how md/disk works :(


> If you would like to resubmit with just the first change I'll happily apply
> the patch.
>

OK. Will do that right soon.


> If you have performed some tests and can demonstrate some cases where this
> makes something faster, and can show us the results of those tests, I would
> be even more happy!!!
>

I wish I could. :) However, unfortunately, I don't have such a real system
to test on.

Thanks.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH/RFC] md/raid10: optimize read_balance() for "far copies" arrays

am 08.06.2011 13:49:25 von Keld Simonsen

On Wed, Jun 08, 2011 at 04:42:27PM +0900, Namhyung Kim wrote:
> NeilBrown writes:
>
> > On Wed, 8 Jun 2011 16:00:45 +0900 Namhyung Kim wrote:
> >
> >> If @conf->far_offset > 0, there is only 1 stripe so that we can treat
> >> the array same as 'near' arrays. Furthermore we could calculate new
> >> distance from the previous position even for the real 'far' array
> >> cases if the position of given disk is already in the lowest stripe.
> >>
> > I agree that it still make sense to to balancing if far_offset != 0.
> > However there is absolutely no point in your change to the calculation of
> > new_distance.
> > You only wont new_distance to contain a distance from head position if we
> > want to choose the device with the 'closest' head. But we don't. We want to
> > choose the device were the data is closest to the start of the device. So
> > the current value for new_distance is correct.
> >
>
> Still can't understand why we choose the closest-to-the-start disk in
> case we could have possible sequencial access on other disk. Probably
> because of the lack of my understanding how md/disk works :(

the nearest position was the case for the initial implementation of
raid10-far. But this had bad performance for an array with disks of
varying specifications. And also it led to not using the faster
outer sectors. Using the closest-to-beginning gave a spped-up of about
50 % in some cases.

best regards
keld
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH/RFC] md/raid10: optimize read_balance() for "far copies" arrays

am 08.06.2011 16:39:31 von Namhyung Kim

Keld Jørn Simonsen writes:
> On Wed, Jun 08, 2011 at 04:42:27PM +0900, Namhyung Kim wrote:
>> Still can't understand why we choose the closest-to-the-start disk i=
n
>> case we could have possible sequencial access on other disk. Probabl=
y
>> because of the lack of my understanding how md/disk works :(
>
> the nearest position was the case for the initial implementation of
> raid10-far. But this had bad performance for an array with disks of
> varying specifications. And also it led to not using the faster
> outer sectors. Using the closest-to-beginning gave a spped-up of abou=
t
> 50 % in some cases.
>

Hi Keld,

Thanks for the explanation. That means lower sectors reside on the oute=
r
tracks/cylinders in the disk, right? The 50% seems a huge improvement I
couldn't stand against. Although my patch tried to choose
closest-to-current-head disk if the disk head is in the lowest stripe -
in the (similar) hope that it'd be on the outer tracks - I don't have
the numbers, so I'll just give up on it.

Besides, I just noticed that the rationale behind read_balance()
pressumed that all components of the array are traditional disks. If we
could detect all/some of them are not (i.e. SSD, etc.), it would be
better off using some other criteria for the read balancing IMHO,
something like nr_pending?

--=20
Regards,
Namhyung Kim
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH/RFC] md/raid10: optimize read_balance() for "far copies"arrays

am 10.06.2011 16:29:34 von Bill Davidsen

Namhyung Kim wrote:
> NeilBrown writes:
>
>
>> On Wed, 8 Jun 2011 16:00:45 +0900 Namhyung Kim wrote:
>>
>>
>>> If @conf->far_offset> 0, there is only 1 stripe so that we can treat
>>> the array same as 'near' arrays. Furthermore we could calculate new
>>> distance from the previous position even for the real 'far' array
>>> cases if the position of given disk is already in the lowest stripe.
>>>
>>>
>> I agree that it still make sense to to balancing if far_offset != 0.
>> However there is absolutely no point in your change to the calculation of
>> new_distance.
>> You only wont new_distance to contain a distance from head position if we
>> want to choose the device with the 'closest' head. But we don't. We want to
>> choose the device were the data is closest to the start of the device. So
>> the current value for new_distance is correct.
>>
>>
> Still can't understand why we choose the closest-to-the-start disk in
> case we could have possible sequencial access on other disk. Probably
> because of the lack of my understanding how md/disk works :(
>

This code is all based on traditional drives, where the seek time,
rotational latency, and position on the platter are all factors which
effect performance in some way. Devices like SSD don't have these
factors (ie. they are constants) and someday it may make sense to
rethink this code again.

Also note that "close to current" optimizes seek time, while "close to
beginning" optimizes transfer rate. Note the total lack of parameters to
tune "what you want" for a given device.

--
Bill Davidsen
We are not out of the woods yet, but we know the direction and have
taken the first step. The steps are many, but finite in number, and if
we persevere we will reach our destination. -me, 2010



--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html