[PATCH RESEND 0/5] md/raid10 changes

am 15.06.2011 04:01:59 von Namhyung Kim

Hello Neil,

This is a resend of my previous raid10 patches. Some of them are revised
according to your comments and some of them are not reviewed at all.
Please take a look.

Thanks.

Namhyung Kim (5):
md/raid10: optimize read_balance() for 'far offset' arrays
md/raid10: get rid of duplicated conditional expression
md/raid10: factor out common bio handling code
md/raid10: share pages between read and write bio's during recovery
md/raid10: spread read for subordinate r10bios during recovery

drivers/md/raid10.c | 83 +++++++++++++++++++++++++++-----------------------
1 files changed, 45 insertions(+), 38 deletions(-)

--
1.7.5.2

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

[PATCH v2 1/5] md/raid10: optimize read_balance() for "far offset" arrays

am 15.06.2011 04:02:00 von Namhyung Kim

If @conf->far_offset > 0, there is only 1 stripe so that we can treat
the array same as 'near' arrays.

Signed-off-by: Namhyung Kim
---
drivers/md/raid10.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index 6e846688962f..fc56bdd8c3fb 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -531,7 +531,7 @@ retry:
break;

/* for far > 1 always use the lowest address */
- if (conf->far_copies > 1)
+ if (conf->far_copies > 1 && conf->far_offset == 0)
new_distance = r10_bio->devs[slot].addr;
else
new_distance = abs(r10_bio->devs[slot].addr -
--
1.7.5.2

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

[PATCH 2/5] md/raid10: get rid of duplicated conditional expression

am 15.06.2011 04:02:01 von Namhyung Kim

Variable 'first' is initialized to zero and updated to @rdev->raid_disk
only if it is greater than 0. Thus condition '>= first' always implies
'>= 0' so the latter is not needed.

Signed-off-by: Namhyung Kim
---
drivers/md/raid10.c | 3 +--
1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index fc56bdd8c3fb..fcb86e86bc31 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1093,8 +1093,7 @@ static int raid10_add_disk(mddev_t *mddev, mdk_rdev_t *rdev)
if (rdev->raid_disk >= 0)
first = last = rdev->raid_disk;

- if (rdev->saved_raid_disk >= 0 &&
- rdev->saved_raid_disk >= first &&
+ if (rdev->saved_raid_disk >= first &&
conf->mirrors[rdev->saved_raid_disk].rdev == NULL)
mirror = rdev->saved_raid_disk;
else
--
1.7.5.2

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

[PATCH 3/5] md/raid10: factor out common bio handling code

am 15.06.2011 04:02:02 von Namhyung Kim

When normal-write and sync-read/write bio completes, we should
find out the disk number the bio belongs to. Factor those common
code out to a separate function.

Signed-off-by: Namhyung Kim
---
drivers/md/raid10.c | 44 +++++++++++++++++++++++---------------------
1 files changed, 23 insertions(+), 21 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index fcb86e86bc31..a53779ffdf89 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -244,6 +244,23 @@ static inline void update_head_pos(int slot, r10bio_t *r10_bio)
r10_bio->devs[slot].addr + (r10_bio->sectors);
}

+/*
+ * Find the disk number which triggered given bio
+ */
+static int find_bio_disk(conf_t *conf, r10bio_t *r10_bio, struct bio *bio)
+{
+ int slot;
+
+ for (slot = 0; slot < conf->copies; slot++)
+ if (r10_bio->devs[slot].bio == bio)
+ break;
+
+ BUG_ON(slot == conf->copies);
+ update_head_pos(slot, r10_bio);
+
+ return r10_bio->devs[slot].devnum;
+}
+
static void raid10_end_read_request(struct bio *bio, int error)
{
int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
@@ -289,13 +306,10 @@ static void raid10_end_write_request(struct bio *bio, int error)
{
int uptodate = test_bit(BIO_UPTODATE, &bio->bi_flags);
r10bio_t *r10_bio = bio->bi_private;
- int slot, dev;
+ int dev;
conf_t *conf = r10_bio->mddev->private;

- for (slot = 0; slot < conf->copies; slot++)
- if (r10_bio->devs[slot].bio == bio)
- break;
- dev = r10_bio->devs[slot].devnum;
+ dev = find_bio_disk(conf, r10_bio, bio);

/*
* this branch is our 'one mirror IO has finished' event handler:
@@ -316,8 +330,6 @@ static void raid10_end_write_request(struct bio *bio, int error)
*/
set_bit(R10BIO_Uptodate, &r10_bio->state);

- update_head_pos(slot, r10_bio);
-
/*
*
* Let's see if all mirrored write operations have finished
@@ -1173,14 +1185,9 @@ static void end_sync_read(struct bio *bio, int error)
{
r10bio_t *r10_bio = bio->bi_private;
conf_t *conf = r10_bio->mddev->private;
- int i,d;
+ int d;

- for (i=0; icopies; i++)
- if (r10_bio->devs[i].bio == bio)
- break;
- BUG_ON(i == conf->copies);
- update_head_pos(i, r10_bio);
- d = r10_bio->devs[i].devnum;
+ d = find_bio_disk(conf, r10_bio, bio);

if (test_bit(BIO_UPTODATE, &bio->bi_flags))
set_bit(R10BIO_Uptodate, &r10_bio->state);
@@ -1211,18 +1218,13 @@ static void end_sync_write(struct bio *bio, int error)
r10bio_t *r10_bio = bio->bi_private;
mddev_t *mddev = r10_bio->mddev;
conf_t *conf = mddev->private;
- int i,d;
+ int d;

- for (i = 0; i < conf->copies; i++)
- if (r10_bio->devs[i].bio == bio)
- break;
- d = r10_bio->devs[i].devnum;
+ d = find_bio_disk(conf, r10_bio, bio);

if (!uptodate)
md_error(mddev, conf->mirrors[d].rdev);

- update_head_pos(i, r10_bio);
-
rdev_dec_pending(conf->mirrors[d].rdev, mddev);
while (atomic_dec_and_test(&r10_bio->remaining)) {
if (r10_bio->master_bio == NULL) {
--
1.7.5.2

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

[PATCH v2 4/5] md/raid10: share pages between read and write bio"s during recovery

am 15.06.2011 04:02:03 von Namhyung Kim

When performing a recovery, only first 2 slots in r10_bio are in use,
for read and write respectively. However all of pages in the write bio
are never used and just replaced to read bio's when the read completes.

Get rid of those unused pages and share read pages properly.

Signed-off-by: Namhyung Kim
---
drivers/md/raid10.c | 24 +++++++++++++-----------
1 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index a53779ffdf89..dea73bdb99b8 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -123,7 +123,15 @@ static void * r10buf_pool_alloc(gfp_t gfp_flags, void *data)
for (j = 0 ; j < nalloc; j++) {
bio = r10_bio->devs[j].bio;
for (i = 0; i < RESYNC_PAGES; i++) {
- page = alloc_page(gfp_flags);
+ if (j == 1 && !test_bit(MD_RECOVERY_SYNC,
+ &conf->mddev->recovery)) {
+ /* we can share bv_page's during recovery */
+ struct bio *rbio = r10_bio->devs[0].bio;
+ page = rbio->bi_io_vec[i].bv_page;
+ get_page(page);
+ } else {
+ page = alloc_page(gfp_flags);
+ }
if (unlikely(!page))
goto out_free_pages;

@@ -1360,20 +1368,14 @@ done:
static void recovery_request_write(mddev_t *mddev, r10bio_t *r10_bio)
{
conf_t *conf = mddev->private;
- int i, d;
- struct bio *bio, *wbio;
-
+ int d;
+ struct bio *wbio;

- /* move the pages across to the second bio
+ /*
+ * share the pages with the first bio
* and submit the write request
*/
- bio = r10_bio->devs[0].bio;
wbio = r10_bio->devs[1].bio;
- for (i=0; i < wbio->bi_vcnt; i++) {
- struct page *p = bio->bi_io_vec[i].bv_page;
- bio->bi_io_vec[i].bv_page = wbio->bi_io_vec[i].bv_page;
- wbio->bi_io_vec[i].bv_page = p;
- }
d = r10_bio->devs[1].devnum;

atomic_inc(&conf->mirrors[d].rdev->nr_pending);
--
1.7.5.2

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

[PATCH 5/5] md/raid10: spread read for subordinate r10bios during recovery

am 15.06.2011 04:02:04 von Namhyung Kim

In the current scheme, multiple read request could be directed to
the first active disk during recovery if there are several disk
failure at the same time. Spreading those requests on other in-sync
disks might be helpful.

Signed-off-by: Namhyung Kim
---
drivers/md/raid10.c | 10 +++++++---
1 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index dea73bdb99b8..d0188e49f881 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -1832,6 +1832,7 @@ static sector_t sync_request(mddev_t *mddev, sector_t sector_nr,
if (!test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) {
/* recovery... the complicated one */
int j, k;
+ int last_read = -1;
r10_bio = NULL;

for (i=0 ; iraid_disks; i++) {
@@ -1891,7 +1892,9 @@ static sector_t sync_request(mddev_t *mddev, sector_t sector_nr,
&sync_blocks, still_degraded);

for (j=0; jcopies;j++) {
- int d = r10_bio->devs[j].devnum;
+ int c = (last_read + j + 1) % conf->copies;
+ int d = r10_bio->devs[c].devnum;
+
if (!conf->mirrors[d].rdev ||
!test_bit(In_sync, &conf->mirrors[d].rdev->flags))
continue;
@@ -1902,13 +1905,14 @@ static sector_t sync_request(mddev_t *mddev, sector_t sector_nr,
bio->bi_private = r10_bio;
bio->bi_end_io = end_sync_read;
bio->bi_rw = READ;
- bio->bi_sector = r10_bio->devs[j].addr +
+ bio->bi_sector = r10_bio->devs[c].addr +
conf->mirrors[d].rdev->data_offset;
bio->bi_bdev = conf->mirrors[d].rdev->bdev;
atomic_inc(&conf->mirrors[d].rdev->nr_pending);
atomic_inc(&r10_bio->remaining);
- /* and we write to 'i' */
+ last_read = c;

+ /* and we write to 'i' */
for (k=0; kcopies; k++)
if (r10_bio->devs[k].devnum == i)
break;
--
1.7.5.2

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 1/5] md/raid10: optimize read_balance() for "faroffset" arrays

am 15.06.2011 04:57:15 von NeilBrown

On Wed, 15 Jun 2011 11:02:00 +0900 Namhyung Kim wrote:

> If @conf->far_offset > 0, there is only 1 stripe so that we can treat
> the array same as 'near' arrays.
>
> Signed-off-by: Namhyung Kim
> ---
> drivers/md/raid10.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index 6e846688962f..fc56bdd8c3fb 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -531,7 +531,7 @@ retry:
> break;
>
> /* for far > 1 always use the lowest address */
> - if (conf->far_copies > 1)
> + if (conf->far_copies > 1 && conf->far_offset == 0)
> new_distance = r10_bio->devs[slot].addr;
> else
> new_distance = abs(r10_bio->devs[slot].addr -

Hi,
I realise that I said before that this part was OK but having thought about
it some more I am not convinced.
Using the 'distance' from were the head previously was is a fairly poor
heuristic. In some cases it is the best we have, but it still isn't good.

With an 'offset' layout, like with a 'far' layout, sections of the array are
in a RAID0 layout and we know that reading from a RAID0 gives good speed by
algorithmically distributing the reads evenly over all the devices.

Setting new_distance to the 'addr' means that we use the algorithmic
approach to distribute reads. Setting it to the difference between addr and
head_position uses the heuristic approach.

I much prefer the algorithmic (RAID0) approach were it is possible. So I'm
not going to apply this patch. However if you can demonstrate real speedups
in some realist test I will reconsider my position.

Thanks,
NeilBrown

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH 5/5] md/raid10: spread read for subordinate r10biosduring recovery

am 15.06.2011 05:09:03 von NeilBrown

On Wed, 15 Jun 2011 11:02:04 +0900 Namhyung Kim wrote:

> In the current scheme, multiple read request could be directed to
> the first active disk during recovery if there are several disk
> failure at the same time. Spreading those requests on other in-sync
> disks might be helpful.

I don't find this convincing either. Spreading requests over disks in a
haphazard way is not certain to improve anything and could easily cause
regressions.

For example if I have an 'n3' array on 3 devices, this will read alternately
from the first 2 devices while rebuilding the last. This is simply a waste.
One disk would be enough keep the rebuilding disk busy - the other should be
left of regular reads.

Again: if you can demonstrate a speed up in some configuration I'll be happy
to reconsider the patch.

Thanks,
NeilBrown

>
> Signed-off-by: Namhyung Kim
> ---
> drivers/md/raid10.c | 10 +++++++---
> 1 files changed, 7 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index dea73bdb99b8..d0188e49f881 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -1832,6 +1832,7 @@ static sector_t sync_request(mddev_t *mddev, sector_t sector_nr,
> if (!test_bit(MD_RECOVERY_SYNC, &mddev->recovery)) {
> /* recovery... the complicated one */
> int j, k;
> + int last_read = -1;
> r10_bio = NULL;
>
> for (i=0 ; iraid_disks; i++) {
> @@ -1891,7 +1892,9 @@ static sector_t sync_request(mddev_t *mddev, sector_t sector_nr,
> &sync_blocks, still_degraded);
>
> for (j=0; jcopies;j++) {
> - int d = r10_bio->devs[j].devnum;
> + int c = (last_read + j + 1) % conf->copies;
> + int d = r10_bio->devs[c].devnum;
> +
> if (!conf->mirrors[d].rdev ||
> !test_bit(In_sync, &conf->mirrors[d].rdev->flags))
> continue;
> @@ -1902,13 +1905,14 @@ static sector_t sync_request(mddev_t *mddev, sector_t sector_nr,
> bio->bi_private = r10_bio;
> bio->bi_end_io = end_sync_read;
> bio->bi_rw = READ;
> - bio->bi_sector = r10_bio->devs[j].addr +
> + bio->bi_sector = r10_bio->devs[c].addr +
> conf->mirrors[d].rdev->data_offset;
> bio->bi_bdev = conf->mirrors[d].rdev->bdev;
> atomic_inc(&conf->mirrors[d].rdev->nr_pending);
> atomic_inc(&r10_bio->remaining);
> - /* and we write to 'i' */
> + last_read = c;
>
> + /* and we write to 'i' */
> for (k=0; kcopies; k++)
> if (r10_bio->devs[k].devnum == i)
> break;

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH RESEND 0/5] md/raid10 changes

am 15.06.2011 05:09:54 von NeilBrown

On Wed, 15 Jun 2011 11:01:59 +0900 Namhyung Kim wrote:

> Hello Neil,
>
> This is a resend of my previous raid10 patches. Some of them are revised
> according to your comments and some of them are not reviewed at all.
> Please take a look.
>
> Thanks.
>
>
> Namhyung Kim (5):
> md/raid10: optimize read_balance() for 'far offset' arrays
> md/raid10: get rid of duplicated conditional expression
> md/raid10: factor out common bio handling code
> md/raid10: share pages between read and write bio's during recovery
> md/raid10: spread read for subordinate r10bios during recovery
>

Thanks.
I've reject the first and last as explained separately but the others look
good and I have added them to my tree. They will probably be submitted in
the next merge window.

Thanks,
NeilBrown

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH RESEND 0/5] md/raid10 changes

am 15.06.2011 05:33:01 von Namhyung Kim

2011-06-15 (ì=98), 13:09 +1000, NeilBrown:
> On Wed, 15 Jun 2011 11:01:59 +0900 Namhyung Kim =
wrote:
>=20
> > Hello Neil,
> >=20
> > This is a resend of my previous raid10 patches. Some of them are re=
vised
> > according to your comments and some of them are not reviewed at all=

> > Please take a look.
> >=20
> > Thanks.
> >=20
> >=20
> > Namhyung Kim (5):
> > md/raid10: optimize read_balance() for 'far offset' arrays
> > md/raid10: get rid of duplicated conditional expression
> > md/raid10: factor out common bio handling code
> > md/raid10: share pages between read and write bio's during recove=
ry
> > md/raid10: spread read for subordinate r10bios during recovery
> >=20
>=20
> Thanks.
> I've reject the first and last as explained separately but the other=
s look
> good and I have added them to my tree. They will probably be submit=
ted in
> the next merge window.
>=20
> Thanks,
> NeilBrown
>=20

Hi,

Thanks for reviewing them. As I said before, I don't have any realistic
test environment for those patches. They've just come from the code
inspection, and I (wrongly) thought they would be somewhat helpful,
sorry. :)

--=20
Regards,
Namhyung Kim

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 1/5] md/raid10: optimize read_balance() for "far offset" arrays

am 15.06.2011 08:51:44 von Keld Simonsen

On Wed, Jun 15, 2011 at 11:02:00AM +0900, Namhyung Kim wrote:
> If @conf->far_offset > 0, there is only 1 stripe so that we can treat
> the array same as 'near' arrays.

does it also work with more than 2 copies - eg 3 copies?
I think the original code just takes the available data blocks with the
lowest address.

Best regards
keld

> Signed-off-by: Namhyung Kim
> ---
> drivers/md/raid10.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
> index 6e846688962f..fc56bdd8c3fb 100644
> --- a/drivers/md/raid10.c
> +++ b/drivers/md/raid10.c
> @@ -531,7 +531,7 @@ retry:
> break;
>
> /* for far > 1 always use the lowest address */
> - if (conf->far_copies > 1)
> + if (conf->far_copies > 1 && conf->far_offset == 0)
> new_distance = r10_bio->devs[slot].addr;
> else
> new_distance = abs(r10_bio->devs[slot].addr -
> --
> 1.7.5.2
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 1/5] md/raid10: optimize read_balance() for "faroffset" arrays

am 15.06.2011 14:25:33 von Namhyung Kim

2011-06-15 (ì=98), 08:51 +0200, Keld JÃ¸rn Simonsen:
> On Wed, Jun 15, 2011 at 11:02:00AM +0900, Namhyung Kim wrote:
> > If @conf->far_offset > 0, there is only 1 stripe so that we can tre=
at
> > the array same as 'near' arrays.
>=20
> does it also work with more than 2 copies - eg 3 copies?
> I think the original code just takes the available data blocks with t=
he=20
> lowest address.
>=20

Hi,

Let me clarify this: AFAIK, 'far offset' array saves redundant data in
the diagonally adjacent chunk/disk, so it could be roughly thought as
'raid0' array with reduced size - just ignore redundant chunks here. It
was my mistake considering it as 'near' array. :(

Therefore, it makes more sense distributing reads over the array based
on some criteria - here, the address of starting sector - like RAID0
does. Now I see that the same goes to the 'far copies' array exactly, s=
o
the original code is correct.

Thanks.

--=20
Regards,
Namhyung Kim

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 1/5] md/raid10: optimize read_balance() for "far offset" arrays

am 15.06.2011 16:35:53 von Namhyung Kim

Namhyung Kim writes:
> 2011-06-15 (ì=98), 08:51 +0200, Keld JÃ¸rn Simonsen:
>> On Wed, Jun 15, 2011 at 11:02:00AM +0900, Namhyung Kim wrote:
>> > If @conf->far_offset > 0, there is only 1 stripe so that we can tr=
eat
>> > the array same as 'near' arrays.
>>=20
>> does it also work with more than 2 copies - eg 3 copies?
>> I think the original code just takes the available data blocks with =
the=20
>> lowest address.
>>=20
>
> Hi,
>
> Let me clarify this: AFAIK, 'far offset' array saves redundant data i=
n
> the diagonally adjacent chunk/disk, so it could be roughly thought as
> 'raid0' array with reduced size - just ignore redundant chunks here. =
It
> was my mistake considering it as 'near' array. :(
>

I'm confused again. If fo > 0 && fc > 1 && nc > 1 then it turns out to
a near array with reduced size, no? Does it still need to be treaded
as RAID0?

> Therefore, it makes more sense distributing reads over the array base=
d
> on some criteria - here, the address of starting sector - like RAID0
> does. Now I see that the same goes to the 'far copies' array exactly,=
so
> the original code is correct.
>
> Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH v2 1/5] md/raid10: optimize read_balance() for "faroffset" arrays

am 16.06.2011 01:56:07 von NeilBrown

On Wed, 15 Jun 2011 23:35:53 +0900 Namhyung Kim wr=
ote:

> Namhyung Kim writes:
> > 2011-06-15 (ì=98), 08:51 +0200, Keld JÃ¸rn Simonsen:
> >> On Wed, Jun 15, 2011 at 11:02:00AM +0900, Namhyung Kim wrote:
> >> > If @conf->far_offset > 0, there is only 1 stripe so that we can =
treat
> >> > the array same as 'near' arrays.
> >>=20
> >> does it also work with more than 2 copies - eg 3 copies?
> >> I think the original code just takes the available data blocks wit=
h the=20
> >> lowest address.
> >>=20
> >
> > Hi,
> >
> > Let me clarify this: AFAIK, 'far offset' array saves redundant data=
in
> > the diagonally adjacent chunk/disk, so it could be roughly thought =
as
> > 'raid0' array with reduced size - just ignore redundant chunks here=
It
> > was my mistake considering it as 'near' array. :(
> >
>=20
> I'm confused again. If fo > 0 && fc > 1 && nc > 1 then it turns out t=
o
> a near array with reduced size, no? Does it still need to be treaded
> as RAID0?

This would be a mix of near and offset. I'm not at all sure what the "=
best"
read balancing approach would be. But as I don't think anyone would ev=
er
actually use it, I don't think it really matters.

Thanks,
NeilBrown

>=20
>=20
> > Therefore, it makes more sense distributing reads over the array ba=
sed
> > on some criteria - here, the address of starting sector - like RAID=
0
> > does. Now I see that the same goes to the 'far copies' array exactl=
y, so
> > the original code is correct.
> >
> > Thanks.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid"=
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html