the question about raid0_make_request

the question about raid0_make_request

am 19.06.2006 03:21:23 von Liu Yang

When I read the code of raid0_make_request,I meet some questions.

1\ block = bio->bi_sector >> 1,it's the device offset in kilotytes.
so why do we use block substract zone->zone_offset? The
zone->zone_offset is the zone offset relative the mddev in sectors.

2\ the codes below:
x = block >> chunksize_bits;
tmp_dev = zone->dev[sector_div(x, zone->nb_dev)];
actually, we get the underlying device by 'sector_div(x,
zone->nb_dev)'.The var x is the chunk nr relative to the start of the
mddev in my opinion.But not all of the zone->nb_dev is the same, so we
cann't get the right rdev by 'sector_div(x, zone->nb_dev)', I think.

Why?Could you explain them to me?
Thanks!
Regards.

YangLiu
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: the question about raid0_make_request

am 19.06.2006 03:38:13 von NeilBrown

On Monday June 19, liudows2@gmail.com wrote:
> When I read the code of raid0_make_request,I meet some questions.
>
> 1\ block = bio->bi_sector >> 1,it's the device offset in kilotytes.
> so why do we use block substract zone->zone_offset? The
> zone->zone_offset is the zone offset relative the mddev in sectors.

zone_offset is set to 'curr_zone_offset' in create_strip_zones,
curr_zone_offset is a sum of 'zone->size' values.
zone->size is (typically) calculated by
(smallest->size - current_offset) *c
'smallest' is an rdev.
So the unit of 'zone_offset' are ultimately the same units as that of
rdev->size.
rdev->size is set in md.c is set e.g. from
calc_dev_size(rdev, sb->chunk_size);
which uses the value from calc_dev_sboffset which shifts the size in
bytes by BLOCK_SIZE_BITS which is defined in fs.h to be 10.
So the units of zone_offset is in kilobytes, not sectors.

>
> 2\ the codes below:
> x = block >> chunksize_bits;
> tmp_dev = zone->dev[sector_div(x, zone->nb_dev)];
> actually, we get the underlying device by 'sector_div(x,
> zone->nb_dev)'.The var x is the chunk nr relative to the start of the
> mddev in my opinion.But not all of the zone->nb_dev is the same, so we
> cann't get the right rdev by 'sector_div(x, zone->nb_dev)', I think.

x is the chunk number relative to the start of the current zone, not
the start of the mddev:
sector_t x = (block - zone->zone_offset) >> chunksize_bits;

taking the remainder after dividing this by the number of devices in
the current zone gives the number of the device to use.

Hope that helps.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: the question about raid0_make_request

am 19.06.2006 09:29:30 von Liu Yang

Thanks,Neil.
I noticed that the whole codes of calculating the underlying device is below
{
sector_t x = (block - zone->zone_offset) >> chunksize_bits;

sector_div(x, zone->nb_dev);
chunk = x;
BUG_ON(x != (sector_t)chunk);

x = block >> chunksize_bits;
tmp_dev = zone->dev[sector_div(x, zone->nb_dev)];
}
rsect = (((chunk << chunksize_bits) + zone->dev_offset)<<1)
+ sect_in_chunk;

so we first set the var x to the chunk number relative to the start of
the current zone.But after that we execute 'x = block >>
chunksize_bits' which will set x to the chunk nr relative to the start
of the mddev,I think. Right?
I am confused.

Thanks a lot!
Regards.

YangLiu



2006/6/19, Neil Brown :
> On Monday June 19, liudows2@gmail.com wrote:
> > When I read the code of raid0_make_request,I meet some questions.
> >
> > 1\ block = bio->bi_sector >> 1,it's the device offset in kilotytes.
> > so why do we use block substract zone->zone_offset? The
> > zone->zone_offset is the zone offset relative the mddev in sectors.
>
> zone_offset is set to 'curr_zone_offset' in create_strip_zones,
> curr_zone_offset is a sum of 'zone->size' values.
> zone->size is (typically) calculated by
> (smallest->size - current_offset) *c
> 'smallest' is an rdev.
> So the unit of 'zone_offset' are ultimately the same units as that of
> rdev->size.
> rdev->size is set in md.c is set e.g. from
> calc_dev_size(rdev, sb->chunk_size);
> which uses the value from calc_dev_sboffset which shifts the size in
> bytes by BLOCK_SIZE_BITS which is defined in fs.h to be 10.
> So the units of zone_offset is in kilobytes, not sectors.
>
> >
> > 2\ the codes below:
> > x = block >> chunksize_bits;
> > tmp_dev = zone->dev[sector_div(x, zone->nb_dev)];
> > actually, we get the underlying device by 'sector_div(x,
> > zone->nb_dev)'.The var x is the chunk nr relative to the start of the
> > mddev in my opinion.But not all of the zone->nb_dev is the same, so we
> > cann't get the right rdev by 'sector_div(x, zone->nb_dev)', I think.
>
> x is the chunk number relative to the start of the current zone, not
> the start of the mddev:
> sector_t x = (block - zone->zone_offset) >> chunksize_bits;
>
> taking the remainder after dividing this by the number of devices in
> the current zone gives the number of the device to use.
>
> Hope that helps.
>
> NeilBrown
>
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: the question about raid0_make_request

am 19.06.2006 11:33:42 von NeilBrown

On Monday June 19, liudows2@gmail.com wrote:
> Thanks,Neil.
> I noticed that the whole codes of calculating the underlying device is below
> {
> sector_t x = (block - zone->zone_offset) >> chunksize_bits;
>
> sector_div(x, zone->nb_dev);
> chunk = x;
> BUG_ON(x != (sector_t)chunk);
>
> x = block >> chunksize_bits;
> tmp_dev = zone->dev[sector_div(x, zone->nb_dev)];
> }
> rsect = (((chunk << chunksize_bits) + zone->dev_offset)<<1)
> + sect_in_chunk;
>
> so we first set the var x to the chunk number relative to the start of
> the current zone.But after that we execute 'x = block >>
> chunksize_bits' which will set x to the chunk nr relative to the start
> of the mddev,I think. Right?
> I am confused.

Ahhhh yes, now I remember.

Yes, you are right, this code is 'wrong' - but it is 'definitively
wrong' if that means anything ....

The line
> x = block >> chunksize_bits;
really 'should' be
> x = (block - zone->zone_offset) >> chunksize_bits;

but it isn't. That bug has been there 'forever'. You can see it in
2.0.40 at
http://lxr.linux.no/source/drivers/block/raid0.c?v=2.0.40#L2 01

The effect of that 'bug' is that in zones after the first, all the
blocks are shifted some number of devices to the right compared with
where you would expect them to be.
e.g. the first block in new zone might not be on the first device in
the zone, but might be on the third, or whatever.
However the offset is consistent. Whether you read or write data,
you will find the block at the same offset. So the array works
perfectly, except not exactly how you would expect.

Obviously we could not 'fix' this 'bug', because then anyone who has a
raid0 will multiple zones would find their array gets corrupted when
they upgrade.

If we ever wanted to support some raid0 array that was also accessed
by other software (e.g. a DDF array), we would need to make sure that
the 'bug' was fixed in that usage.

So you are right, the code is confusing, but it does provide a
reliable raid0 array.

I hope that helps.

I guess we should really put a comment in there explaining this....

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: the question about raid0_make_request

am 20.06.2006 00:23:29 von NeilBrown

On Monday June 19, liudows2@gmail.com wrote:
> We can imagine that there is a raid0 array whose layerout is drawn in the
> attachment.
> Take this for example.
> There are 3 zones totally, and their zone->nb_dev is 5,4,3 respectively.
> In the raid0_make_request function, the var block is the offset of bio in
> kilobytes.
>
>
> x = block >> chunksize_bits;
> tmp_dev = zone->dev[sector_div(x, zone->nb_dev)];
>
> If block is in the chunk 5, then x = block >> chunksize_bits = 5.And the
> nb_dev of zone2 is 4.
> So tmp_dev = zone->dev[sector_div(5,4)] = zone->dev[1].
> But we can see that the right result should be zone->dev[0].
> Then how does the 'bug' get the right underlying device?

When you say 'right' result, you really mean 'expected' result.

You expect the layout to be

0 1 2 3 4
5 6 7 8
9 10 11

The actual layout for Linux-md-raid0 is

0 1 2 3 4
8 5 6 7
9 10 11

Not what you would expect, but still a valid layout.

NeilBrown
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: the question about raid0_make_request

am 20.06.2006 15:36:10 von Liu Yang

I understand what you mean now.
Valid layerout is not always the 'right' layerout.
Thanks for you help.

Regards.

YangLiu
-
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html