expand raid10
am 13.04.2011 06:28:48 von Roberto Spadim
hi guys, today i have 2 disks raid10 far, with 2tb each disk/array
i want to expand it to 4 disks raid10 far, with 2tb each disk and 4tb a=
rray
in other words, i will put more 2 disks of 2tb and i want more 2tb of s=
pace
could i do this with raid10 far layout? i=B4m using ext4 at filesystem
how could i expand ext4 filesystem?
--=20
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: expand raid10
am 13.04.2011 09:15:56 von mathias.buren
On 13 April 2011 05:28, Roberto Spadim wrote:
> hi guys, today i have 2 disks raid10 far, with 2tb each disk/array
> i want to expand it to 4 disks raid10 far, with 2tb each disk and 4tb=
array
> in other words, i will put more 2 disks of 2tb and i want more 2tb of=
space
>
> could i do this with raid10 far layout? i´m using ext4 at filesy=
stem
> how could i expand ext4 filesystem?
>
> --
> Roberto Spadim
> Spadim Technology / SPAEmpresarial
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid"=
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.ht=
ml
>
No, you cannot expand RAID0.
// M
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: expand raid10
am 13.04.2011 12:47:26 von Roberto Spadim
raid10 with other layout i could expand?
2011/4/13 Mathias Bur=E9n :
> On 13 April 2011 05:28, Roberto Spadim wrote:
>> hi guys, today i have 2 disks raid10 far, with 2tb each disk/array
>> i want to expand it to 4 disks raid10 far, with 2tb each disk and 4t=
b array
>> in other words, i will put more 2 disks of 2tb and i want more 2tb o=
f space
>>
>> could i do this with raid10 far layout? i=B4m using ext4 at filesyst=
em
>> how could i expand ext4 filesystem?
>>
>> --
>> Roberto Spadim
>> Spadim Technology / SPAEmpresarial
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid=
" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>>
>
> No, you cannot expand RAID0.
>
> // M
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid"=
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>
--=20
Roberto Spadim
Spadim Technology / SPAEmpresarial
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: expand raid10
am 13.04.2011 13:10:16 von Keld Simonsen
On Wed, Apr 13, 2011 at 07:47:26AM -0300, Roberto Spadim wrote:
> raid10 with other layout i could expand?
My understanding is that you currently cannot expand raid10.
but there are things in the works. Expansion of raid10,far
was not on the list from neil, raid10,near was. But it should be fairly
easy to expand raid10,far. You can just treat one of the copies as your
refence data, and copy that data to the other raid0-like parts of the
array. I wonder if Neil thinks he could leave that as an exersize for
me to implement... I would like to be able to combine it with a
reformat to a more robust layout of raid10,far that in some cases can s=
urvive more
than one disk failure.
best regards
keld
> 2011/4/13 Mathias Bur=E9n :
> > On 13 April 2011 05:28, Roberto Spadim wrot=
e:
> >> hi guys, today i have 2 disks raid10 far, with 2tb each disk/array
> >> i want to expand it to 4 disks raid10 far, with 2tb each disk and =
4tb array
> >> in other words, i will put more 2 disks of 2tb and i want more 2tb=
of space
> >>
> >> could i do this with raid10 far layout? i?m using ext4 at filesyst=
em
> >> how could i expand ext4 filesystem?
> >>
> >> --
> >> Roberto Spadim
> >> Spadim Technology / SPAEmpresarial
> >> --
> >> To unsubscribe from this list: send the line "unsubscribe linux-ra=
id" in
> >> the body of a message to majordomo@vger.kernel.org
> >> More majordomo info at =A0http://vger.kernel.org/majordomo-info.ht=
ml
> >>
> >
> > No, you cannot expand RAID0.
> >
> > // M
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-rai=
d" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at =A0http://vger.kernel.org/majordomo-info.htm=
l
> >
>=20
>=20
>=20
> --=20
> Roberto Spadim
> Spadim Technology / SPAEmpresarial
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid"=
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: expand raid10
am 13.04.2011 13:17:15 von NeilBrown
On Wed, 13 Apr 2011 13:10:16 +0200 Keld J=F8rn Simonsen
m> wrote:
> On Wed, Apr 13, 2011 at 07:47:26AM -0300, Roberto Spadim wrote:
> > raid10 with other layout i could expand?
>=20
> My understanding is that you currently cannot expand raid10.
> but there are things in the works. Expansion of raid10,far
> was not on the list from neil, raid10,near was. But it should be fair=
ly
> easy to expand raid10,far. You can just treat one of the copies as yo=
ur
> refence data, and copy that data to the other raid0-like parts of the
> array. I wonder if Neil thinks he could leave that as an exersize fo=
r
> me to implement... I would like to be able to combine it with a
> reformat to a more robust layout of raid10,far that in some cases can=
survive more
> than one disk failure.
>=20
I'm very happy for anyone to offer to implement anything.
I will of course require the code to be of reasonable quality before I =
accept
it, but I'm also happy to give helpful review comments and guidance.
So don't wait for permission, if you want to try implementing something=
, just
do it.
Equally if there is something that I particularly want done I won't wai=
t for
ever for someone else who says they are working on it. But RAID10 resh=
ape is
a long way from the top of my list.
NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: expand raid10
am 13.04.2011 14:34:14 von Keld Simonsen
On Wed, Apr 13, 2011 at 09:17:15PM +1000, NeilBrown wrote:
> On Wed, 13 Apr 2011 13:10:16 +0200 Keld J=F8rn Simonsen
com> wrote:
>=20
> > On Wed, Apr 13, 2011 at 07:47:26AM -0300, Roberto Spadim wrote:
> > > raid10 with other layout i could expand?
> >=20
> > My understanding is that you currently cannot expand raid10.
> > but there are things in the works. Expansion of raid10,far
> > was not on the list from neil, raid10,near was. But it should be fa=
irly
> > easy to expand raid10,far. You can just treat one of the copies as =
your
> > refence data, and copy that data to the other raid0-like parts of t=
he
> > array. I wonder if Neil thinks he could leave that as an exersize =
for
> > me to implement... I would like to be able to combine it with a
> > reformat to a more robust layout of raid10,far that in some cases c=
an survive more
> > than one disk failure.
> >=20
>=20
> I'm very happy for anyone to offer to implement anything.
>=20
> I will of course require the code to be of reasonable quality before =
I accept
> it, but I'm also happy to give helpful review comments and guidance.
>=20
> So don't wait for permission, if you want to try implementing somethi=
ng, just
> do it.
>=20
> Equally if there is something that I particularly want done I won't w=
ait for
> ever for someone else who says they are working on it. But RAID10 re=
shape is
> a long way from the top of my list.
Hi Neil!
Yes, that is how I understand your policy on contributions.
Do you by RAID10 reshaping also mean RAID10 expansion?
In my eyes this is quite important, and something that I have wanted fo=
r
a long time. I think it is a quite common task for many Linux MD users.
best regards
keld
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: expand raid10
am 13.04.2011 14:34:15 von David Brown
On 13/04/2011 13:17, NeilBrown wrote:
> On Wed, 13 Apr 2011 13:10:16 +0200 Keld J=F8rn Simonsen
om> wrote:
>
>> On Wed, Apr 13, 2011 at 07:47:26AM -0300, Roberto Spadim wrote:
>>> raid10 with other layout i could expand?
>>
>> My understanding is that you currently cannot expand raid10.
>> but there are things in the works. Expansion of raid10,far
>> was not on the list from neil, raid10,near was. But it should be fai=
rly
>> easy to expand raid10,far. You can just treat one of the copies as y=
our
>> refence data, and copy that data to the other raid0-like parts of th=
e
>> array. I wonder if Neil thinks he could leave that as an exersize f=
or
>> me to implement... I would like to be able to combine it with a
>> reformat to a more robust layout of raid10,far that in some cases ca=
n survive more
>> than one disk failure.
>>
>
> I'm very happy for anyone to offer to implement anything.
>
> I will of course require the code to be of reasonable quality before =
I accept
> it, but I'm also happy to give helpful review comments and guidance.
>
> So don't wait for permission, if you want to try implementing somethi=
ng, just
> do it.
>
> Equally if there is something that I particularly want done I won't w=
ait for
> ever for someone else who says they are working on it. But RAID10 re=
shape is
> a long way from the top of my list.
>
I know you have other exciting things on your to-do list - there was=20
lots in your roadmap thread a while back.
But I'd like to put in a word for raid10,far - it is an excellent choic=
e=20
of layout for small or medium systems with a combination of redundancy=20
and near-raid0 speed. It is especially ideal for 2 or 3 disk systems.=20
The only disadvantage is that it can't be resized or re-shaped. The=20
algorithm suggested by Keld sounds simple to implement, but it would=20
leave the disks in a non-redundant state during the resize/reshape.=20
That would be good enough for some uses (and better than nothing), but=20
not good enough for all uses. It may also be scalable to include both=20
resizing (replacing each disk with a bigger one) and adding another dis=
k=20
to the array.
Currently, it /is/ possible to get an approximate raid10,far layout tha=
t=20
is resizeable and reshapeable. You can divide the member disks into tw=
o=20
partitions and pair them off appropriately in mirrors. Then use these=20
mirrors to form a degraded raid5 with "parity-last" layout and a missin=
g=20
last disk - this is, as far as I can see, equivalent to a raid0 layout=20
but can be re-shaped to more disks and resized to use bigger disks.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: expand raid10
am 14.04.2011 01:28:24 von NeilBrown
On Wed, 13 Apr 2011 14:34:14 +0200 Keld J=F8rn Simonsen
m> wrote:
> On Wed, Apr 13, 2011 at 09:17:15PM +1000, NeilBrown wrote:
> > On Wed, 13 Apr 2011 13:10:16 +0200 Keld J=F8rn Simonsen
x.com> wrote:
> >=20
> > > On Wed, Apr 13, 2011 at 07:47:26AM -0300, Roberto Spadim wrote:
> > > > raid10 with other layout i could expand?
> > >=20
> > > My understanding is that you currently cannot expand raid10.
> > > but there are things in the works. Expansion of raid10,far
> > > was not on the list from neil, raid10,near was. But it should be =
fairly
> > > easy to expand raid10,far. You can just treat one of the copies a=
s your
> > > refence data, and copy that data to the other raid0-like parts of=
the
> > > array. I wonder if Neil thinks he could leave that as an exersiz=
e for
> > > me to implement... I would like to be able to combine it with a
> > > reformat to a more robust layout of raid10,far that in some cases=
can survive more
> > > than one disk failure.
> > >=20
> >=20
> > I'm very happy for anyone to offer to implement anything.
> >=20
> > I will of course require the code to be of reasonable quality befor=
e I accept
> > it, but I'm also happy to give helpful review comments and guidance=
> >=20
> > So don't wait for permission, if you want to try implementing somet=
hing, just
> > do it.
> >=20
> > Equally if there is something that I particularly want done I won't=
wait for
> > ever for someone else who says they are working on it. But RAID10 =
reshape is
> > a long way from the top of my list.
>=20
> Hi Neil!
>=20
> Yes, that is how I understand your policy on contributions.
>=20
> Do you by RAID10 reshaping also mean RAID10 expansion?
> In my eyes this is quite important, and something that I have wanted =
for
> a long time. I think it is a quite common task for many Linux MD user=
s.
>=20
> best regards
> keld
Yes, by 'reshaping' I mean everything included here:
http://neil.brown.name/blog/20110216044002#11
which includes size changes.
NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: expand raid10
am 14.04.2011 01:36:57 von NeilBrown
On Wed, 13 Apr 2011 14:34:15 +0200 David Brown =
wrote:
> On 13/04/2011 13:17, NeilBrown wrote:
> > On Wed, 13 Apr 2011 13:10:16 +0200 Keld J=F8rn Simonsen
com> wrote:
> >
> >> On Wed, Apr 13, 2011 at 07:47:26AM -0300, Roberto Spadim wrote:
> >>> raid10 with other layout i could expand?
> >>
> >> My understanding is that you currently cannot expand raid10.
> >> but there are things in the works. Expansion of raid10,far
> >> was not on the list from neil, raid10,near was. But it should be f=
airly
> >> easy to expand raid10,far. You can just treat one of the copies as=
your
> >> refence data, and copy that data to the other raid0-like parts of =
the
> >> array. I wonder if Neil thinks he could leave that as an exersize=
for
> >> me to implement... I would like to be able to combine it with a
> >> reformat to a more robust layout of raid10,far that in some cases =
can survive more
> >> than one disk failure.
> >>
> >
> > I'm very happy for anyone to offer to implement anything.
> >
> > I will of course require the code to be of reasonable quality befor=
e I accept
> > it, but I'm also happy to give helpful review comments and guidance=
> >
> > So don't wait for permission, if you want to try implementing somet=
hing, just
> > do it.
> >
> > Equally if there is something that I particularly want done I won't=
wait for
> > ever for someone else who says they are working on it. But RAID10 =
reshape is
> > a long way from the top of my list.
> >
>=20
> I know you have other exciting things on your to-do list - there was=20
> lots in your roadmap thread a while back.
>=20
> But I'd like to put in a word for raid10,far - it is an excellent cho=
ice=20
> of layout for small or medium systems with a combination of redundanc=
y=20
> and near-raid0 speed. It is especially ideal for 2 or 3 disk systems=
.
> The only disadvantage is that it can't be resized or re-shaped. The=20
> algorithm suggested by Keld sounds simple to implement, but it would=20
> leave the disks in a non-redundant state during the resize/reshape.=20
> That would be good enough for some uses (and better than nothing), bu=
t=20
> not good enough for all uses. It may also be scalable to include bot=
h=20
> resizing (replacing each disk with a bigger one) and adding another d=
isk=20
> to the array.
>=20
> Currently, it /is/ possible to get an approximate raid10,far layout t=
hat=20
> is resizeable and reshapeable. You can divide the member disks into =
two=20
> partitions and pair them off appropriately in mirrors. Then use thes=
e=20
> mirrors to form a degraded raid5 with "parity-last" layout and a miss=
ing=20
> last disk - this is, as far as I can see, equivalent to a raid0 layou=
t=20
> but can be re-shaped to more disks and resized to use bigger disks.
>=20
There is an interesting idea in here....
Currently if the devices in an md/raid array with redundancy (1,4,5,6,1=
0) are
of difference sizes, they are all treated as being the size of the smal=
lest
device.
However this doesn't really make sense for RAID10-far.
=46or RAID10-far, it would make the offset where the second slab of dat=
a
appeared not be 50% of the smallest device (in the far-2 case), but 50%=
of
the current device.
Then replacing all the devices in a RAID10-far with larger devices woul=
d mean
that the size of the array could then be increased with no further data
rearrangement.
A lot of care would be needed to implement this as the assumption that =
all
drives are only as big as the smallest is pretty deep. But it could be=
done
and would be sensible.
That would make point 2 of http://neil.brown.name/blog/20110216044002#1=
1 a
lot simpler.
NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: expand raid10
am 14.04.2011 10:16:43 von David Brown
On 14/04/2011 01:36, NeilBrown wrote:
> On Wed, 13 Apr 2011 14:34:15 +0200 David Brown=
wrote:
>
>> On 13/04/2011 13:17, NeilBrown wrote:
>>> On Wed, 13 Apr 2011 13:10:16 +0200 Keld J=F8rn Simonsen
com> wrote:
>>>
>>>> On Wed, Apr 13, 2011 at 07:47:26AM -0300, Roberto Spadim wrote:
>>>>> raid10 with other layout i could expand?
>>>>
>>>> My understanding is that you currently cannot expand raid10.
>>>> but there are things in the works. Expansion of raid10,far
>>>> was not on the list from neil, raid10,near was. But it should be f=
airly
>>>> easy to expand raid10,far. You can just treat one of the copies as=
your
>>>> refence data, and copy that data to the other raid0-like parts of =
the
>>>> array. I wonder if Neil thinks he could leave that as an exersize=
for
>>>> me to implement... I would like to be able to combine it with a
>>>> reformat to a more robust layout of raid10,far that in some cases =
can survive more
>>>> than one disk failure.
>>>>
>>>
>>> I'm very happy for anyone to offer to implement anything.
>>>
>>> I will of course require the code to be of reasonable quality befor=
e I accept
>>> it, but I'm also happy to give helpful review comments and guidance=
>>>
>>> So don't wait for permission, if you want to try implementing somet=
hing, just
>>> do it.
>>>
>>> Equally if there is something that I particularly want done I won't=
wait for
>>> ever for someone else who says they are working on it. But RAID10 =
reshape is
>>> a long way from the top of my list.
>>>
>>
>> I know you have other exciting things on your to-do list - there was
>> lots in your roadmap thread a while back.
>>
>> But I'd like to put in a word for raid10,far - it is an excellent ch=
oice
>> of layout for small or medium systems with a combination of redundan=
cy
>> and near-raid0 speed. It is especially ideal for 2 or 3 disk system=
s.
>> The only disadvantage is that it can't be resized or re-shaped. The
>> algorithm suggested by Keld sounds simple to implement, but it would
>> leave the disks in a non-redundant state during the resize/reshape.
>> That would be good enough for some uses (and better than nothing), b=
ut
>> not good enough for all uses. It may also be scalable to include bo=
th
>> resizing (replacing each disk with a bigger one) and adding another =
disk
>> to the array.
>>
>> Currently, it /is/ possible to get an approximate raid10,far layout =
that
>> is resizeable and reshapeable. You can divide the member disks into=
two
>> partitions and pair them off appropriately in mirrors. Then use the=
se
>> mirrors to form a degraded raid5 with "parity-last" layout and a mis=
sing
>> last disk - this is, as far as I can see, equivalent to a raid0 layo=
ut
>> but can be re-shaped to more disks and resized to use bigger disks.
>>
>
> There is an interesting idea in here....
>
> Currently if the devices in an md/raid array with redundancy (1,4,5,6=
,10) are
> of difference sizes, they are all treated as being the size of the sm=
allest
> device.
> However this doesn't really make sense for RAID10-far.
>
> For RAID10-far, it would make the offset where the second slab of dat=
a
> appeared not be 50% of the smallest device (in the far-2 case), but 5=
0% of
> the current device.
>
> Then replacing all the devices in a RAID10-far with larger devices wo=
uld mean
> that the size of the array could then be increased with no further da=
ta
> rearrangement.
>
> A lot of care would be needed to implement this as the assumption tha=
t all
> drives are only as big as the smallest is pretty deep. But it could =
be done
> and would be sensible.
>
> That would make point 2 of http://neil.brown.name/blog/20110216044002=
#11 a
> lot simpler.
>
I'd like to share an idea here for a slight change in the metadata, and=
=20
an algorithm that I think can be used for resizing raid10,far. I=20
apologise if I've got my terminology wrong, or if it sounds like I'm=20
teaching my grandmother to suck eggs.
I think you want to make a distinction between the size of the=20
underlying device (disk, partition, lvm device, other md raid), the siz=
e=20
of the components actually used, and the position of the mirror copy in=
=20
raid10.
I see it as perfectly reasonable to assume that the used component size=
=20
is the same for all devices in an array, and that this only changes whe=
n=20
you "grow" the array itself (assuming the underlying devices are=20
bigger). That's the way raid 1, 4, 5, and 6 work, and I think that=20
assumption would help make 10 growable. It is also, AFAIU, the reason=20
normal raid 0 isn't growable - because it doesn't have that restriction=
.
(Maybe raid0 can be made growable for cases where the component sizes=
=20
are the same?)
To make raid10, far resizeable, I think the key is that instead of=20
"position of second copy" being fixed at 50% of the array component=20
size, or 50% of the underlying device size, it should be variable. In=20
fact, not only should it be variable - it should consist of two (start,=
=20
length) pairs.
The issue here is that to do a safe grow after resizing the underlying=20
device (this being the most awkward case), the mirror copy has to be=20
moved rather than deleted and re-written - otherwise you lose your=20
redundancy. But if you keep track of two valid regions, it becomes=20
easier. In the most common case, growing the disk, you would start at=20
the end. Copy a block from the end of the component part of the mirror=
=20
to the appropriate place near the end of the new underlying device.=20
Update the second (start, length) pair to include this block, and the=20
first (start, length) pair to remove it. Repeat the process until you=20
have copied over everything valid and then have a device with a first=20
data block, then some unused space, then a mirror block, then some=20
unused space. Once every underlying device is in this shape, then a=20
"grow" is just a straight sync of the unused space (or you just mark it=
=20
in the non-sync bitmap).
Let me try to put it into a picture. I'll label all the real data=20
blocks by letters, and use "." for unused data blocks. Small letters=20
and big letters represent the same data in two copies. "*" is for=20
non-sync bitmap data, or data that must be synced normally (if the=20
non-sync bitmap functionality is not yet implemented).
The list of numbers after the disks is:
Size of underlying disk, size of component, (start, length), (start, le=
ngth)
We start with a raid10,far layout:
1: acegikBDFHJL 12, 6, (6, 6), (0, 0)
2: bdfhjlACEGIK 12, 6, (6, 6), (0, 0)
Then we assume disk 2 is grown (either it is an LVM partition, a raid=20
that is grown, or whatever). Thus we have:
1: acegikBDFHJL 12, 6, (6, 6), (0, 0)
2: bdfhjlACEGIK...... 18, 6, (6, 6), (0, 0)
Rebalancing disk 2 (which may be done as its own operation, or=20
automatically during a "grow" of the whole array - assuming each=20
component disk has enough space) goes through steps like this:
2: bdfhjlACEGIK...... 18, 6, (6, 6), (0, 0)
2: bdfhjlACEGIK.IK... 18, 6, (6, 6), (13, 2)
2: bdfhjlACEG...IK... 18, 6, (6, 4), (13, 2)
2: bdfhjlACEG.EGIK... 18, 6, (6, 4), (11, 4)
2: bdfhjlAC...EGIK... 18, 6, (6, 2), (11, 4)
2: bdfhjlAC.ACEGIK... 18, 6, (6, 2), (9, 6)
2: bdfhjl...ACEGIK... 18, 6, (6, 0), (9, 6)
2: bdfhjl...ACEGIK... 18, 6, (9, 6), (0, 0)
With the pair now being:
1: acegikBDFHJL 12, 6, (6, 6), (0, 0)
2: bdfhjl...ACEGIK... 18, 6, (9, 6), (0, 0)
After a similar process with disk 1 we have:
1: acegik...BDFHJL... 18, 6, (9, 6), (0, 0)
2: bdfhjl...ACEGIK... 18, 6, (9, 6), (0, 0)
"Grow" gives you:
1: acegik***BDFHJL*** 18, 9, (9, 9), (0, 0)
2: bdfhjl***ACEGIK*** 18, 9, (9, 9), (0, 0)
A similar sort of sequence is easy to imagine for shrinking partitions.=
=20
And when replacing a disk with a new one, this re-shape could easily=20
be combined with a hot-replace copy.
As far as I can see, this setup with the extra metadata will hold=20
everything consistent, safe and redundant during the whole operation.
mvh.,
David
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: expand raid10
am 15.04.2011 18:52:03 von Keld Simonsen
On Thu, Apr 14, 2011 at 09:36:57AM +1000, NeilBrown wrote:
> On Wed, 13 Apr 2011 14:34:15 +0200 David Brown
> wrote:
>=20
> > On 13/04/2011 13:17, NeilBrown wrote:
> > > On Wed, 13 Apr 2011 13:10:16 +0200 Keld J=F8rn Simonsen
ix.com> wrote:
> > >
> > >> On Wed, Apr 13, 2011 at 07:47:26AM -0300, Roberto Spadim wrote:
> > >>> raid10 with other layout i could expand?
> > >>
> > >> My understanding is that you currently cannot expand raid10.
> > >> but there are things in the works. Expansion of raid10,far
> > >> was not on the list from neil, raid10,near was. But it should be=
fairly
> > >> easy to expand raid10,far. You can just treat one of the copies =
as your
> > >> refence data, and copy that data to the other raid0-like parts o=
f the
> > >> array. I wonder if Neil thinks he could leave that as an exersi=
ze for
> > >> me to implement... I would like to be able to combine it with a
> > >> reformat to a more robust layout of raid10,far that in some case=
s can survive more
> > >> than one disk failure.
> > >>
> > >
> > > I'm very happy for anyone to offer to implement anything.
> > >
> > > I will of course require the code to be of reasonable quality bef=
ore I accept
> > > it, but I'm also happy to give helpful review comments and guidan=
ce.
> > >
> > > So don't wait for permission, if you want to try implementing som=
ething, just
> > > do it.
> > >
> > > Equally if there is something that I particularly want done I won=
't wait for
> > > ever for someone else who says they are working on it. But RAID1=
0 reshape is
> > > a long way from the top of my list.
> > >
> >=20
> > I know you have other exciting things on your to-do list - there wa=
s=20
> > lots in your roadmap thread a while back.
> >=20
> > But I'd like to put in a word for raid10,far - it is an excellent c=
hoice=20
> > of layout for small or medium systems with a combination of redunda=
ncy=20
> > and near-raid0 speed. It is especially ideal for 2 or 3 disk syste=
ms.=20
> > The only disadvantage is that it can't be resized or re-shaped. Th=
e=20
> > algorithm suggested by Keld sounds simple to implement, but it woul=
d=20
> > leave the disks in a non-redundant state during the resize/reshape.=
=20
> > That would be good enough for some uses (and better than nothing), =
but=20
> > not good enough for all uses. It may also be scalable to include b=
oth=20
> > resizing (replacing each disk with a bigger one) and adding another=
disk=20
> > to the array.
> >=20
> > Currently, it /is/ possible to get an approximate raid10,far layout=
that=20
> > is resizeable and reshapeable. You can divide the member disks int=
o two=20
> > partitions and pair them off appropriately in mirrors. Then use th=
ese=20
> > mirrors to form a degraded raid5 with "parity-last" layout and a mi=
ssing=20
> > last disk - this is, as far as I can see, equivalent to a raid0 lay=
out=20
> > but can be re-shaped to more disks and resized to use bigger disks.
> >=20
>=20
> There is an interesting idea in here....
>=20
> Currently if the devices in an md/raid array with redundancy (1,4,5,6=
,10) are
> of difference sizes, they are all treated as being the size of the sm=
allest
> device.
> However this doesn't really make sense for RAID10-far.
>=20
> For RAID10-far, it would make the offset where the second slab of dat=
a
> appeared not be 50% of the smallest device (in the far-2 case), but 5=
0% of
> the current device.
>=20
> Then replacing all the devices in a RAID10-far with larger devices wo=
uld mean
> that the size of the array could then be increased with no further da=
ta
> rearrangement.
>=20
> A lot of care would be needed to implement this as the assumption tha=
t all
> drives are only as big as the smallest is pretty deep. But it could =
be done
> and would be sensible.
>=20
> That would make point 2 of http://neil.brown.name/blog/20110216044002=
#11 a
> lot simpler.
Hmm, I am not sure I understand. Eg for the simple case of growing a 2
disk raid10-far to a 3 disk or 4 disk, how would that be done? I think
you need to rewrite the whole array. But I think you also need to do
that when growing most of the other array types.
Quoting point 2 of http://neil.brown.name/blog/20110216044002#11:
> 2/ Device size of 'far' arrays cannot be changed easily. Increasing
> device size of 'far' would require re-laying out a lot of data. We wo=
uld
> need to record the 'old' and 'new' sizes which metadata doesn't
> currently allow. If we spent 8 bytes on this we could possibly manage=
a
> 'reverse reshape' style conversion here.
>=20
> EDIT: if we stored data on drives a little differently this could be =
a
> lot easier. Instead of starting the second slab of data at the same
> location on all devices, we start it an appropriate fraction into the
> size of 'this' device, then replacing all devices in a raid10-far wit=
h
> larger drives would be very effective. However just increasing the si=
ze
> of the device (e.g. using LVM) would not work very well=20
I am not sure I understand the problem here. Are you saying that there
is no room in the metadata to hold info on the reshaping while it is
processed?=20
=46or a simple grow with more partitions of the same size I see problem=
s=20
in just keeping the old data. I think that would damage the striping
performance.
And I don't understand what is meant with "we start it an appropriate
fraction" - what fraction would that be? Eg growing from 2 to 3 disks?
If you want integrity of the data, understood as always having the
required number of copies available, then you could copy from the end o=
f
the half array and then have a pointer that tells whereto the process
is completed. There may be some initial problems with consistency, but
maybe there is some recovery areas in the new array data that could be
used for bootstrapping the process - once you are over an initial size,
you are not overwriting old data.
Best regards
keld
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: expand raid10
am 18.04.2011 02:46:15 von NeilBrown
On Fri, 15 Apr 2011 18:52:03 +0200 Keld J=F8rn Simonsen
m> wrote:
> On Thu, Apr 14, 2011 at 09:36:57AM +1000, NeilBrown wrote:
> > On Wed, 13 Apr 2011 14:34:15 +0200 David Brown
om> wrote:
> >=20
> > > On 13/04/2011 13:17, NeilBrown wrote:
> > > > On Wed, 13 Apr 2011 13:10:16 +0200 Keld J=F8rn Simonsen
ldix.com> wrote:
> > > >
> > > >> On Wed, Apr 13, 2011 at 07:47:26AM -0300, Roberto Spadim wrote=
:
> > > >>> raid10 with other layout i could expand?
> > > >>
> > > >> My understanding is that you currently cannot expand raid10.
> > > >> but there are things in the works. Expansion of raid10,far
> > > >> was not on the list from neil, raid10,near was. But it should =
be fairly
> > > >> easy to expand raid10,far. You can just treat one of the copie=
s as your
> > > >> refence data, and copy that data to the other raid0-like parts=
of the
> > > >> array. I wonder if Neil thinks he could leave that as an exer=
size for
> > > >> me to implement... I would like to be able to combine it with=
a
> > > >> reformat to a more robust layout of raid10,far that in some ca=
ses can survive more
> > > >> than one disk failure.
> > > >>
> > > >
> > > > I'm very happy for anyone to offer to implement anything.
> > > >
> > > > I will of course require the code to be of reasonable quality b=
efore I accept
> > > > it, but I'm also happy to give helpful review comments and guid=
ance.
> > > >
> > > > So don't wait for permission, if you want to try implementing s=
omething, just
> > > > do it.
> > > >
> > > > Equally if there is something that I particularly want done I w=
on't wait for
> > > > ever for someone else who says they are working on it. But RAI=
D10 reshape is
> > > > a long way from the top of my list.
> > > >
> > >=20
> > > I know you have other exciting things on your to-do list - there =
was=20
> > > lots in your roadmap thread a while back.
> > >=20
> > > But I'd like to put in a word for raid10,far - it is an excellent=
choice=20
> > > of layout for small or medium systems with a combination of redun=
dancy=20
> > > and near-raid0 speed. It is especially ideal for 2 or 3 disk sys=
tems.=20
> > > The only disadvantage is that it can't be resized or re-shaped. =
The=20
> > > algorithm suggested by Keld sounds simple to implement, but it wo=
uld=20
> > > leave the disks in a non-redundant state during the resize/reshap=
e.=20
> > > That would be good enough for some uses (and better than nothing)=
, but=20
> > > not good enough for all uses. It may also be scalable to include=
both=20
> > > resizing (replacing each disk with a bigger one) and adding anoth=
er disk=20
> > > to the array.
> > >=20
> > > Currently, it /is/ possible to get an approximate raid10,far layo=
ut that=20
> > > is resizeable and reshapeable. You can divide the member disks i=
nto two=20
> > > partitions and pair them off appropriately in mirrors. Then use =
these=20
> > > mirrors to form a degraded raid5 with "parity-last" layout and a =
missing=20
> > > last disk - this is, as far as I can see, equivalent to a raid0 l=
ayout=20
> > > but can be re-shaped to more disks and resized to use bigger disk=
s.
> > >=20
> >=20
> > There is an interesting idea in here....
> >=20
> > Currently if the devices in an md/raid array with redundancy (1,4,5=
,6,10) are
> > of difference sizes, they are all treated as being the size of the =
smallest
> > device.
> > However this doesn't really make sense for RAID10-far.
> >=20
> > For RAID10-far, it would make the offset where the second slab of d=
ata
> > appeared not be 50% of the smallest device (in the far-2 case), but=
50% of
> > the current device.
> >=20
> > Then replacing all the devices in a RAID10-far with larger devices =
would mean
> > that the size of the array could then be increased with no further =
data
> > rearrangement.
> >=20
> > A lot of care would be needed to implement this as the assumption t=
hat all
> > drives are only as big as the smallest is pretty deep. But it coul=
d be done
> > and would be sensible.
> >=20
> > That would make point 2 of http://neil.brown.name/blog/201102160440=
02#11 a
> > lot simpler.
>=20
> Hmm, I am not sure I understand. Eg for the simple case of growing a =
2
> disk raid10-far to a 3 disk or 4 disk, how would that be done? I thin=
k
> you need to rewrite the whole array. But I think you also need to do
> that when growing most of the other array types.
>=20
> Quoting point 2 of http://neil.brown.name/blog/20110216044002#11:
>=20
> > 2/ Device size of 'far' arrays cannot be changed easily. Increasing
> > device size of 'far' would require re-laying out a lot of data. We =
would
> > need to record the 'old' and 'new' sizes which metadata doesn't
> > currently allow. If we spent 8 bytes on this we could possibly mana=
ge a
> > 'reverse reshape' style conversion here.
> >=20
> > EDIT: if we stored data on drives a little differently this could b=
e a
> > lot easier. Instead of starting the second slab of data at the same
> > location on all devices, we start it an appropriate fraction into t=
he
> > size of 'this' device, then replacing all devices in a raid10-far w=
ith
> > larger drives would be very effective. However just increasing the =
size
> > of the device (e.g. using LVM) would not work very well=20
>=20
> I am not sure I understand the problem here. Are you saying that ther=
e
> is no room in the metadata to hold info on the reshaping while it is
> processed?=20
No, though adding stuff to the metadata shouldn't be done lightly.
I'm saying that if we layout that RAID10-far data on the device a littl=
e bit
differently, then making a RAID10-far make full use of the devices afte=
r
replacing all the devices becomes very easy.
>=20
> For a simple grow with more partitions of the same size I see problem=
s=20
> in just keeping the old data. I think that would damage the striping
> performance.
The preceding is about increasing the size of individual drives. That =
is
quite different to adding more drives of the same size.
When you add more drives you certainly have to re-layout all the stripe=
s.
This isn't conceptually difficult - just a lot of reads and writes and =
some
care in writing the code to make it safe and efficient.
>=20
> And I don't understand what is meant with "we start it an appropriate
> fraction" - what fraction would that be? Eg growing from 2 to 3 disks=
?
It doesn't apply to that case. It only applies to growing the size of
individual disks. For far2, the fraction would be 1/2. For far3 it wo=
uld be
1/3.
>=20
> If you want integrity of the data, understood as always having the
> required number of copies available, then you could copy from the end=
of
> the half array and then have a pointer that tells whereto the proces=
s
> is completed. There may be some initial problems with consistency, bu=
t
> maybe there is some recovery areas in the new array data that could b=
e
> used for bootstrapping the process - once you are over an initial siz=
e,
> you are not overwriting old data.
Yes. The 'pointer' would be the 'reshape_position' value in the metada=
ta.
Data before this has been relocated. Data after this has not... At le=
ast
that is how RAID5 works. For RAID10 we might want slightly different r=
anges.
NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html