[PATCH 0/3] Continue expansion after reboot

[PATCH 0/3] Continue expansion after reboot

am 22.02.2011 15:13:15 von adam.kwolek

Currently reshaped/expanded array is assembled but it stays in inactive state.
This patches allows for array assembly when array is under expansion.
Array with reshape/expansion information in metadata is assembled
and reshape process continues automatically.

Next step:
Problem is how to address container operation during assembly.
1. After first array being reshaped, assebly process looks if mdmon
sets migration for other array in container. If yes it continues work
for next array.

2. Assembly process performs reshape of currently reshaped array only.
Mdmon sets next array for reshape and user triggers manually mdadm
to finish container operation with just the same parameters set.

Reshape finish can be executed for container operation by container re-assembly
also (this works in current code).


BR
Adam




---

Adam Kwolek (3):
FIX: Assemble device in reshape state with new disks number
imsm: FIX: Report correct array size during reshape
imsm: FIX: initalize reshape progress as it is stored in metatdata


Assemble.c | 2 ++
super-intel.c | 24 +++++++++++++++++++++++-
2 files changed, 25 insertions(+), 1 deletions(-)

--
Signature
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

[PATCH 1/3] imsm: FIX: initalize reshape progress as it is stored in

am 22.02.2011 15:13:23 von adam.kwolek

reshape prodess cannot be restarted due to no checkpoint information
in mdinfo.
When metadata is read during reshape process or reshape restart,
rehape_progress (mdinfo field) has to be initialized to value
stored in metadata. This allows start reshape from stored
in metadata checkpoint.

Signed-off-by: Adam Kwolek
---

super-intel.c | 11 ++++++++++-
1 files changed, 10 insertions(+), 1 deletions(-)

diff --git a/super-intel.c b/super-intel.c
index e8d2c6b..1b3407a 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -1784,6 +1784,7 @@ static void getinfo_super_imsm_volume(struct supertype *st, struct mdinfo *info,
else
info->delta_disks = 0;

+ info->reshape_progress = 0;
if (map_to_analyse->map_state == IMSM_T_STATE_UNINITIALIZED ||
dev->vol.dirty) {
info->resync_start = 0;
@@ -1797,6 +1798,15 @@ static void getinfo_super_imsm_volume(struct supertype *st, struct mdinfo *info,
info->resync_start = blocks_per_unit * units;
break;
}
+ case MIGR_GEN_MIGR: {
+ __u64 blocks_per_unit = blocks_per_migr_unit(dev);
+ __u64 units = __le32_to_cpu(dev->vol.curr_migr_unit);
+
+ info->reshape_progress = blocks_per_unit * units;
+ dprintf("IMSM: General Migration checkpoint : %llu "
+ "(%llu) -> read reshape progress : %llu\n",
+ units, blocks_per_unit, info->reshape_progress);
+ }
case MIGR_VERIFY:
/* we could emulate the checkpointing of
* 'sync_action=check' migrations, but for now
@@ -1804,7 +1814,6 @@ static void getinfo_super_imsm_volume(struct supertype *st, struct mdinfo *info,
*/
case MIGR_REBUILD:
/* this is handled by container_content_imsm() */
- case MIGR_GEN_MIGR:
case MIGR_STATE_CHANGE:
/* FIXME handle other migrations */
default:

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

[PATCH 2/3] imsm: FIX: Report correct array size during reshape

am 22.02.2011 15:13:31 von adam.kwolek

When reshape is started imsm stores new size in metadata.
mdadm requires "old" size to proper initialization restarted array.

When reshape is in progress getinfo_super_imsm_volume() should report
computed array size value instead array size stored in metatda.

Signed-off-by: Adam Kwolek
---

super-intel.c | 13 +++++++++++++
1 files changed, 13 insertions(+), 0 deletions(-)

diff --git a/super-intel.c b/super-intel.c
index 1b3407a..c101dca 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -1806,6 +1806,19 @@ static void getinfo_super_imsm_volume(struct supertype *st, struct mdinfo *info,
dprintf("IMSM: General Migration checkpoint : %llu "
"(%llu) -> read reshape progress : %llu\n",
units, blocks_per_unit, info->reshape_progress);
+ unsigned long long array_blocks;
+ int used_disks;
+
+ used_disks = imsm_num_data_members(dev, 1);
+ if (used_disks > 0) {
+ array_blocks = map->blocks_per_member *
+ used_disks;
+ /* round array size down to closest MB
+ */
+ info->custom_array_size = (array_blocks
+ >> SECT_PER_MB_SHIFT)
+ << SECT_PER_MB_SHIFT;
+ }
}
case MIGR_VERIFY:
/* we could emulate the checkpointing of

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

[PATCH 3/3] FIX: Assemble device in reshape state with new disks

am 22.02.2011 15:13:39 von adam.kwolek

When expansion was interrupted, assembly should include all disks
in to new array. To do this delta_disks has to be added
to raid disks number.

When metadata reports all disks in content->array.raid_disks
then delta_disks should be equal to 0 (it is included in raid_disks already).

Signed-off-by: Adam Kwolek
---

Assemble.c | 2 ++
1 files changed, 2 insertions(+), 0 deletions(-)

diff --git a/Assemble.c b/Assemble.c
index 317be8b..8d89dbe 100644
--- a/Assemble.c
+++ b/Assemble.c
@@ -1508,6 +1508,8 @@ int assemble_container_content(struct supertype *st, int mdfd,

sysfs_init(content, mdfd, 0);

+ if (content->reshape_active)
+ content->array.raid_disks += content->delta_disks;
sra = sysfs_read(mdfd, 0, GET_VERSION);
if (sra == NULL || strcmp(sra->text_version, content->text_version) != 0)
if (sysfs_set_array(content, md_get_version(mdfd)) != 0) {

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] Continue expansion after reboot

am 23.02.2011 04:37:56 von NeilBrown

On Tue, 22 Feb 2011 15:13:15 +0100 Adam Kwolek wrote:

> Currently reshaped/expanded array is assembled but it stays in inactive state.
> This patches allows for array assembly when array is under expansion.
> Array with reshape/expansion information in metadata is assembled
> and reshape process continues automatically.
>
> Next step:
> Problem is how to address container operation during assembly.
> 1. After first array being reshaped, assebly process looks if mdmon
> sets migration for other array in container. If yes it continues work
> for next array.
>
> 2. Assembly process performs reshape of currently reshaped array only.
> Mdmon sets next array for reshape and user triggers manually mdadm
> to finish container operation with just the same parameters set.
>
> Reshape finish can be executed for container operation by container re-assembly
> also (this works in current code).
>

Yes, this is an awkward problem.

Just to be sure we are thinking about the same thing:
When restarting an array in which migration is already underway mdadm simply
forks and continues monitoring that migration.
However if it is an array-wide migration, then when the migration of the
first array completes, mdmon will update the metadata on the second array,
but it isn't clear how mdadm can be told to start monitoring that array.

How about this:
the imsm metadata handler should report that an array is 'undergoing
migration if it is, or if an earlier array in the container is undergoing a
migration which will cause 'this' array to subsequently be migrated too.

So if the first array is in the middle of a 4drive->5drive conversion and
the second array is simply at '4 drives', then imsm reported (to
container_content) that the second drive is actually undergoing a migration
from 4 to 5 drives, and is at the very beginning.

When mdadm assembles that second array it will fork a child to monitor it.
It will need to somehow wait for mdmon to really update the metadata before
it starts. This can probably be handled in the ->manage_reshape function.

Something along those line would be the right way to go I think. It avoid
any races between arrays being assembled at different times.


> Adam Kwolek (3):
> FIX: Assemble device in reshape state with new disks number

I don't think this patch is correct. We need to configure the array with the
'old' number of devices first, then 'reshape_array' will also set the 'new'
number of devices.
What exactly what the problem you were trying to fix?


> imsm: FIX: Report correct array size during reshape
> imsm: FIX: initalize reshape progress as it is stored in metatdata
>
These both look good - I have applied them. Thanks.

NeilBrown


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

RE: [PATCH 0/3] Continue expansion after reboot

am 23.02.2011 10:02:25 von adam.kwolek

> -----Original Message-----
> From: NeilBrown [mailto:neilb@suse.de]
> Sent: Wednesday, February 23, 2011 4:38 AM
> To: Kwolek, Adam
> Cc: linux-raid@vger.kernel.org; Williams, Dan J; Ciechanowski, Ed;
> Neubauer, Wojciech
> Subject: Re: [PATCH 0/3] Continue expansion after reboot
>
> On Tue, 22 Feb 2011 15:13:15 +0100 Adam Kwolek
> wrote:
>
> > Currently reshaped/expanded array is assembled but it stays in
> inactive state.
> > This patches allows for array assembly when array is under expansion.
> > Array with reshape/expansion information in metadata is assembled
> > and reshape process continues automatically.
> >
> > Next step:
> > Problem is how to address container operation during assembly.
> > 1. After first array being reshaped, assebly process looks if mdmon
> > sets migration for other array in container. If yes it continues
> work
> > for next array.
> >
> > 2. Assembly process performs reshape of currently reshaped array only.
> > Mdmon sets next array for reshape and user triggers manually mdadm
> > to finish container operation with just the same parameters set.
> >
> > Reshape finish can be executed for container operation by container
> re-assembly
> > also (this works in current code).
> >
>
> Yes, this is an awkward problem.
>
> Just to be sure we are thinking about the same thing:
> When restarting an array in which migration is already underway mdadm
> simply
> forks and continues monitoring that migration.
> However if it is an array-wide migration, then when the migration of
> the
> first array completes, mdmon will update the metadata on the second
> array,
> but it isn't clear how mdadm can be told to start monitoring that
> array.
>
> How about this:
> the imsm metadata handler should report that an array is 'undergoing
> migration if it is, or if an earlier array in the container is
> undergoing a
> migration which will cause 'this' array to subsequently be migrated
> too.
>
> So if the first array is in the middle of a 4drive->5drive conversion
> and
> the second array is simply at '4 drives', then imsm reported (to
> container_content) that the second drive is actually undergoing a
> migration
> from 4 to 5 drives, and is at the very beginning.
>
> When mdadm assembles that second array it will fork a child to monitor
> it.
> It will need to somehow wait for mdmon to really update the metadata
> before
> it starts. This can probably be handled in the ->manage_reshape
> function.
>
> Something along those line would be the right way to go I think. It
> avoid
> any races between arrays being assembled at different times.


This looks fine for me.

>
>
> > Adam Kwolek (3):
> > FIX: Assemble device in reshape state with new disks number
>
> I don't think this patch is correct. We need to configure the array
> with the
> 'old' number of devices first, then 'reshape_array' will also set the
> 'new'
> number of devices.
> What exactly what the problem you were trying to fix?

When array is being assembled with old raid disk number assembly cannot set readOnly array state
(error on sysfs state writing). Array stays in inactive state, so nothing (reshape) happened later.

I think that array cannot be assembled with old disks number (added new disks are present as spares)
because begin of array uses new disks already. This means we are assembling array with not complete disk set.
Stripes on begin can be corrupted (not all disks present in array). At this point inactive array state is ok to keep safe user data.


I'll test is setting old disk number and later configuration change in disks number and array state resolves problem.
I'll let you know results.

BR
Adam

>
>
> > imsm: FIX: Report correct array size during reshape
> > imsm: FIX: initalize reshape progress as it is stored in
> metatdata
> >
> These both look good - I have applied them. Thanks.
>
> NeilBrown
>

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

RE: [PATCH 0/3] Continue expansion after reboot

am 25.02.2011 16:55:01 von adam.kwolek

> -----Original Message-----
> From: Kwolek, Adam
> Sent: Wednesday, February 23, 2011 10:02 AM
> To: 'NeilBrown'
> Cc: linux-raid@vger.kernel.org; Williams, Dan J; Ciechanowski, Ed;
> Neubauer, Wojciech
> Subject: RE: [PATCH 0/3] Continue expansion after reboot
>
>
>
> > -----Original Message-----
> > From: NeilBrown [mailto:neilb@suse.de]
> > Sent: Wednesday, February 23, 2011 4:38 AM
> > To: Kwolek, Adam
> > Cc: linux-raid@vger.kernel.org; Williams, Dan J; Ciechanowski, Ed;
> > Neubauer, Wojciech
> > Subject: Re: [PATCH 0/3] Continue expansion after reboot
> >
> > On Tue, 22 Feb 2011 15:13:15 +0100 Adam Kwolek
> > wrote:
> >
> > > Currently reshaped/expanded array is assembled but it stays in
> > inactive state.
> > > This patches allows for array assembly when array is under
> expansion.
> > > Array with reshape/expansion information in metadata is assembled
> > > and reshape process continues automatically.
> > >
> > > Next step:
> > > Problem is how to address container operation during assembly.
> > > 1. After first array being reshaped, assebly process looks if mdmon
> > > sets migration for other array in container. If yes it continues
> > work
> > > for next array.
> > >
> > > 2. Assembly process performs reshape of currently reshaped array
> only.
> > > Mdmon sets next array for reshape and user triggers manually
> mdadm
> > > to finish container operation with just the same parameters set.
> > >
> > > Reshape finish can be executed for container operation by container
> > re-assembly
> > > also (this works in current code).
> > >
> >
> > Yes, this is an awkward problem.
> >
> > Just to be sure we are thinking about the same thing:
> > When restarting an array in which migration is already underway
> mdadm
> > simply
> > forks and continues monitoring that migration.
> > However if it is an array-wide migration, then when the migration of
> > the
> > first array completes, mdmon will update the metadata on the second
> > array,
> > but it isn't clear how mdadm can be told to start monitoring that
> > array.
> >
> > How about this:
> > the imsm metadata handler should report that an array is 'undergoing
> > migration if it is, or if an earlier array in the container is
> > undergoing a
> > migration which will cause 'this' array to subsequently be migrated
> > too.
> >
> > So if the first array is in the middle of a 4drive->5drive
> conversion
> > and
> > the second array is simply at '4 drives', then imsm reported (to
> > container_content) that the second drive is actually undergoing a
> > migration
> > from 4 to 5 drives, and is at the very beginning.
> >
> > When mdadm assembles that second array it will fork a child to
> monitor
> > it.
> > It will need to somehow wait for mdmon to really update the metadata
> > before
> > it starts. This can probably be handled in the ->manage_reshape
> > function.
> >
> > Something along those line would be the right way to go I think. It
> > avoid
> > any races between arrays being assembled at different times.
>
>
> This looks fine for me.
>
> >
> >
> > > Adam Kwolek (3):
> > > FIX: Assemble device in reshape state with new disks number
> >
> > I don't think this patch is correct. We need to configure the array
> > with the
> > 'old' number of devices first, then 'reshape_array' will also set the
> > 'new'
> > number of devices.
> > What exactly what the problem you were trying to fix?
>
> When array is being assembled with old raid disk number assembly cannot
> set readOnly array state
> (error on sysfs state writing). Array stays in inactive state, so
> nothing (reshape) happened later.
>
> I think that array cannot be assembled with old disks number (added new
> disks are present as spares)
> because begin of array uses new disks already. This means we are
> assembling array with not complete disk set.
> Stripes on begin can be corrupted (not all disks present in array). At
> this point inactive array state is ok to keep safe user data.
>
>
> I'll test is setting old disk number and later configuration change in
> disks number and array state resolves problem.
> I'll let you know results.

I've made some investigations. I've tried assemble algorithm (as you suggested):
Conditions:
reshape 3 disk raid5 array to 4 disks raid5 array
is interrupted. Restart is invoked by command 'mdadm -As'

1. Assemble() builds container with new disks number
2. Assemble() builds container content (array with /old/ 3 disks)
3. array is set to frozen to block monitor
4. sync_max in sysfs is set to 0, to block md until reshape monitoring takes carry about reshape process
5. Continue_reshape() starts reshape process
6. Continue_reshape() continues reshape process

Problems I've met:
1. not all disks in Assembly() are added to array (old disks number limitation)
2. setting reshape_position invokes automatically reshape start in md on array run
3. setting of reshape position clears delta_disks in md (and other parameters, for now not important)
4. Assembly() closes handle to array (it has to be not closed and used in reshape continuation)
5. reshape continuation can require backup file. It depends where it was interrupted during expansion,
Other reshapes can always require backup file
6. to run reshape, 'reshape' has to be written to sync_action.
Raid5_start_reshape() is not prepared for reshape restart (i.e reshape position can be 0 or max array value
- it depends on operation grow/shrink)
7. After array start flag MD_RECOVERY_NEEDED is set, so reshape cannot be started from mdadm
As array is started with not all disks (old raid disks), we cannot allow for such check (???)
I've made workaround (setting reshape position clears this flag for external meta)

I've started reshape again on /all/ new disks number, but it still starts from array begin. This is a matter of search where checkpoint is lost.

I've tested my first idea also.
To do as much as we can, as for native meta (reshape is started by array run).
Some problems are similar as before (p.4, p.5)
The only serious problem, that I've got with this is how to let to know md about delta_disks.
I've resolved it by adding special case in raid_disks_store(),
similar to native metadata when old_disks number is guessed.
For external metadata, I am storing old and then new disks numbers, md calculates delta disks from this raid disks numbers sequence.
(as I remember you do not want to expose delta disks in sysfs).

Other issue that I'm observing in both methods is sync_action sysfs entry behavior. It reports reshape->idle->reshape...
This 'idle' for a very short time causes migration cancelation. I've made workaround in mdmon for a now.

Both methods are not fully workable yet, but I think this will change on Monday.

Considering above, I still like more method when we construct array with new disks number.
Begin of array /already reshaped/ has all disks present. In the same way md works for native arrays.

I'm waiting for your comments/questions/ideas.

BR
Adam

>
> BR
> Adam
>
> >
> >
> > > imsm: FIX: Report correct array size during reshape
> > > imsm: FIX: initalize reshape progress as it is stored in
> > metatdata
> > >
> > These both look good - I have applied them. Thanks.
> >
> > NeilBrown
> >

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] Continue expansion after reboot

am 27.02.2011 07:51:20 von NeilBrown

On Fri, 25 Feb 2011 15:55:01 +0000 "Kwolek, Adam"
wrote:

>
>
> > -----Original Message-----
> > From: Kwolek, Adam
> > Sent: Wednesday, February 23, 2011 10:02 AM
> > To: 'NeilBrown'
> > Cc: linux-raid@vger.kernel.org; Williams, Dan J; Ciechanowski, Ed;
> > Neubauer, Wojciech
> > Subject: RE: [PATCH 0/3] Continue expansion after reboot
> >
> >
> >
> > > -----Original Message-----
> > > From: NeilBrown [mailto:neilb@suse.de]
> > > Sent: Wednesday, February 23, 2011 4:38 AM
> > > To: Kwolek, Adam
> > > Cc: linux-raid@vger.kernel.org; Williams, Dan J; Ciechanowski, Ed;
> > > Neubauer, Wojciech
> > > Subject: Re: [PATCH 0/3] Continue expansion after reboot
> > >
> > > On Tue, 22 Feb 2011 15:13:15 +0100 Adam Kwolek
> > > wrote:
> > >
> > > > Currently reshaped/expanded array is assembled but it stays in
> > > inactive state.
> > > > This patches allows for array assembly when array is under
> > expansion.
> > > > Array with reshape/expansion information in metadata is assembled
> > > > and reshape process continues automatically.
> > > >
> > > > Next step:
> > > > Problem is how to address container operation during assembly.
> > > > 1. After first array being reshaped, assebly process looks if mdmon
> > > > sets migration for other array in container. If yes it continues
> > > work
> > > > for next array.
> > > >
> > > > 2. Assembly process performs reshape of currently reshaped array
> > only.
> > > > Mdmon sets next array for reshape and user triggers manually
> > mdadm
> > > > to finish container operation with just the same parameters set.
> > > >
> > > > Reshape finish can be executed for container operation by container
> > > re-assembly
> > > > also (this works in current code).
> > > >
> > >
> > > Yes, this is an awkward problem.
> > >
> > > Just to be sure we are thinking about the same thing:
> > > When restarting an array in which migration is already underway
> > mdadm
> > > simply
> > > forks and continues monitoring that migration.
> > > However if it is an array-wide migration, then when the migration of
> > > the
> > > first array completes, mdmon will update the metadata on the second
> > > array,
> > > but it isn't clear how mdadm can be told to start monitoring that
> > > array.
> > >
> > > How about this:
> > > the imsm metadata handler should report that an array is 'undergoing
> > > migration if it is, or if an earlier array in the container is
> > > undergoing a
> > > migration which will cause 'this' array to subsequently be migrated
> > > too.
> > >
> > > So if the first array is in the middle of a 4drive->5drive
> > conversion
> > > and
> > > the second array is simply at '4 drives', then imsm reported (to
> > > container_content) that the second drive is actually undergoing a
> > > migration
> > > from 4 to 5 drives, and is at the very beginning.
> > >
> > > When mdadm assembles that second array it will fork a child to
> > monitor
> > > it.
> > > It will need to somehow wait for mdmon to really update the metadata
> > > before
> > > it starts. This can probably be handled in the ->manage_reshape
> > > function.
> > >
> > > Something along those line would be the right way to go I think. It
> > > avoid
> > > any races between arrays being assembled at different times.
> >
> >
> > This looks fine for me.
> >
> > >
> > >
> > > > Adam Kwolek (3):
> > > > FIX: Assemble device in reshape state with new disks number
> > >
> > > I don't think this patch is correct. We need to configure the array
> > > with the
> > > 'old' number of devices first, then 'reshape_array' will also set the
> > > 'new'
> > > number of devices.
> > > What exactly what the problem you were trying to fix?
> >
> > When array is being assembled with old raid disk number assembly cannot
> > set readOnly array state
> > (error on sysfs state writing). Array stays in inactive state, so
> > nothing (reshape) happened later.
> >
> > I think that array cannot be assembled with old disks number (added new
> > disks are present as spares)
> > because begin of array uses new disks already. This means we are
> > assembling array with not complete disk set.
> > Stripes on begin can be corrupted (not all disks present in array). At
> > this point inactive array state is ok to keep safe user data.
> >
> >
> > I'll test is setting old disk number and later configuration change in
> > disks number and array state resolves problem.
> > I'll let you know results.
>
> I've made some investigations. I've tried assemble algorithm (as you suggested):
> Conditions:
> reshape 3 disk raid5 array to 4 disks raid5 array
> is interrupted. Restart is invoked by command 'mdadm -As'
>
> 1. Assemble() builds container with new disks number
> 2. Assemble() builds container content (array with /old/ 3 disks)
> 3. array is set to frozen to block monitor
> 4. sync_max in sysfs is set to 0, to block md until reshape monitoring takes carry about reshape process
> 5. Continue_reshape() starts reshape process
> 6. Continue_reshape() continues reshape process
>
> Problems I've met:
> 1. not all disks in Assembly() are added to array (old disks number limitation)

I want to fix this by getting sysfs_set_array to set up the new raid_disks
number.
It currently doesn't because the number of disks that md is to expect could
be different to the number of disks recorded in the metadata, and
"analyse_change" might be needed to resolve the difference.
A particular example is that the metadata might think a RAID0 is changing from
4 device to 5 devices, but md need to be told that a RAID4 is changing from
5 devices to 6 devices.
However in the case, we really need to do the 'analyse_change' before calling
sysfs_set_array anyway.

So get sysfs_set_array to set up the array fully, and find somewhere
appropriate to put a call to analyse_change ... possibly modifying
analyse_change a bit ...

> 2. setting reshape_position invokes automatically reshape start in md on array run

That shouldn't be a problem.. We start the array read-only and the reshape
will not start while that is set.
So:
set 'old' shape of array,
set reshape_position
set 'new' shape of array
start array 'readonly'
set sync_max to 0
enable read/write
allow reshape to continue while monitoring it with mdadm.

Does this work, or is there something I have missed.


> 3. setting of reshape position clears delta_disks in md (and other parameters, for now not important)

That shouldn't matter ... where do we set reshape_position that it causes
a problem?

> 4. Assembly() closes handle to array (it has to be not closed and used in reshape continuation)

I'm not sure what you are getting at.... reshape continuation is handled by
Grow_continue which is passed the handle to the array. It should fork and
monitor the array in the background, so it has its own copy of the handle
???


> 5. reshape continuation can require backup file. It depends where it was interrupted during expansion,
> Other reshapes can always require backup file

Yes ... Why is this a problem?

> 6. to run reshape, 'reshape' has to be written to sync_action.
> Raid5_start_reshape() is not prepared for reshape restart (i.e reshape position can be 0 or max array value
> - it depends on operation grow/shrink)

Yes ... raid5_start_reshape isn't used for restarting a reshape.
run() will start the reshape thread, which will not run because the array is
read-only
Once you switch the array to read-write the sync_thread should get woken up
and will continue the reshape.


I think the remainder of your email is also addressed by what I have said
above so I won't try to address specific things.

Please let me know if you see any problem with what I have outlined.

Thanks!

NeilBrown




> 7. After array start flag MD_RECOVERY_NEEDED is set, so reshape cannot be started from mdadm
> As array is started with not all disks (old raid disks), we cannot allow for such check (???)
> I've made workaround (setting reshape position clears this flag for external meta)
>
> I've started reshape again on /all/ new disks number, but it still starts from array begin. This is a matter of search where checkpoint is lost.
>
> I've tested my first idea also.
> To do as much as we can, as for native meta (reshape is started by array run).
> Some problems are similar as before (p.4, p.5)
> The only serious problem, that I've got with this is how to let to know md about delta_disks.
> I've resolved it by adding special case in raid_disks_store(),
> similar to native metadata when old_disks number is guessed.
> For external metadata, I am storing old and then new disks numbers, md calculates delta disks from this raid disks numbers sequence.
> (as I remember you do not want to expose delta disks in sysfs).
>
> Other issue that I'm observing in both methods is sync_action sysfs entry behavior. It reports reshape->idle->reshape...
> This 'idle' for a very short time causes migration cancelation. I've made workaround in mdmon for a now.
>
> Both methods are not fully workable yet, but I think this will change on Monday.
>
> Considering above, I still like more method when we construct array with new disks number.
> Begin of array /already reshaped/ has all disks present. In the same way md works for native arrays.
>
> I'm waiting for your comments/questions/ideas.
>
> BR
> Adam
>
> >
> > BR
> > Adam
> >
> > >
> > >
> > > > imsm: FIX: Report correct array size during reshape
> > > > imsm: FIX: initalize reshape progress as it is stored in
> > > metatdata
> > > >
> > > These both look good - I have applied them. Thanks.
> > >
> > > NeilBrown
> > >

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

RE: [PATCH 0/3] Continue expansion after reboot

am 28.02.2011 14:35:33 von adam.kwolek

> -----Original Message-----
> From: NeilBrown [mailto:neilb@suse.de]
> Sent: Sunday, February 27, 2011 7:51 AM
> To: Kwolek, Adam
> Cc: linux-raid@vger.kernel.org; Williams, Dan J; Ciechanowski, Ed;
> Neubauer, Wojciech
> Subject: Re: [PATCH 0/3] Continue expansion after reboot
>
> On Fri, 25 Feb 2011 15:55:01 +0000 "Kwolek, Adam"
>
> wrote:
>
> >
> >
> > > -----Original Message-----
> > > From: Kwolek, Adam
> > > Sent: Wednesday, February 23, 2011 10:02 AM
> > > To: 'NeilBrown'
> > > Cc: linux-raid@vger.kernel.org; Williams, Dan J; Ciechanowski, Ed;
> > > Neubauer, Wojciech
> > > Subject: RE: [PATCH 0/3] Continue expansion after reboot
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: NeilBrown [mailto:neilb@suse.de]
> > > > Sent: Wednesday, February 23, 2011 4:38 AM
> > > > To: Kwolek, Adam
> > > > Cc: linux-raid@vger.kernel.org; Williams, Dan J; Ciechanowski, Ed;
> > > > Neubauer, Wojciech
> > > > Subject: Re: [PATCH 0/3] Continue expansion after reboot
> > > >
> > > > On Tue, 22 Feb 2011 15:13:15 +0100 Adam Kwolek
>
> > > > wrote:
> > > >
> > > > > Currently reshaped/expanded array is assembled but it stays in
> > > > inactive state.
> > > > > This patches allows for array assembly when array is under
> > > expansion.
> > > > > Array with reshape/expansion information in metadata is
> assembled
> > > > > and reshape process continues automatically.
> > > > >
> > > > > Next step:
> > > > > Problem is how to address container operation during assembly.
> > > > > 1. After first array being reshaped, assebly process looks if
> mdmon
> > > > > sets migration for other array in container. If yes it
> continues
> > > > work
> > > > > for next array.
> > > > >
> > > > > 2. Assembly process performs reshape of currently reshaped array
> > > only.
> > > > > Mdmon sets next array for reshape and user triggers manually
> > > mdadm
> > > > > to finish container operation with just the same parameters
> set.
> > > > >
> > > > > Reshape finish can be executed for container operation by
> container
> > > > re-assembly
> > > > > also (this works in current code).
> > > > >
> > > >
> > > > Yes, this is an awkward problem.
> > > >
> > > > Just to be sure we are thinking about the same thing:
> > > > When restarting an array in which migration is already underway
> > > mdadm
> > > > simply
> > > > forks and continues monitoring that migration.
> > > > However if it is an array-wide migration, then when the
> migration of
> > > > the
> > > > first array completes, mdmon will update the metadata on the
> second
> > > > array,
> > > > but it isn't clear how mdadm can be told to start monitoring
> that
> > > > array.
> > > >
> > > > How about this:
> > > > the imsm metadata handler should report that an array is
> 'undergoing
> > > > migration if it is, or if an earlier array in the container is
> > > > undergoing a
> > > > migration which will cause 'this' array to subsequently be
> migrated
> > > > too.
> > > >
> > > > So if the first array is in the middle of a 4drive->5drive
> > > conversion
> > > > and
> > > > the second array is simply at '4 drives', then imsm reported (to
> > > > container_content) that the second drive is actually undergoing
> a
> > > > migration
> > > > from 4 to 5 drives, and is at the very beginning.
> > > >
> > > > When mdadm assembles that second array it will fork a child to
> > > monitor
> > > > it.
> > > > It will need to somehow wait for mdmon to really update the
> metadata
> > > > before
> > > > it starts. This can probably be handled in the ->manage_reshape
> > > > function.
> > > >
> > > > Something along those line would be the right way to go I think.
> It
> > > > avoid
> > > > any races between arrays being assembled at different times.
> > >
> > >
> > > This looks fine for me.
> > >
> > > >
> > > >
> > > > > Adam Kwolek (3):
> > > > > FIX: Assemble device in reshape state with new disks
> number
> > > >
> > > > I don't think this patch is correct. We need to configure the
> array
> > > > with the
> > > > 'old' number of devices first, then 'reshape_array' will also set
> the
> > > > 'new'
> > > > number of devices.
> > > > What exactly what the problem you were trying to fix?
> > >
> > > When array is being assembled with old raid disk number assembly
> cannot
> > > set readOnly array state
> > > (error on sysfs state writing). Array stays in inactive state, so
> > > nothing (reshape) happened later.
> > >
> > > I think that array cannot be assembled with old disks number (added
> new
> > > disks are present as spares)
> > > because begin of array uses new disks already. This means we are
> > > assembling array with not complete disk set.
> > > Stripes on begin can be corrupted (not all disks present in array).
> At
> > > this point inactive array state is ok to keep safe user data.
> > >
> > >
> > > I'll test is setting old disk number and later configuration change
> in
> > > disks number and array state resolves problem.
> > > I'll let you know results.
> >
> > I've made some investigations. I've tried assemble algorithm (as you
> suggested):
> > Conditions:
> > reshape 3 disk raid5 array to 4 disks raid5 array
> > is interrupted. Restart is invoked by command 'mdadm -As'
> >
> > 1. Assemble() builds container with new disks number
> > 2. Assemble() builds container content (array with /old/ 3 disks)
> > 3. array is set to frozen to block monitor
> > 4. sync_max in sysfs is set to 0, to block md until reshape monitoring
> takes carry about reshape process
> > 5. Continue_reshape() starts reshape process
> > 6. Continue_reshape() continues reshape process
> >
> > Problems I've met:
> > 1. not all disks in Assembly() are added to array (old disks number
> limitation)
>
> I want to fix this by getting sysfs_set_array to set up the new
> raid_disks
> number.
> It currently doesn't because the number of disks that md is to expect
> could
> be different to the number of disks recorded in the metadata, and
> "analyse_change" might be needed to resolve the difference.
> A particular example is that the metadata might think a RAID0 is
> changing from
> 4 device to 5 devices, but md need to be told that a RAID4 is changing
> from
> 5 devices to 6 devices.
> However in the case, we really need to do the 'analyse_change' before
> calling
> sysfs_set_array anyway.
>
> So get sysfs_set_array to set up the array fully, and find somewhere
> appropriate to put a call to analyse_change ... possibly modifying
> analyse_change a bit ...

In the nearest patches that I hope makes me closer to the final code, I will not use analyze changes.
I want to have that makes the thing (restore from checkpoint) workable first.

>
> > 2. setting reshape_position invokes automatically reshape start in md
> on array run
>
> That shouldn't be a problem.. We start the array read-only and the
> reshape
> will not start while that is set.
> So:
> set 'old' shape of array,
> set reshape_position
> set 'new' shape of array
> start array 'readonly'
> set sync_max to 0
> enable read/write
> allow reshape to continue while monitoring it with mdadm.
>
> Does this work, or is there something I have missed.

Everything in mdadm seems to be ok.
One small problem is in md (raid5.c:5052):
For grow case there is check for checkpoint. For my code chunk_sectors and new_chunk_sectors
are the same, so array is not started. If I ignore '==' case array can be assembled.

>
>
> > 3. setting of reshape position clears delta_disks in md (and other
> parameters, for now not important)
>
> That shouldn't matter ... where do we set reshape_position that it
> causes
> a problem?

Yes I've found it during my tests.

>
> > 4. Assembly() closes handle to array (it has to be not closed and used
> in reshape continuation)
>
> I'm not sure what you are getting at.... reshape continuation is handled
> by
> Grow_continue which is passed the handle to the array. It should fork
> and
> monitor the array in the background, so it has its own copy of the
> handle
> ???

Assemble_container_content() closes array handle. I've not fork() inside this function,
but probably this will be better.

>
> > 5. reshape continuation can require backup file. It depends where it
> was interrupted during expansion,
> > Other reshapes can always require backup file
>
> Yes ... Why is this a problem?

.... not a problem, rather TBD ;)
Assemble operation has not pointed backup file, so backup file name has to be generated.


>
> > 6. to run reshape, 'reshape' has to be written to sync_action.
> > Raid5_start_reshape() is not prepared for reshape restart (i.e
> reshape position can be 0 or max array value
> > - it depends on operation grow/shrink)
>
> Yes ... raid5_start_reshape isn't used for restarting a reshape.
> run() will start the reshape thread, which will not run because the
> array is
> read-only
> Once you switch the array to read-write the sync_thread should get woken
> up
> and will continue the reshape.

This is what I wanted to hear :).

>
>
> I think the remainder of your email is also addressed by what I have
> said
> above so I won't try to address specific things.
>
> Please let me know if you see any problem with what I have outlined.
>
> Thanks!
>
> NeilBrown

Everything is more or less clear, I'll prepare few patches that allows
For reshape continuation for expansion to get your feedback.

BR
Adam


>
>
>
> > 7. After array start flag MD_RECOVERY_NEEDED is set, so reshape cannot
> be started from mdadm
> > As array is started with not all disks (old raid disks), we cannot
> allow for such check (???)
> > I've made workaround (setting reshape position clears this flag
> for external meta)
> >
> > I've started reshape again on /all/ new disks number, but it still
> starts from array begin. This is a matter of search where checkpoint is
> lost.
> >
> > I've tested my first idea also.
> > To do as much as we can, as for native meta (reshape is started by
> array run).
> > Some problems are similar as before (p.4, p.5)
> > The only serious problem, that I've got with this is how to let to
> know md about delta_disks.
> > I've resolved it by adding special case in raid_disks_store(),
> > similar to native metadata when old_disks number is guessed.
> > For external metadata, I am storing old and then new disks numbers, md
> calculates delta disks from this raid disks numbers sequence.
> > (as I remember you do not want to expose delta disks in sysfs).
> >
> > Other issue that I'm observing in both methods is sync_action sysfs
> entry behavior. It reports reshape->idle->reshape...
> > This 'idle' for a very short time causes migration cancelation. I've
> made workaround in mdmon for a now.
> >
> > Both methods are not fully workable yet, but I think this will change
> on Monday.
> >
> > Considering above, I still like more method when we construct array
> with new disks number.
> > Begin of array /already reshaped/ has all disks present. In the same
> way md works for native arrays.
> >
> > I'm waiting for your comments/questions/ideas.
> >
> > BR
> > Adam
> >
> > >
> > > BR
> > > Adam
> > >
> > > >
> > > >
> > > > > imsm: FIX: Report correct array size during reshape
> > > > > imsm: FIX: initalize reshape progress as it is stored in
> > > > metatdata
> > > > >
> > > > These both look good - I have applied them. Thanks.
> > > >
> > > > NeilBrown
> > > >

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html