[PATCH 0/3] UT and error case changes

am 14.03.2011 15:09:20 von adam.kwolek

The following series implements 2 changes:
1. Fix for unit tests failure.
UT suits 12 and 13 fails, when backup file cannot be opened
for grow operation (backup file exists already).

2. I've got proposal for handling/rollback metadata in error case.
Currently in case of error external metadata can remain in reshape state.
In some cases metadata can be automatically restored to initial state
(i.e. metadata during imsm container operation can be rolled back
when error occurs on first reshaped array before reshape is started).
For such cases, additional superswitch function can be introduced.

Metadata shouldn't be rolled back in restart case.
I'm passing restart flag to abort function in Grow.c only, as this is general rule.
In the same way array reshape state is checked.

This is proposal, so I've put no implementation in to imsm handler (no metadata update is created yet).
Please let me know your opinion. If you will like it, I'll fill out gaps in imsm code.

BR
Adam

---

Adam Kwolek (3):
imsm: Add metadata abort changes handler template
External metadata has to be restored to initial state in error case
imsm: FIX: existing backup file fails unit tests

Grow.c | 26 ++++++++++++++++++++++++++
mdadm.h | 10 ++++++++++
super-intel.c | 15 +++++++++++++++
tests/imsm-grow-template | 8 ++++++--
4 files changed, 57 insertions(+), 2 deletions(-)

--
Signature
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

[PATCH 1/3] imsm: FIX: existing backup file fails unit tests

am 14.03.2011 15:09:29 von adam.kwolek

During normal test execution, backup file is deleted after test execution.
If test is interrupted/broken, backup file can remain for next run.
When backup file exists before unit test run, suits 12 and 13 fails.

To avoid this remove backup file before grow is executed.

Signed-off-by: Adam Kwolek
---

tests/imsm-grow-template | 8 ++++++--
1 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/tests/imsm-grow-template b/tests/imsm-grow-template
index 7c212c4..d03752d 100644
--- a/tests/imsm-grow-template
+++ b/tests/imsm-grow-template
@@ -17,8 +17,10 @@ function grow_member() {
local offset=$6
local chunk=$7
local array_size=$((comps * size))
+ local backup_imsm=/tmp/backup_imsm

- ( set -ex; mdadm --grow $member --chunk=$chunk --level=$level --backup-file=/tmp/backup_imsm )
+ rm -f $backup_imsm
+ ( set -ex; mdadm --grow $member --chunk=$chunk --level=$level --backup-file=$backup_imsm )
local status=$?
if [ $negative_test -ne 0 ]; then
if [ $status -eq 0 ]; then
@@ -71,6 +73,7 @@ done

imsm_check container $num_disks
num_disks=$((num_disks + add_to_num_disks))
+backup_imsm=/tmp/backup_imsm

# Grow each member or a container depending on the type of an operation
if [ $migration_test -ne 0 ]; then
@@ -82,7 +85,8 @@ if [ $migration_test -ne 0 ]; then
grow_member $member1 $new_num_disks $vol1_new_num_comps $vol1_new_level $vol1_comp_size $vol1_offset $vol1_new_chunk
fi
else
- ( set -x; mdadm --grow $container --raid-disks=$num_disks --backup-file=/tmp/backup_imsm )
+ rm -f $backup_imsm
+ ( set -x; mdadm --grow $container --raid-disks=$num_disks --backup-file=$backup_imsm )
grow_status=$?
if [ $negative_test -ne 0 ]; then
if [ $grow_status -eq 0 ]; then

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

[PATCH 2/3] External metadata has to be restored to initial state in

am 14.03.2011 15:09:38 von adam.kwolek

When any error occurs in reshape_array(), mdadm should have ability
to rollback changes. Based on metadata knowledge abort_metadata_changes()
should decide:
1. metadata abort is allowed?
In case of container operation abort can be allowed on first array only
It depends on metadata implementation.
2. If rollback is possible, execute required action to prepare update.

Returned value is:
1 if rollback succeed (send update)
0 if rollback not possible (no action)
-1 for other error

Metadata handler can be called only if reshape is not started in md.

Signed-off-by: Adam Kwolek
---

Grow.c | 26 ++++++++++++++++++++++++++
mdadm.h | 10 ++++++++++
2 files changed, 36 insertions(+), 0 deletions(-)

diff --git a/Grow.c b/Grow.c
index 40e693e..26fd7eb 100644
--- a/Grow.c
+++ b/Grow.c
@@ -594,6 +594,31 @@ static void sync_metadata(struct supertype *st)
}
}

+static int abort_metadata_changes(struct supertype *st, struct mdinfo *sra,
+ int restart)
+{
+ int ret_val = 0;
+ char action[40];
+
+ /* for not restarted external metadata only
+ */
+ if (!st->ss->external || restart)
+ return ret_val;
+
+ /* check if reshape is not started in md
+ */
+ if (sra && sysfs_get_str(sra, NULL, "sync_action", action, 40) > 0 &&
+ strncmp(action, "reshape", 7) == 0)
+ return ret_val;
+
+ if (st->ss->abort_metadata_changes)
+ ret_val = st->ss->abort_metadata_changes(st);
+ if (ret_val > 0)
+ sync_metadata(st);
+
+ return ret_val;
+}
+
static int subarray_set_num(char *container, struct mdinfo *sra, char *name, int n)
{
/* when dealing with external metadata subarrays we need to be
@@ -2187,6 +2212,7 @@ out:
exit(0);

release:
+ abort_metadata_changes(st, sra, restart);
if (orig_level != UnSet && sra) {
c = map_num(pers, orig_level);
if (c && sysfs_set_str(sra, NULL, "level", c) == 0)
diff --git a/mdadm.h b/mdadm.h
index d3ed50a..d502b13 100644
--- a/mdadm.h
+++ b/mdadm.h
@@ -727,6 +727,16 @@ extern struct superswitch {
struct supertype *st, unsigned long blocks,
int *fds, unsigned long long *offsets,
int dests, int *destfd, unsigned long long *destoffsets);
+ /* abort_metadata_changes() will rollbacks metadata changes
+ * in case of error.
+ * It can be called for not restart case
+ * and when reshape has not been started in md.
+ * Function should return values:
+ * 1 if abort succeed
+ * 0 if abort not possible
+ * -1 for other error
+ */
+ int (*abort_metadata_changes)(struct supertype *st);

/* for mdmon */
int (*open_new)(struct supertype *c, struct active_array *a,

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

[PATCH 3/3] imsm: Add metadata abort changes handler template

am 14.03.2011 15:09:47 von adam.kwolek

Add metadata rollback function implementation/placeholder
for imsm metadata.

Signed-off-by: Adam Kwolek
---

super-intel.c | 15 +++++++++++++++
1 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/super-intel.c b/super-intel.c
index 44c100b..2c2bd1e 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -7364,6 +7364,20 @@ static int imsm_manage_reshape(
afd, sra, reshape, st, stripes,
fds, offsets, dests, destfd, destoffsets);
}
+
+/* imsm metadata rollback handler, manage metadata for error scenario
+ */
+static int imsm_abort_metadata_changes(struct supertype *st)
+{
+ int ret_val = -1;
+
+ dprintf("imsm: imsm_abort_metadata_changes() called\n");
+
+ /* ToDo: when possible, prepare rollback metadata update
+ */
+
+ return ret_val;
+}
#endif /* MDASSEMBLE */

struct superswitch super_imsm = {
@@ -7386,6 +7400,7 @@ struct superswitch super_imsm = {
.get_disk_controller_domain = imsm_get_disk_controller_domain,
.reshape_super = imsm_reshape_super,
.manage_reshape = imsm_manage_reshape,
+ .abort_metadata_changes = imsm_abort_metadata_changes,
#endif
.match_home = match_home_imsm,
.uuid_from_super= uuid_from_super_imsm,

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] UT and error case changes

am 14.03.2011 22:53:46 von NeilBrown

On Mon, 14 Mar 2011 15:09:20 +0100 Adam Kwolek wrote:

> The following series implements 2 changes:
> 1. Fix for unit tests failure.
> UT suits 12 and 13 fails, when backup file cannot be opened
> for grow operation (backup file exists already).

Thanks - applied.

>
> 2. I've got proposal for handling/rollback metadata in error case.
> Currently in case of error external metadata can remain in reshape state.
> In some cases metadata can be automatically restored to initial state
> (i.e. metadata during imsm container operation can be rolled back
> when error occurs on first reshaped array before reshape is started).
> For such cases, additional superswitch function can be introduced.
>
> Metadata shouldn't be rolled back in restart case.
> I'm passing restart flag to abort function in Grow.c only, as this is general rule.
> In the same way array reshape state is checked.
>
> This is proposal, so I've put no implementation in to imsm handler (no metadata update is created yet).
> Please let me know your opinion. If you will like it, I'll fill out gaps in imsm code.

I'm not sure.. it sounds like it might be a good idea, but I'd like to have
some concrete examples to help me think about it.

Do you have an example in mind of an error which might be detected after
the metadata has been updated, but before the reshape has actually started,
and for which reverting the metadata update is likely to be useful?

The more general case of changing your mind just after a reshape has
started and asking the reshape to revert would certainly be useful, but
md isn't capable of that in general (yet).

Thanks,
NeilBrown

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

RE: [PATCH 0/3] UT and error case changes

am 15.03.2011 08:28:00 von adam.kwolek

> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
> owner@vger.kernel.org] On Behalf Of NeilBrown
> Sent: Monday, March 14, 2011 10:54 PM
> To: Kwolek, Adam
> Cc: linux-raid@vger.kernel.org; Williams, Dan J; Ciechanowski, Ed;
> Neubauer, Wojciech
> Subject: Re: [PATCH 0/3] UT and error case changes
>
> On Mon, 14 Mar 2011 15:09:20 +0100 Adam Kwolek
> wrote:
>
> > The following series implements 2 changes:
> > 1. Fix for unit tests failure.
> > UT suits 12 and 13 fails, when backup file cannot be opened
> > for grow operation (backup file exists already).
>
> Thanks - applied.
>
>
> >
> > 2. I've got proposal for handling/rollback metadata in error case.
> > Currently in case of error external metadata can remain in reshape
> state.
> > In some cases metadata can be automatically restored to initial state
> > (i.e. metadata during imsm container operation can be rolled back
> > when error occurs on first reshaped array before reshape is started).
> > For such cases, additional superswitch function can be introduced.
> >
> > Metadata shouldn't be rolled back in restart case.
> > I'm passing restart flag to abort function in Grow.c only, as this is
> general rule.
> > In the same way array reshape state is checked.
> >
> > This is proposal, so I've put no implementation in to imsm handler (no
> metadata update is created yet).
> > Please let me know your opinion. If you will like it, I'll fill out
> gaps in imsm code.
>
> I'm not sure.. it sounds like it might be a good idea, but I'd like to
> have
> some concrete examples to help me think about it.

I've got this idea during correcting UT so, i.e.:
wrong/already existing/ backup file name passed by used.
Backup file verification/opening is when metadata is in reshape state already.
This error causes mdadm exit at reshape position == 0. It is too early position for reshape restart.
User has manually restore metadata information and this can be quite difficult.

Metadata rollback can be possible for container operation if:
1. external metadata case (checked in Grow.c)
2. This is action on first array in container (check in metadata handler)
2. md doesn't start reshape (checked in Grow.c)
3. Other conditions/metadata specific (checked in Grow.c)

We can also pass some additional status to reshape_container() to indicate that metadata is rolled back
and allow for container unfreeze in such case also.
This can allow mdadm to leave array/container in state that work is started from (hopefully ;)).

BR
Adam

>
> Do you have an example in mind of an error which might be detected after
> the metadata has been updated, but before the reshape has actually
> started,
> and for which reverting the metadata update is likely to be useful?
>
> The more general case of changing your mind just after a reshape has
> started and asking the reshape to revert would certainly be useful, but
> md isn't capable of that in general (yet).
>
> Thanks,
> NeilBrown
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH 0/3] UT and error case changes

am 18.03.2011 03:07:04 von NeilBrown

On Tue, 15 Mar 2011 07:28:00 +0000 "Kwolek, Adam"
wrote:

>
>
> > -----Original Message-----
> > From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
> > owner@vger.kernel.org] On Behalf Of NeilBrown
> > Sent: Monday, March 14, 2011 10:54 PM
> > To: Kwolek, Adam
> > Cc: linux-raid@vger.kernel.org; Williams, Dan J; Ciechanowski, Ed;
> > Neubauer, Wojciech
> > Subject: Re: [PATCH 0/3] UT and error case changes
> >
> > On Mon, 14 Mar 2011 15:09:20 +0100 Adam Kwolek
> > wrote:
> >
> > > The following series implements 2 changes:
> > > 1. Fix for unit tests failure.
> > > UT suits 12 and 13 fails, when backup file cannot be opened
> > > for grow operation (backup file exists already).
> >
> > Thanks - applied.
> >
> >
> > >
> > > 2. I've got proposal for handling/rollback metadata in error case.
> > > Currently in case of error external metadata can remain in reshape
> > state.
> > > In some cases metadata can be automatically restored to initial state
> > > (i.e. metadata during imsm container operation can be rolled back
> > > when error occurs on first reshaped array before reshape is started).
> > > For such cases, additional superswitch function can be introduced.
> > >
> > > Metadata shouldn't be rolled back in restart case.
> > > I'm passing restart flag to abort function in Grow.c only, as this is
> > general rule.
> > > In the same way array reshape state is checked.
> > >
> > > This is proposal, so I've put no implementation in to imsm handler (no
> > metadata update is created yet).
> > > Please let me know your opinion. If you will like it, I'll fill out
> > gaps in imsm code.
> >
> > I'm not sure.. it sounds like it might be a good idea, but I'd like to
> > have
> > some concrete examples to help me think about it.
>
>
>
> I've got this idea during correcting UT so, i.e.:
> wrong/already existing/ backup file name passed by used.
> Backup file verification/opening is when metadata is in reshape state already.
> This error causes mdadm exit at reshape position == 0. It is too early position for reshape restart.
> User has manually restore metadata information and this can be quite difficult.
>
> Metadata rollback can be possible for container operation if:
> 1. external metadata case (checked in Grow.c)
> 2. This is action on first array in container (check in metadata handler)
> 2. md doesn't start reshape (checked in Grow.c)
> 3. Other conditions/metadata specific (checked in Grow.c)
>
> We can also pass some additional status to reshape_container() to indicate that metadata is rolled back
> and allow for container unfreeze in such case also.
> This can allow mdadm to leave array/container in state that work is started from (hopefully ;)).
>
> BR
> Adam
>

I think I see your point - there are some issues with some failure
possibilities at start-up.

We should of course do as much checking as possible before committing to the
reshape, and I think we do. But it is still possible that something could go
wrong.

It would probably be good to take the first backup after freezing IO, but
before updating the metadata to show that a reshape has started. That would
ensure that a crash at any time will either be able to replay the backup and
continue the reshape, or won't see the reshape at all.

It might be a bit awkward to do that with the current code, I'm not sure.

Another possibility is to at least write out a 'zeroed' backup file and
arrange that when we restart and array, if the backup file is empty, then
that is a clear sign that no reshape has started and
->update_super("_reshape_progress") could revert the reshape.

Longer term: I want to eventually teach md to be able to revert a reshape
that has already started. In that case we will need some sort of metadata
method to record that reshape is going backwards - though it may end up
looking like something we already have...

NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Something wrong with __prep_thunderdome in super-intel.c

am 22.03.2011 03:23:07 von NeilBrown

Hi Dan,

I think you were the original author of imsm_thunderdome and
__prep_thunderdome - yes?

I found a case (thank to the test suite) where it isn't working correctly,

If I have a container with 2 devices, and the second one is failed, then
imsm_thunderdome returns NULL.

__prep_thunderdome sees the first and adds it to the table of superblocks.
Then it sees the second notices the family_num and checksum are the same, and
so replaces the first with the second in the table.

Then in imsm_thunderdome, d->serial is full of 'nul', so disk_list_get
doesn't find anything so the super_table becomes empty and nothing works.

So it could be:
load_and_parse_mpb is wrong for putting the nul serial in there
__prep_thunderdome is wrong for thinking the two are equivalent
imsm_thunderdome is wrong for giving up when just one device isn't found

and I really don't know which.

You can easily reproduce with

./mdadm -CR /dev/md/imsm -e imsm -n 2 /dev/loop[01]
./mdadm -CR /dev/md/r1 -l1 -n2 /dev/md/imsm
./mdadm /dev/md/r1 -f /dev/loop1
./mdadm -E /dev/md/imsm

and notice that nothing gets printed.
If you fail loop0 instead, it works properly.

Thanks,
NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Something wrong with __prep_thunderdome in super-intel.c

am 25.03.2011 03:40:46 von dan.j.williams

On Mon, 2011-03-21 at 19:23 -0700, NeilBrown wrote:
> Hi Dan,
>
> I think you were the original author of imsm_thunderdome and
> __prep_thunderdome - yes?

yes.

>
> I found a case (thank to the test suite) where it isn't working correctly,
>
> If I have a container with 2 devices, and the second one is failed, then
> imsm_thunderdome returns NULL.
>
> __prep_thunderdome sees the first and adds it to the table of superblocks.
> Then it sees the second notices the family_num and checksum are the same, and
> so replaces the first with the second in the table.
>
> Then in imsm_thunderdome, d->serial is full of 'nul', so disk_list_get
> doesn't find anything so the super_table becomes empty and nothing works.
>
> So it could be:
> load_and_parse_mpb is wrong for putting the nul serial in there
> __prep_thunderdome is wrong for thinking the two are equivalent
> imsm_thunderdome is wrong for giving up when just one device isn't found
>
> and I really don't know which.

hmm...

>
> You can easily reproduce with
>
> ./mdadm -CR /dev/md/imsm -e imsm -n 2 /dev/loop[01]
> ./mdadm -CR /dev/md/r1 -l1 -n2 /dev/md/imsm
> ./mdadm /dev/md/r1 -f /dev/loop1
> ./mdadm -E /dev/md/imsm
>
> and notice that nothing gets printed.
> If you fail loop0 instead, it works properly.

The first thought is that the generation number on the good device
should be ahead of the bad, but lo and behold it isn't. We should stop
advancing the generation number on failed devices. The regression that
Krzysztof's patch somewhat addresses is that thunderdome expects that
failed devices should at least be able to look themselves up in their
own mpb. Fixing up the serial number so that other devices see the
other disk as out of date is the expectation.

Stopping metadata write outs to failed devices fixes both problems.
Krzysztof care to try the following with your recent change reverted? I
verified it addresses the problem above, but I have not had a chance to
double check the orom compatibility (although I suspect it will be ok).
This also simplifies the calling convention to mark_missing and
mark_failure.

--
Dan

diff --git a/super-intel.c b/super-intel.c
index 2b41e08..6a6f738 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -5213,13 +5213,14 @@ static int is_resyncing(struct imsm_dev *dev)
}

/* return true if we recorded new information */
-static int mark_failure(struct imsm_dev *dev, struct imsm_disk *disk, int idx)
+static int mark_failure(struct imsm_dev *dev, struct dl *dl)
{
__u32 ord;
int slot;
struct imsm_map *map;
char buf[MAX_RAID_SERIAL_LEN+3];
- unsigned int len, shift = 0;
+ struct imsm_disk *disk = &dl->disk;
+ unsigned int len, shift = 0, idx = dl->index;

/* new failures are always set in map[0] */
map = get_imsm_map(dev, 0);
@@ -5232,6 +5233,7 @@ static int mark_failure(struct imsm_dev *dev, struct imsm_disk *disk, int idx)
if (is_failed(disk) && (ord & IMSM_ORD_REBUILD))
return 0;

+ dl->index = -2;
sprintf(buf, "%s:0", disk->serial);
if ((len = strlen(buf)) >= MAX_RAID_SERIAL_LEN)
shift = len - MAX_RAID_SERIAL_LEN + 1;
@@ -5244,9 +5246,11 @@ static int mark_failure(struct imsm_dev *dev, struct imsm_disk *disk, int idx)
return 1;
}

-static void mark_missing(struct imsm_dev *dev, struct imsm_disk *disk, int idx)
+static void mark_missing(struct imsm_dev *dev, struct dl *dl)
{
- mark_failure(dev, disk, idx);
+ struct imsm_disk *disk = &dl->disk;
+
+ mark_failure(dev, dl);

if (disk->scsi_id == __cpu_to_le32(~(__u32)0))
return;
@@ -5269,7 +5273,7 @@ static void handle_missing(struct intel_super *super, struct imsm_dev *dev)
dprintf("imsm: mark missing\n");
end_migration(dev, map_state);
for (dl = super->missing; dl; dl = dl->next)
- mark_missing(dev, &dl->disk, dl->index);
+ mark_missing(dev, dl);
super->updates_pending++;
}

@@ -5497,7 +5501,7 @@ static void imsm_set_disk(struct active_array *a, int n, int state)
struct intel_super *super = a->container->sb;
struct imsm_dev *dev = get_imsm_dev(super, inst);
struct imsm_map *map = get_imsm_map(dev, 0);
- struct imsm_disk *disk;
+ struct dl *dl;
int failed;
__u32 ord;
__u8 map_state;
@@ -5512,11 +5516,11 @@ static void imsm_set_disk(struct active_array *a, int n, int state)
dprintf("imsm: set_disk %d:%x\n", n, state);

ord = get_imsm_ord_tbl_ent(dev, n, -1);
- disk = get_imsm_disk(super, ord_to_idx(ord));
+ dl = get_imsm_dl_disk(super, ord_to_idx(ord));

/* check for new failures */
if (state & DS_FAULTY) {
- if (mark_failure(dev, disk, ord_to_idx(ord)))
+ if (mark_failure(dev, dl))
super->updates_pending++;
}

@@ -6265,7 +6269,7 @@ static int apply_takeover_update(struct imsm_update_takeover *u,
for (du = super->missing; du; du = du->next)
if (du->index >= 0) {
set_imsm_ord_tbl_ent(map, du->index, du->index);
- mark_missing(dev_new, &du->disk, du->index);
+ mark_missing(dev_new, du);
}

return 1;

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

RE: Something wrong with __prep_thunderdome in super-intel.c

am 25.03.2011 09:43:26 von adam.kwolek

DQoNCj4gLS0tLS1PcmlnaW5hbCBNZXNzYWdlLS0tLS0NCj4gRnJvbTogV2ls bGlhbXMsIERhbiBK
DQo+IFNlbnQ6IEZyaWRheSwgTWFyY2ggMjUsIDIwMTEgMzo0MSBBTQ0KPiBU bzogTmVpbEJyb3du
DQo+IENjOiBLd29sZWssIEFkYW07IGxpbnV4LXJhaWRAdmdlci5rZXJuZWwu b3JnOyBDaWVjaGFu
b3dza2ksIEVkOw0KPiBOZXViYXVlciwgV29qY2llY2g7IFdvamNpaywgS3J6 eXN6dG9mDQo+IFN1
YmplY3Q6IFJlOiBTb21ldGhpbmcgd3Jvbmcgd2l0aCBfX3ByZXBfdGh1bmRl cmRvbWUgaW4gc3Vw
ZXItaW50ZWwuYw0KPiANCj4gT24gTW9uLCAyMDExLTAzLTIxIGF0IDE5OjIz IC0wNzAwLCBOZWls
QnJvd24gd3JvdGU6DQo+ID4gSGkgRGFuLA0KPiA+DQo+ID4gIEkgdGhpbmsg eW91IHdlcmUgdGhl
IG9yaWdpbmFsIGF1dGhvciBvZiBpbXNtX3RodW5kZXJkb21lIGFuZA0KPiA+ ICBfX3ByZXBfdGh1
bmRlcmRvbWUgLSB5ZXM/DQo+IA0KPiB5ZXMuDQo+IA0KPiA+DQo+ID4gIEkg Zm91bmQgYSBjYXNl
ICh0aGFuayB0byB0aGUgdGVzdCBzdWl0ZSkgd2hlcmUgaXQgaXNuJ3Qgd29y a2luZw0KPiBjb3Jy
ZWN0bHksDQo+ID4NCj4gPiAgSWYgSSBoYXZlIGEgY29udGFpbmVyIHdpdGgg MiBkZXZpY2VzLCBh
bmQgdGhlIHNlY29uZCBvbmUgaXMgZmFpbGVkLA0KPiB0aGVuDQo+ID4gIGlt c21fdGh1bmRlcmRv
bWUgcmV0dXJucyBOVUxMLg0KPiA+DQo+ID4gIF9fcHJlcF90aHVuZGVyZG9t ZSBzZWVzIHRoZSBm
aXJzdCBhbmQgYWRkcyBpdCB0byB0aGUgdGFibGUgb2YNCj4gc3VwZXJibG9j a3MuDQo+ID4gIFRo
ZW4gaXQgc2VlcyB0aGUgc2Vjb25kIG5vdGljZXMgdGhlIGZhbWlseV9udW0g YW5kIGNoZWNrc3Vt
IGFyZSB0aGUNCj4gc2FtZSwgYW5kDQo+ID4gIHNvIHJlcGxhY2VzIHRoZSBm aXJzdCB3aXRoIHRo
ZSBzZWNvbmQgaW4gdGhlIHRhYmxlLg0KPiA+DQo+ID4gIFRoZW4gaW4gaW1z bV90aHVuZGVyZG9t
ZSwgZC0+c2VyaWFsIGlzIGZ1bGwgb2YgJ251bCcsIHNvDQo+IGRpc2tfbGlz dF9nZXQNCj4gPiAg
ZG9lc24ndCBmaW5kIGFueXRoaW5nIHNvIHRoZSBzdXBlcl90YWJsZSBiZWNv bWVzIGVtcHR5IGFu
ZCBub3RoaW5nDQo+IHdvcmtzLg0KPiA+DQo+ID4gIFNvIGl0IGNvdWxkIGJl Og0KPiA+ICAgICBs
b2FkX2FuZF9wYXJzZV9tcGIgaXMgd3JvbmcgZm9yIHB1dHRpbmcgdGhlIG51 bCBzZXJpYWwgaW4g
dGhlcmUNCj4gPiAgICAgX19wcmVwX3RodW5kZXJkb21lIGlzIHdyb25nIGZv ciB0aGlua2luZyB0
aGUgdHdvIGFyZSBlcXVpdmFsZW50DQo+ID4gICAgIGltc21fdGh1bmRlcmRv bWUgaXMgd3Jvbmcg
Zm9yIGdpdmluZyB1cCB3aGVuIGp1c3Qgb25lIGRldmljZSBpc24ndA0KPiBm b3VuZA0KPiA+DQo+
ID4gIGFuZCBJIHJlYWxseSBkb24ndCBrbm93IHdoaWNoLg0KPiANCj4gaG1t Li4uDQo+IA0KPiA8
Y29udGV4dCBzd2l0Y2ggb3V0IG9mIGlzY2kgZHJpdmVyIHJldmlldyBtb2Rl Pg0KPiANCj4gPg0K
PiA+IFlvdSBjYW4gZWFzaWx5IHJlcHJvZHVjZSB3aXRoDQo+ID4NCj4gPiAg IC4vbWRhZG0gLUNS
IC9kZXYvbWQvaW1zbSAtZSBpbXNtIC1uIDIgL2Rldi9sb29wWzAxXQ0KPiA+ ICAgLi9tZGFkbSAt
Q1IgL2Rldi9tZC9yMSAtbDEgLW4yIC9kZXYvbWQvaW1zbQ0KPiA+ICAgLi9t ZGFkbSAvZGV2L21k
L3IxIC1mIC9kZXYvbG9vcDENCj4gPiAgIC4vbWRhZG0gLUUgL2Rldi9tZC9p bXNtDQo+ID4NCj4g
PiBhbmQgbm90aWNlIHRoYXQgbm90aGluZyBnZXRzIHByaW50ZWQuDQo+ID4g SWYgeW91IGZhaWwg
bG9vcDAgaW5zdGVhZCwgaXQgd29ya3MgcHJvcGVybHkuDQo+IA0KPiBUaGUg Zmlyc3QgdGhvdWdo
dCBpcyB0aGF0IHRoZSBnZW5lcmF0aW9uIG51bWJlciBvbiB0aGUgZ29vZCBk ZXZpY2UNCj4gc2hv
dWxkIGJlIGFoZWFkIG9mIHRoZSBiYWQsIGJ1dCBsbyBhbmQgYmVob2xkIGl0 IGlzbid0LiAgV2Ug
c2hvdWxkIHN0b3ANCj4gYWR2YW5jaW5nIHRoZSBnZW5lcmF0aW9uIG51bWJl ciBvbiBmYWlsZWQg
ZGV2aWNlcy4gIFRoZSByZWdyZXNzaW9uIHRoYXQNCj4gS3J6eXN6dG9mJ3Mg cGF0Y2ggc29tZXdo
YXQgYWRkcmVzc2VzIGlzIHRoYXQgdGh1bmRlcmRvbWUgZXhwZWN0cyB0aGF0 DQo+IGZhaWxlZCBk
ZXZpY2VzIHNob3VsZCBhdCBsZWFzdCBiZSBhYmxlIHRvIGxvb2sgdGhlbXNl bHZlcyB1cCBpbiB0
aGVpcg0KPiBvd24gbXBiLiAgRml4aW5nIHVwIHRoZSBzZXJpYWwgbnVtYmVy IHNvIHRoYXQgb3Ro
ZXIgZGV2aWNlcyBzZWUgdGhlDQo+IG90aGVyIGRpc2sgYXMgb3V0IG9mIGRh dGUgaXMgdGhlIGV4
cGVjdGF0aW9uLg0KPiANCj4gU3RvcHBpbmcgbWV0YWRhdGEgd3JpdGUgb3V0 cyB0byBmYWlsZWQg
ZGV2aWNlcyBmaXhlcyBib3RoIHByb2JsZW1zLg0KPiBLcnp5c3p0b2YgY2Fy ZSB0byB0cnkgdGhl
IGZvbGxvd2luZyB3aXRoIHlvdXIgcmVjZW50IGNoYW5nZSByZXZlcnRlZD8g IEkNCj4gdmVyaWZp
ZWQgaXQgYWRkcmVzc2VzIHRoZSBwcm9ibGVtIGFib3ZlLCBidXQgSSBoYXZl IG5vdCBoYWQgYSBj
aGFuY2UgdG8NCj4gZG91YmxlIGNoZWNrIHRoZSBvcm9tIGNvbXBhdGliaWxp dHkgKGFsdGhvdWdo
IEkgc3VzcGVjdCBpdCB3aWxsIGJlIG9rKS4NCj4gVGhpcyBhbHNvIHNpbXBs aWZpZXMgdGhlIGNh
bGxpbmcgY29udmVudGlvbiB0byBtYXJrX21pc3NpbmcgYW5kDQo+IG1hcmtf ZmFpbHVyZS4NCj4g
DQo+IC0tDQo+IERhbg0KPiANCj4gZGlmZiAtLWdpdCBhL3N1cGVyLWludGVs LmMgYi9zdXBlci1p
bnRlbC5jDQo+IGluZGV4IDJiNDFlMDguLjZhNmY3MzggMTAwNjQ0DQo+IC0t LSBhL3N1cGVyLWlu
dGVsLmMNCj4gKysrIGIvc3VwZXItaW50ZWwuYw0KPiBAQCAtNTIxMywxMyAr NTIxMywxNCBAQCBz
dGF0aWMgaW50IGlzX3Jlc3luY2luZyhzdHJ1Y3QgaW1zbV9kZXYgKmRldikN Cj4gIH0NCj4gDQo+
ICAvKiByZXR1cm4gdHJ1ZSBpZiB3ZSByZWNvcmRlZCBuZXcgaW5mb3JtYXRp b24gKi8NCj4gLXN0
YXRpYyBpbnQgbWFya19mYWlsdXJlKHN0cnVjdCBpbXNtX2RldiAqZGV2LCBz dHJ1Y3QgaW1zbV9k
aXNrICpkaXNrLA0KPiBpbnQgaWR4KQ0KPiArc3RhdGljIGludCBtYXJrX2Zh aWx1cmUoc3RydWN0
IGltc21fZGV2ICpkZXYsIHN0cnVjdCBkbCAqZGwpDQo+ICB7DQo+ICAJX191 MzIgb3JkOw0KPiAg
CWludCBzbG90Ow0KPiAgCXN0cnVjdCBpbXNtX21hcCAqbWFwOw0KPiAgCWNo YXIgYnVmW01BWF9S
QUlEX1NFUklBTF9MRU4rM107DQo+IC0JdW5zaWduZWQgaW50IGxlbiwgc2hp ZnQgPSAwOw0KPiAr
CXN0cnVjdCBpbXNtX2Rpc2sgKmRpc2sgPSAmZGwtPmRpc2s7DQo+ICsJdW5z aWduZWQgaW50IGxl
biwgc2hpZnQgPSAwLCBpZHggPSBkbC0+aW5kZXg7DQo+IA0KPiAgCS8qIG5l dyBmYWlsdXJlcyBh
cmUgYWx3YXlzIHNldCBpbiBtYXBbMF0gKi8NCj4gIAltYXAgPSBnZXRfaW1z bV9tYXAoZGV2LCAw
KTsNCj4gQEAgLTUyMzIsNiArNTIzMyw3IEBAIHN0YXRpYyBpbnQgbWFya19m YWlsdXJlKHN0cnVj
dCBpbXNtX2RldiAqZGV2LA0KPiBzdHJ1Y3QgaW1zbV9kaXNrICpkaXNrLCBp bnQgaWR4KQ0KPiAg
CWlmIChpc19mYWlsZWQoZGlzaykgJiYgKG9yZCAmIElNU01fT1JEX1JFQlVJ TEQpKQ0KPiAgCQly
ZXR1cm4gMDsNCj4gDQo+ICsJZGwtPmluZGV4ID0gLTI7DQo+ICAJc3ByaW50 ZihidWYsICIlczow
IiwgZGlzay0+c2VyaWFsKTsNCj4gIAlpZiAoKGxlbiA9IHN0cmxlbihidWYp KSA+PSBNQVhfUkFJ
RF9TRVJJQUxfTEVOKQ0KPiAgCQlzaGlmdCA9IGxlbiAtIE1BWF9SQUlEX1NF UklBTF9MRU4gKyAx
Ow0KPiBAQCAtNTI0NCw5ICs1MjQ2LDExIEBAIHN0YXRpYyBpbnQgbWFya19m YWlsdXJlKHN0cnVj
dCBpbXNtX2RldiAqZGV2LA0KPiBzdHJ1Y3QgaW1zbV9kaXNrICpkaXNrLCBp bnQgaWR4KQ0KPiAg
CXJldHVybiAxOw0KPiAgfQ0KPiANCj4gLXN0YXRpYyB2b2lkIG1hcmtfbWlz c2luZyhzdHJ1Y3Qg
aW1zbV9kZXYgKmRldiwgc3RydWN0IGltc21fZGlzayAqZGlzaywNCj4gaW50 IGlkeCkNCj4gK3N0
YXRpYyB2b2lkIG1hcmtfbWlzc2luZyhzdHJ1Y3QgaW1zbV9kZXYgKmRldiwg c3RydWN0IGRsICpk
bCkNCj4gIHsNCj4gLQltYXJrX2ZhaWx1cmUoZGV2LCBkaXNrLCBpZHgpOw0K PiArCXN0cnVjdCBp
bXNtX2Rpc2sgKmRpc2sgPSAmZGwtPmRpc2s7DQo+ICsNCj4gKwltYXJrX2Zh aWx1cmUoZGV2LCBk
bCk7DQo+IA0KPiAgCWlmIChkaXNrLT5zY3NpX2lkID09IF9fY3B1X3RvX2xl MzIofihfX3UzMikw
KSkNCj4gIAkJcmV0dXJuOw0KPiBAQCAtNTI2OSw3ICs1MjczLDcgQEAgc3Rh dGljIHZvaWQgaGFu
ZGxlX21pc3Npbmcoc3RydWN0IGludGVsX3N1cGVyDQo+ICpzdXBlciwgc3Ry dWN0IGltc21fZGV2
ICpkZXYpDQo+ICAJZHByaW50ZigiaW1zbTogbWFyayBtaXNzaW5nXG4iKTsN Cj4gIAllbmRfbWln
cmF0aW9uKGRldiwgbWFwX3N0YXRlKTsNCj4gIAlmb3IgKGRsID0gc3VwZXIt Pm1pc3Npbmc7IGRs
OyBkbCA9IGRsLT5uZXh0KQ0KPiAtCQltYXJrX21pc3NpbmcoZGV2LCAmZGwt PmRpc2ssIGRsLT5p
bmRleCk7DQo+ICsJCW1hcmtfbWlzc2luZyhkZXYsIGRsKTsNCj4gIAlzdXBl ci0+dXBkYXRlc19w
ZW5kaW5nKys7DQo+ICB9DQo+IA0KPiBAQCAtNTQ5Nyw3ICs1NTAxLDcgQEAg c3RhdGljIHZvaWQg
aW1zbV9zZXRfZGlzayhzdHJ1Y3QgYWN0aXZlX2FycmF5ICphLA0KPiBpbnQg biwgaW50IHN0YXRl
KQ0KPiAgCXN0cnVjdCBpbnRlbF9zdXBlciAqc3VwZXIgPSBhLT5jb250YWlu ZXItPnNiOw0KPiAg
CXN0cnVjdCBpbXNtX2RldiAqZGV2ID0gZ2V0X2ltc21fZGV2KHN1cGVyLCBp bnN0KTsNCj4gIAlz
dHJ1Y3QgaW1zbV9tYXAgKm1hcCA9IGdldF9pbXNtX21hcChkZXYsIDApOw0K PiAtCXN0cnVjdCBp
bXNtX2Rpc2sgKmRpc2s7DQo+ICsJc3RydWN0IGRsICpkbDsNCj4gIAlpbnQg ZmFpbGVkOw0KPiAg
CV9fdTMyIG9yZDsNCj4gIAlfX3U4IG1hcF9zdGF0ZTsNCj4gQEAgLTU1MTIs MTEgKzU1MTYsMTEg
QEAgc3RhdGljIHZvaWQgaW1zbV9zZXRfZGlzayhzdHJ1Y3QgYWN0aXZlX2Fy cmF5DQo+ICphLCBp
bnQgbiwgaW50IHN0YXRlKQ0KPiAgCWRwcmludGYoImltc206IHNldF9kaXNr ICVkOiV4XG4iLCBu
LCBzdGF0ZSk7DQo+IA0KPiAgCW9yZCA9IGdldF9pbXNtX29yZF90YmxfZW50 KGRldiwgbiwgLTEp
Ow0KPiAtCWRpc2sgPSBnZXRfaW1zbV9kaXNrKHN1cGVyLCBvcmRfdG9faWR4 KG9yZCkpOw0KPiAr
CWRsID0gZ2V0X2ltc21fZGxfZGlzayhzdXBlciwgb3JkX3RvX2lkeChvcmQp KTsNCj4gDQo+ICAJ
LyogY2hlY2sgZm9yIG5ldyBmYWlsdXJlcyAqLw0KPiAgCWlmIChzdGF0ZSAm IERTX0ZBVUxUWSkg
ew0KPiAtCQlpZiAobWFya19mYWlsdXJlKGRldiwgZGlzaywgb3JkX3RvX2lk eChvcmQpKSkNCj4g
KwkJaWYgKG1hcmtfZmFpbHVyZShkZXYsIGRsKSkNCj4gIAkJCXN1cGVyLT51 cGRhdGVzX3BlbmRp
bmcrKzsNCj4gIAl9DQo+IA0KPiBAQCAtNjI2NSw3ICs2MjY5LDcgQEAgc3Rh dGljIGludCBhcHBs
eV90YWtlb3Zlcl91cGRhdGUoc3RydWN0DQo+IGltc21fdXBkYXRlX3Rha2Vv dmVyICp1LA0KPiAg
CWZvciAoZHUgPSBzdXBlci0+bWlzc2luZzsgZHU7IGR1ID0gZHUtPm5leHQp DQo+ICAJCWlmIChk
dS0+aW5kZXggPj0gMCkgew0KPiAgCQkJc2V0X2ltc21fb3JkX3RibF9lbnQo bWFwLCBkdS0+aW5k
ZXgsIGR1LT5pbmRleCk7DQo+IC0JCQltYXJrX21pc3NpbmcoZGV2X25ldywg JmR1LT5kaXNrLCBk
dS0+aW5kZXgpOw0KPiArCQkJbWFya19taXNzaW5nKGRldl9uZXcsIGR1KTsN Cj4gIAkJfQ0KPiAN
Cj4gIAlyZXR1cm4gMTsNCj4gDQoNCg0KVW5mb3J0dW5hdGVseSB0aGlzIGRv ZXNuJ3QgY2xvc2Ug
aXNzdWUuIA0KDQpJJ3ZlIHJlYWNoZWQgdGhlIHNhbWUgZnJvbSBJTVNNIGNv bXBhdGliaWxpdHkg
cG9pbnQgb2YgdmlldywgYnV0IGhhdmUgaW4gbWluZCB0aGF0IHVuaXQgdGVz dCBzdWl0IDExIGZh
aWxzIHVzaW5nIHRoaXMgYXBwcm9hY2guDQpQdXQgZm9jdXMgb24gdGVzdCA1 IGluIHRoaXMgc3Vp
dCAoSSdtIHVzaW5nIGl0LCBhcyB0aGlzIGlzIHRoZSBvbmx5IG9uZSB0aGF0 IGZhaWxzIGZvciBt
ZSkgZm9yIElNU00gKG5vdGUgdGhhdCBmaXJzdCB0ZXN0IHBhc3MgaXMgZm9y IG5hdGl2ZSBtZXRh
ZGF0YSkuDQpVc2luZyB5b3VyIHBhdGggdGVzdHMgNWIgYW5kIDVjIGZhaWxz LiBNeSBjb2RlIGZh
aWxzIDVjIHRlc3Qgb25seS4gSSdtIGN1cnJlbnRseSB3b3JraW5nIG9uIGl0 Lg0KDQpCUg0KQWRh
bQ0KDQo=
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Something wrong with __prep_thunderdome in super-intel.c

am 25.03.2011 19:50:27 von dan.j.williams

On 3/25/2011 1:43 AM, Kwolek, Adam wrote:
>
> Unfortunately this doesn't close issue.
>
> I've reached the same from IMSM compatibility point of view, but have in mind that unit test suit 11 fails using this approach.
> Put focus on test 5 in this suit (I'm using it, as this is the only one that fails for me) for IMSM (note that first test pass is for native metadata).
> Using your path tests 5b and 5c fails. My code fails 5c test only. I'm currently working on it.

I am seeing a different failing set (after applying Anna's patches): 1,
1a, 2, 4, 5b, 5c...

One thing too look out for is that failed disks will no longer self
identify. Prior to this mdadm would update the metadata on failed disks
to show "failed", now the disk must be viewed in relation to others to
determine if it is stale or not. This makes the mdadm -f case more in
line with the hot removed case.

--
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Something wrong with __prep_thunderdome in super-intel.c

am 28.03.2011 03:35:09 von NeilBrown

On Thu, 24 Mar 2011 19:40:46 -0700 Dan Williams
wrote:

>
> hmm...
>
>

:-)

>
> >
> > You can easily reproduce with
> >
> > ./mdadm -CR /dev/md/imsm -e imsm -n 2 /dev/loop[01]
> > ./mdadm -CR /dev/md/r1 -l1 -n2 /dev/md/imsm
> > ./mdadm /dev/md/r1 -f /dev/loop1
> > ./mdadm -E /dev/md/imsm
> >
> > and notice that nothing gets printed.
> > If you fail loop0 instead, it works properly.
>
> The first thought is that the generation number on the good device
> should be ahead of the bad, but lo and behold it isn't. We should stop
> advancing the generation number on failed devices. The regression that
> Krzysztof's patch somewhat addresses is that thunderdome expects that
> failed devices should at least be able to look themselves up in their
> own mpb. Fixing up the serial number so that other devices see the
> other disk as out of date is the expectation.
>
> Stopping metadata write outs to failed devices fixes both problems.
> Krzysztof care to try the following with your recent change reverted? I
> verified it addresses the problem above, but I have not had a chance to
> double check the orom compatibility (although I suspect it will be ok).
> This also simplifies the calling convention to mark_missing and
> mark_failure.

Thanks for looking at this Dan.

I tried your patch, however ...

>
> diff --git a/super-intel.c b/super-intel.c
> index 2b41e08..6a6f738 100644
> --- a/super-intel.c
> +++ b/super-intel.c
> @@ -5213,13 +5213,14 @@ static int is_resyncing(struct imsm_dev *dev)
> }
>
> /* return true if we recorded new information */
> -static int mark_failure(struct imsm_dev *dev, struct imsm_disk *disk, int idx)
> +static int mark_failure(struct imsm_dev *dev, struct dl *dl)
> {
> __u32 ord;
> int slot;
> struct imsm_map *map;
> char buf[MAX_RAID_SERIAL_LEN+3];
> - unsigned int len, shift = 0;
> + struct imsm_disk *disk = &dl->disk;
> + unsigned int len, shift = 0, idx = dl->index;
>
> /* new failures are always set in map[0] */
> map = get_imsm_map(dev, 0);
> @@ -5232,6 +5233,7 @@ static int mark_failure(struct imsm_dev *dev, struct imsm_disk *disk, int idx)
> if (is_failed(disk) && (ord & IMSM_ORD_REBUILD))
> return 0;
>
> + dl->index = -2;
> sprintf(buf, "%s:0", disk->serial);
> if ((len = strlen(buf)) >= MAX_RAID_SERIAL_LEN)
> shift = len - MAX_RAID_SERIAL_LEN + 1;
> @@ -5244,9 +5246,11 @@ static int mark_failure(struct imsm_dev *dev, struct imsm_disk *disk, int idx)
> return 1;
> }
>
> -static void mark_missing(struct imsm_dev *dev, struct imsm_disk *disk, int idx)
> +static void mark_missing(struct imsm_dev *dev, struct dl *dl)
> {
> - mark_failure(dev, disk, idx);
> + struct imsm_disk *disk = &dl->disk;
> +
> + mark_failure(dev, dl);
>
> if (disk->scsi_id == __cpu_to_le32(~(__u32)0))
> return;
> @@ -5269,7 +5273,7 @@ static void handle_missing(struct intel_super *super, struct imsm_dev *dev)
> dprintf("imsm: mark missing\n");
> end_migration(dev, map_state);
> for (dl = super->missing; dl; dl = dl->next)
> - mark_missing(dev, &dl->disk, dl->index);
> + mark_missing(dev, dl);
> super->updates_pending++;
> }
>
> @@ -5497,7 +5501,7 @@ static void imsm_set_disk(struct active_array *a, int n, int state)
> struct intel_super *super = a->container->sb;
> struct imsm_dev *dev = get_imsm_dev(super, inst);
> struct imsm_map *map = get_imsm_map(dev, 0);
> - struct imsm_disk *disk;
> + struct dl *dl;
> int failed;
> __u32 ord;
> __u8 map_state;
> @@ -5512,11 +5516,11 @@ static void imsm_set_disk(struct active_array *a, int n, int state)
> dprintf("imsm: set_disk %d:%x\n", n, state);
>
> ord = get_imsm_ord_tbl_ent(dev, n, -1);
> - disk = get_imsm_disk(super, ord_to_idx(ord));
> + dl = get_imsm_dl_disk(super, ord_to_idx(ord));

This sometimes return NULL, leading to bad stuff and mdmon crashing....

So there is more to this than meets the eye...

I'll stop trying this patch.

Thanks,

NeilBrown

>
> /* check for new failures */
> if (state & DS_FAULTY) {
> - if (mark_failure(dev, disk, ord_to_idx(ord)))
> + if (mark_failure(dev, dl))
> super->updates_pending++;
> }
>
> @@ -6265,7 +6269,7 @@ static int apply_takeover_update(struct imsm_update_takeover *u,
> for (du = super->missing; du; du = du->next)
> if (du->index >= 0) {
> set_imsm_ord_tbl_ent(map, du->index, du->index);
> - mark_missing(dev_new, &du->disk, du->index);
> + mark_missing(dev_new, du);
> }
>
> return 1;
>

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Something wrong with __prep_thunderdome in super-intel.c

am 28.03.2011 04:28:45 von NeilBrown

On Fri, 25 Mar 2011 08:43:26 +0000 "Kwolek, Adam"
wrote:

>
> Unfortunately this doesn't close issue.
>
> I've reached the same from IMSM compatibility point of view, but have in mind that unit test suit 11 fails using this approach.
> Put focus on test 5 in this suit (I'm using it, as this is the only one that fails for me) for IMSM (note that first test pass is for native metadata).
> Using your path tests 5b and 5c fails. My code fails 5c test only. I'm currently working on it.

I've got it passing all migration tests with your patch.
The '5' tests have a problem that they are trying to use both dev6 and dev7
which have the same backing device and are used for multipath testing.
I change it to use dev10 instead of dev7 and everything passes.

So I'll be releasing 3.2.1 shortly with what I've got.

Thanks,
NeilBrown

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Something wrong with __prep_thunderdome in super-intel.c

am 28.03.2011 18:56:49 von dan.j.williams

On Sun, 2011-03-27 at 18:35 -0700, NeilBrown wrote:
> On Thu, 24 Mar 2011 19:40:46 -0700 Dan Williams
> wrote:
>
> >
>
> :-)
>
[..]
> > - disk = get_imsm_disk(super, ord_to_idx(ord));
> > + dl = get_imsm_dl_disk(super, ord_to_idx(ord));
>
> This sometimes return NULL, leading to bad stuff and mdmon crashing....
>
> So there is more to this than meets the eye...

Yes, (and I chalk this up to context switch latency), setting the index
to -2 is not correct as other paths need to be able to reference a valid
disk index until the failed device is removed via a rebuild.

> I'll stop trying this patch.

Ok, here is a proposed v2 on top of the latest devel-3.2, but I need to
play with it a bit more, and figure out what the spare migration test is
complaining about.

diff --git a/super-intel.c b/super-intel.c
index 6e12af2..e2f66aa 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -3993,7 +3993,7 @@ static int write_super_imsm(struct supertype *st, int doclose)

/* write the mpb for disks that compose raid devices */
for (d = super->disks; d ; d = d->next) {
- if (d->index < 0)
+ if (d->index < 0 || is_failed(&d->disk))
continue;
if (store_imsm_mpb(d->fd, mpb))
fprintf(stderr, "%s: failed for device %d:%d %s\n",
@@ -5218,6 +5218,8 @@ static int mark_failure(struct imsm_dev *dev, struct imsm_disk *disk, int idx)
__u32 ord;
int slot;
struct imsm_map *map;
+ char buf[MAX_RAID_SERIAL_LEN+3];
+ unsigned int len, shift = 0;

/* new failures are always set in map[0] */
map = get_imsm_map(dev, 0);
@@ -5230,6 +5232,11 @@ static int mark_failure(struct imsm_dev *dev, struct imsm_disk *disk, int idx)
if (is_failed(disk) && (ord & IMSM_ORD_REBUILD))
return 0;

+ sprintf(buf, "%s:0", disk->serial);
+ if ((len = strlen(buf)) >= MAX_RAID_SERIAL_LEN)
+ shift = len - MAX_RAID_SERIAL_LEN + 1;
+ strncpy((char *)disk->serial, &buf[shift], MAX_RAID_SERIAL_LEN);
+
disk->status |= FAILED_DISK;
set_imsm_ord_tbl_ent(map, slot, idx | IMSM_ORD_REBUILD);
if (map->failed_disk_num == 0xff)

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html