[PATCH 00/16] Check-pointing for external metadata

am 13.12.2010 15:44:47 von adam.kwolek

The following series should be applied in top of my previous migration patches (devel 3.2).

Patches implements:
1. fixes for OLCE, migration, takeover : patches 0001 to 0006
2. check-pointing implementation: patches from 0007 to 0016

Next steps:
1. replace spares management using functions from auto-rebuild (Krzysztof is almost done)
2. add unit tests for submitted features (UT works for 3.1.4, they need rework for devel 3.2)

For feature description please look comments for particular patches.

---

Adam Kwolek (16):
Allow for reshape without backup file
imsm Fix: Core during rebuild on array details read
mdadm: support grow operation for external meta using checkpointing
Add mdadm->mdmon sync_max command message
mdadm: migration restart for external meta
mdadm: support backup operations for imsm
mdadm: support restore_stripes() from the given buffer
mdadm: add backup methods to superswitch
mdadm: Add IMSM migration record to intel_super
mdadm: second_map enhancement for imsm_get_map()
FIX: suspend_hi and lo set to max and then to 0
FIX: wait_backup() sometimes hangs
FIX: prepare for new spares management
FIX: remove unnecessary load_super() call
WORKAROUND: md reports idle state during reshape start - Remove
FIX: Add spares to raid0 in mdadm

Assemble.c | 10 +
Detail.c | 5
Grow.c | 462 ++++++++++++++++++++++++++++++++--
Manage.c | 1
managemon.c | 73 +++++
mdadm.h | 34 ++-
mdmon.c | 12 +
mdmon.h | 6
monitor.c | 107 ++++++++
msg.c | 33 ++
msg.h | 2
restripe.c | 49 ++--
super-intel.c | 764 ++++++++++++++++++++++++++++++++++++++++++++++++++++-----
util.c | 25 ++
14 files changed, 1448 insertions(+), 135 deletions(-)

--
Adam Kwoleka
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

[PATCH 01/16] FIX: Add spares to raid0 in mdadm

am 13.12.2010 15:44:56 von adam.kwolek

Add accidentally removed code.

Signed-off-by: Krzysztof Wojcik
---

Manage.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)

diff --git a/Manage.c b/Manage.c
index f8dcaaf..6562a59 100644
--- a/Manage.c
+++ b/Manage.c
@@ -860,6 +860,7 @@ int Manage_subdevs(char *devname, int fd,
sysfs_free(sra);
return 1;
}
+ ping_monitor(devnum2devname(devnum));
sysfs_free(sra);
close(container_fd);
} else if (ioctl(fd, ADD_NEW_DISK, &disc)) {

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

[PATCH 02/16] WORKAROUND: md reports idle state during reshape start

am 13.12.2010 15:45:04 von adam.kwolek

Remove previously inserted workaround.

Signed-off-by: Adam Kwolek
---

monitor.c | 3 +--
1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/monitor.c b/monitor.c
index 0961636..cab558c 100644
--- a/monitor.c
+++ b/monitor.c
@@ -312,8 +312,7 @@ static int read_and_act(struct active_array *a)
/* finalize reshape detection
*/
if ((a->curr_action != reshape) &&
- (a->prev_action == reshape) &&
- (a->info.reshape_progress > 2)) {
+ (a->prev_action == reshape)) {
/* set reshape_not_active
* to allow for future rebuilds
*/

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

[PATCH 03/16] FIX: remove unnecessary load_super() call

am 13.12.2010 15:45:13 von adam.kwolek

Code cleanup. Remove unnecessary load_super() call

Signed-off-by: Adam Kwolek
---

super-intel.c | 8 --------
1 files changed, 0 insertions(+), 8 deletions(-)

diff --git a/super-intel.c b/super-intel.c
index 6697257..0a27836 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -8159,14 +8159,6 @@ int imsm_manage_container_reshape(struct supertype *st, char *backup)
int array;

for (i = 0; i < mpb->num_raid_devs; i++) {
- struct intel_super *super;
-
- st->ss->load_super(st, fd, NULL);
- if (st->sb == NULL) {
- dprintf("cannot get sb\n");
- ret_val = 1;
- goto imsm_manage_container_reshape_exit;
- }
info2.devs = NULL;
super = st->sb;
super->current_vol = i;

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

[PATCH 04/16] FIX: prepare for new spares management

am 13.12.2010 15:45:22 von adam.kwolek

When grow uses new spares management from auto-rebuild
Disks list is not guaranteed to be in order.

Signed-off-by: Krzysztof Wojcik
---

super-intel.c | 3 +++
1 files changed, 3 insertions(+), 0 deletions(-)

diff --git a/super-intel.c b/super-intel.c
index 0a27836..bee28bc 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -7499,6 +7499,9 @@ static int imsm_reshape_array_manage_new_slots(struct intel_super *super,
int fd2;
int rv;

+ if (dl->index < 0)
+ continue;
+
dprintf("\tLooking at device %s (index = %i).\n",
dl->devname,
dl->index);

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

[PATCH 05/16] FIX: wait_backup() sometimes hangs

am 13.12.2010 15:45:30 von adam.kwolek

Sometimes wait_backup() meets condition on reshape finish:
array state : reshape
sync_completed : 0
this causes one more loop and hung on select() command
and there is no more changes of sync_completed.

This fix extends wait_backup()interface and for external metadata pings monitor
to speed up array state change and test already up to date state.

Other fix options:
- would be add to select() additional watching handle for 'sync_action' to detect reshape finish.
- or add break command on sync_completed == 0

Signed-off-by: Adam Kwolek
---

Grow.c | 51 ++++++++++++++++++++++++++++++++++-----------------
mdadm.h | 3 ++-
mdmon.c | 3 ++-
super-intel.c | 2 +-
4 files changed, 39 insertions(+), 20 deletions(-)

diff --git a/Grow.c b/Grow.c
index a01051f..2ca1b6f 100644
--- a/Grow.c
+++ b/Grow.c
@@ -453,11 +453,13 @@ static __u32 bsb_csum(char *buf, int len)
return __cpu_to_le32(csum);
}

-static int child_shrink(int afd, struct mdinfo *sra, unsigned long blocks,
+static int child_shrink(struct supertype *st,
+ int afd, struct mdinfo *sra, unsigned long blocks,
int *fds, unsigned long long *offsets,
int disks, int chunk, int level, int layout, int data,
int dests, int *destfd, unsigned long long *destoffsets);
-static int child_same_size(int afd, struct mdinfo *sra, unsigned long blocks,
+static int child_same_size(struct supertype *st,
+ int afd, struct mdinfo *sra, unsigned long blocks,
int *fds, unsigned long long *offsets,
unsigned long long start,
int disks, int chunk, int level, int layout, int data,
@@ -1910,17 +1912,17 @@ int Grow_reshape(char *devname, int fd, int quiet, char *backup_file,
mlockall(MCL_FUTURE);

if (odata < ndata)
- done = child_grow(fd, sra, stripes,
+ done = child_grow(st, fd, sra, stripes,
fdlist, offsets,
odisks, ochunk, array.level, olayout, odata,
d - odisks, fdlist+odisks, offsets+odisks);
else if (odata > ndata)
- done = child_shrink(fd, sra, stripes,
+ done = child_shrink(st, fd, sra, stripes,
fdlist, offsets,
odisks, ochunk, array.level, olayout, odata,
d - odisks, fdlist+odisks, offsets+odisks);
else
- done = child_same_size(fd, sra, stripes,
+ done = child_same_size(st, fd, sra, stripes,
fdlist, offsets,
0,
odisks, ochunk, array.level, olayout, odata,
@@ -2120,7 +2122,8 @@ static int grow_backup(struct mdinfo *sra,
* every works.
*/
/* FIXME return value is often ignored */
-static int wait_backup(struct mdinfo *sra,
+static int wait_backup(struct supertype *st,
+ struct mdinfo *sra,
unsigned long long offset, /* per device */
unsigned long long blocks, /* per device */
unsigned long long blocks2, /* per device - hack */
@@ -2155,6 +2158,15 @@ static int wait_backup(struct mdinfo *sra,
close(fd);
return -1;
}
+ if (st && st->ss->external) {
+ int container_dev = (st->container_dev != NoMdDev
+ ? st->container_dev : st->devnum);
+ char *container = devnum2devname(container_dev);
+ if (container) {
+ ping_monitor(container);
+ free(container);
+ }
+ }
if (sysfs_get_str(sra, NULL, "sync_action",
action, 20) > 0 &&
strncmp(action, "reshape", 7) != 0)
@@ -2281,7 +2293,7 @@ static void validate(int afd, int bfd, unsigned long long offset)
}
}

-int child_grow(int afd, struct mdinfo *sra,
+int child_grow(struct supertype *st, int afd, struct mdinfo *sra,
unsigned long stripes, int *fds, unsigned long long *offsets,
int disks, int chunk, int level, int layout, int data,
int dests, int *destfd, unsigned long long *destoffsets)
@@ -2299,7 +2311,8 @@ int child_grow(int afd, struct mdinfo *sra,
dests, destfd, destoffsets,
0, °raded, buf);
validate(afd, destfd[0], destoffsets[0]);
- wait_backup(sra, 0, stripes * (chunk / 512), stripes * (chunk / 512),
+ wait_backup(st, sra, 0, stripes * (chunk / 512),
+ stripes * (chunk / 512),
dests, destfd, destoffsets,
0);
sysfs_set_num(sra, NULL, "suspend_lo", (stripes * (chunk/512)) * data);
@@ -2309,7 +2322,8 @@ int child_grow(int afd, struct mdinfo *sra,
return 1;
}

-static int child_shrink(int afd, struct mdinfo *sra, unsigned long stripes,
+static int child_shrink(struct supertype *st,
+ int afd, struct mdinfo *sra, unsigned long stripes,
int *fds, unsigned long long *offsets,
int disks, int chunk, int level, int layout, int data,
int dests, int *destfd, unsigned long long *destoffsets)
@@ -2326,7 +2340,8 @@ static int child_shrink(int afd, struct mdinfo *sra, unsigned long stripes,
sysfs_set_str(sra, NULL, "sync_action", "reshape");
sysfs_set_num(sra, NULL, "suspend_lo", 0);
sysfs_set_num(sra, NULL, "suspend_hi", 0);
- rv = wait_backup(sra, 0, start - stripes * (chunk/512), stripes * (chunk/512),
+ rv = wait_backup(st, sra, 0, start - stripes * (chunk/512),
+ stripes * (chunk/512),
dests, destfd, destoffsets, 0);
if (rv < 0)
return 0;
@@ -2336,7 +2351,7 @@ static int child_shrink(int afd, struct mdinfo *sra, unsigned long stripes,
dests, destfd, destoffsets,
0, °raded, buf);
validate(afd, destfd[0], destoffsets[0]);
- wait_backup(sra, start, stripes*(chunk/512), 0,
+ wait_backup(st, sra, start, stripes*(chunk/512), 0,
dests, destfd, destoffsets, 0);
sysfs_set_num(sra, NULL, "suspend_lo", (stripes * (chunk/512)) * data);
free(buf);
@@ -2345,7 +2360,7 @@ static int child_shrink(int afd, struct mdinfo *sra, unsigned long stripes,
return 1;
}

-int child_same_size(int afd,
+int child_same_size(struct supertype *st, int afd,
struct mdinfo *sra, unsigned long stripes,
int *fds, unsigned long long *offsets,
unsigned long long start,
@@ -2384,7 +2399,7 @@ int child_same_size(int afd,
start += stripes * 2; /* where to read next */
size = sra->component_size / (chunk/512);
while (start < size) {
- if (wait_backup(sra, (start-stripes*2)*(chunk/512),
+ if (wait_backup(st, sra, (start-stripes*2)*(chunk/512),
stripes*(chunk/512), 0,
dests, destfd, destoffsets,
part) < 0)
@@ -2402,12 +2417,14 @@ int child_same_size(int afd,
part = 1 - part;
validate(afd, destfd[0], destoffsets[0]);
}
- if (wait_backup(sra, (start-stripes*2) * (chunk/512), stripes * (chunk/512), 0,
+ if (wait_backup(st, sra, (start-stripes*2) * (chunk/512),
+ stripes * (chunk/512), 0,
dests, destfd, destoffsets,
part) < 0)
return 0;
sysfs_set_num(sra, NULL, "suspend_lo", ((start-stripes)*(chunk/512)) * data);
- wait_backup(sra, (start-stripes) * (chunk/512), tailstripes * (chunk/512), 0,
+ wait_backup(st, sra, (start-stripes) * (chunk/512),
+ tailstripes * (chunk/512), 0,
dests, destfd, destoffsets,
1-part);
sysfs_set_num(sra, NULL, "suspend_lo", (size*(chunk/512)) * data);
@@ -2829,7 +2846,7 @@ int Grow_continue(int mdfd, struct supertype *st, struct mdinfo *info,
close(mdfd);
mlockall(MCL_FUTURE);
if (info->delta_disks < 0)
- done = child_shrink(-1, info, stripes,
+ done = child_shrink(st, -1, info, stripes,
fds, offsets,
info->array.raid_disks,
info->array.chunk_size,
@@ -2843,7 +2860,7 @@ int Grow_continue(int mdfd, struct supertype *st, struct mdinfo *info,
*/
unsigned long long start = info->reshape_progress / ndata;
start /= (info->array.chunk_size/512);
- done = child_same_size(-1, info, stripes,
+ done = child_same_size(st, -1, info, stripes,
fds, offsets,
start,
info->array.raid_disks,
diff --git a/mdadm.h b/mdadm.h
index ceffb81..1fb1cbc 100644
--- a/mdadm.h
+++ b/mdadm.h
@@ -500,7 +500,8 @@ extern int reshape_open_backup_file(char *backup,
extern unsigned long compute_backup_blocks(int nchunk, int ochunk,
unsigned int ndata, unsigned int odata);
extern struct mdinfo *sysfs_get_unused_spares(int container_fd, int fd);
-extern int child_grow(int afd, struct mdinfo *sra, unsigned long stripes,
+extern int child_grow(struct supertype *st,
+ int afd, struct mdinfo *sra, unsigned long stripes,
int *fds, unsigned long long *offsets,
int disks, int chunk, int level, int layout, int data,
int dests, int *destfd, unsigned long long *destoffsets);
diff --git a/mdmon.c b/mdmon.c
index ebadff7..85890de 100644
--- a/mdmon.c
+++ b/mdmon.c
@@ -559,7 +559,8 @@ int reshape_open_backup_file(char *backup_file,
return -1;
}

-int child_grow(int afd, struct mdinfo *sra,
+int child_grow(struct supertype *st,
+ int afd, struct mdinfo *sra,
unsigned long stripes, int *fds, unsigned long long *offsets,
int disks, int chunk, int level, int layout, int data,
int dests, int *destfd, unsigned long long *destoffsets)
diff --git a/super-intel.c b/super-intel.c
index bee28bc..0896f1d 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -7893,7 +7893,7 @@ int imsm_child_grow(struct supertype *st,
sra->new_chunk = sra->array.chunk_size;

stripes = blocks / (sra->array.chunk_size/512) / odata;
- child_grow(validate_fd, sra, stripes,
+ child_grow(st, validate_fd, sra, stripes,
fdlist, offsets,
odisks, sra->array.chunk_size,
sra->array.level, sra->array.layout, odata,

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

[PATCH 06/16] FIX: suspend_hi and lo set to max and then to 0

am 13.12.2010 15:45:40 von adam.kwolek

After reshape suspend_lo and suspend_hi should be set to start values.

Signed-off-by: Adam Kwolek
---

super-intel.c | 9 +++++++++
1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/super-intel.c b/super-intel.c
index 0896f1d..1f96cbc 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -7750,6 +7750,7 @@ int imsm_grow_manage_size(struct supertype *st,
unsigned long long size;
int container_fd;
unsigned long long current_size = 0;
+ unsigned long long suspend_value;

/* finalize current volume reshape
* for external meta size has to be managed by mdadm
@@ -7793,6 +7794,14 @@ int imsm_grow_manage_size(struct supertype *st,
"set size to %llu\n",
current_size, size);
sysfs_set_num(sra, NULL, "array_size", size);
+ /* manage suspend_* entries
+ * set suspend_lo to suspend_hi value
+ * and then push both to 0
+ */
+ sysfs_get_ll(sra, NULL, "suspend_hi", &suspend_value);
+ sysfs_set_num(sra, NULL, "suspend_lo", suspend_value);
+ sysfs_set_num(sra, NULL, "suspend_hi", 0);
+ sysfs_set_num(sra, NULL, "suspend_lo", 0);

ret_val = 1;

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

[PATCH 07/16] mdadm: second_map enhancement for imsm_get_map()

am 13.12.2010 15:45:49 von adam.kwolek

Allow map related operations for the given map: first of second.
For reshape specific functionality it is required to have an access

Until now, the active map was chosen according to the current volume status.

Signed-off-by: Maciej Trela
Signed-off-by: Adam Kwolek
---

super-intel.c | 78 +++++++++++++++++++++++++++++++++------------------------
1 files changed, 45 insertions(+), 33 deletions(-)

diff --git a/super-intel.c b/super-intel.c
index 1f96cbc..5971991 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -546,23 +546,35 @@ static struct imsm_dev *get_imsm_dev(struct intel_super *super, __u8 index)
return NULL;
}

-static __u32 get_imsm_ord_tbl_ent(struct imsm_dev *dev, int slot)
+/*
+ * for second_map:
+ * == 0 get first map
+ * == 1 get second map
+ * == -1 than get map according to the current migr_state
+ */
+static __u32 get_imsm_ord_tbl_ent(struct imsm_dev *dev,
+ int slot,
+ int second_map)
{
struct imsm_map *map;

- if (dev->vol.migr_state)
- map = get_imsm_map(dev, 1);
- else
- map = get_imsm_map(dev, 0);
+ if (second_map == -1) {
+ if (dev->vol.migr_state)
+ map = get_imsm_map(dev, 1);
+ else
+ map = get_imsm_map(dev, 0);
+ } else {
+ map = get_imsm_map(dev, second_map);
+ }

/* top byte identifies disk under rebuild */
return __le32_to_cpu(map->disk_ord_tbl[slot]);
}

#define ord_to_idx(ord) (((ord) << 8) >> 8)
-static __u32 get_imsm_disk_idx(struct imsm_dev *dev, int slot)
+static __u32 get_imsm_disk_idx(struct imsm_dev *dev, int slot, int second_map)
{
- __u32 ord = get_imsm_ord_tbl_ent(dev, slot);
+ __u32 ord = get_imsm_ord_tbl_ent(dev, slot, second_map);

return ord_to_idx(ord);
}
@@ -770,13 +782,13 @@ static void print_imsm_dev(struct imsm_dev *dev, char *uuid, int disk_idx)
printf(" Members : %d\n", map->num_members);
printf(" Slots : [");
for (i = 0; i < map->num_members; i++) {
- ord = get_imsm_ord_tbl_ent(dev, i);
+ ord = get_imsm_ord_tbl_ent(dev, i, -1);
printf("%s", ord & IMSM_ORD_REBUILD ? "_" : "U");
}
printf("]\n");
slot = get_imsm_disk_slot(map, disk_idx);
if (slot >= 0) {
- ord = get_imsm_ord_tbl_ent(dev, slot);
+ ord = get_imsm_ord_tbl_ent(dev, slot, -1);
printf(" This Slot : %d%s\n", slot,
ord & IMSM_ORD_REBUILD ? " (out-of-sync)" : "");
} else
@@ -1414,12 +1426,12 @@ static __u32 num_stripes_per_unit_rebuild(struct imsm_dev *dev)
return num_stripes_per_unit_resync(dev);
}

-static __u8 imsm_num_data_members(struct imsm_dev *dev)
+static __u8 imsm_num_data_members(struct imsm_dev *dev, int second_map)
{
/* named 'imsm_' because raid0, raid1 and raid10
* counter-intuitively have the same number of data disks
*/
- struct imsm_map *map = get_imsm_map(dev, 0);
+ struct imsm_map *map = get_imsm_map(dev, second_map);

switch (get_imsm_raid_level(map)) {
case 0:
@@ -1502,7 +1514,7 @@ static __u64 blocks_per_migr_unit(struct imsm_dev *dev)
*/
stripes_per_unit = num_stripes_per_unit_resync(dev);
migr_chunk = migr_strip_blocks_resync(dev);
- disks = imsm_num_data_members(dev);
+ disks = imsm_num_data_members(dev, 0);
blocks_per_unit = stripes_per_unit * migr_chunk * disks;
stripe = __le32_to_cpu(map->blocks_per_strip) * disks;
segment = blocks_per_unit / stripe;
@@ -1635,7 +1647,7 @@ static void getinfo_super_imsm_volume(struct supertype *st, struct mdinfo *info,
dmap[i] = 0;
if (i < info->array.raid_disks) {
struct imsm_disk *dsk;
- j = get_imsm_disk_idx(dev, i);
+ j = get_imsm_disk_idx(dev, i, -1);
dsk = get_imsm_disk(super, j);
if (dsk && (dsk->status & CONFIGURED_DISK))
dmap[i] = 1;
@@ -1744,7 +1756,7 @@ static void getinfo_super_imsm(struct supertype *st, struct mdinfo *info, char *
* (catches single-degraded vs double-degraded)
*/
for (j = 0; j < map->num_members; j++) {
- __u32 ord = get_imsm_ord_tbl_ent(dev, i);
+ __u32 ord = get_imsm_ord_tbl_ent(dev, i, -1);
__u32 idx = ord_to_idx(ord);

if (!(ord & IMSM_ORD_REBUILD) &&
@@ -3419,7 +3431,7 @@ static int add_to_super_imsm_volume(struct supertype *st, mdu_disk_info_t *dk,
/* Check the device has not already been added */
slot = get_imsm_disk_slot(map, dl->index);
if (slot >= 0 &&
- (get_imsm_ord_tbl_ent(dev, slot) & IMSM_ORD_REBUILD) == 0) {
+ (get_imsm_ord_tbl_ent(dev, slot, -1) & IMSM_ORD_REBUILD) == 0) {
fprintf(stderr, Name ": %s has been included in this array twice\n",
devname);
return 1;
@@ -3675,7 +3687,7 @@ static int create_array(struct supertype *st, int dev_idx)
imsm_copy_dev(&u->dev, dev);
inf = get_disk_info(u);
for (i = 0; i < map->num_members; i++) {
- int idx = get_imsm_disk_idx(dev, i);
+ int idx = get_imsm_disk_idx(dev, i, -1);

disk = get_imsm_disk(super, idx);
serialcpy(inf[i].serial, disk->serial);
@@ -4562,8 +4574,8 @@ static struct mdinfo *container_content_imsm(struct supertype *st, char *subarra
__u32 ord;

skip = 0;
- idx = get_imsm_disk_idx(dev, slot);
- ord = get_imsm_ord_tbl_ent(dev, slot);
+ idx = get_imsm_disk_idx(dev, slot, 0);
+ ord = get_imsm_ord_tbl_ent(dev, slot, 0);
for (d = super->disks; d ; d = d->next)
if (d->index == idx)
break;
@@ -4658,7 +4670,7 @@ static __u8 imsm_check_degraded(struct intel_super *super, struct imsm_dev *dev,
int insync = insync;

for (i = 0; i < map->num_members; i++) {
- __u32 ord = get_imsm_ord_tbl_ent(dev, i);
+ __u32 ord = get_imsm_ord_tbl_ent(dev, i, -1);
int idx = ord_to_idx(ord);
struct imsm_disk *disk;

@@ -4943,7 +4955,7 @@ static void imsm_set_disk(struct active_array *a, int n, int state)

dprintf("imsm: set_disk %d:%x\n", n, state);

- ord = get_imsm_ord_tbl_ent(dev, n);
+ ord = get_imsm_ord_tbl_ent(dev, n, -1);
disk = get_imsm_disk(super, ord_to_idx(ord));

/* check for new failures */
@@ -5050,7 +5062,7 @@ static void imsm_sync_metadata(struct supertype *container)
static struct dl *imsm_readd(struct intel_super *super, int idx, struct active_array *a)
{
struct imsm_dev *dev = get_imsm_dev(super, a->info.container_member);
- int i = get_imsm_disk_idx(dev, idx);
+ int i = get_imsm_disk_idx(dev, idx, -1);
struct dl *dl;

for (dl = super->disks; dl; dl = dl->next)
@@ -5071,7 +5083,7 @@ static struct dl *imsm_add_spare(struct intel_super *super, int slot,
struct mdinfo *additional_test_list)
{
struct imsm_dev *dev = get_imsm_dev(super, a->info.container_member);
- int idx = get_imsm_disk_idx(dev, slot);
+ int idx = get_imsm_disk_idx(dev, slot, -1);
struct imsm_super *mpb = super->anchor;
struct imsm_map *map;
unsigned long long pos;
@@ -5338,7 +5350,7 @@ static int disks_overlap(struct intel_super *super, int idx, struct imsm_update_
int j;

for (i = 0; i < map->num_members; i++) {
- disk = get_imsm_disk(super, get_imsm_disk_idx(dev, i));
+ disk = get_imsm_disk(super, get_imsm_disk_idx(dev, i, -1));
for (j = 0; j < new_map->num_members; j++)
if (serialcmp(disk->serial, inf[j].serial) == 0)
return 1;
@@ -5619,7 +5631,7 @@ update_reshape_exit:
end_migration(dev, map_1->map_state);
/* array size rollback
*/
- used_disks = imsm_num_data_members(dev);
+ used_disks = imsm_num_data_members(dev, 0);
if (used_disks) {
array_blocks = map_1->blocks_per_member * used_disks;
/* round array size down to closest MB
@@ -5753,7 +5765,7 @@ update_reshape_exit:
struct dl *dl;
unsigned int found;
int failed;
- int victim = get_imsm_disk_idx(dev, u->slot);
+ int victim = get_imsm_disk_idx(dev, u->slot, -1);
int i;

for (dl = super->disks; dl; dl = dl->next)
@@ -5776,7 +5788,8 @@ update_reshape_exit:
for (i = 0; i < map->num_members; i++) {
if (i == u->slot)
continue;
- disk = get_imsm_disk(super, get_imsm_disk_idx(dev, i));
+ disk = get_imsm_disk(super,
+ get_imsm_disk_idx(dev, i, -1));
if (!disk || is_failed(disk))
failed++;
}
@@ -6251,7 +6264,7 @@ static void imsm_delete(struct intel_super *super, struct dl **dlp, unsigned ind
/* update ord entries being careful not to propagate
* ord-flags to the first map
*/
- ord = get_imsm_ord_tbl_ent(dev, j);
+ ord = get_imsm_ord_tbl_ent(dev, j, -1);

if (ord_to_idx(ord) <= index)
continue;
@@ -6427,11 +6440,11 @@ static int update_level_imsm(struct supertype *st, struct mdinfo *info,
idx = -1;
for (newdi = info->devs; newdi; newdi = newdi->next) {
if ((dl->major != newdi->disk.major) ||
- (dl->minor != newdi->disk.minor) ||
- (newdi->disk.raid_disk < 0))
+ (dl->minor != newdi->disk.minor) ||
+ (newdi->disk.raid_disk < 0))
continue;
slot = get_imsm_disk_slot(map, dl->index);
- idx = get_imsm_ord_tbl_ent(dev_new, slot);
+ idx = get_imsm_ord_tbl_ent(dev_new, slot, 0);
tmp_ord_tbl[newdi->disk.raid_disk] = idx;
break;
}
@@ -6606,7 +6619,7 @@ int imsm_reshape_is_allowed_on_container(struct supertype *st,
ret_val = 0;
break;
}
- used_disks = imsm_num_data_members(dev);
+ used_disks = imsm_num_data_members(dev, 0);
dprintf("read raid_disks = %i\n", used_disks);
dprintf("read requested disks = %i\n", geo->raid_disks);
array_blocks = map->blocks_per_member * used_disks;
@@ -7140,8 +7153,7 @@ calculate_size_only:
/* calculate new size
*/
if (new_map != NULL) {
-
- used_disks = imsm_num_data_members(upd_devs);
+ used_disks = imsm_num_data_members(upd_devs, 0);
if (used_disks) {
array_blocks = new_map->blocks_per_member * used_disks;
/* round array size down to closest MB

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

[PATCH 08/16] mdadm: Add IMSM migration record to intel_super

am 13.12.2010 15:45:57 von adam.kwolek

Add support for IMSM migration record structure.
IMSM migration record is stored on the first two disks of IMSM volume during the migration.

Add function for reading/writing migration record - they will be used by the next checkpointing patches.
Clear migration record every time MIGR_GEN_MIGR is started.

Signed-off-by: Maciej Trela
Signed-off-by: Adam Kwolek
---

super-intel.c | 172 +++++++++++++++++++++++++++++++++++++++++++++++++++++++--
1 files changed, 167 insertions(+), 5 deletions(-)

diff --git a/super-intel.c b/super-intel.c
index 5971991..82371d5 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -195,6 +195,37 @@ struct bbm_log {
static char *map_state_str[] = { "normal", "uninitialized", "degraded", "failed" };
#endif

+#define UNIT_SRC_NORMAL 0 /* Source data for curr_migr_unit must
+ * be recovered using srcMap */
+#define UNIT_SRC_IN_CP_AREA 1 /* Source data for curr_migr_unit has
+ * already been migrated and must
+ * be recovered from checkpoint area */
+struct migr_record {
+ __u32 rec_status; /* Status used to determine how to restart
+ * migration in case it aborts
+ * in some fashion */
+ __u32 curr_migr_unit; /* 0..numMigrUnits-1 */
+ __u32 family_num; /* Family number of MPB
+ * containing the RaidDev
+ * that is migrating */
+ __u32 ascending_migr; /* True if migrating in increasing
+ * order of lbas */
+ __u32 blocks_per_unit; /* Num disk blocks per unit of operation */
+ __u32 dest_depth_per_unit; /* Num member blocks each destMap
+ * member disk
+ * advances per unit-of-operation */
+ __u32 ckpt_area_pba; /* Pba of first block of ckpt copy area */
+ __u32 dest_1st_member_lba; /* First member lba on first
+ * stripe of destination */
+ __u32 num_migr_units; /* Total num migration units-of-op */
+ __u32 post_migr_vol_cap; /* Size of volume after
+ * migration completes */
+ __u32 post_migr_vol_cap_hi; /* Expansion space for LBA64 */
+ __u32 ckpt_read_disk_num; /* Which member disk in destSubMap[0] the
+ * migration ckpt record was read from
+ * (for recovered migrations) */
+} __attribute__ ((__packed__));
+
static __u8 migr_type(struct imsm_dev *dev)
{
if (dev->vol.migr_type == MIGR_VERIFY &&
@@ -240,6 +271,10 @@ struct intel_super {
void *buf; /* O_DIRECT buffer for reading/writing metadata */
struct imsm_super *anchor; /* immovable parameters */
};
+ union {
+ void *migr_rec_buf; /* buffer for I/O operations */
+ struct migr_record *migr_rec; /* migration record */
+ };
size_t len; /* size of the 'buf' allocation */
void *next_buf; /* for realloc'ing buf from the manager */
size_t next_len;
@@ -1553,6 +1588,104 @@ static int imsm_level_to_layout(int level)
return UnSet;
}

+/*
+ * load_imsm_migr_rec - read imsm migration record
+ */
+__attribute__((unused))
+static int load_imsm_migr_rec(struct intel_super *super, struct mdinfo *info)
+{
+ unsigned long long dsize;
+ struct mdinfo *sd;
+ struct dl *dl;
+ char nm[30];
+ int retval = -1;
+ int fd = -1;
+
+ for (sd = info->devs ; sd ; sd = sd->next) {
+ /* read only from one of the first two slots */
+ if (sd->disk.raid_disk > 1)
+ continue;
+ sprintf(nm, "%d:%d", sd->disk.major, sd->disk.minor);
+ fd = dev_open(nm, O_RDONLY);
+ if (fd >= 0)
+ break;
+ }
+ if (fd < 0) {
+ for (dl = super->disks; dl; dl = dl->next) {
+ /* read only from one of the first two slots */
+ if (dl->index > 1)
+ continue;
+ sprintf(nm, "%d:%d", dl->major, dl->minor);
+ fd = dev_open(nm, O_RDONLY);
+ if (fd >= 0)
+ break;
+ }
+ }
+ if (fd < 0)
+ goto out;
+ get_dev_size(fd, NULL, &dsize);
+ if (lseek64(fd, dsize - 512, SEEK_SET) < 0) {
+ fprintf(stderr,
+ Name ": Cannot seek to anchor block: %s\n",
+ strerror(errno));
+ goto out;
+ }
+ if (read(fd, super->migr_rec_buf, 512) != 512) {
+ fprintf(stderr,
+ Name ": Cannot read migr record block: %s\n",
+ strerror(errno));
+ goto out;
+ }
+ retval = 0;
+ out:
+ if (fd >= 0)
+ close(fd);
+ return retval;
+}
+
+/*
+ * write_imsm_migr_rec - write imsm migration record
+ */
+__attribute__((unused))
+static int write_imsm_migr_rec(struct intel_super *super, struct mdinfo *info)
+{
+ unsigned long long dsize;
+ struct mdinfo *sd;
+ char nm[30];
+ int fd = -1;
+ int retval = -1;
+
+ for (sd = info->devs ; sd ; sd = sd->next) {
+ /* read only from one of the first two slots */
+ if (sd->disk.raid_disk > 1)
+ continue;
+ sprintf(nm, "%d:%d", sd->disk.major, sd->disk.minor);
+ fd = dev_open(nm, O_RDWR);
+ if (fd < 0)
+ continue;
+ get_dev_size(fd, NULL, &dsize);
+ if (lseek64(fd, dsize - 512, SEEK_SET) < 0) {
+ fprintf(stderr,
+ Name ": Cannot seek to anchor block: %s\n",
+ strerror(errno));
+ goto out;
+ }
+ if (write(fd, super->migr_rec_buf, 512) != 512) {
+ fprintf(stderr,
+ Name ": Cannot write migr record block: %s\n",
+ strerror(errno));
+ goto out;
+ }
+ close(fd);
+ fd = -1;
+ }
+ retval = 0;
+ out:
+ if (fd >= 0)
+ close(fd);
+ return retval;
+}
+
static void getinfo_super_imsm_volume(struct supertype *st, struct mdinfo *info, char *dmap)
{
struct intel_super *super = st->sb;
@@ -2265,8 +2398,11 @@ load_imsm_disk(int fd, struct intel_super *super, char *devname, int keep_fd)
* map1state=normal)
* 4/ Rebuild (migr_state=1 migr_type=MIGR_REBUILD map0state=normal
* map1state=degraded)
+ * 5/ Migration (mig_state=1 migr_type=MIGR_GEN_MIGR map0state=normal
+ * map1state=normal)
*/
-static void migrate(struct imsm_dev *dev, __u8 to_state, int migr_type)
+static void migrate(struct imsm_dev *dev, struct intel_super *super,
+ __u8 to_state, int migr_type)
{
struct imsm_map *dest;
struct imsm_map *src = get_imsm_map(dev, 0);
@@ -2289,6 +2425,10 @@ static void migrate(struct imsm_dev *dev, __u8 to_state, int migr_type)
}
}

+ if (migr_type == MIGR_GEN_MIGR)
+ /* Clear migration record */
+ memset(super->migr_rec, 0, sizeof(struct migr_record));
+
src->map_state = to_state;
}

@@ -2454,6 +2594,14 @@ static int load_imsm_mpb(int fd, struct intel_super *super, char *devname)

sectors = mpb_sectors(anchor) - 1;
free(anchor);
+
+ if (posix_memalign(&super->migr_rec_buf, 512, 512) != 0) {
+ fprintf(stderr, Name
+ ": %s could not allocate migr_rec buffer\n", __func__);
+ free(super->buf);
+ return 2;
+ }
+
if (!sectors) {
check_sum = __gen_imsm_checksum(super->anchor);
if (check_sum != __le32_to_cpu(super->anchor->check_sum)) {
@@ -2556,6 +2704,10 @@ static void __free_imsm(struct intel_super *super, int free_disks)
free(super->buf);
super->buf = NULL;
}
+ if (super->migr_rec_buf) {
+ free(super->migr_rec_buf);
+ super->migr_rec_buf = NULL;
+ }
if (free_disks)
free_imsm_disks(super);
free_devlist(super);
@@ -3364,6 +3516,13 @@ static int init_super_imsm(struct supertype *st, mdu_array_info_t *info,
": %s could not allocate superblock\n", __func__);
return 0;
}
+ if (posix_memalign(&super->migr_rec_buf, 512, 512) != 0) {
+ fprintf(stderr, Name
+ ": %s could not allocate migr_rec buffer\n", __func__);
+ free(super->buf);
+ free(super);
+ return 0;
+ }
memset(super->buf, 0, mpb_size);
mpb = super->buf;
mpb->mpb_size = __cpu_to_le32(mpb_size);
@@ -4887,9 +5046,9 @@ static int imsm_set_array_state(struct active_array *a, int consistent)
/* mark the start of the init process if nothing is failed */
dprintf("imsm: mark resync start\n");
if (map->map_state == IMSM_T_STATE_UNINITIALIZED)
- migrate(dev, IMSM_T_STATE_NORMAL, MIGR_INIT);
+ migrate(dev, super, IMSM_T_STATE_NORMAL, MIGR_INIT);
else
- migrate(dev, IMSM_T_STATE_NORMAL, MIGR_REPAIR);
+ migrate(dev, super, IMSM_T_STATE_NORMAL, MIGR_REPAIR);
super->updates_pending++;
}

@@ -5508,6 +5667,9 @@ static void imsm_process_update(struct supertype *st,
}
a->reshape_chunk_size = u->reshape_chunk_size;

+ /* Clear migration record */
+ memset(super->migr_rec, 0, sizeof(struct migr_record));
+
super->updates_pending++;
update_reshape_exit:
if (u->devs_mem.dev)
@@ -5806,7 +5968,7 @@ update_reshape_exit:
/* mark rebuild */
to_state = imsm_check_degraded(super, dev, failed);
map->map_state = IMSM_T_STATE_DEGRADED;
- migrate(dev, to_state, MIGR_REBUILD);
+ migrate(dev, super, to_state, MIGR_REBUILD);
migr_map = get_imsm_map(dev, 1);
set_imsm_ord_tbl_ent(map, u->slot, dl->index);
set_imsm_ord_tbl_ent(migr_map, u->slot, dl->index | IMSM_ORD_REBUILD);
@@ -7030,7 +7192,7 @@ struct imsm_update_reshape *imsm_create_metadata_update_for_reshape(
*/

to_state = imsm_check_degraded(super, old_dev, 0);
- migrate(upd_devs, to_state, MIGR_GEN_MIGR);
+ migrate(upd_devs, super, to_state, MIGR_GEN_MIGR);
/* second map length is equal to first map
* correct second map length to old value
*/

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

[PATCH 09/16] mdadm: add backup methods to superswitch

am 13.12.2010 15:46:08 von adam.kwolek

Add new methods to the superswitch for external metadata supporting its own critical reshape data backup mechanism.

The new methods are:
save_backup - save critical data to backup area discard_backup - critical data was successfully migrated, so
the current backup may be discarded recover_backup - recover critical data after reshape crashed
during array assembly

Signed-off-by: Maciej Trela
Signed-off-by: Adam Kwolek
---

mdadm.h | 9 +++++++++
1 files changed, 9 insertions(+), 0 deletions(-)

diff --git a/mdadm.h b/mdadm.h
index 1fb1cbc..8229b66 100644
--- a/mdadm.h
+++ b/mdadm.h
@@ -785,6 +785,15 @@ extern struct superswitch {
enum state_of_reshape request_type,
struct metadata_update **updates);

+ /* for external backup area
+ *
+ */
+ int (*save_backup)(struct supertype *st, struct mdinfo *info,
+ void *buf, unsigned long write_offset, int length);
+ void (*discard_backup)(struct supertype *st, struct mdinfo *info);
+ int (*recover_backup)(struct supertype *st, struct mdinfo *info,
+ void *ptr, int length);
+
int swapuuid; /* true if uuid is bigending rather than hostendian */
int external;
const char *name; /* canonical metadata name */

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

[PATCH 10/16] mdadm: support restore_stripes() from the given buffer

am 13.12.2010 15:46:16 von adam.kwolek

Currently restore_stripes() function is able to restore data only from the given backup file handles and it is used only for assembling partially reshaped arrays.
As this function will be very helpful for external metadata backup mechanism, add the support for restoring data from the given source buffer.

Signed-off-by: Maciej Trela
Signed-off-by: Adam Kwolek
---

Grow.c | 4 ++--
mdadm.h | 3 ++-
restripe.c | 49 +++++++++++++++++++++++++++++++------------------
super-intel.c | 2 +-
4 files changed, 36 insertions(+), 22 deletions(-)

diff --git a/Grow.c b/Grow.c
index 2ca1b6f..9fbdd0e 100644
--- a/Grow.c
+++ b/Grow.c
@@ -2621,7 +2621,7 @@ int Grow_restart(struct supertype *st, struct mdinfo *info, int *fdlist, int cnt
info->new_layout,
fd, __le64_to_cpu(bsb.devstart)*512,
__le64_to_cpu(bsb.arraystart)*512,
- __le64_to_cpu(bsb.length)*512)) {
+ __le64_to_cpu(bsb.length)*512, NULL)) {
/* didn't succeed, so giveup */
if (verbose)
fprintf(stderr, Name ": Error restoring backup from %s\n",
@@ -2638,7 +2638,7 @@ int Grow_restart(struct supertype *st, struct mdinfo *info, int *fdlist, int cnt
fd, __le64_to_cpu(bsb.devstart)*512 +
__le64_to_cpu(bsb.devstart2)*512,
__le64_to_cpu(bsb.arraystart2)*512,
- __le64_to_cpu(bsb.length2)*512)) {
+ __le64_to_cpu(bsb.length2)*512, NULL)) {
/* didn't succeed, so giveup */
if (verbose)
fprintf(stderr, Name ": Error restoring second backup from %s\n",
diff --git a/mdadm.h b/mdadm.h
index 8229b66..45c6b3c 100644
--- a/mdadm.h
+++ b/mdadm.h
@@ -515,7 +515,8 @@ extern int save_stripes(int *source, unsigned long long *offsets,
extern int restore_stripes(int *dest, unsigned long long *offsets,
int raid_disks, int chunk_size, int level, int layout,
int source, unsigned long long read_offset,
- unsigned long long start, unsigned long long length);
+ unsigned long long start, unsigned long long length,
+ char *src_buf);

#ifndef Sendmail
#define Sendmail "/usr/lib/sendmail -t"
diff --git a/restripe.c b/restripe.c
index d33dbba..066eafd 100644
--- a/restripe.c
+++ b/restripe.c
@@ -536,11 +536,10 @@ int save_stripes(int *source, unsigned long long *offsets,
fdisk[0], fdisk[1], bufs);
}
}
-
- for (i=0; i - if (write(dest[i], buf, len) != len)
- return -1;
-
+ if (dest)
+ for (i = 0; i < nwrites; i++)
+ if (write(dest[i], buf, len) != len)
+ return -1;
length -= len;
start += len;
}
@@ -561,7 +560,8 @@ int save_stripes(int *source, unsigned long long *offsets,
int restore_stripes(int *dest, unsigned long long *offsets,
int raid_disks, int chunk_size, int level, int layout,
int source, unsigned long long read_offset,
- unsigned long long start, unsigned long long length)
+ unsigned long long start, unsigned long long length,
+ char *src_buf)
{
char *stripe_buf;
char **stripes = malloc(raid_disks * sizeof(char*));
@@ -579,13 +579,17 @@ int restore_stripes(int *dest, unsigned long long *offsets,
}
if (stripe_buf == NULL || stripes == NULL || blocks == NULL
|| zero == NULL) {
- free(stripe_buf);
- free(stripes);
- free(blocks);
- free(zero);
+ if (stripe_buf != NULL)
+ free(stripe_buf);
+ if (stripes != NULL)
+ free(stripes);
+ if (blocks != NULL)
+ free(blocks);
+ if (zero != NULL)
+ free(zero);
return -2;
}
- for (i=0; i + for (i = 0; i < raid_disks; i++)
stripes[i] = stripe_buf + i * chunk_size;
while (length > 0) {
unsigned int len = data_disks * chunk_size;
@@ -594,15 +598,24 @@ int restore_stripes(int *dest, unsigned long long *offsets,
int syndrome_disks;
if (length < len)
return -3;
- for (i=0; i < data_disks; i++) {
+ for (i = 0; i < data_disks; i++) {
int disk = geo_map(i, start/chunk_size/data_disks,
raid_disks, level, layout);
- if ((unsigned long long)lseek64(source, read_offset, 0)
- != read_offset)
- return -1;
- if (read(source, stripes[disk],
- chunk_size) != chunk_size)
- return -1;
+ if (src_buf == NULL) {
+ /* read from file */
+ if (lseek64(source,
+ read_offset, 0) != (off64_t)read_offset)
+ return -1;
+ if (read(source,
+ stripes[disk],
+ chunk_size) != chunk_size)
+ return -1;
+ } else {
+ /* read from input buffer */
+ memcpy(stripes[disk],
+ src_buf + read_offset,
+ chunk_size);
+ }
read_offset += chunk_size;
}
/* We have the data, now do the parity */
diff --git a/super-intel.c b/super-intel.c
index 82371d5..349e583 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -1656,7 +1656,7 @@ static int write_imsm_migr_rec(struct intel_super *super, struct mdinfo *info)
int retval = -1;

for (sd = info->devs ; sd ; sd = sd->next) {
- /* read only from one of the first two slots */
+ /* write only to the first two slots */
if (sd->disk.raid_disk > 1)
continue;
sprintf(nm, "%d:%d", sd->disk.major, sd->disk.minor);

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

[PATCH 11/16] mdadm: support backup operations for imsm

am 13.12.2010 15:46:25 von adam.kwolek

Add support for the following operations:
save_backup() - save critical data stripes to Migration Copy Area and
update the current migration unit status.
Use restore_stripes() to form a destination stripe,
and to write it to the Copy Area.
save_backup() initialize migration record at the
beginning of the reshape.

discard_backup() - critical data was successfully migrated by the kernel.
Update the current unit status in the migration record.

recover_backup() - recover critical data from the Migration Copy Area
while assembling an array.

Signed-off-by: Maciej Trela
Signed-off-by: Adam Kwolek
---

mdmon.c | 9 ++
super-intel.c | 267 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++
2 files changed, 274 insertions(+), 2 deletions(-)

diff --git a/mdmon.c b/mdmon.c
index 85890de..8afb4de 100644
--- a/mdmon.c
+++ b/mdmon.c
@@ -575,3 +575,12 @@ void reshape_free_fdlist(int *fdlist,
;
}

+int restore_stripes(int *dest, unsigned long long *offsets,
+ int raid_disks, int chunk_size, int level, int layout,
+ int source, unsigned long long read_offset,
+ unsigned long long start, unsigned long long length,
+ char *src_buf)
+{
+ return 1;
+}
+
diff --git a/super-intel.c b/super-intel.c
index 349e583..b328828 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -1591,7 +1591,6 @@ static int imsm_level_to_layout(int level)
/*
* load_imsm_migr_rec - read imsm migration record
*/
-__attribute__((unused))
static int load_imsm_migr_rec(struct intel_super *super, struct mdinfo *info)
{
unsigned long long dsize;
@@ -1646,7 +1645,6 @@ static int load_imsm_migr_rec(struct intel_super *super, struct mdinfo *info)
/*
* write_imsm_migr_rec - write imsm migration record
*/
-__attribute__((unused))
static int write_imsm_migr_rec(struct intel_super *super, struct mdinfo *info)
{
unsigned long long dsize;
@@ -6448,6 +6446,264 @@ static void imsm_delete(struct intel_super *super, struct dl **dlp, unsigned ind
__free_imsm_disk(dl);
}
}
+
+int open_backup_targets(struct mdinfo *info, int raid_disks, int *raid_fds)
+{
+ struct mdinfo *sd;
+
+ for (sd = info->devs ; sd ; sd = sd->next) {
+ if (sd->disk.state & (1< + dprintf("disk is faulty!!\n");
+ continue;
+ }
+
+ if ((sd->disk.raid_disk >= raid_disks) ||
+ (sd->disk.raid_disk < 0)) {
+ raid_fds[sd->disk.raid_disk] = -1;
+ continue;
+ }
+ char *dn = map_dev(sd->disk.major,
+ sd->disk.minor, 1);
+ raid_fds[sd->disk.raid_disk] = dev_open(dn, O_RDWR);
+ if (raid_fds[sd->disk.raid_disk] < 0) {
+ fprintf(stderr, "cannot open component\n");
+ return -1;
+ }
+ }
+ return 0;
+}
+
+#define RAID_DISK_RESERVED_BLOCKS_IMSM_HI 417
+
+void init_migr_record_imsm(struct intel_super *super, struct mdinfo *info,
+ unsigned blocks_per_unit)
+{
+ struct migr_record *migr_rec = super->migr_rec;
+ int new_data_disks, prev_data_disks;
+ long long unsigned new_array_sectors;
+ int prev_stripe_sectors, new_stripe_sectors;
+ unsigned long long dsize, dev_sectors;
+ long long unsigned min_dev_sectors = -1LLU;
+ struct mdinfo *sd;
+ char nm[30];
+ int fd;
+
+ memset(migr_rec, 0, sizeof(struct migr_record));
+ migr_rec->family_num = __cpu_to_le32(super->anchor->family_num);
+
+ migr_rec->ascending_migr =
+ __cpu_to_le32((info->delta_disks > 0) ? 1 : 0);
+
+ prev_data_disks = info->array.raid_disks;
+ if ((info->array.level == 5) || (info->array.level == 4))
+ prev_data_disks--;
+ new_data_disks = info->array.raid_disks + info->delta_disks;
+ if ((info->new_level == 5) || (info->new_level == 4))
+ new_data_disks--;
+
+ new_array_sectors = info->component_size;
+ new_array_sectors &= ~(unsigned long long)((info->new_chunk / 512) - 1);
+ new_array_sectors *= new_data_disks;
+ new_array_sectors = (new_array_sectors >> SECT_PER_MB_SHIFT)
+ << SECT_PER_MB_SHIFT;
+
+ migr_rec->post_migr_vol_cap = __cpu_to_le32(new_array_sectors);
+ migr_rec->post_migr_vol_cap_hi = __cpu_to_le32(new_array_sectors >> 32);
+
+ prev_stripe_sectors = info->array.chunk_size/512 * prev_data_disks;
+ new_stripe_sectors = info->new_chunk/512 * new_data_disks;
+
+ new_array_sectors =
+ info->component_size * new_data_disks / blocks_per_unit;
+ migr_rec->num_migr_units = __cpu_to_le32(new_array_sectors);
+ migr_rec->dest_depth_per_unit =
+ __cpu_to_le32(blocks_per_unit / new_data_disks);
+ migr_rec->blocks_per_unit = __cpu_to_le32(blocks_per_unit);
+
+ /* Find the smallest dev */
+ for (sd = info->devs ; sd ; sd = sd->next) {
+ sprintf(nm, "%d:%d", sd->disk.major, sd->disk.minor);
+ fd = dev_open(nm, O_RDONLY);
+ if (fd < 0)
+ continue;
+ get_dev_size(fd, NULL, &dsize);
+ dev_sectors = dsize / 512;
+ if (dev_sectors < min_dev_sectors)
+ min_dev_sectors = dev_sectors;
+ close(fd);
+ }
+ migr_rec->ckpt_area_pba = __cpu_to_le32(min_dev_sectors -
+ RAID_DISK_RESERVED_BLOCKS_IMSM_HI);
+ return;
+}
+
+int save_backup_imsm(struct supertype *st, struct mdinfo *info,
+ void *buf, unsigned long write_offset,
+ int length)
+{
+ int rv = -1;
+ struct intel_super *super = st->sb;
+ unsigned long long *target_offsets = NULL;
+ int *targets = NULL;
+ int new_disks, new_odata;
+ int i;
+
+ if (info->reshape_progress == 0)
+ init_migr_record_imsm(super, info, length/512);
+
+ new_disks = info->array.raid_disks + info->delta_disks;
+ new_odata = new_disks;
+ if ((info->new_level == 5) || (info->new_level == 4))
+ new_odata--;
+
+ targets = malloc(new_disks * sizeof(int));
+ if (!targets)
+ goto abort;
+
+ target_offsets = malloc(new_disks * sizeof(unsigned long long));
+ if (!target_offsets)
+ goto abort;
+
+ for (i = 0; i < new_disks; i++) {
+ targets[i] = -1;
+ target_offsets[i] = (unsigned long long)
+ __le32_to_cpu(super->migr_rec->ckpt_area_pba) * 512;
+ target_offsets[i] -= write_offset / new_odata;
+ }
+
+ open_backup_targets(info, new_disks, targets);
+
+ if (restore_stripes(targets, /* list of dest devices */
+ target_offsets, /* migration record offsets */
+ new_disks,
+ info->new_chunk,
+ info->new_level,
+ info->new_layout,
+ 0, /* source backup file descriptor */
+ 0, /* input buf offset
+ * always 0 buf is already offseted */
+ write_offset,
+ info->new_chunk * new_odata,
+ buf) != 0) {
+ fprintf(stderr, Name ": Error restoring stripes\n");
+ goto abort;
+ }
+
+ super->migr_rec->curr_migr_unit =
+ __cpu_to_le32(info->reshape_progress /
+ __le32_to_cpu(super->migr_rec->blocks_per_unit) + 1);
+ super->migr_rec->rec_status = __cpu_to_le32(UNIT_SRC_IN_CP_AREA);
+ super->migr_rec->dest_1st_member_lba =
+ __cpu_to_le32((__le32_to_cpu(super->migr_rec->curr_migr_unit ) - 1)
+ * __le32_to_cpu(super->migr_rec->dest_depth_per_unit));
+
+ write_imsm_migr_rec(super, info);
+ abort:
+ if (targets) {
+ for (i = 0; i < new_disks; i++)
+ if (targets[i] >= 0)
+ close(targets[i]);
+ free(targets);
+ }
+ if (target_offsets)
+ free(target_offsets);
+
+ return rv;
+}
+
+void discard_backup_imsm(struct supertype *st, struct mdinfo *info)
+{
+ struct intel_super *super = st->sb;
+ load_imsm_migr_rec(super, info);
+ if (__le32_to_cpu(super->migr_rec->blocks_per_unit) == 0) {
+ dprintf("ERROR: blocks_per_unit = 0!!!\n");
+ return;
+ }
+
+ super->migr_rec->curr_migr_unit =
+ __cpu_to_le32(info->reshape_progress /
+ __le32_to_cpu(super->migr_rec->blocks_per_unit) + 1);
+ super->migr_rec->rec_status = __cpu_to_le32(UNIT_SRC_NORMAL);
+ super->migr_rec->dest_1st_member_lba =
+ __cpu_to_le32((__le32_to_cpu(super->migr_rec->curr_migr_unit ) - 1)
+ * __le32_to_cpu(super->migr_rec->dest_depth_per_unit));
+ write_imsm_migr_rec(super, info);
+}
+
+int recover_backup_imsm(struct supertype *st, struct mdinfo *info,
+ void *ptr, int length)
+{
+ struct intel_super *super = st->sb;
+ unsigned long long read_offset;
+ unsigned long long write_offset;
+ unsigned unit_len;
+ int *targets = NULL;
+ int new_disks, i;
+ char *buf = NULL;
+ int retval = 1;
+
+ if (__le32_to_cpu(super->migr_rec->rec_status) == UNIT_SRC_NORMAL)
+ return 0;
+ if (__le32_to_cpu(super->migr_rec->curr_migr_unit)
+ >= __le32_to_cpu(super->migr_rec->num_migr_units))
+ return 0;
+
+ new_disks = info->array.raid_disks + info->delta_disks;
+
+ read_offset = (unsigned long long)
+ __le32_to_cpu(super->migr_rec->ckpt_area_pba) * 512;
+
+ write_offset = ((unsigned long long)
+ __le32_to_cpu(super->migr_rec->dest_1st_member_lba) +
+ info->data_offset) * 512;
+
+ unit_len = __le32_to_cpu(super->migr_rec->dest_depth_per_unit) * 512;
+ if (posix_memalign((void **)&buf, 512, unit_len) != 0)
+ goto abort;
+ targets = malloc(new_disks * sizeof(int));
+ if (!targets)
+ goto abort;
+
+ open_backup_targets(info, new_disks, targets);
+
+ for (i = 0; i < new_disks; i++) {
+ if (lseek64(targets[i], read_offset, SEEK_SET) < 0) {
+ fprintf(stderr,
+ Name ": Cannot seek to block: %s\n",
+ strerror(errno));
+ goto abort;
+ }
+ if (read(targets[i], buf, unit_len) != unit_len) {
+ fprintf(stderr,
+ Name ": Cannot read copy area block: %s\n",
+ strerror(errno));
+ goto abort;
+ }
+ if (lseek64(targets[i], write_offset, SEEK_SET) < 0) {
+ fprintf(stderr,
+ Name ": Cannot seek to block: %s\n",
+ strerror(errno));
+ goto abort;
+ }
+ if (write(targets[i], buf, unit_len) != unit_len) {
+ fprintf(stderr,
+ Name ": Cannot restore block: %s\n",
+ strerror(errno));
+ goto abort;
+ }
+ }
+ retval = 0;
+abort:
+ if (targets) {
+ for (i = 0; i < new_disks; i++)
+ if (targets[i])
+ close(targets[i]);
+ free(targets);
+ }
+ if (buf)
+ free(buf);
+ return retval;
+}
#endif /* MDASSEMBLE */

static char disk_by_path[] = "/dev/disk/by-path/";
@@ -8615,6 +8871,13 @@ struct superswitch super_imsm = {
.manage_reshape = imsm_manage_reshape,
.reshape_array = imsm_reshape_array,

+ /* for external backup area
+ *
+ */
+ .save_backup = save_backup_imsm,
+ .discard_backup = discard_backup_imsm,
+ .recover_backup = recover_backup_imsm,
+
.external = 1,
.name = "imsm",

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

[PATCH 12/16] mdadm: migration restart for external meta

am 13.12.2010 15:46:34 von adam.kwolek

Add support for assembling partially migrated arrays with external meta.
Note that if Raid0 was used while migration it should be changed to
Raid4 while assembling (see check_mpb_migr_compatibility and switch_raid0_configuration).

getinfo_super_imsm_volume() reads migration record and initializes mdadm reshape specific structures.

Signed-off-by: Maciej Trela
Signed-off-by: Adam Kwolek
---

Assemble.c | 10 +++
super-intel.c | 218 +++++++++++++++++++++++++++++++++++++++++++++++++++++++--
2 files changed, 221 insertions(+), 7 deletions(-)

diff --git a/Assemble.c b/Assemble.c
index ac489e8..7293ee6 100644
--- a/Assemble.c
+++ b/Assemble.c
@@ -1429,6 +1429,16 @@ int assemble_container_content(struct supertype *st, int mdfd,
close(mdfd);
return 1;
}
+
+ if (content->reshape_active) {
+ sysfs_set_num(sra, NULL, "reshape_position",
+ content->reshape_progress);
+ sysfs_set_num(sra, NULL, "chunk_size", content->new_chunk);
+ sysfs_set_num(sra, NULL, "layout", content->new_layout);
+ sysfs_set_num(sra, NULL, "raid_disks",
+ content->array.raid_disks + content->delta_disks);
+ }
+
if (sra)
sysfs_free(sra);

diff --git a/super-intel.c b/super-intel.c
index b328828..351fbbc 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -916,6 +916,7 @@ static void examine_super_imsm(struct supertype *st, char *homehost)
printf(" Orig Family : %08x\n", __le32_to_cpu(mpb->orig_family_num));
printf(" Family : %08x\n", __le32_to_cpu(mpb->family_num));
printf(" Generation : %08x\n", __le32_to_cpu(mpb->generation_num));
+ info.devs = NULL;
getinfo_super_imsm(st, &info, NULL);
fname_from_uuid(st, &info, nbuf, ':');
printf(" UUID : %s\n", nbuf + 5);
@@ -943,6 +944,7 @@ static void examine_super_imsm(struct supertype *st, char *homehost)
struct imsm_dev *dev = __get_imsm_dev(mpb, i);

super->current_vol = i;
+ info.devs = NULL;
getinfo_super_imsm(st, &info, NULL);
fname_from_uuid(st, &info, nbuf, ':');
print_imsm_dev(dev, nbuf + 5, super->disks->index);
@@ -966,6 +968,7 @@ static void brief_examine_super_imsm(struct supertype *st, int verbose)
return;
}

+ info.devs = NULL;
getinfo_super_imsm(st, &info, NULL);
fname_from_uuid(st, &info, nbuf, ':');
printf("ARRAY metadata=imsm UUID=%s\n", nbuf + 5);
@@ -983,12 +986,14 @@ static void brief_examine_subarrays_imsm(struct supertype *st, int verbose)
if (!super->anchor->num_raid_devs)
return;

+ info.devs = NULL;
getinfo_super_imsm(st, &info, NULL);
fname_from_uuid(st, &info, nbuf, ':');
for (i = 0; i < super->anchor->num_raid_devs; i++) {
struct imsm_dev *dev = get_imsm_dev(super, i);

super->current_vol = i;
+ info.devs = NULL;
getinfo_super_imsm(st, &info, NULL);
fname_from_uuid(st, &info, nbuf1, ':');
printf("ARRAY /dev/md/%.16s container=%s member=%d UUID=%s\n",
@@ -1003,6 +1008,7 @@ static void export_examine_super_imsm(struct supertype *st)
struct mdinfo info;
char nbuf[64];

+ info.devs = NULL;
getinfo_super_imsm(st, &info, NULL);
fname_from_uuid(st, &info, nbuf, ':');
printf("MD_METADATA=imsm\n");
@@ -1016,6 +1022,7 @@ static void detail_super_imsm(struct supertype *st, char *homehost)
struct mdinfo info;
char nbuf[64];

+ info.devs = NULL;
getinfo_super_imsm(st, &info, NULL);
fname_from_uuid(st, &info, nbuf, ':');
printf("\n UUID : %s\n", nbuf + 5);
@@ -1025,6 +1032,7 @@ static void brief_detail_super_imsm(struct supertype *st)
{
struct mdinfo info;
char nbuf[64];
+ info.devs = NULL;
getinfo_super_imsm(st, &info, NULL);
fname_from_uuid(st, &info, nbuf, ':');
printf(" UUID=%s", nbuf + 5);
@@ -1693,6 +1701,8 @@ static void getinfo_super_imsm_volume(struct supertype *st, struct mdinfo *info,
struct dl *dl;
char *devname;
int map_disks = info->array.raid_disks;
+ __u32 blocks_per_member;
+ __u32 blocks_per_strip;

if (map == NULL)
return;
@@ -1703,7 +1713,13 @@ static void getinfo_super_imsm_volume(struct supertype *st, struct mdinfo *info,
info->container_member = super->current_vol;
info->array.raid_disks = map->num_members;
info->array.level = get_imsm_raid_level(map);
- info->array.layout = imsm_level_to_layout(info->array.level);
+ if (info->array.level == 4) {
+ map->raid_level = 5;
+ info->array.level = 5;
+ info->array.layout = ALGORITHM_PARITY_N;
+ } else {
+ info->array.layout = imsm_level_to_layout(info->array.level);
+ }
info->array.md_minor = -1;
info->array.ctime = 0;
info->array.utime = 0;
@@ -1721,7 +1737,15 @@ static void getinfo_super_imsm_volume(struct supertype *st, struct mdinfo *info,
}

info->data_offset = __le32_to_cpu(map->pba_of_lba0);
- info->component_size = __le32_to_cpu(map->blocks_per_member);
+ /* FIXME: For some unknown reason sometimes in a volume created by
+ * IMSM blocks_per_member is not a multiple of blocks_per strip.
+ * Fix blocks_per_member here:
+ */
+ blocks_per_member = __le32_to_cpu(map->blocks_per_member);
+ blocks_per_strip = __le16_to_cpu(map->blocks_per_strip);
+ blocks_per_member &= ~(blocks_per_strip - 1);
+ info->component_size = blocks_per_member;
+
memset(info->uuid, 0, sizeof(info->uuid));
info->recovery_start = MaxSector;
info->reshape_active = (prev_map != NULL);
@@ -1749,7 +1773,45 @@ static void getinfo_super_imsm_volume(struct supertype *st, struct mdinfo *info,
*/
case MIGR_REBUILD:
/* this is handled by container_content_imsm() */
- case MIGR_GEN_MIGR:
+ case MIGR_GEN_MIGR: {
+ int data_members;
+
+ load_imsm_migr_rec(super, info);
+
+ info->reshape_progress = (unsigned long long)
+ __le32_to_cpu(super->migr_rec->blocks_per_unit) *
+ __le32_to_cpu(super->migr_rec->curr_migr_unit);
+
+ /* set previous and new map configurations */
+ prev_map = get_imsm_map(dev, 1);
+ info->array.raid_disks = prev_map->num_members;
+ info->new_level = info->array.level;
+ info->array.level = get_imsm_raid_level(prev_map);
+ info->new_layout = info->array.layout;
+ info->array.layout =
+ imsm_level_to_layout(info->array.level);
+ info->array.chunk_size =
+ __le16_to_cpu(prev_map->blocks_per_strip) << 9;
+ info->new_chunk =
+ __le16_to_cpu(map->blocks_per_strip) << 9;
+
+ if (info->array.level == 4) {
+ prev_map->raid_level = 5;
+ info->array.level = 5;
+ info->array.layout = ALGORITHM_PARITY_N;
+ }
+
+ /* IMSM FIX for blocks_per_member */
+ blocks_per_strip =
+ __le16_to_cpu(prev_map->blocks_per_strip);
+ blocks_per_member &= ~(blocks_per_strip - 1);
+ info->component_size = blocks_per_member;
+
+ /* Calculate previous array size */
+ data_members = imsm_num_data_members(dev, 1);
+ info->custom_array_size =
+ blocks_per_member * data_members;
+ }
case MIGR_STATE_CHANGE:
/* FIXME handle other migrations */
default:
@@ -2524,6 +2586,123 @@ struct bbm_log *__get_imsm_bbm_log(struct imsm_super *mpb)
return ptr;
}

+/* Switches N-disk Raid0 map configuration (N+1)disk Raid4
+ */
+void switch_raid0_configuration(struct imsm_super *mpb, struct imsm_map *map)
+{
+ __u8 *src, *dst;
+ int bytes_to_copy;
+
+ /* get the pointer to the rest of the metadata */
+ src = (__u8 *)map + sizeof_imsm_map(map);
+
+ /* change the level and disk number to be compatible with IMSM */
+ map->raid_level = 4;
+ map->num_members++;
+
+ /* get the updated pointer to the rest of the metadata */
+ dst = (__u8 *)map + sizeof_imsm_map(map);
+ /* Now move the rest of the metadata to be properly aligned */
+ bytes_to_copy = mpb->mpb_size - (src - (__u8 *)mpb);
+ if (bytes_to_copy > 0)
+ memmove(dst, src, bytes_to_copy);
+ /* Now insert new entry to the map */
+ set_imsm_ord_tbl_ent(map, map->num_members - 1/*slot*/,
+ mpb->num_disks | IMSM_ORD_REBUILD);
+ /* update size */
+ mpb->mpb_size += sizeof(__u32);
+}
+
+/* Make sure that in case of migration in progress we'll convert raid
+ * personalities so we could continue migrating
+ */
+void convert_raid_personalities(struct intel_super *super)
+{
+ struct imsm_super *mpb = super->anchor;
+ struct imsm_map *map;
+ struct imsm_disk *newMissing;
+ int i, map_modified = 0;
+ int bytes_to_copy;
+ __u8 *src, *dst;
+
+ for (i = 0; i < super->anchor->num_raid_devs; i++) {
+ struct imsm_dev *dev_iter = __get_imsm_dev(super->anchor, i);
+
+ map_modified = 0;
+ if (dev_iter &&
+ dev_iter->vol.migr_state == 1 &&
+ dev_iter->vol.migr_type == MIGR_GEN_MIGR) {
+ /* This device is migrating, check for raid0 levels */
+ map = get_imsm_map(dev_iter, 0);
+ if (map->raid_level == 0) {
+ /* Map0: Migrating raid0 detected
+ * lets switch it to level4
+ */
+ switch_raid0_configuration(mpb, map);
+ map_modified++;
+ }
+ map = get_imsm_map(dev_iter, 1);
+ if (map->raid_level == 0) {
+ /* Map1: Migrating raid0 detected
+ * lets switch it to level4
+ */
+ switch_raid0_configuration(mpb, map);
+ map_modified++;
+ }
+ }
+ }
+
+ if (map_modified > 0) {
+ /* Add missing device to the MPB disk table */
+ src = (__u8 *)mpb->disk + sizeof(struct imsm_disk)
+ *mpb->num_disks;
+ mpb->num_disks++;
+ dst = (__u8 *)mpb->disk + sizeof(struct imsm_disk)
+ *mpb->num_disks;
+
+ /* Now move the rest of the metadata to be properly aligned */
+ bytes_to_copy = mpb->mpb_size - (src - (__u8 *)mpb);
+ if (bytes_to_copy > 0)
+ memmove(dst, src, bytes_to_copy);
+
+ /* Update mpb size */
+ mpb->mpb_size += sizeof(struct imsm_disk);
+
+ /* Now fill in the new missing disk fields */
+ newMissing = (struct imsm_disk *)src;
+ sprintf((char *)newMissing->serial, "%s", "MISSING DISK");
+ /* copy the device size from the first disk */
+ newMissing->total_blocks = mpb->disk[0].total_blocks;
+ newMissing->scsi_id = 0x0;
+ newMissing->status = FAILED_DISK;
+ }
+}
+
+/* Check for unsupported migration features:
+ * migration optimization area
+ */
+int check_mpb_migr_compatibility(struct intel_super *super)
+{
+ struct imsm_map *map0, *map1;
+ int i;
+
+ for (i = 0; i < super->anchor->num_raid_devs; i++) {
+ struct imsm_dev *dev_iter = __get_imsm_dev(super->anchor, i);
+
+ if (dev_iter &&
+ dev_iter->vol.migr_state == 1 &&
+ dev_iter->vol.migr_type == MIGR_GEN_MIGR) {
+ /* This device is migrating */
+ map0 = get_imsm_map(dev_iter, 0);
+ map1 = get_imsm_map(dev_iter, 1);
+ if (map0->pba_of_lba0 != map1->pba_of_lba0)
+ /* migration optimization area was used */
+ return -1;
+ }
+ }
+ return 0;
+}
+
static void __free_imsm(struct intel_super *super, int free_disks);

/* load_imsm_mpb - read matrix metadata
@@ -2642,6 +2821,21 @@ static int load_imsm_mpb(int fd, struct intel_super *super, char *devname)
return 3;
}

+ /* Check for unsupported migration features */
+ if (check_mpb_migr_compatibility(super) != 0) {
+ if (devname)
+ fprintf(stderr,
+ Name ": Unsupported migration detected on %s\n",
+ devname);
+
+ return 4;
+ }
+
+ /* Now make sure that in case of migration
+ * we'll convert raid personalities
+ */
+ convert_raid_personalities(super);
+
/* FIXME the BBM log is disk specific so we cannot use this global
* buffer for all disks. Ok for now since we only look at the global
* bbm_log_size parameter to gate assembly
@@ -4662,6 +4856,8 @@ static void update_recovery_start(struct imsm_dev *dev, struct mdinfo *array)
rebuild->recovery_start = units * blocks_per_migr_unit(dev);
}

+static int recover_backup_imsm(struct supertype *st, struct mdinfo *info,
+ void *ptr, int length);

static struct mdinfo *container_content_imsm(struct supertype *st, char *subarray)
{
@@ -4788,8 +4984,16 @@ static struct mdinfo *container_content_imsm(struct supertype *st, char *subarra
info_d->data_offset = __le32_to_cpu(map->pba_of_lba0);
info_d->component_size = __le32_to_cpu(map->blocks_per_member);
}
- /* now that the disk list is up-to-date fixup recovery_start */
- update_recovery_start(dev, this);
+ if (this) {
+ /* now that the disk list
+ * is up-to-date fixup recovery_start */
+ update_recovery_start(dev, this);
+
+ /* check for reshape */
+ if (this->reshape_active == 1)
+ recover_backup_imsm(st, this, NULL, 0);
+ }
+
rest = this;
}

@@ -6630,8 +6834,8 @@ void discard_backup_imsm(struct supertype *st, struct mdinfo *info)
write_imsm_migr_rec(super, info);
}

-int recover_backup_imsm(struct supertype *st, struct mdinfo *info,
- void *ptr, int length)
+static int recover_backup_imsm(struct supertype *st, struct mdinfo *info,
+ void *ptr, int length)
{
struct intel_super *super = st->sb;
unsigned long long read_offset;

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

[PATCH 13/16] Add mdadm->mdmon sync_max command message

am 13.12.2010 15:46:43 von adam.kwolek

Currently only metadata_update messages can be send from mdadm do mdmon using a socket.
For the external metadata reshape implementation a support for sending sync_max command will be also needed.

A new type of message "cmd_message" was defined.
cmd_message is a generic structure that enables to define different types of commands to be send from mdadm to mdmon.

cmd_message's and update_message's are recognized by different start magic numbers sent through the socket.

In this patch only one type of cmd_message was defined:
'SET_SYNC_MAX'

Signed-off-by: Maciej Trela
Signed-off-by: Adam Kwolek
---

managemon.c | 39 +++++++++++++++++++++++++++++++++++++--
mdadm.h | 18 ++++++++++++++++++
msg.c | 33 +++++++++++++++++++++++++++++++--
msg.h | 2 ++
util.c | 25 +++++++++++++++++++++++++
5 files changed, 113 insertions(+), 4 deletions(-)

diff --git a/managemon.c b/managemon.c
index 7ff49ab..c675d71 100644
--- a/managemon.c
+++ b/managemon.c
@@ -864,13 +864,36 @@ static void handle_message(struct supertype *container, struct metadata_update *
}
}

+static void handle_command(struct supertype *container, struct cmd_message *msg)
+{
+ struct active_array *a;
+
+ /* Search for a member of this container */
+ for (a = container->arrays; a; a = a->next)
+ if (msg->devnum == a->devnum)
+ break;
+
+ if (!a)
+ return;
+
+ /* check command msg type */
+ switch (msg->type) {
+ case SET_SYNC_MAX:
+ /* Add SET_SYNC_MAX handler here */
+ break;
+ }
+}
+
void read_sock(struct supertype *container)
{
int fd;
struct metadata_update msg;
+ struct mdmon_update *update;
+ struct cmd_message *cmd_msg;
int terminate = 0;
long fl;
int tmo = 3; /* 3 second timeout before hanging up the socket */
+ int rv;

fd = accept(container->sock, NULL, NULL);
if (fd < 0)
@@ -884,7 +907,9 @@ void read_sock(struct supertype *container)
msg.buf = NULL;

/* read and validate the message */
- if (receive_message(fd, &msg, tmo) == 0) {
+ rv = receive_message(fd, &msg, tmo);
+ if (rv == 0) {
+ /* metadata update */
handle_message(container, &msg);
if (msg.len == 0) {
/* ping reply with version */
@@ -894,8 +919,18 @@ void read_sock(struct supertype *container)
terminate = 1;
} else if (ack(fd, tmo) < 0)
terminate = 1;
- } else
+ } else if (rv == 1) {
+ /* mdmon_update received */
+ update = (struct mdmon_update *)&msg;
+ cmd_msg = (struct cmd_message *)(update->buf);
+ handle_command(container, cmd_msg);
+
+ free(msg.buf);
+ if (ack(fd, tmo) < 0)
+ terminate = 1;
+ } else {
terminate = 1;
+ }

} while (!terminate);

diff --git a/mdadm.h b/mdadm.h
index 45c6b3c..de5d642 100644
--- a/mdadm.h
+++ b/mdadm.h
@@ -811,6 +811,23 @@ struct metadata_update {
struct metadata_update *next;
};

+struct mdmon_update {
+ int len;
+ char *buf;
+};
+
+enum cmd_type {
+ SET_SYNC_MAX,
+};
+
+struct cmd_message {
+ enum cmd_type type;
+ int devnum;
+ union {
+ unsigned long long new_sync_max;
+ } msg_buf;
+};
+
/* A supertype holds a particular collection of metadata.
* It identifies the metadata type by the superswitch, and the particular
* sub-version of that metadata type.
@@ -1141,6 +1158,7 @@ extern int add_disk(int mdfd, struct supertype *st,
extern int remove_disk(int mdfd, struct supertype *st,
struct mdinfo *sra, struct mdinfo *info);
extern int set_array_info(int mdfd, struct supertype *st, struct mdinfo *info);
+extern int send_mdmon_cmd(struct supertype *st, struct mdmon_update *update);
unsigned long long min_recovery_start(struct mdinfo *array);

extern char *human_size(long long bytes);
diff --git a/msg.c b/msg.c
index 5511ecd..7f0f1f8 100644
--- a/msg.c
+++ b/msg.c
@@ -32,6 +32,7 @@
#include "mdmon.h"

static const __u32 start_magic = 0x5a5aa5a5;
+static const __u32 start_magic_cmd = 0x6b6bb6b6;
static const __u32 end_magic = 0xa5a55a5a;

static int send_buf(int fd, const void* buf, int len, int tmo)
@@ -93,14 +94,42 @@ int send_message(int fd, struct metadata_update *msg, int tmo)
return rv;
}

+int send_message_cmd(int fd, struct mdmon_update *update, int tmo)
+{
+ __s32 len = update->len;
+ int rv;
+
+ rv = send_buf(fd, &start_magic_cmd, 4, tmo);
+ rv = rv ?: send_buf(fd, &len, 4, tmo);
+ if (len > 0)
+ rv = rv ?: send_buf(fd, update->buf, update->len, tmo);
+ rv = send_buf(fd, &end_magic, 4, tmo);
+
+ return rv;
+}
+
+/*
+ * return:
+ * 0 - metadata_update received
+ * 1 - mdmon_update received
+ * -1 - error case
+ */
int receive_message(int fd, struct metadata_update *msg, int tmo)
{
__u32 magic;
__s32 len;
int rv;
+ int msg_type;

rv = recv_buf(fd, &magic, 4, tmo);
- if (rv < 0 || magic != start_magic)
+ if (rv < 0)
+ return -1;
+
+ if (magic == start_magic)
+ msg_type = 0;
+ else if (magic == start_magic_cmd)
+ msg_type = 1;
+ else
return -1;
rv = recv_buf(fd, &len, 4, tmo);
if (rv < 0 || len > MSG_MAX_LEN)
@@ -122,7 +151,7 @@ int receive_message(int fd, struct metadata_update *msg, int tmo)
return -1;
}
msg->len = len;
- return 0;
+ return msg_type;
}

int ack(int fd, int tmo)
diff --git a/msg.h b/msg.h
index 1f916de..046f7c4 100644
--- a/msg.h
+++ b/msg.h
@@ -20,9 +20,11 @@

struct mdinfo;
struct metadata_update;
+struct mdmon_update;

extern int receive_message(int fd, struct metadata_update *msg, int tmo);
extern int send_message(int fd, struct metadata_update *msg, int tmo);
+extern int send_message_cmd(int fd, struct mdmon_update *update, int tmo);
extern int ack(int fd, int tmo);
extern int wait_reply(int fd, int tmo);
extern int connect_monitor(char *devname);
diff --git a/util.c b/util.c
index ebeaa16..09961be 100644
--- a/util.c
+++ b/util.c
@@ -1877,6 +1877,31 @@ int flush_metadata_updates(struct supertype *st)
return 0;
}

+int send_mdmon_cmd(struct supertype *st, struct mdmon_update *update)
+{
+ int sfd;
+ char *devname;
+
+ devname = devnum2devname(st->container_dev);
+ if (devname == NULL)
+ return -1;
+ sfd = connect_monitor(devname);
+ if (sfd < 0) {
+ free(devname);
+ return -1;
+ }
+
+ send_message_cmd(sfd, update, 0);
+ wait_reply(sfd, 0);
+
+ ack(sfd, 0);
+ wait_reply(sfd, 0);
+ close(sfd);
+ st->update_tail = NULL;
+ free(devname);
+ return 0;
+}
+
void append_metadata_update(struct supertype *st, void *buf, int len)
{

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

[PATCH 14/16] mdadm: support grow operation for external meta using

am 13.12.2010 15:46:52 von adam.kwolek

Assumptions for external metadata reshape implementation:
- mdadm controls weather writing over live data
- mdadm advances suspend_hi, does a backup if needed,
tells mdmon it is safe to continue by sending
resync_max command_msg to mdmon
- mdmon controls sync_max sysfs entry - so the kernel won't
cross the safe position (reshape progress from metadata)
- mdmon monitors resync_completed and update the metadata
to reflect 'resync_completed'.
- mdmon moves suspend_lo forward in line with changes in
resync_completed
- md updates/notifies resync_completed periodically which
guide mdmon in updating the metadata periodically.

Above "mdadm" here means a background process forked by "mdadm --grow"
or "mdadm --assemble" which monitors an ongoing reshape.
A general algorithm for external metadata reshape:

<=====we are writing over live data
1. mdadm sets suspend_lo = 0, suspend_hi = 0
2. monitor waits for new sync_max message from mdadm
3. mdadm sets suspend_hi
4. mdadm perform critical data backup with save_backup()
5. mdadm sends new resync_max to monitor
6. mdadm waits on suspend_lo change
7. mdmon wakes up on socket msg
8. mdmon: sync_max is not MAX (we are still writing over live data)
monitor sets sysfs:sync_max
9. md reshape critical stripes
10. mdmon wakes up on new sync_completed
11. mdmon updates metadata using discard_backup()
12. mdmon updates suspend_lo
13. mdmon wakes on suspend_lo
14.

<==== now critical section is finished
2. mdmon waits for new sync_max message from mdadm
3. mdadm sends new sync_max to monitor without stripes backup
(this means the end of critical section)
4. mdadm go back to 2. until end of array
5. mdmon works as for critical section

A new external counterpart for grow_backup() is implemented:
grow_backup_ext().
For non-grow reshape (number of data disks do not change) a new child_same_size_ext() function is implemented.
Both uses save_stripes to read critical data from the source array to the buffer and than writes the buffer to the external backup area with save_backup().
mdmon uses discard_backup() when notified with the new sync_completed.

Signed-off-by: Maciej Trela
Signed-off-by: Adam Kwolek
---

Grow.c | 378 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++--
managemon.c | 34 +++++
mdadm.h | 1
mdmon.h | 6 +
monitor.c | 104 ++++++++++++++++
5 files changed, 507 insertions(+), 16 deletions(-)

diff --git a/Grow.c b/Grow.c
index 9fbdd0e..02193a9 100644
--- a/Grow.c
+++ b/Grow.c
@@ -854,6 +854,12 @@ void reshape_free_fdlist(int *fdlist,
{
int i;

+ if ((fdlist == NULL) || (offsets == NULL)) {
+ dprintf(Name " Error: reshape_free_fdlist() - "\
+ "parameters verification error.\n");
+ return;
+ }
+
for (i = 0; i < size; i++)
if (fdlist[i] >= 0)
close(fdlist[i]);
@@ -1910,7 +1916,14 @@ int Grow_reshape(char *devname, int fd, int quiet, char *backup_file,
else
fd = -1;
mlockall(MCL_FUTURE);
-
+ sra->array.raid_disks = odisks;
+ sra->array.level = array.level;
+ sra->array.layout = olayout;
+ sra->array.chunk_size = ochunk;
+ sra->delta_disks = ndisks - odisks;
+ sra->new_level = (level == UnSet) ? array.level : level;
+ sra->new_layout = nlayout;
+ sra->new_chunk = nchunk;
if (odata < ndata)
done = child_grow(st, fd, sra, stripes,
fdlist, offsets,
@@ -2293,6 +2306,241 @@ static void validate(int afd, int bfd, unsigned long long offset)
}
}

+int wait_reshape_completed_ext(struct supertype *st,
+ struct mdinfo *sra,
+ unsigned long long offset /* per device */)
+{
+
+ /* Wait for resync to pass the section that was backed up
+ * then erase the backup and allow IO
+ */
+ int fd = sysfs_get_fd(sra, NULL, "suspend_lo");
+ unsigned long long completed;
+
+ struct timeval timeout;
+
+ if (fd < 0)
+ return -1;
+ timeout.tv_sec = 0;
+ timeout.tv_usec = 500000;
+ do {
+ char action[20];
+ fd_set rfds;
+ FD_ZERO(&rfds);
+ FD_SET(fd, &rfds);
+ select(fd+1, NULL, NULL, &rfds, &timeout);
+ if (sysfs_fd_get_ll(fd, &completed) < 0) {
+ close(fd);
+ return -1;
+ }
+ if (sysfs_get_str(sra, NULL, "sync_action", action, 20) > 0) {
+ if (strncmp(action, "reshape", 7) != 0) {
+ close(fd);
+ return -2;
+ }
+ } else {
+ /* takeover support, when we will back to raid0
+ * sync_action sysfs entry disappears
+ * so we have to exit also
+ */
+ if (sysfs_get_str(sra, NULL,
+ "level", action, 20) > 0) {
+ if (strncmp(action, "raid0", 5) == 0) {
+ close(fd);
+ return -2;
+ }
+ }
+ }
+ } while (completed < offset);
+ close(fd);
+
+ return 0;
+}
+
+int wait_reshape_start_ext(struct supertype *st, struct mdinfo *sra)
+{
+#define WAIT_FOR_RESHAPE_START 20
+ int wait_time = WAIT_FOR_RESHAPE_START;
+ int ret_val = -1;
+ char *container = devnum2devname(st->devnum);
+
+ if (container == NULL) {
+ dprintf("wait_reshape_start_ext: cannot find container.\n");
+ return ret_val;
+ }
+ ping_manager(container);
+ ping_monitor(container);
+ while (wait_time) {
+ char action[20];
+ dprintf("wait_reshape_start_ext Waiting for reshape state (%i)"\
+ "...\n", WAIT_FOR_RESHAPE_START - wait_time + 1);
+ if (sysfs_get_str(sra, NULL, "sync_action", action, 20) < 0) {
+ dprintf("Error: wait_reshape_start_ext cannot "\
+ "read sync_action\n");
+ break;
+ }
+ dprintf("wait_reshape_start_ext: read from sysfs: %s\n",
+ action);
+ if (strncmp(action, "reshape", 7) == 0) {
+ dprintf("wait_reshape_start_ext: reshape started.\n");
+ ret_val = 0;
+ break;
+ }
+ ping_manager(container);
+ ping_monitor(container);
+ sleep(1);
+ wait_time--;
+ }
+
+ free(container);
+ return ret_val;
+}
+
+void send_resync_max_to_mdmon(struct supertype *st,
+ struct mdinfo *sra,
+ unsigned long long resync_max)
+{
+ struct mdmon_update msg;
+ struct cmd_message cmd_msg;
+
+ cmd_msg.type = SET_SYNC_MAX;
+ cmd_msg.devnum = devname2devnum(sra->sys_name);
+ cmd_msg.msg_buf.new_sync_max = resync_max;
+ msg.buf = (void *)&cmd_msg;
+ msg.len = sizeof(cmd_msg);
+
+ send_mdmon_cmd(st, &msg);
+}
+
+int grow_backup_ext(struct supertype *st, struct mdinfo *sra,
+ unsigned long long offset, /* per device */
+ unsigned long long stripes, /* per device */
+ int *sources, unsigned long long *offsets,
+ int dests, int *destfd, unsigned long long *destoffsets,
+ int *degraded, char *buf)
+{
+ int disks = sra->array.raid_disks;
+ int chunk = sra->array.chunk_size;
+ int level = sra->array.level;
+ int layout = sra->array.layout;
+ unsigned long long new_degraded;
+ unsigned long long processed = 0;
+ unsigned long long read_offset = 0;
+ unsigned long long write_offset;
+ unsigned long long resync_max;
+ unsigned bytes_per_unit;
+ int new_disks, new_odata;
+ int odata = disks;
+ int retval = 0;
+ int rv = 0;
+ int i;
+
+ if (level >= 4)
+ odata--;
+ if (level == 6)
+ odata--;
+ sysfs_set_num(sra, NULL, "suspend_hi",
+ (offset + stripes * chunk/512) * odata);
+ /* Check that array hasn't become degraded,
+ * else we might backup the wrong data */
+ sysfs_get_ll(sra, NULL, "degraded", &new_degraded);
+ if (new_degraded != (unsigned long long)*degraded) {
+ /* check each device to ensure it is still working */
+ struct mdinfo *sd;
+ for (sd = sra->devs ; sd ; sd = sd->next) {
+ if (sd->disk.state & (1< + continue;
+ if (sd->disk.state & (1< + char sbuf[20];
+ if (sysfs_get_str(sra,
+ sd,
+ "state",
+ sbuf, 20) < 0 ||
+ strstr(sbuf, "faulty") ||
+ strstr(sbuf, "in_sync") == NULL) {
+ /* this device is dead */
+ sd->disk.state = (1< + if (sd->disk.raid_disk >= 0 &&
+ sources[sd->disk.raid_disk] >= 0) {
+ close(sources[sd->disk.raid_disk]);
+ sources[sd->disk.raid_disk] =
+ -1;
+ }
+ }
+ }
+ }
+ *degraded = new_degraded;
+ }
+
+ for (i = 0; i < dests; i++)
+ lseek64(destfd[i], destoffsets[i], 0);
+
+ /* save critical stripes to buf */
+ for (i = 0; i < (int)stripes; i++)
+ rv |= save_stripes(sources, offsets,
+ disks, chunk, level, layout,
+ dests, destfd,
+ offset * 512 * odata + (i * chunk * odata),
+ chunk * odata,
+ buf + (i * chunk * odata));
+
+ if (rv)
+ return rv;
+
+ new_disks = disks + sra->delta_disks;
+ new_odata = new_disks;
+ if (sra->new_level >= 4)
+ new_odata--;
+ if (sra->new_level == 6)
+ new_odata--;
+
+ write_offset = offset * 512 * new_odata;
+ bytes_per_unit = sra->new_chunk * new_odata;
+ if (chunk > sra->new_chunk)
+ bytes_per_unit *= (chunk / sra->new_chunk);
+ while ((processed < stripes * chunk * odata) ||
+ (processed == 0 && stripes * chunk * odata == 0)) {
+ int dn;
+ char *devname;
+
+ /* Save critical stripes to external backup */
+ if (st->ss->save_backup)
+ st->ss->save_backup(st, sra,
+ buf + read_offset,
+ write_offset,
+ bytes_per_unit);
+
+ /* send new sync_max to mdmon */
+ resync_max = write_offset / 512 / new_odata +
+ bytes_per_unit / 512 / new_odata;
+ send_resync_max_to_mdmon(st, sra, resync_max);
+
+ /* Wait for updated suspend_lo */
+ retval = wait_reshape_completed_ext(st, sra,
+ resync_max * new_odata);
+ if (retval == -2) {
+ /* reshape has been finished
+ */
+ rv = -1;
+ break;
+ }
+
+ processed += bytes_per_unit;
+ read_offset += bytes_per_unit;
+ write_offset += bytes_per_unit;
+ sra->reshape_progress = write_offset / 512;
+
+ dn = devname2devnum(sra->text_version + 1);
+ devname = devnum2devname(dn);
+ if (devname) {
+ ping_monitor(devname);
+ free(devname);
+ }
+ }
+
+ return rv;
+}
+
int child_grow(struct supertype *st, int afd, struct mdinfo *sra,
unsigned long stripes, int *fds, unsigned long long *offsets,
int disks, int chunk, int level, int layout, int data,
@@ -2300,25 +2548,73 @@ int child_grow(struct supertype *st, int afd, struct mdinfo *sra,
{
char *buf;
int degraded = 0;
+ int ext_backup = (st->ss->save_backup) ? 1 : 0;
+ unsigned int buf_size;

- if (posix_memalign((void**)&buf, 4096, disks * chunk))
+ buf_size = (ext_backup) ? stripes * disks * chunk :
+ (unsigned int)(disks * chunk);
+ if (posix_memalign((void **)&buf, 4096, buf_size))
/* Don't start the 'reshape' */
return 0;
sysfs_set_num(sra, NULL, "suspend_hi", 0);
sysfs_set_num(sra, NULL, "suspend_lo", 0);
- grow_backup(sra, 0, stripes,
- fds, offsets, disks, chunk, level, layout,
- dests, destfd, destoffsets,
- 0, °raded, buf);
- validate(afd, destfd[0], destoffsets[0]);
- wait_backup(st, sra, 0, stripes * (chunk / 512),
- stripes * (chunk / 512),
- dests, destfd, destoffsets,
- 0);
+ if (ext_backup) {
+ unsigned long long size;
+ unsigned long long resync_max;
+ int new_odata;
+
+ grow_backup_ext(st, sra, 0, stripes, fds,
+ offsets, dests, destfd, destoffsets,
+ °raded, buf);
+
+ /* go via not critical stripes,
+ * direct mdmon to drive proces up to next stop
+ * using arbitraty distance betwen checkpoints
+ */
+
+ new_odata = disks + sra->delta_disks;
+ if (sra->new_level >= 4)
+ new_odata--;
+ if (sra->new_level == 6)
+ new_odata--;
+ size = sra->component_size;
+ stripes *= 1024 * 10;
+ resync_max = stripes;
+
+ while (resync_max < size) {
+ sysfs_set_num(sra, NULL, "suspend_hi",
+ resync_max * new_odata);
+ send_resync_max_to_mdmon(st, sra, resync_max);
+ /* Wait for updated suspend_lo */
+ if (wait_reshape_completed_ext(st, sra,
+ resync_max * new_odata) == -2)
+ /* reshape has been finished
+ */
+ break;
+ resync_max += stripes;
+ }
+
+ /* Send resync_max=MAX (-1LLU) to mdmon */
+ send_resync_max_to_mdmon(st, sra, -1LLU);
+ } else {
+ grow_backup(sra, 0, stripes,
+ fds, offsets, disks, chunk, level, layout,
+ dests, destfd, destoffsets,
+ 0, °raded, buf);
+ validate(afd, destfd[0], destoffsets[0]);
+ wait_backup(st, sra, 0, stripes * chunk / 512,
+ stripes * chunk / 512, dests, destfd, destoffsets,
+ 0);
+ sysfs_set_num(sra,
+ NULL,
+ "suspend_lo",
+ (stripes * chunk/512) * data);
+ /* FIXME this should probably be numeric */
+ sysfs_set_str(sra, NULL, "sync_max", "max");
+ }
+
sysfs_set_num(sra, NULL, "suspend_lo", (stripes * (chunk/512)) * data);
free(buf);
- /* FIXME this should probably be numeric */
- sysfs_set_str(sra, NULL, "sync_max", "max");
return 1;
}

@@ -2360,6 +2656,55 @@ static int child_shrink(struct supertype *st,
return 1;
}

+static int child_same_size_ext(struct supertype *st, int afd,
+ struct mdinfo *sra, unsigned long stripes, int *fds,
+ unsigned long long *offsets, unsigned long long start,
+ int disks, int chunk, int level, int layout, int data,
+ int dests, int *destfd, unsigned long long *destoffsets)
+{
+ unsigned long long size;
+ unsigned long tailstripes = stripes;
+ char *buf;
+ unsigned long long speed;
+ int degraded = 0;
+ int status;
+
+ if (posix_memalign((void **)&buf, 4096, stripes * disks * chunk))
+ return 0;
+
+ sysfs_set_num(sra, NULL, "suspend_lo", 0);
+ sysfs_set_num(sra, NULL, "suspend_hi", 0);
+
+ sysfs_get_ll(sra, NULL, "sync_speed_min", &speed);
+ sysfs_set_num(sra, NULL, "sync_speed_min", 200000);
+
+ /* wait reshape is starteb by managemon
+ * - give a chance to update the metadata */
+ if (wait_reshape_start_ext(st, sra)) {
+ dprintf("Error: Reshape not started\n");
+ free(buf);
+ return -1;
+ }
+
+ size = sra->component_size / (chunk/512);
+ while (start < size) {
+ if (start + stripes > size)
+ tailstripes = (size - start);
+
+ status = grow_backup_ext(st, sra, start*chunk/512, tailstripes,
+ fds, offsets,
+ dests, destfd, destoffsets,
+ °raded, buf);
+ if (status == 0)
+ start += stripes;
+ else
+ break;
+ }
+ sysfs_set_num(sra, NULL, "sync_speed_min", speed);
+ free(buf);
+ return 1;
+}
+
int child_same_size(struct supertype *st, int afd,
struct mdinfo *sra, unsigned long stripes,
int *fds, unsigned long long *offsets,
@@ -2374,6 +2719,12 @@ int child_same_size(struct supertype *st, int afd,
unsigned long long speed;
int degraded = 0;

+ int ext_backup = (st->ss->save_backup) ? 1 : 0;
+
+ if (ext_backup)
+ return child_same_size_ext(st, afd, sra, stripes, fds, offsets,
+ start, disks, chunk, level, layout,
+ data, dests, destfd, destoffsets);

if (posix_memalign((void**)&buf, 4096, disks * chunk))
return 0;
@@ -2397,6 +2748,7 @@ int child_same_size(struct supertype *st, int afd,
validate(afd, destfd[0], destoffsets[0]);
part = 0;
start += stripes * 2; /* where to read next */
+
size = sra->component_size / (chunk/512);
while (start < size) {
if (wait_backup(st, sra, (start-stripes*2)*(chunk/512),
diff --git a/managemon.c b/managemon.c
index c675d71..68e9642 100644
--- a/managemon.c
+++ b/managemon.c
@@ -512,6 +512,14 @@ static void manage_member(struct mdstat_ent *mdstat,
"sync_max",
0) < 0)
status_ok = 0;
+ if (status_ok) {
+ dprintf("managemon: zero suspend_hi\n");
+ if (sysfs_set_num(&newa->info,
+ NULL,
+ "suspend_hi",
+ 0) < 0)
+ status_ok = 0;
+ }
if (status_ok && newa->reshape_raid_disks) {
dprintf("managemon: set raid_disks "\
"to %i\n",
@@ -567,6 +575,14 @@ static void manage_member(struct mdstat_ent *mdstat,
/* reshape executed
*/
dprintf("Reshape was started\n");
+ newa->old_data_disks =
+ newa->info.array.raid_disks;
+ if (newa->info.array.level == 4)
+ newa->old_data_disks--;
+ if (newa->info.array.level == 5)
+ newa->old_data_disks--;
+ if (newa->info.array.level == 6)
+ newa->old_data_disks--;
if (newa->reshape_raid_disks > 0)
newa->new_data_disks =
newa->reshape_raid_disks;
@@ -580,6 +596,9 @@ static void manage_member(struct mdstat_ent *mdstat,
newa->new_data_disks--;
if (a->info.array.level == 6)
newa->new_data_disks--;
+ newa->waiting_for = wait_grow_backup;
+ newa->grow_sync_max = 0;
+
replace_array(a->container, a, newa);
a = newa;
newa = NULL;
@@ -716,7 +735,7 @@ static void manage_new(struct mdstat_ent *mdstat,
return;

mdi = sysfs_read(-1, mdstat->devnum,
- GET_LEVEL|GET_CHUNK|GET_DISKS|GET_COMPONENT|
+ GET_LEVEL|GET_LAYOUT|GET_CHUNK|GET_DISKS|GET_COMPONENT|
GET_DEGRADED|GET_DEVS|GET_OFFSET|GET_SIZE|GET_STATE);

new = malloc(sizeof(*new));
@@ -880,6 +899,19 @@ static void handle_command(struct supertype *container, struct cmd_message *msg)
switch (msg->type) {
case SET_SYNC_MAX:
/* Add SET_SYNC_MAX handler here */
+ if (a->waiting_for == wait_grow_backup) {
+ if (msg->msg_buf.new_sync_max <= a->grow_sync_max) {
+ dprintf("%s: unexpected sync_max value: "\
+ "%llu <= %llu!\n",
+ __func__, msg->msg_buf.new_sync_max,
+ a->grow_sync_max);
+ }
+ a->grow_sync_max = msg->msg_buf.new_sync_max;
+ } else {
+ dprintf("%s: unexpected sync_max msg from mdadm!\n",
+ __func__);
+ }
+ wakeup_monitor();
break;
}
}
diff --git a/mdadm.h b/mdadm.h
index de5d642..ba179b4 100644
--- a/mdadm.h
+++ b/mdadm.h
@@ -1036,7 +1036,6 @@ extern int Grow_restart(struct supertype *st, struct mdinfo *info,
int *fdlist, int cnt, char *backup_file, int verbose);
extern int Grow_continue(int mdfd, struct supertype *st,
struct mdinfo *info, char *backup_file);
-
extern int Assemble(struct supertype *st, char *mddev,
struct mddev_ident *ident,
struct mddev_dev *devlist,
diff --git a/mdmon.h b/mdmon.h
index c463003..9339131 100644
--- a/mdmon.h
+++ b/mdmon.h
@@ -26,6 +26,8 @@ enum sync_action { idle, reshape, resync, recover, check, repair, bad_action };
enum state_of_reshape { reshape_not_active, reshape_is_starting,
reshape_in_progress, reshape_cancel_request };

+enum reshape_wait { wait_grow_backup, wait_md_reshape };
+
struct active_array {
struct mdinfo info;
struct supertype *container;
@@ -49,11 +51,15 @@ struct active_array {

enum state_of_reshape reshape_state;
int reshape_delta_disks;
+ int old_data_disks;
int new_data_disks;
int reshape_raid_disks;
int reshape_level;
int reshape_layout;
int reshape_chunk_size;
+ unsigned long long grow_sync_max; /* sync_max from mdadm Grow */
+ enum reshape_wait waiting_for; /* we can wait for grow backup event
+ or for md reshape completed */

int check_degraded; /* flag set by mon, read by manage */

diff --git a/monitor.c b/monitor.c
index cab558c..7509335 100644
--- a/monitor.c
+++ b/monitor.c
@@ -218,12 +218,17 @@ static int read_and_act(struct active_array *a)
int deactivate = 0;
struct mdinfo *mdi;
int dirty = 0;
+ long long unsigned new_sync_completed;
+ long long unsigned curr_sync_max;
+ unsigned long long safe_sync_max;
+ int signal_md_reshape = 0;

a->next_state = bad_word;
a->next_action = bad_action;

a->curr_state = read_state(a->info.state_fd);
a->curr_action = read_action(a->action_fd);
+ new_sync_completed = read_resync_start(a->sync_completed_fd);
a->info.resync_start = read_resync_start(a->resync_start_fd);
sync_completed = read_sync_completed(a->sync_completed_fd);
for (mdi = a->info.devs; mdi ; mdi = mdi->next) {
@@ -234,6 +239,103 @@ static int read_and_act(struct active_array *a)
}
}

+ if (a->curr_action == reshape && a->waiting_for == wait_grow_backup) {
+ /* We are waiting for mdadm Grow backup completed
+ */
+ sysfs_get_ll(&a->info, NULL, "sync_max", &curr_sync_max);
+ if (a->grow_sync_max > curr_sync_max) {
+ /* grow_resync_max was update by mdadm:
+ * continue the reshape with md
+ */
+ signal_md_reshape = 1;
+ }
+ }
+
+ if (a->curr_action == reshape && a->waiting_for == wait_md_reshape) {
+ /* We are waiting for md reshape completed.
+ * note: if new_sync_completed == 0 md completed the reshape
+ */
+ if (new_sync_completed > 0) {
+ /* It is possible that sync_completed = sync_max + 2 */
+ new_sync_completed &=
+ ~(a->info.array.chunk_size / 512 - 1);
+ if (new_sync_completed * a->new_data_disks >=
+ a->info.reshape_progress) {
+ a->info.reshape_progress =
+ new_sync_completed * a->new_data_disks;
+
+ /* write_metadata: migration record */
+ a->container->ss->discard_backup(a->container,
+ &a->info);
+ }
+
+ sysfs_get_ll(&a->info,
+ NULL,
+ "sync_max",
+ &curr_sync_max);
+ if (curr_sync_max == 0)
+ /* sync_max was set to max */
+ curr_sync_max = -1LLU;
+
+ /* md confirms end of area with 0 value
+ */
+ if (new_sync_completed == 0)
+ new_sync_completed = curr_sync_max;
+
+ if (new_sync_completed >= curr_sync_max) {
+
+ if (sysfs_set_num(&a->info, NULL, "suspend_lo",
+ new_sync_completed *
+ a->new_data_disks) != 0)
+ dprintf("mdmon: setting suspend_lo() "\
+ "FAILED!\n");
+
+ a->waiting_for = wait_grow_backup;
+ if (a->grow_sync_max == -1LLU)
+ /* calculate next sync_max
+ * and wait for md*/
+ signal_md_reshape = 1;
+ }
+
+ } else {
+ /* reshape was finished. should we do something here? */
+ }
+ }
+
+ if (a->curr_action == reshape && signal_md_reshape == 1) {
+ if (a->grow_sync_max == -1LLU) {
+ /* calculate next safe sync_max for the reshape */
+ safe_sync_max =
+ a->info.reshape_progress / a->old_data_disks;
+ safe_sync_max &= ~(a->info.array.chunk_size / 512 - 1);
+ if (safe_sync_max >= a->info.component_size)
+ sysfs_set_str(&a->info,
+ NULL,
+ "sync_max",
+ "max");
+ else {
+ /* Workarround:
+ * sometimes md reports sync_completed == 2
+ * but in fact it is 0
+ */
+ if ((new_sync_completed == 2) &&
+ (safe_sync_max == 0))
+ safe_sync_max = 2;
+ sysfs_set_num(&a->info,
+ NULL,
+ "sync_max",
+ safe_sync_max);
+ }
+ } else {
+ sysfs_set_num(&a->info,
+ NULL,
+ "sync_max",
+ a->grow_sync_max);
+ }
+ /* sync_max was set. wait for md. */
+ a->waiting_for = wait_md_reshape;
+ }
+
if (a->curr_state <= inactive &&
a->prev_state > inactive) {
/* array has been stopped */
@@ -306,7 +408,7 @@ static int read_and_act(struct active_array *a)
}

if (a->curr_action == reshape)
- a->info.reshape_progress = a->info.resync_start *
+ a->info.reshape_progress = sync_completed *
a->new_data_disks;

/* finalize reshape detection

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

[PATCH 15/16] imsm Fix: Core during rebuild on array details read

am 13.12.2010 15:47:01 von adam.kwolek

When rebuild/reshape is in progress, executing mdadm for read array details causes core dump.
This is due to not initialized devices list pointer in getinfo_super() call.
Initializing it to NULL value allows code to detect such situation.

Signed-off-by: Adam Kwolek
---

Detail.c | 5 ++++-
1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/Detail.c b/Detail.c
index 9415628..836f153 100644
--- a/Detail.c
+++ b/Detail.c
@@ -147,7 +147,10 @@ int Detail(char *dev, int brief, int export, int test, char *homehost)
info = st->ss->container_content(st, subarray);
else {
info = malloc(sizeof(*info));
- st->ss->getinfo_super(st, info, NULL);
+ if (info) {
+ info->devs = NULL;
+ st->ss->getinfo_super(st, info, NULL);
+ }
}
if (!info)
continue;

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

[PATCH 16/16] Allow for reshape without backup file

am 13.12.2010 15:47:10 von adam.kwolek

When reshape process is guarded by metadata specific check pointing,
backup file is no longer necessary.
Remove backup file requirement from mdadm command line when reshape_super
and manage reshape_super are defined for external metadata case.

Signed-off-by: Adam Kwolek
---

Grow.c | 33 +++++++++++++++++++++++++--------
super-intel.c | 15 +--------------
2 files changed, 26 insertions(+), 22 deletions(-)

diff --git a/Grow.c b/Grow.c
index 02193a9..66b5ff3 100644
--- a/Grow.c
+++ b/Grow.c
@@ -1741,6 +1741,15 @@ int Grow_reshape(char *devname, int fd, int quiet, char *backup_file,
goto release;
}
if (backup_file == NULL) {
+ int backup_file_required_for_external;
+
+ backup_file_required_for_external = st->ss->external &&
+ st->ss->reshape_super &&
+ st->ss->manage_reshape &&
+ st->ss->save_backup &&
+ st->ss->discard_backup &&
+ st->ss->recover_backup;
+
if (st->ss->external && !st->ss->manage_reshape) {
fprintf(stderr, Name ": %s Grow operation not supported by %s metadata\n",
devname, st->ss->name);
@@ -1748,10 +1757,14 @@ int Grow_reshape(char *devname, int fd, int quiet, char *backup_file,
break;
}
if (ndata <= odata) {
- fprintf(stderr, Name ": %s: Cannot grow - need backup-file\n",
- devname);
- rv = 1;
- break;
+ if (!backup_file_required_for_external) {
+ fprintf(stderr,
+ Name": %s: Cannot grow - "\
+ "need backup-file\n",
+ devname);
+ rv = 1;
+ break;
+ }
} else if (sra->array.spare_disks == 0) {
fprintf(stderr, Name ": %s: Cannot grow - need a spare or "
"backup-file to backup critical section\n",
@@ -1760,10 +1773,14 @@ int Grow_reshape(char *devname, int fd, int quiet, char *backup_file,
break;
}
if (d == array.raid_disks) {
- fprintf(stderr, Name ": %s: No spare device for backup\n",
- devname);
- rv = 1;
- break;
+ if (!backup_file_required_for_external) {
+ fprintf(stderr,
+ Name ": %s: No spare device"\
+ "for backup\n",
+ devname);
+ rv = 1;
+ break;
+ }
}
} else {
if (!reshape_open_backup_file(backup_file, fd, devname,
diff --git a/super-intel.c b/super-intel.c
index 351fbbc..626ecb6 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -8513,16 +8513,6 @@ int imsm_child_grow(struct supertype *st,
return ret_val;
}

- if (reshape_open_backup_file(backup, fd_in, "imsm",
- (signed)blocks,
- fdlist, offsets) == 0) {
- free(fdlist);
- free(offsets);
- ret_val = 1;
- return ret_val;
- }
- d++;
-
mlockall(MCL_FUTURE);
if (ret_val == 0) {
if (check_env("MDADM_GROW_VERIFY"))
@@ -8540,14 +8530,11 @@ int imsm_child_grow(struct supertype *st,
fdlist, offsets,
odisks, sra->array.chunk_size,
sra->array.level, sra->array.layout, odata,
- d - odisks, fdlist + odisks, offsets + odisks);
+ d - odisks, NULL, offsets + odisks);
imsm_grow_manage_size(st, sra, current_vol);
}
reshape_free_fdlist(fdlist, offsets, d);

- if (backup)
- unlink(backup);
-
return ret_val;
}

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html