[PATCH] FIX: Mdmon crashes after changing RAID level from 1 to 0

[PATCH] FIX: Mdmon crashes after changing RAID level from 1 to 0

am 01.09.2011 15:10:34 von Lukasz Dorau

RGVzY3JpcHRpb24gb2YgdGhlIGJ1ZzoKU29tZXRpbWVzIG1kbW9uIGNyYXNo ZXMgYWZ0ZXIgY2hh
bmdpbmcgUkFJRCBsZXZlbCBmcm9tIDEgdG8gMCAodGFrZW92ZXIpLgoKQ2F1 c2Ugb2YgdGhlIGJ1
ZzoKVGhlIG1hbmFnZW1vbiBtYXJrcyBhbiBhY3RpdmVfYXJyYXkgZm9yIHJl bW92YWwgZnJvbSBt
b25pdG9yaW5nCmJ5IGFzc2lnbmluZyBhLT5jb250YWluZXIgdG8gTlVMTCB2 YWx1ZSAoaW4gdGhl
ICJtYW5hZ2VfbWVtYmVyIiBmdW5jdGlvbikuClNvbWV0aW1lcyAoZHVyaW5n IHN0cmVzcyB0ZXN0
KSBpdCBoYXBwZW5zIHJpZ2h0IHdoZW4gdGhlIG1vbml0b3IKaXMgaW4gdGhl ICJyZWFkX2FuZF9h
Y3QiIGZ1bmN0aW9uIGFuZCBhLT5jb250YWluZXIgcG9pbnRlciBpcyBpbiB1 c2UuClRoaXMgY2F1
c2VzIHRoZSBtb25pdG9yIGNyYXNoZXMuCgpTb2x1dGlvbjoKVGhlIGFjdGl2 ZSBhcnJheSBoYXMg
dG8gYmUgbWFya2VkIGZvciByZW1vdmFsIGluIGFub3RoZXIgd2F5CnRoYW4g c2V0dGluZyBOVUxM
IHBvaW50ZXIgd2hlbiBpdCBjYW4gYmUgaW4gdXNlLgpBIG5ldyBmaWVsZCAi dG9fcmVtb3ZlIiB3
YXMgYWRkZWQgdG8gdGhlICJhY3RpdmVfYXJyYXkiIHN0cnVjdHVyZS4KSXQg aXMgdXNlZCBpbiB0
aGUgbWFuYWdlbW9uIHRvIG1hcmsgYSBjb250YWluZXIgdG8gcmVtb3ZlCihp bnN0ZWFkIG9mIHRo
ZSBvbGQgYXNzaWdtZW50OiBhLT5jb250YWluZXIgPSBOVUxMKQphbmQgbW9u aXRvciBjaGVja3Mg
aXQgdG8gZGV0ZXJtaW5lIGlmIHRoZSBhcnJheSBzaG91bGQgYmUgcmVtb3Zl ZC4KVGhlIGZpZWxk
ICJ0b19yZW1vdmUiIHNob3VsZCBiZSBjaGVja2VkIGluIHNvbWUgb3RoZXIg cGxhY2VzCnRvIGF2
b2lkIG1hbmFnaW5nIG9mIHRoZSBhcnJheSB3aGljaCBpcyBnb2luZyB0byBi ZSByZW1vdmVkLgoK
U2lnbmVkLW9mZi1ieTogTHVrYXN6IERvcmF1IDxsdWthc3ouZG9yYXVAaW50 ZWwuY29tPgotLS0K
IG1hbmFnZW1vbi5jIHwgICAgNCArKy0tCiBtZG1vbi5oICAgICB8ICAgIDEg KwogbW9uaXRvci5j
ICAgfCAgICA4ICsrKystLS0tCiAzIGZpbGVzIGNoYW5nZWQsIDcgaW5zZXJ0 aW9ucygrKSwgNiBk
ZWxldGlvbnMoLSkKCmRpZmYgLS1naXQgYS9tYW5hZ2Vtb24uYyBiL21hbmFn ZW1vbi5jCmluZGV4
IGQwMjBmODIuLjllMGEzNGQgMTAwNjQ0Ci0tLSBhL21hbmFnZW1vbi5jCisr KyBiL21hbmFnZW1v
bi5jCkBAIC00NjEsNyArNDYxLDcgQEAgc3RhdGljIHZvaWQgbWFuYWdlX21l bWJlcihzdHJ1Y3Qg
bWRzdGF0X2VudCAqbWRzdGF0LAogCWlmIChtZHN0YXQtPmxldmVsKSB7CiAJ CWludCBsZXZlbCA9
IG1hcF9uYW1lKHBlcnMsIG1kc3RhdC0+bGV2ZWwpOwogCQlpZiAobGV2ZWwg PT0gMCB8fCBsZXZl
bCA9PSBMRVZFTF9MSU5FQVIpIHsKLQkJCWEtPmNvbnRhaW5lciA9IE5VTEw7 CisJCQlhLT50b19y
ZW1vdmUgPSAxOwogCQkJd2FrZXVwX21vbml0b3IoKTsKIAkJCXJldHVybjsK IAkJfQpAQCAtNzM5
LDcgKzczOSw3IEBAIHZvaWQgbWFuYWdlKHN0cnVjdCBtZHN0YXRfZW50ICpt ZHN0YXQsIHN0cnVj
dCBzdXBlcnR5cGUgKmNvbnRhaW5lcikKIAkJLyogTG9va3MgbGlrZSBhIG1l bWJlciBvZiB0aGlz
IGNvbnRhaW5lciAqLwogCQlmb3IgKGEgPSBjb250YWluZXItPmFycmF5czsg YTsgYSA9IGEtPm5l
eHQpIHsKIAkJCWlmIChtZHN0YXQtPmRldm51bSA9PSBhLT5kZXZudW0pIHsK LQkJCQlpZiAoYS0+
Y29udGFpbmVyKQorCQkJCWlmIChhLT5jb250YWluZXIgJiYgYS0+dG9fcmVt b3ZlID09IDApCiAJ
CQkJCW1hbmFnZV9tZW1iZXIobWRzdGF0LCBhKTsKIAkJCQlicmVhazsKIAkJ CX0KZGlmZiAtLWdp
dCBhL21kbW9uLmggYi9tZG1vbi5oCmluZGV4IDZkMTc3NmYuLjU5ZTFiNTMg MTAwNjQ0Ci0tLSBh
L21kbW9uLmgKKysrIGIvbWRtb24uaApAQCAtMjgsNiArMjgsNyBAQCBzdHJ1 Y3QgYWN0aXZlX2Fy
cmF5IHsKIAlzdHJ1Y3QgbWRpbmZvIGluZm87CiAJc3RydWN0IHN1cGVydHlw ZSAqY29udGFpbmVy
OwogCXN0cnVjdCBhY3RpdmVfYXJyYXkgKm5leHQsICpyZXBsYWNlczsKKwlp bnQgdG9fcmVtb3Zl
OwogCiAJaW50IGFjdGlvbl9mZDsKIAlpbnQgcmVzeW5jX3N0YXJ0X2ZkOwpk aWZmIC0tZ2l0IGEv
bW9uaXRvci5jIGIvbW9uaXRvci5jCmluZGV4IDdhYzU5MDcuLmIwMDJlOTAg MTAwNjQ0Ci0tLSBh
L21vbml0b3IuYworKysgYi9tb25pdG9yLmMKQEAgLTQ3OSw3ICs0NzksNyBA QCBzdGF0aWMgdm9p
ZCByZWNvbmNpbGVfZmFpbGVkKHN0cnVjdCBhY3RpdmVfYXJyYXkgKmFhLCBz dHJ1Y3QgbWRpbmZv
ICpmYWlsZWQpCiAJc3RydWN0IG1kaW5mbyAqdmljdGltOwogCiAJZm9yIChh ID0gYWE7IGE7IGEg
PSBhLT5uZXh0KSB7Ci0JCWlmICghYS0+Y29udGFpbmVyKQorCQlpZiAoIWEt PmNvbnRhaW5lciB8
fCBhLT50b19yZW1vdmUpCiAJCQljb250aW51ZTsKIAkJdmljdGltID0gZmlu ZF9kZXZpY2UoYSwg
ZmFpbGVkLT5kaXNrLm1ham9yLCBmYWlsZWQtPmRpc2subWlub3IpOwogCQlp ZiAoIXZpY3RpbSkK
QEAgLTUzOSw3ICs1MzksNyBAQCBzdGF0aWMgaW50IHdhaXRfYW5kX2FjdChz dHJ1Y3Qgc3VwZXJ0
eXBlICpjb250YWluZXIsIGludCBub3dhaXQpCiAJCS8qIG9uY2UgYW4gYXJy YXkgaGFzIGJlZW4g
ZGVhY3RpdmF0ZWQgd2Ugd2FudCB0bwogCQkgKiBhc2sgdGhlIG1hbmFnZXIg dG8gZGlzY2FyZCBp
dC4KIAkJICovCi0JCWlmICghYS0+Y29udGFpbmVyKSB7CisJCWlmICghYS0+ Y29udGFpbmVyIHx8
IGEtPnRvX3JlbW92ZSkgewogCQkJaWYgKGRpc2NhcmRfdGhpcykgewogCQkJ CWFwID0gJigqYXAp
LT5uZXh0OwogCQkJCWNvbnRpbnVlOwpAQCAtNjQyLDcgKzY0Miw3IEBAIHN0 YXRpYyBpbnQgd2Fp
dF9hbmRfYWN0KHN0cnVjdCBzdXBlcnR5cGUgKmNvbnRhaW5lciwgaW50IG5v d2FpdCkKIAkJCS8q
IEZJWE1FIGNoZWNrIGlmIGRldmljZS0+c3RhdGVfZmQgbmVlZCB0byBiZSBj bGVhcmVkPyovCiAJ
CQlzaWduYWxfbWFuYWdlcigpOwogCQl9Ci0JCWlmIChhLT5jb250YWluZXIp IHsKKwkJaWYgKGEt
PmNvbnRhaW5lciAmJiAhYS0+dG9fcmVtb3ZlKSB7CiAJCQlpc19kaXJ0eSA9 IHJlYWRfYW5kX2Fj
dChhKTsKIAkJCXJ2IHw9IDE7CiAJCQlkaXJ0eV9hcnJheXMgKz0gaXNfZGly dHk7CkBAIC02NTcs
NyArNjU3LDcgQEAgc3RhdGljIGludCB3YWl0X2FuZF9hY3Qoc3RydWN0IHN1 cGVydHlwZSAqY29u
dGFpbmVyLCBpbnQgbm93YWl0KQogCiAJLyogcHJvcGFnYXRlIGZhaWx1cmVz IGFjcm9zcyBjb250
YWluZXIgbWVtYmVycyAqLwogCWZvciAoYSA9ICphYXA7IGEgOyBhID0gYS0+ bmV4dCkgewotCQlp
ZiAoIWEtPmNvbnRhaW5lcikKKwkJaWYgKCFhLT5jb250YWluZXIgfHwgYS0+ dG9fcmVtb3ZlKQog
CQkJY29udGludWU7CiAJCWZvciAobWRpID0gYS0+aW5mby5kZXZzIDsgbWRp IDsgbWRpID0gbWRp
LT5uZXh0KQogCQkJaWYgKG1kaS0+Y3Vycl9zdGF0ZSAmIERTX0ZBVUxUWSkK Ci0tLS0tLS0tLS0t
LS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t LS0tLS0tLS0tLS0t
LQpJbnRlbCBUZWNobm9sb2d5IFBvbGFuZCBzcC4geiBvLm8uCnogc2llZHpp YmEgdyBHZGFuc2t1
CnVsLiBTbG93YWNraWVnbyAxNzMKODAtMjk4IEdkYW5zawoKU2FkIFJlam9u b3d5IEdkYW5zayBQ
b2xub2MgdyBHZGFuc2t1LCAKVklJIFd5ZHppYWwgR29zcG9kYXJjenkgS3Jh am93ZWdvIFJlamVz
dHJ1IFNhZG93ZWdvLCAKbnVtZXIgS1JTIDEwMTg4MgoKTklQIDk1Ny0wNy01 Mi0zMTYKS2FwaXRh
bCB6YWtsYWRvd3kgMjAwLjAwMCB6bAoKVGhpcyBlLW1haWwgYW5kIGFueSBh dHRhY2htZW50cyBt
YXkgY29udGFpbiBjb25maWRlbnRpYWwgbWF0ZXJpYWwgZm9yCnRoZSBzb2xl IHVzZSBvZiB0aGUg
aW50ZW5kZWQgcmVjaXBpZW50KHMpLiBBbnkgcmV2aWV3IG9yIGRpc3RyaWJ1 dGlvbgpieSBvdGhl
cnMgaXMgc3RyaWN0bHkgcHJvaGliaXRlZC4gSWYgeW91IGFyZSBub3QgdGhl IGludGVuZGVkCnJl
Y2lwaWVudCwgcGxlYXNlIGNvbnRhY3QgdGhlIHNlbmRlciBhbmQgZGVsZXRl IGFsbCBjb3BpZXMu
Cg==

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH] FIX: Mdmon crashes after changing RAID level from 1 to0

am 03.09.2011 12:48:00 von Jan Ceuleers

On 09/01/2011 03:10 PM, Lukasz Dorau wrote:
> ------------------------------------------------------------ ---------
> Intel Technology Poland sp. z o.o.
> z siedziba w Gdansku
> ul. Slowackiego 173
> 80-298 Gdansk
>
> Sad Rejonowy Gdansk Polnoc w Gdansku,
> VII Wydzial Gospodarczy Krajowego Rejestru Sadowego,
> numer KRS 101882
>
> NIP 957-07-52-316
> Kapital zakladowy 200.000 zl
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.
Hi Lukasz.

I'm not a maintainer of any kind, but I think your contributions are
unusable because of the above footer, particularly the confidentiality
clause. Can you resubmit, after reconfiguring your mail setup not to
include this footer?

Thanks, Jan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH] FIX: Mdmon crashes after changing RAID level from 1 to0

am 07.09.2011 04:41:31 von NeilBrown

On Thu, 01 Sep 2011 15:10:34 +0200 Lukasz Dorau
wrote:

> Description of the bug:
> Sometimes mdmon crashes after changing RAID level from 1 to 0 (takeover).
>
> Cause of the bug:
> The managemon marks an active_array for removal from monitoring
> by assigning a->container to NULL value (in the "manage_member" function).
> Sometimes (during stress test) it happens right when the monitor
> is in the "read_and_act" function and a->container pointer is in use.
> This causes the monitor crashes.
>
> Solution:
> The active array has to be marked for removal in another way
> than setting NULL pointer when it can be in use.
> A new field "to_remove" was added to the "active_array" structure.
> It is used in the managemon to mark a container to remove
> (instead of the old assigment: a->container = NULL)
> and monitor checks it to determine if the array should be removed.
> The field "to_remove" should be checked in some other places
> to avoid managing of the array which is going to be removed.
>
> Signed-off-by: Lukasz Dorau

Thanks.

I have applied this - despite the ridiculous disclaimer at the bottom :-)

NeilBrown


> ---
> managemon.c | 4 ++--
> mdmon.h | 1 +
> monitor.c | 8 ++++----
> 3 files changed, 7 insertions(+), 6 deletions(-)
>
> diff --git a/managemon.c b/managemon.c
> index d020f82..9e0a34d 100644
> --- a/managemon.c
> +++ b/managemon.c
> @@ -461,7 +461,7 @@ static void manage_member(struct mdstat_ent *mdstat,
> if (mdstat->level) {
> int level = map_name(pers, mdstat->level);
> if (level == 0 || level == LEVEL_LINEAR) {
> - a->container = NULL;
> + a->to_remove = 1;
> wakeup_monitor();
> return;
> }
> @@ -739,7 +739,7 @@ void manage(struct mdstat_ent *mdstat, struct supertype *container)
> /* Looks like a member of this container */
> for (a = container->arrays; a; a = a->next) {
> if (mdstat->devnum == a->devnum) {
> - if (a->container)
> + if (a->container && a->to_remove == 0)
> manage_member(mdstat, a);
> break;
> }
> diff --git a/mdmon.h b/mdmon.h
> index 6d1776f..59e1b53 100644
> --- a/mdmon.h
> +++ b/mdmon.h
> @@ -28,6 +28,7 @@ struct active_array {
> struct mdinfo info;
> struct supertype *container;
> struct active_array *next, *replaces;
> + int to_remove;
>
> int action_fd;
> int resync_start_fd;
> diff --git a/monitor.c b/monitor.c
> index 7ac5907..b002e90 100644
> --- a/monitor.c
> +++ b/monitor.c
> @@ -479,7 +479,7 @@ static void reconcile_failed(struct active_array *aa, struct mdinfo *failed)
> struct mdinfo *victim;
>
> for (a = aa; a; a = a->next) {
> - if (!a->container)
> + if (!a->container || a->to_remove)
> continue;
> victim = find_device(a, failed->disk.major, failed->disk.minor);
> if (!victim)
> @@ -539,7 +539,7 @@ static int wait_and_act(struct supertype *container, int nowait)
> /* once an array has been deactivated we want to
> * ask the manager to discard it.
> */
> - if (!a->container) {
> + if (!a->container || a->to_remove) {
> if (discard_this) {
> ap = &(*ap)->next;
> continue;
> @@ -642,7 +642,7 @@ static int wait_and_act(struct supertype *container, int nowait)
> /* FIXME check if device->state_fd need to be cleared?*/
> signal_manager();
> }
> - if (a->container) {
> + if (a->container && !a->to_remove) {
> is_dirty = read_and_act(a);
> rv |= 1;
> dirty_arrays += is_dirty;
> @@ -657,7 +657,7 @@ static int wait_and_act(struct supertype *container, int nowait)
>
> /* propagate failures across container members */
> for (a = *aap; a ; a = a->next) {
> - if (!a->container)
> + if (!a->container || a->to_remove)
> continue;
> for (mdi = a->info.devs ; mdi ; mdi = mdi->next)
> if (mdi->curr_state & DS_FAULTY)
>
> ------------------------------------------------------------ ---------
> Intel Technology Poland sp. z o.o.
> z siedziba w Gdansku
> ul. Slowackiego 173
> 80-298 Gdansk
>
> Sad Rejonowy Gdansk Polnoc w Gdansku,
> VII Wydzial Gospodarczy Krajowego Rejestru Sadowego,
> numer KRS 101882
>
> NIP 957-07-52-316
> Kapital zakladowy 200.000 zl
>
> This e-mail and any attachments may contain confidential material for
> the sole use of the intended recipient(s). Any review or distribution
> by others is strictly prohibited. If you are not the intended
> recipient, please contact the sender and delete all copies.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html