[PATCH 1/2] FIX: Use successfully loaded metadata only

[PATCH 1/2] FIX: Use successfully loaded metadata only

am 12.04.2011 14:51:16 von adam.kwolek

Values greater than 0, means error. We exit from loop on error
with empty super-block pointer when sd pointer is valid.
This cannot be detected by check condition as error.
For sure we shouldn't go forward with error condition.
It leads to throwing exception with core file when metadata handler
wants to access non existing super-block.

Signed-off-by: Adam Kwolek
---

Grow.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/Grow.c b/Grow.c
index a954cfd..768278f 100644
--- a/Grow.c
+++ b/Grow.c
@@ -2933,7 +2933,7 @@ int child_monitor(int afd, struct mdinfo *sra, struct reshape *reshape,
continue;
ok = st->ss->load_super(st, devfd, NULL);
close(devfd);
- if (ok >= 0)
+ if (ok == 0)
break;
}
if (!sd) {

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

[PATCH 2/2] imsm: FIX: Be more patient during loading matadata

am 12.04.2011 14:51:28 von adam.kwolek

Sometimes occurs that metadata cannot be loaded e.g. wrong check sum
It can happen due to metadata update racing with mdmon condition.
If mpb loading is tried again, it is loaded successfully.
Try to load metadata again before really giving up.

Signed-off-by: Adam Kwolek
---

super-intel.c | 10 ++++++++--
1 files changed, 8 insertions(+), 2 deletions(-)

diff --git a/super-intel.c b/super-intel.c
index dc5e34e..d23267a 100644
--- a/super-intel.c
+++ b/super-intel.c
@@ -2773,8 +2773,14 @@ load_and_parse_mpb(int fd, struct intel_super *super, char *devname, int keep_fd
int err;

err = load_imsm_mpb(fd, super, devname);
- if (err)
- return err;
+ if (err) {
+ /* try to load mpb again,
+ * in case of mdmon race we could have more luck...
+ */
+ err = load_imsm_mpb(fd, super, devname);
+ if (err)
+ return err;
+ }
err = load_imsm_disk(fd, super, devname, keep_fd);
if (err)
return err;

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] imsm: FIX: Be more patient during loading matadata

am 13.04.2011 02:44:31 von dan.j.williams

On Tue, Apr 12, 2011 at 5:51 AM, Adam Kwolek wr=
ote:
> Sometimes occurs that metadata cannot be loaded e.g. wrong check sum
> It can happen due to metadata update racing with mdmon condition.
> If mpb loading is tried again, it is loaded successfully.
> Try to load metadata again before really giving up.
>
> Signed-off-by: Adam Kwolek
> ---
>
> =A0super-intel.c | =A0 10 ++++++++--
> =A01 files changed, 8 insertions(+), 2 deletions(-)
>
> diff --git a/super-intel.c b/super-intel.c
> index dc5e34e..d23267a 100644
> --- a/super-intel.c
> +++ b/super-intel.c
> @@ -2773,8 +2773,14 @@ load_and_parse_mpb(int fd, struct intel_super =
*super, char *devname, int keep_fd
> =A0 =A0 =A0 =A0int err;
>
> =A0 =A0 =A0 =A0err =3D load_imsm_mpb(fd, super, devname);
> - =A0 =A0 =A0 if (err)
> - =A0 =A0 =A0 =A0 =A0 =A0 =A0 return err;
> + =A0 =A0 =A0 if (err) {
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* try to load mpb again,
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0* in case of mdmon race we could hav=
e more luck...
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0*/
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 err =3D load_imsm_mpb(fd, super, devnam=
e);
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (err)
> + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 return err;
> + =A0 =A0 =A0 }
> =A0 =A0 =A0 =A0err =3D load_imsm_disk(fd, super, devname, keep_fd);
> =A0 =A0 =A0 =A0if (err)
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0return err;

This is semi-duplicates the check we already do after returning from
load_and_parse_mpb in load_super_imsm_all. I'm curious, are you
hitting this path from load_super_imsm? If the container is assembled
we should be loading from the container, if the container is not
available then mdmon can't be running and checksum errors are real.

--
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

RE: [PATCH 2/2] imsm: FIX: Be more patient during loading matadata

am 13.04.2011 08:40:35 von adam.kwolek

> -----Original Message-----
> From: dan.j.williams@gmail.com [mailto:dan.j.williams@gmail.com] On
> Behalf Of Dan Williams
> Sent: Wednesday, April 13, 2011 2:45 AM
> To: Kwolek, Adam
> Cc: neilb@suse.de; linux-raid@vger.kernel.org; Ciechanowski, Ed;
> Neubauer, Wojciech
> Subject: Re: [PATCH 2/2] imsm: FIX: Be more patient during loading
> matadata
>=20
> On Tue, Apr 12, 2011 at 5:51 AM, Adam Kwolek
> wrote:
> > Sometimes occurs that metadata cannot be loaded e.g. wrong check su=
m
> > It can happen due to metadata update racing with mdmon condition.
> > If mpb loading is tried again, it is loaded successfully.
> > Try to load metadata again before really giving up.
> >
> > Signed-off-by: Adam Kwolek
> > ---
> >
> > =A0super-intel.c | =A0 10 ++++++++--
> > =A01 files changed, 8 insertions(+), 2 deletions(-)
> >
> > diff --git a/super-intel.c b/super-intel.c
> > index dc5e34e..d23267a 100644
> > --- a/super-intel.c
> > +++ b/super-intel.c
> > @@ -2773,8 +2773,14 @@ load_and_parse_mpb(int fd, struct intel_supe=
r
> *super, char *devname, int keep_fd
> > =A0 =A0 =A0 =A0int err;
> >
> > =A0 =A0 =A0 =A0err =3D load_imsm_mpb(fd, super, devname);
> > - =A0 =A0 =A0 if (err)
> > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 return err;
> > + =A0 =A0 =A0 if (err) {
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* try to load mpb again,
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0* in case of mdmon race we could h=
ave more luck...
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0*/
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 err =3D load_imsm_mpb(fd, super, devn=
ame);
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (err)
> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 return err;
> > + =A0 =A0 =A0 }
> > =A0 =A0 =A0 =A0err =3D load_imsm_disk(fd, super, devname, keep_fd);
> > =A0 =A0 =A0 =A0if (err)
> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0return err;
>=20
> This is semi-duplicates the check we already do after returning from
> load_and_parse_mpb in load_super_imsm_all. I'm curious, are you
> hitting this path from load_super_imsm? If the container is assemble=
d
> we should be loading from the container, if the container is not
> available then mdmon can't be running and checksum errors are real.
>=20
> --
> Dan

My test scenario is that after boot I'm disassembling read only array a=
nd immediately new array is assembled for grow continuation.
Sometimes occurs that mdadm throws exception and core file is generated=
It shows that anchor pointer has NULL value due to CRC error.
Second reading try helps, and anchor is always read correctly.
This behavior and fact that if I put more time between array disassembl=
ing and assembling it again helps also suggest, that we have some race =
condition here.
Problem is not in currently monitored in mdmon container but rather in =
interaction with previous mdmon session that is about to close.

This patch makes that error condition never occurs in this scenario. Gr=
ow.c is fixed for correct error condition behavior also.
I can agree that both patches in this series can help for this problem =
separately, but I think both should be placed in code.


BR
Adam


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] imsm: FIX: Be more patient during loading matadata

am 13.04.2011 20:56:29 von dan.j.williams

2011/4/12 Kwolek, Adam :
>> -----Original Message-----
>> From: dan.j.williams@gmail.com [mailto:dan.j.williams@gmail.com] On
>> Behalf Of Dan Williams
>> Sent: Wednesday, April 13, 2011 2:45 AM
>> To: Kwolek, Adam
>> Cc: neilb@suse.de; linux-raid@vger.kernel.org; Ciechanowski, Ed;
>> Neubauer, Wojciech
>> Subject: Re: [PATCH 2/2] imsm: FIX: Be more patient during loading
>> matadata
>>
>> On Tue, Apr 12, 2011 at 5:51 AM, Adam Kwolek
>> wrote:
>> > Sometimes occurs that metadata cannot be loaded e.g. wrong check s=
um
>> > It can happen due to metadata update racing with mdmon condition.
>> > If mpb loading is tried again, it is loaded successfully.
>> > Try to load metadata again before really giving up.
>> >
>> > Signed-off-by: Adam Kwolek
>> > ---
>> >
>> > =A0super-intel.c | =A0 10 ++++++++--
>> > =A01 files changed, 8 insertions(+), 2 deletions(-)
>> >
>> > diff --git a/super-intel.c b/super-intel.c
>> > index dc5e34e..d23267a 100644
>> > --- a/super-intel.c
>> > +++ b/super-intel.c
>> > @@ -2773,8 +2773,14 @@ load_and_parse_mpb(int fd, struct intel_sup=
er
>> *super, char *devname, int keep_fd
>> > =A0 =A0 =A0 =A0int err;
>> >
>> > =A0 =A0 =A0 =A0err =3D load_imsm_mpb(fd, super, devname);
>> > - =A0 =A0 =A0 if (err)
>> > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 return err;
>> > + =A0 =A0 =A0 if (err) {
>> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* try to load mpb again,
>> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0* in case of mdmon race we could =
have more luck...
>> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0*/
>> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 err =3D load_imsm_mpb(fd, super, dev=
name);
>> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (err)
>> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 return err;
>> > + =A0 =A0 =A0 }
>> > =A0 =A0 =A0 =A0err =3D load_imsm_disk(fd, super, devname, keep_fd)=
;
>> > =A0 =A0 =A0 =A0if (err)
>> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0return err;
>>
>> This is semi-duplicates the check we already do after returning from
>> load_and_parse_mpb in load_super_imsm_all. =A0I'm curious, are you
>> hitting this path from load_super_imsm? =A0If the container is assem=
bled
>> we should be loading from the container, if the container is not
>> available then mdmon can't be running and checksum errors are real.
>>
>
> My test scenario is that after boot I'm disassembling read only array=
and immediately new array is assembled for grow continuation.
> Sometimes occurs that mdadm throws exception and core file is generat=
ed. It shows that anchor pointer has NULL value due to CRC error.
> Second reading try helps, and anchor is always read correctly.
> This behavior and fact that if I put more time between array disassem=
bling and assembling it again helps also suggest, that we have some rac=
e condition here.

Right, so we need to fix that disassemble-to-assemble race, this patch
only "helps" :-).

> Problem is not in currently monitored in mdmon container but rather i=
n interaction with previous mdmon session that is about to close.
>
> This patch makes that error condition never occurs in this scenario.

That "never" needs to be qualified. This patch makes the race harder
to lose, but as far as I can see not "impossible" to lose.

> Grow.c is fixed for correct error condition behavior also.
> I can agree that both patches in this series can help for this proble=
m separately, but I think both should be placed in code.

Why does this existing check:

/* retry the load if we might have raced against mdmon *=
/
if (err == 3 && mdmon_running(devnum))
for (retry =3D 0; retry < 3; retry++) {
usleep(3000);
err =3D load_and_parse_mpb(dfd, s, NULL=
, 1);
if (err !=3D 3)
break;
}

..not help in this case. I suspect due to that mdmon_running()
qualification and the fact that the test is only seeing this in a
disassemble-to-assemble window. So that seems to be further evidence
that a higher level fix is needed.

--
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH 1/2] FIX: Use successfully loaded metadata only

am 14.04.2011 09:50:34 von NeilBrown

On Tue, 12 Apr 2011 14:51:16 +0200 Adam Kwolek wrote:

> Values greater than 0, means error. We exit from loop on error
> with empty super-block pointer when sd pointer is valid.
> This cannot be detected by check condition as error.
> For sure we shouldn't go forward with error condition.
> It leads to throwing exception with core file when metadata handler
> wants to access non existing super-block.
>
> Signed-off-by: Adam Kwolek

Applied, thanks.

NeilBrown


> ---
>
> Grow.c | 2 +-
> 1 files changed, 1 insertions(+), 1 deletions(-)
>
> diff --git a/Grow.c b/Grow.c
> index a954cfd..768278f 100644
> --- a/Grow.c
> +++ b/Grow.c
> @@ -2933,7 +2933,7 @@ int child_monitor(int afd, struct mdinfo *sra, struct reshape *reshape,
> continue;
> ok = st->ss->load_super(st, devfd, NULL);
> close(devfd);
> - if (ok >= 0)
> + if (ok == 0)
> break;
> }
> if (!sd) {
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

RE: [PATCH 2/2] imsm: FIX: Be more patient during loading matadata

am 14.04.2011 09:57:01 von adam.kwolek

> -----Original Message-----
> From: dan.j.williams@gmail.com [mailto:dan.j.williams@gmail.com] On
> Behalf Of Dan Williams
> Sent: Wednesday, April 13, 2011 8:56 PM
> To: Kwolek, Adam
> Cc: neilb@suse.de; linux-raid@vger.kernel.org; Ciechanowski, Ed;
> Neubauer, Wojciech
> Subject: Re: [PATCH 2/2] imsm: FIX: Be more patient during loading
> matadata
>
> 2011/4/12 Kwolek, Adam :
> >> -----Original Message-----
> >> From: dan.j.williams@gmail.com [mailto:dan.j.williams@gmail.com] On
> >> Behalf Of Dan Williams
> >> Sent: Wednesday, April 13, 2011 2:45 AM
> >> To: Kwolek, Adam
> >> Cc: neilb@suse.de; linux-raid@vger.kernel.org; Ciechanowski, Ed;
> >> Neubauer, Wojciech
> >> Subject: Re: [PATCH 2/2] imsm: FIX: Be more patient during loading
> >> matadata
> >>
> >> On Tue, Apr 12, 2011 at 5:51 AM, Adam Kwolek
> >>
> >> wrote:
> >> > Sometimes occurs that metadata cannot be loaded e.g. wrong check
> sum
> >> > It can happen due to metadata update racing with mdmon condition.
> >> > If mpb loading is tried again, it is loaded successfully.
> >> > Try to load metadata again before really giving up.
> >> >
> >> > Signed-off-by: Adam Kwolek
> >> > ---
> >> >
> >> > super-intel.c | 10 ++++++++--
> >> > 1 files changed, 8 insertions(+), 2 deletions(-)
> >> >
> >> > diff --git a/super-intel.c b/super-intel.c index dc5e34e..d23267a
> >> > 100644
> >> > --- a/super-intel.c
> >> > +++ b/super-intel.c
> >> > @@ -2773,8 +2773,14 @@ load_and_parse_mpb(int fd, struct
> intel_super
> >> *super, char *devname, int keep_fd
> >> > int err;
> >> >
> >> > err = load_imsm_mpb(fd, super, devname);
> >> > - if (err)
> >> > - return err;
> >> > + if (err) {
> >> > + /* try to load mpb again,
> >> > + * in case of mdmon race we could have more luck...
> >> > + */
> >> > + err = load_imsm_mpb(fd, super, devname);
> >> > + if (err)
> >> > + return err;
> >> > + }
> >> > err = load_imsm_disk(fd, super, devname, keep_fd);
> >> > if (err)
> >> > return err;
> >>
> >> This is semi-duplicates the check we already do after returning
> >> from load_and_parse_mpb in load_super_imsm_all. I'm curious, are
> >> you hitting this path from load_super_imsm? If the container is
> assembled
> >> we should be loading from the container, if the container is not
> >> available then mdmon can't be running and checksum errors are real.
> >>
> >
> > My test scenario is that after boot I'm disassembling read only
> > array
> and immediately new array is assembled for grow continuation.
> > Sometimes occurs that mdadm throws exception and core file is
> generated. It shows that anchor pointer has NULL value due to CRC error.
> > Second reading try helps, and anchor is always read correctly.
> > This behavior and fact that if I put more time between array
> disassembling and assembling it again helps also suggest, that we have
> some race condition here.
>
> Right, so we need to fix that disassemble-to-assemble race, this patch
> only "helps" :-).
>
> > Problem is not in currently monitored in mdmon container but rather
> > in
> interaction with previous mdmon session that is about to close.
> >
> > This patch makes that error condition never occurs in this scenario.
>
> That "never" needs to be qualified. This patch makes the race harder
> to lose, but as far as I can see not "impossible" to lose.


Grow.c looks all devices for metadata,
so I think for both patches word never is ok (without qualified)


>
> > Grow.c is fixed for correct error condition behavior also.
> > I can agree that both patches in this series can help for this
> > problem
> separately, but I think both should be placed in code.
>
> Why does this existing check:
>
> /* retry the load if we might have raced against mdmon */
> if (err == 3 && mdmon_running(devnum))
> for (retry = 0; retry < 3; retry++) {
> usleep(3000);
> err = load_and_parse_mpb(dfd, s, NULL,
> 1);
> if (err != 3)
> break;
> }
>
> ...not help in this case. I suspect due to that mdmon_running()




Probably because it is not in execution patch for my case ;) This is in load_super_imsm_all(), my case applies to load_super() /load_super_imsm()/ for particular device that is used in Grow.c:2934.
As You point yourself, similar fix /workaround/ is applied for different usage scenario, so problem is not new and can be fixed in this way.


> qualification and the fact that the test is only seeing this in a
> disassemble-to-assemble window. So that seems to be further evidence
> that a higher level fix is needed.
>
> --
> Dan


I think that mdmon-mdadm synchronization/communication/ has to be discussed again. Such problems can push it to happen.
Probably metadata loading via mdmon should be implemented (single metadata access point).
Anchor passed to mdadm should be build by mdmon /not loaded from disks/. During mdmon life it needs one metadata load
on start and then metadata snapshots should be flushed to disks only.
It will take a while /if we decide to use this idea/, but at this moment we have to have code without execution exceptions.


BR
Adam
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] imsm: FIX: Be more patient during loading matadata

am 14.04.2011 10:08:46 von NeilBrown

On Wed, 13 Apr 2011 11:56:29 -0700 Dan Williams om>
wrote:

> 2011/4/12 Kwolek, Adam :
> >> -----Original Message-----
> >> From: dan.j.williams@gmail.com [mailto:dan.j.williams@gmail.com] O=
n
> >> Behalf Of Dan Williams
> >> Sent: Wednesday, April 13, 2011 2:45 AM
> >> To: Kwolek, Adam
> >> Cc: neilb@suse.de; linux-raid@vger.kernel.org; Ciechanowski, Ed;
> >> Neubauer, Wojciech
> >> Subject: Re: [PATCH 2/2] imsm: FIX: Be more patient during loading
> >> matadata
> >>
> >> On Tue, Apr 12, 2011 at 5:51 AM, Adam Kwolek m>
> >> wrote:
> >> > Sometimes occurs that metadata cannot be loaded e.g. wrong check=
sum
> >> > It can happen due to metadata update racing with mdmon condition=

> >> > If mpb loading is tried again, it is loaded successfully.
> >> > Try to load metadata again before really giving up.
> >> >
> >> > Signed-off-by: Adam Kwolek
> >> > ---
> >> >
> >> > =A0super-intel.c | =A0 10 ++++++++--
> >> > =A01 files changed, 8 insertions(+), 2 deletions(-)
> >> >
> >> > diff --git a/super-intel.c b/super-intel.c
> >> > index dc5e34e..d23267a 100644
> >> > --- a/super-intel.c
> >> > +++ b/super-intel.c
> >> > @@ -2773,8 +2773,14 @@ load_and_parse_mpb(int fd, struct intel_s=
uper
> >> *super, char *devname, int keep_fd
> >> > =A0 =A0 =A0 =A0int err;
> >> >
> >> > =A0 =A0 =A0 =A0err =3D load_imsm_mpb(fd, super, devname);
> >> > - =A0 =A0 =A0 if (err)
> >> > - =A0 =A0 =A0 =A0 =A0 =A0 =A0 return err;
> >> > + =A0 =A0 =A0 if (err) {
> >> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* try to load mpb again,
> >> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0* in case of mdmon race we coul=
d have more luck...
> >> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0*/
> >> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 err =3D load_imsm_mpb(fd, super, d=
evname);
> >> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 if (err)
> >> > + =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 return err;
> >> > + =A0 =A0 =A0 }
> >> > =A0 =A0 =A0 =A0err =3D load_imsm_disk(fd, super, devname, keep_f=
d);
> >> > =A0 =A0 =A0 =A0if (err)
> >> > =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0return err;
> >>
> >> This is semi-duplicates the check we already do after returning fr=
om
> >> load_and_parse_mpb in load_super_imsm_all. =A0I'm curious, are you
> >> hitting this path from load_super_imsm? =A0If the container is ass=
embled
> >> we should be loading from the container, if the container is not
> >> available then mdmon can't be running and checksum errors are real=

> >>
> >
> > My test scenario is that after boot I'm disassembling read only arr=
ay and immediately new array is assembled for grow continuation.
> > Sometimes occurs that mdadm throws exception and core file is gener=
ated. It shows that anchor pointer has NULL value due to CRC error.
> > Second reading try helps, and anchor is always read correctly.
> > This behavior and fact that if I put more time between array disass=
embling and assembling it again helps also suggest, that we have some r=
ace condition here.
>=20
> Right, so we need to fix that disassemble-to-assemble race, this patc=
h
> only "helps" :-).
>=20

Yes. We need to fix it properly.
The race should be avoided by proper use of O_EXCL.
i.e.
1/ mdmon should only write metadata to devices while they are actually=
part
of the container and so the container has exclusive access (which i=
t
shares with member arrays).
2/ mdadm should only try to use a device in an array if it has O_EXCL =
access.

Which of these rules is broken - and where?


> > Problem is not in currently monitored in mdmon container but rather=
in interaction with previous mdmon session that is about to close.
> >
> > This patch makes that error condition never occurs in this scenario=

>=20
> That "never" needs to be qualified. This patch makes the race harder
> to lose, but as far as I can see not "impossible" to lose.
>=20
> > Grow.c is fixed for correct error condition behavior also.
> > I can agree that both patches in this series can help for this prob=
lem separately, but I think both should be placed in code.
>=20
> Why does this existing check:
>=20
> /* retry the load if we might have raced against mdmon=
*/
> if (err == 3 && mdmon_running(devnum))
> for (retry =3D 0; retry < 3; retry++) {
> usleep(3000);
> err =3D load_and_parse_mpb(dfd, s, NU=
LL, 1);
> if (err !=3D 3)
> break;
> }

It's pretty horrible that we need to do this, isn't it?
Maybe we need some way to synchronise with mdmon so we *know* if we hav=
e
raced or not.
i.e. mdmon keeps a count of the number of times it has updated the meta=
data.
We send a message to get the count, read, then get the count again.
The request blocks while mdmon is actually writing.
If the counts are different, we read again.
??

NeilBrown


>=20
> ...not help in this case. I suspect due to that mdmon_running()
> qualification and the fact that the test is only seeing this in a
> disassemble-to-assemble window. So that seems to be further evidence
> that a higher level fix is needed.
>=20
> --
> Dan
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid"=
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: [PATCH 2/2] imsm: FIX: Be more patient during loading matadata

am 14.04.2011 10:36:00 von dan.j.williams

On 4/14/2011 1:08 AM, NeilBrown wrote:
>> Why does this existing check:
>>
>> /* retry the load if we might have raced against mdmon */
>> if (err == 3&& mdmon_running(devnum))
>> for (retry = 0; retry< 3; retry++) {
>> usleep(3000);
>> err = load_and_parse_mpb(dfd, s, NULL, 1);
>> if (err != 3)
>> break;
>> }
>
> It's pretty horrible that we need to do this, isn't it?

Yes.

> Maybe we need some way to synchronise with mdmon so we *know* if we have
> raced or not.
> i.e. mdmon keeps a count of the number of times it has updated the metadata.
> We send a message to get the count, read, then get the count again.
> The request blocks while mdmon is actually writing.
> If the counts are different, we read again.
> ??

Assuming we are only racing against dirty transitions maybe it is enough
to poll md/array_state and if it signaled while reading the metadata we
likely need to try again (of course with an intervening ping_monitor)?

--
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html