FW: Backup Server RAID Array Event Notification

FW: Backup Server RAID Array Event Notification

am 09.06.2011 20:35:59 von Leslie Rhorer

After I created a pair of two member RAID1 arrays and then added
them as members to a RAID6 array, I am now getting messages similar to the
following, complaining of "Wrong-level" issues. When I check the RAID6
array, however, it is clean and both RAID1 members are still there. When I
check both RAID1 arrays, they show clean with no events. I am running a
compare between all the data on this machine and its mirror (this is a
backup machine). So far everything looks good. What does this imply? Is
there something about which I should be worried?

-----Original Message-----
From: mdadm_monitor@satx.rr.com [mailto:mdadm_monitor@satx.rr.com]
Sent: Thursday, June 09, 2011 8:04 AM
To: leslie.rhorer@twtelecom.com; lrhorer@satx.rr.com
Subject: Backup Server RAID Array Event Notification

DeviceDisappeared /dev/md10 Wrong-Level


From mdadm:
Backup:~# mdadm -D /dev/md0

/dev/md0:

Version : 1.2

Creation Time : Mon May 31 16:23:10 2010

Raid Level : raid6

Array Size : 14651371520 (13972.64 GiB 15003.00 GB)

Used Dev Size : 1465137152 (1397.26 GiB 1500.30 GB)

Raid Devices : 12
Total Devices : 12
Persistence : Superblock is persistent

Update Time : Thu Jun 9 13:30:54 2011
State : active
Active Devices : 12
Working Devices : 12
Failed Devices : 0
Spare Devices : 0

Layout : left-symmetric
Chunk Size : 1024K

Name : Backup:0 (local to host Backup)
UUID : 431244d6:45d9635a:e88b3de5:92f30255
Events : 436289

Number Major Minor RaidDevice State
0 8 0 0 active sync /dev/sda
1 8 16 1 active sync /dev/sdb
2 8 32 2 active sync /dev/sdc
3 8 48 3 active sync /dev/sdd
4 8 64 4 active sync /dev/sde
5 8 80 5 active sync /dev/sdf
6 8 96 6 active sync /dev/sdg
7 8 112 7 active sync /dev/sdh
8 8 128 8 active sync /dev/sdi
10 8 144 9 active sync /dev/sdj
12 9 11 10 active sync /dev/md11
11 9 10 11 active sync /dev/md10
Backup:~# mdadm -D /dev/md10
/dev/md10:
Version : 1.2
Creation Time : Wed Jun 8 00:08:16 2011

Raid Level : raid0

Array Size : 1953521664 (1863.02 GiB 2000.41 GB)
Raid Devices : 2

Total Devices : 2
Persistence : Superblock is persistent


Update Time : Wed Jun 8 00:08:16 2011
State : clean

Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0

Chunk Size : 1024K

Name : Backup:10 (local to host Backup)
UUID : fa1ed617:d80525c4:1df692e8:0116406d
Events : 0

Number Major Minor RaidDevice State
0 8 192 0 active sync /dev/sdm
1 8 208 1 active sync /dev/sdn
Backup:~# mdadm -D /dev/md11
/dev/md11:
Version : 1.2
Creation Time : Wed Jun 8 00:08:38 2011
Raid Level : raid0
Array Size : 1953521664 (1863.02 GiB 2000.41 GB)
Raid Devices : 2
Total Devices : 2
Persistence : Superblock is persistent

Update Time : Wed Jun 8 00:08:38 2011
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0

Chunk Size : 1024K

Name : Backup:11 (local to host Backup)
UUID : 1ac704ee:8f501b33:4caee409:e384eeec
Events : 0

Number Major Minor RaidDevice State
0 8 224 0 active sync /dev/sdo
1 8 240 1 active sync /dev/sdp


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Backup Server RAID Array Event Notification

am 09.06.2011 20:46:55 von Roman Mamedov

--Sig_/ugec6Vek.oHQYuVOibUhqr1
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

On Thu, 9 Jun 2011 13:35:59 -0500
"Leslie Rhorer" wrote:

>=20
> After I created a pair of two member RAID1 arrays and then added
> them as members to a RAID6 array, I am now getting messages similar to the
> following, complaining of "Wrong-level" issues. When I check the RAID6
> array, however, it is clean and both RAID1 members are still there. When=
I
> check both RAID1 arrays, they show clean with no events. I am running a
> compare between all the data on this machine and its mirror (this is a
> backup machine). So far everything looks good. What does this imply? Is
> there something about which I should be worried?

You said RAID1 twice, and your mdadm --detail doesn't agree with you and sa=
ys
"raid0" twice. Maybe you mistakenly used RAID1 instead of RAID0 somewhere
else as well, and the WrongLevel message is trying to tell you that?

--=20
With respect,
Roman

--Sig_/ugec6Vek.oHQYuVOibUhqr1
Content-Type: application/pgp-signature; name=signature.asc
Content-Disposition: attachment; filename=signature.asc

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAk3xFR8ACgkQTLKSvz+PZwie1ACghrLaAAdbaprJc/01gGv0 Vu0f
LrwAn19/eu9X2f3wCWkVqmZZtzXAkkFd
=Q1JN
-----END PGP SIGNATURE-----

--Sig_/ugec6Vek.oHQYuVOibUhqr1--
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

RE: Backup Server RAID Array Event Notification

am 09.06.2011 21:01:34 von Leslie Rhorer

> -----Original Message-----
> From: Roman Mamedov [mailto:rm@romanrm.ru]
> Sent: Thursday, June 09, 2011 1:47 PM
> To: lrhorer@satx.rr.com
> Cc: linux-raid@vger.kernel.org
> Subject: Re: Backup Server RAID Array Event Notification
>
> On Thu, 9 Jun 2011 13:35:59 -0500
> "Leslie Rhorer" wrote:
>
> >
> > After I created a pair of two member RAID1 arrays and then added
> > them as members to a RAID6 array, I am now getting messages similar to
> the
> > following, complaining of "Wrong-level" issues. When I check the RAID6
> > array, however, it is clean and both RAID1 members are still there.
> When I
> > check both RAID1 arrays, they show clean with no events. I am running a
> > compare between all the data on this machine and its mirror (this is a
> > backup machine). So far everything looks good. What does this imply?
> Is
> > there something about which I should be worried?
>
> You said RAID1 twice, and your mdadm --detail doesn't agree with you and
> says
> "raid0" twice. Maybe you mistakenly used RAID1 instead of RAID0 somewhere
> else as well, and the WrongLevel message is trying to tell you that?

No, that was just a typo. (OK, three typos) I meant "RAID0". The
RAID0 members are all 1T drives. The RAID6 array is made of 1.5T members.
In order to use the 1T drives on the RAID6 array, I have to combine them
into 2T arrays, which then can be used as members of the RAID6 array. If
md10 and md11 were RAID1 arrays, they would only be 1T in extent, and could
not be members of md0.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Backup Server RAID Array Event Notification

am 09.06.2011 23:14:41 von NeilBrown

On Thu, 9 Jun 2011 14:01:34 -0500 "Leslie Rhorer" wrote:

> > -----Original Message-----
> > From: Roman Mamedov [mailto:rm@romanrm.ru]
> > Sent: Thursday, June 09, 2011 1:47 PM
> > To: lrhorer@satx.rr.com
> > Cc: linux-raid@vger.kernel.org
> > Subject: Re: Backup Server RAID Array Event Notification
> >
> > On Thu, 9 Jun 2011 13:35:59 -0500
> > "Leslie Rhorer" wrote:
> >
> > >
> > > After I created a pair of two member RAID1 arrays and then added
> > > them as members to a RAID6 array, I am now getting messages similar to
> > the
> > > following, complaining of "Wrong-level" issues. When I check the RAID6
> > > array, however, it is clean and both RAID1 members are still there.
> > When I
> > > check both RAID1 arrays, they show clean with no events. I am running a
> > > compare between all the data on this machine and its mirror (this is a
> > > backup machine). So far everything looks good. What does this imply?
> > Is
> > > there something about which I should be worried?
> >
> > You said RAID1 twice, and your mdadm --detail doesn't agree with you and
> > says
> > "raid0" twice. Maybe you mistakenly used RAID1 instead of RAID0 somewhere
> > else as well, and the WrongLevel message is trying to tell you that?
>
> No, that was just a typo. (OK, three typos) I meant "RAID0". The
> RAID0 members are all 1T drives. The RAID6 array is made of 1.5T members.
> In order to use the 1T drives on the RAID6 array, I have to combine them
> into 2T arrays, which then can be used as members of the RAID6 array. If
> md10 and md11 were RAID1 arrays, they would only be 1T in extent, and could
> not be members of md0.
>

"mdadm --monitor" does not monitor RAID0 or Linear arrays. There is nothing
to see. Nothing can fail, they don't rebuilt, they are really just AID, not
RAID.

So if it thinks that it was asked to monitor a RAID0 it pretends that it has
disappeared with reason "Wrong Level".
So if you explicitly ask it to monitor a RAID0, it won't and it will tell you
why.

If you only implicitly ask with e.g. "mdadm --monitor --scan" with a RAID0
listing in mdadm.conf it probably shouldn't give the message as it might be
confusing... but it does.
Or maybe the message is just confusing and I should change it.

Or something.

NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

RE: Backup Server RAID Array Event Notification

am 10.06.2011 03:10:11 von Leslie Rhorer

> -----Original Message-----
> From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
> owner@vger.kernel.org] On Behalf Of NeilBrown
> Sent: Thursday, June 09, 2011 4:15 PM
> To: lrhorer@satx.rr.com
> Cc: 'Roman Mamedov'; linux-raid@vger.kernel.org
> Subject: Re: Backup Server RAID Array Event Notification
>
> On Thu, 9 Jun 2011 14:01:34 -0500 "Leslie Rhorer"
> wrote:
>
> > > -----Original Message-----
> > > From: Roman Mamedov [mailto:rm@romanrm.ru]
> > > Sent: Thursday, June 09, 2011 1:47 PM
> > > To: lrhorer@satx.rr.com
> > > Cc: linux-raid@vger.kernel.org
> > > Subject: Re: Backup Server RAID Array Event Notification
> > >
> > > On Thu, 9 Jun 2011 13:35:59 -0500
> > > "Leslie Rhorer" wrote:
> > >
> > > >
> > > > After I created a pair of two member RAID1 arrays and then
> added
> > > > them as members to a RAID6 array, I am now getting messages similar
> to
> > > the
> > > > following, complaining of "Wrong-level" issues. When I check the
> RAID6
> > > > array, however, it is clean and both RAID1 members are still there.
> > > When I
> > > > check both RAID1 arrays, they show clean with no events. I am
> running a
> > > > compare between all the data on this machine and its mirror (this is
> a
> > > > backup machine). So far everything looks good. What does this
> imply?
> > > Is
> > > > there something about which I should be worried?
> > >
> > > You said RAID1 twice, and your mdadm --detail doesn't agree with you
> and
> > > says
> > > "raid0" twice. Maybe you mistakenly used RAID1 instead of RAID0
> somewhere
> > > else as well, and the WrongLevel message is trying to tell you that?
> >
> > No, that was just a typo. (OK, three typos) I meant "RAID0". The
> > RAID0 members are all 1T drives. The RAID6 array is made of 1.5T
> members.
> > In order to use the 1T drives on the RAID6 array, I have to combine them
> > into 2T arrays, which then can be used as members of the RAID6 array.
> If
> > md10 and md11 were RAID1 arrays, they would only be 1T in extent, and
> could
> > not be members of md0.
> >
>
> "mdadm --monitor" does not monitor RAID0 or Linear arrays. There is
> nothing
> to see. Nothing can fail, they don't rebuilt, they are really just AID,
> not
> RAID.

Well, OK. So why does it report anything at all?
>
> So if it thinks that it was asked to monitor a RAID0 it pretends that it
> has
> disappeared with reason "Wrong Level".
> So if you explicitly ask it to monitor a RAID0, it won't and it will tell
> you
> why.

That would make sense if I had started the monitor deamon and it had
sent the e-mail, but the monitor has been running for nearly two days, since
the system was rebooted. Why send the message nearly a day after the deamon
is started, and why send it more than once (for each array)?

By the same token, why did it wait nearly 8 hours and then again
more than a day and a half after the array was created to send the messages,
instead of immediately after it was created?

This suggests I am going to be treated to a pair of spurious e-mails
every day or so telling me the device has disappeared, when it is perfectly
good. After a few months of that, what happens when one of the devices
really does disappear? We all know what happens to the system that cries,
"Wolf!" all the time.

> If you only implicitly ask with e.g. "mdadm --monitor --scan" with a RAID0
> listing in mdadm.conf it probably shouldn't give the message as it might
> be confusing... but it does.

I'm not sure I follow.

> Or maybe the message is just confusing and I should change it.
>
> Or something.

Well that's definite. :-)

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

RE: Backup Server RAID Array Event Notification

am 10.06.2011 19:32:00 von Leslie Rhorer

> > "mdadm --monitor" does not monitor RAID0 or Linear arrays. There is
> > nothing
> > to see. Nothing can fail, they don't rebuilt, they are really just AID,
> > not
> > RAID.

I've been thinking about this. It's true they are just AID, but
they most certainly can fail. If the array truly disappears, perhaps
because a drive fails, I certainly should be notified of it.

> That would make sense if I had started the monitor deamon and it had
> sent the e-mail, but the monitor has been running for nearly two days,
> since
> the system was rebooted. Why send the message nearly a day after the
> deamon
> is started, and why send it more than once (for each array)?
>
> By the same token, why did it wait nearly 8 hours and then again
> more than a day and a half after the array was created to send the
> messages,
> instead of immediately after it was created?
>
> This suggests I am going to be treated to a pair of spurious e-mails
> every day or so telling me the device has disappeared, when it is
> perfectly
> good. After a few months of that, what happens when one of the devices
> really does disappear? We all know what happens to the system that cries,
> "Wolf!" all the time.

Yeah, it looks like it's going to send this message out once a day
for both arrays. Mdadm sent out another pair of e-mails at 07:44 this
morning. Is no one else seeing this with RAID0 arrays? Is there some way I
can stop it without impacting any real notifications? I could intercept the
message in the script run by mdadm to send the e-mail, but if I do, I fear I
might also incorrectly trash a real error message. I don't suppose
"Wrong-Level" would ever appear in a valid notification for a properly
configured system, would it? If not, I suppose I could grep for
"Wrong-Level" in the e-mail packet and trash it if the text is found.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html