how to recreate a raid5 array with n-1 drives?

am 06.05.2008 07:48:19 von Marc MERLIN

Howdy,

I had a 5 drive raid5 array that went down due to the a double disk failure.
Both drives aren't dead, but I seem to have picked the wrong drive as the
'good' one.
When I brought the array back up with
mdadm --assemble --run --force /dev/md5 /dev/sd{c,d,f,g}
and then started a rebuild with:
mdadm /dev/md5 -a /dev/sdd1

I'm getting failures when a certain block from sde1 is read.

Before I forget:
There isn't a way to tell the raid subsystem not to kill a degraded array if
it finds a single disk failure (bad block) and not an entire bad drive, is
it? (if not, I'd love that added to the wishlist if it isn't there yet).

Assuming there isn't, I did start a rebuild on sdd1 (the old drive), which
should have been rewritting the same blocks back onto themselves.

So, I'd like to bring the array back up with sdd1 instead of sde1.
Of course, I can't do that with assemble now that I've done a partial rebuild
on sdd1.

I thought I could recreate a n-1 array like so:
gargamel:~# mdadm --create /dev/md5 --level=5 --chunk=64 --layout=left-symmetric --raid-devices=5 /dev/sd{c,d,f,g}1
mdadm: You haven't given enough devices (real or missing) to create this array

In the olden days (pre-mdadm), I could bring up the array by giving 5 drives
and marking /dev/sde1 as failed-disk instead of read-disk (or somesuch).

I could not find a way to do this with mdadm in the man page. How do I give
/dev/sde1 on the command line as a failed drive?

Thanks,
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems & security ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: how to recreate a raid5 array with n-1 drives?

am 06.05.2008 09:17:52 von Richard Scobie

Marc MERLIN wrote:

> In the olden days (pre-mdadm), I could bring up the array by giving 5 drives
> and marking /dev/sde1 as failed-disk instead of read-disk (or somesuch).
>
> I could not find a way to do this with mdadm in the man page. How do I give
> /dev/sde1 on the command line as a failed drive?

Looking at the mdadm man page in the "CREATE MODE" section:

"To create a "degraded" array in which some devices are missing, simply
give the word "missing" in place of a device name. This will cause
mdadm to leave the corresponding slot in the array empty. For a RAID4
or RAID5 array at most one slot can be "missing"; for a RAID6 array at
most two slots. For a RAID1 array, only one real device needs to be
given. All of the others can be "missing"."

Regards,

Richard
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: mdadm create corrupted md data?

am 07.05.2008 01:08:00 von Marc MERLIN

[please Cc me on replies, I see them faster that way]

On Mon, May 05, 2008 at 10:48:19PM -0700, Marc MERLIN wrote:
> I thought I could recreate a n-1 array like so:
> gargamel:~# mdadm --create /dev/md5 --level=5 --chunk=64 --layout=left-symmetric --raid-devices=5 /dev/sd{c,d,f,g}1
> mdadm: You haven't given enough devices (real or missing) to create this array
>
> In the olden days (pre-mdadm), I could bring up the array by giving 5 drives
> and marking /dev/sde1 as failed-disk instead of read-disk (or somesuch).

Indeed "missing" as a device name did it, thanks Richard (I guess I can't
read when it's late).

Sad part is that recreating the device worked, but my VG on top disappeared.
I may have found a bug or misfeature.

During my first post, and up to this mornhing, I had:
Layout : left-symmetric
Chunk Size : 64K
md5 : active raid5 sdf1[0] sdc1[3] sdg1[2] sde1[1]
1953535744 blocks level 5, 64k chunk, algorithm 2 [5/4] [UUUU_]

pvdisplay /dev/md5 or vgscan would find the pv and vg.

Then, I just typed this:
mdadm --create /dev/md5 --level=5 --raid-devices=5 /dev/sd{c,d,f,g}1 missing

it made a new md5 that vgscan doesn't find anything on.

but I'm very confused as to why
mdadm --create /dev/md5 --level=5 --raid-devices=5 /dev/sd{c,e,f,g}1 missing
also gives me an md5 that vgscan won't find its pv on anymore

gargamel:~# cat /proc/mdstat | grep -1 md5 | tail -n+2
md5 : active raid5 sdg1[3] sdf1[2] sde1[1] sdc1[0]
1953535744 blocks level 5, 64k chunk, algorithm 2 [5/4] [UUUU_]
gargamel:~# pvdisplay /dev/md5
No physical volume label read from /dev/md5
Failed to read physical volume "/dev/md5"

Any idea what got corrupted in my mdadm runs that caused my data to apparently be
gone now?
(good news is that I do have an up to date backup, but I should have to use
it, and I'd like to recover from this the way it should work, so that I can
learn from it)

It's maybe a good time to give:
2.6.24.5-slub-dualcore2smp-volpreempt-noticks
and
mdadm - v2.6.4 - 19th October 2007

Even if I've lost my data, I'd still like to try to find out what went wrong and help debug
if that helps.

Thanks,
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems & security ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: mdadm create corrupted md data?

am 07.05.2008 07:38:27 von Dan Williams

On Tue, May 6, 2008 at 4:08 PM, Marc MERLIN wrote:
> [please Cc me on replies, I see them faster that way]
>
> On Mon, May 05, 2008 at 10:48:19PM -0700, Marc MERLIN wrote:
> > I thought I could recreate a n-1 array like so:
> > gargamel:~# mdadm --create /dev/md5 --level=5 --chunk=64 --layout=left-symmetric --raid-devices=5 /dev/sd{c,d,f,g}1
> > mdadm: You haven't given enough devices (real or missing) to create this array
> >
> > In the olden days (pre-mdadm), I could bring up the array by giving 5 drives
> > and marking /dev/sde1 as failed-disk instead of read-disk (or somesuch).
>
> Indeed "missing" as a device name did it, thanks Richard (I guess I can't
> read when it's late).
>
> Sad part is that recreating the device worked, but my VG on top disappeared.
> I may have found a bug or misfeature.
>
> During my first post, and up to this mornhing, I had:
> Layout : left-symmetric
> Chunk Size : 64K
> md5 : active raid5 sdf1[0] sdc1[3] sdg1[2] sde1[1]
> 1953535744 blocks level 5, 64k chunk, algorithm 2 [5/4] [UUUU_]
>
> pvdisplay /dev/md5 or vgscan would find the pv and vg.
>
> Then, I just typed this:
> mdadm --create /dev/md5 --level=5 --raid-devices=5 /dev/sd{c,d,f,g}1 missing
>
> it made a new md5 that vgscan doesn't find anything on.
>
> but I'm very confused as to why
> mdadm --create /dev/md5 --level=5 --raid-devices=5 /dev/sd{c,e,f,g}1 missing
> also gives me an md5 that vgscan won't find its pv on anymore
>
> gargamel:~# cat /proc/mdstat | grep -1 md5 | tail -n+2
> md5 : active raid5 sdg1[3] sdf1[2] sde1[1] sdc1[0]
> 1953535744 blocks level 5, 64k chunk, algorithm 2 [5/4] [UUUU_]
> gargamel:~# pvdisplay /dev/md5
> No physical volume label read from /dev/md5
> Failed to read physical volume "/dev/md5"
>
> Any idea what got corrupted in my mdadm runs that caused my data to apparently be
> gone now?
> (good news is that I do have an up to date backup, but I should have to use
> it, and I'd like to recover from this the way it should work, so that I can
> learn from it)

You may have created the array with a different disk order than when
the array was originally created. It would help if you had a dump of
the original superblocks. I'm guessing your original array might have
been the following order "/dev/sdc1 missing /dev/sde1 /dev/sdf1
/dev/sdg1" where your last attempt changed this order to "/dev/sdc1
/dev/sde1 /dev/sdf1 /dev/sdg1 missing"... however this assumes that
the device names haven't changed.

>
> It's maybe a good time to give:
> 2.6.24.5-slub-dualcore2smp-volpreempt-noticks

2.6.24.5 also needs this patch:

http://userweb.kernel.org/~akpm/mmotm/broken-out/md-fix-raid 5-repair-operations.patch

--
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: mdadm create corrupted md data?

am 07.05.2008 08:56:57 von Marc MERLIN

On Tue, May 06, 2008 at 10:38:27PM -0700, Dan Williams wrote:
> > During my first post, and up to this mornhing, I had:
> > Layout : left-symmetric
> > Chunk Size : 64K
> > md5 : active raid5 sdf1[0] sdc1[3] sdg1[2] sde1[1]
> > 1953535744 blocks level 5, 64k chunk, algorithm 2 [5/4] [UUUU_]
> >
> > gargamel:~# cat /proc/mdstat | grep -1 md5 | tail -n+2
> > md5 : active raid5 sdg1[3] sdf1[2] sde1[1] sdc1[0]
> > 1953535744 blocks level 5, 64k chunk, algorithm 2 [5/4] [UUUU_]
>
> You may have created the array with a different disk order than when
> the array was originally created. It would help if you had a dump of
> the original superblocks. I'm guessing your original array might have
> been the following order "/dev/sdc1 missing /dev/sde1 /dev/sdf1
> /dev/sdg1" where your last attempt changed this order to "/dev/sdc1
> /dev/sde1 /dev/sdf1 /dev/sdg1 missing"... however this assumes that
> the device names haven't changed.

Doh! I feel so silly considering how blindingly obvious this is now
now that you mention it :)

Yes, that was of course my problem, and the correct order of the drives
was shown as numbers in the first mdstat:
The winning command was therefore:
mdadm --create /dev/md5 --level=5 --raid-devices=5 /dev/sdf1 /dev/sde1 /dev/sdg1 /dev/sdc1 missing
or
mdadm --create /dev/md5 --level=5 --raid-devices=5 /dev/sdf1 missing /dev/sdg1 /dev/sdc1 /dev/sdd1

After that, I get my pv back, and my VG.

I did try an e2fsck -f -n -C 0 /dev/dshelf2/space when in the configuration
with sdd1 (i.e. the drive I first tried to rebuild parity on, until I found
out it was sde1 that had bad sectors), and it is showing some pretty scary
errors that probably show that my fs is mostly toast if I elect to use sdd1
instead of sde1.
Considering that sde1 is the soon to be dead drive, I guess backups is where
I go next.

I'm however surprised that rebuilding parity on sdd1 wasn't effectively a
no-op since sde1 only had one bad sector, about 100GB from its beginning,
and that rebuilding parity on sdd1 caused some no trivial FS damage.

Oh well...

If I can provide more useful info before I rebuild this array altogether,
let me know.

Thanks,
Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems & security ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: mdadm create corrupted md data?

am 07.05.2008 11:29:36 von David Greaves

Marc MERLIN wrote:
> After that, I get my pv back, and my VG.
>
> I did try an e2fsck -f -n -C 0 /dev/dshelf2/space when in the configuration
> with sdd1 (i.e. the drive I first tried to rebuild parity on, until I found
> out it was sde1 that had bad sectors), and it is showing some pretty scary
> errors that probably show that my fs is mostly toast if I elect to use sdd1
> instead of sde1.
> Considering that sde1 is the soon to be dead drive, I guess backups is where
> I go next.

I would obtain a replacement drive for sde and use gnu ddrescue (dd with error
retries) to create an image on a reliable drive. Then re-create the array using
the replaced sde instead of sdd. You should have more luck that way.

If you still have fsck issues then I'd suggest that the futzing around may have
caused corruption.

David

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: mdadm create corrupted md data?

am 07.05.2008 17:08:14 von Marc MERLIN

On Wed, May 07, 2008 at 10:29:36AM +0100, David Greaves wrote:
> Marc MERLIN wrote:
> > After that, I get my pv back, and my VG.
> >
> > I did try an e2fsck -f -n -C 0 /dev/dshelf2/space when in the configuration
> > with sdd1 (i.e. the drive I first tried to rebuild parity on, until I found
> > out it was sde1 that had bad sectors), and it is showing some pretty scary
> > errors that probably show that my fs is mostly toast if I elect to use sdd1
> > instead of sde1.
> > Considering that sde1 is the soon to be dead drive, I guess backups is where
> > I go next.
>
> I would obtain a replacement drive for sde and use gnu ddrescue (dd with error
> retries) to create an image on a reliable drive. Then re-create the array using
> the replaced sde instead of sdd. You should have more luck that way.

True, I could do that too.

> If you still have fsck issues then I'd suggest that the futzing around may have
> caused corruption.

No, that should work fine. The array works fine with sde, I'm just surprised
that it is now half corrupted if I use sdd instead. I'm not sure I
understand why, but I guess that drives the point that I should be even
more careful about which drive I select after a double failure.

Anyway, thanks for the suggestions.

Marc
--
"A mouse is a device used to point at the xterm you want to type in" - A.S.R.
Microsoft is to operating systems & security ....
.... what McDonalds is to gourmet cooking
Home page: http://marc.merlins.org/
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html