Bookmarks

Yahoo Gmail Google Facebook Delicious Twitter Reddit Stumpleupon Myspace Digg

Search queries

wwwxxix Buzz, wwwxxxdbgf, ncurses font size, WWWXXX USABLE, WwwXXX MPEG4, WWWXXX BABBAGE WWWXXX, IIS_asps05kj, WWWXXXDOCO, WWW.XXX,GCDMO, Idl Wwwxxx

Links

XODOX
Impressum

#1: RAID 5 array recovery - two drives errors in external enclosure

Posted on 2009-09-17 22:42:30 by Tim Bostrom

OK,

Let me start off by saying - I panicked. Rule #1 - don't panic. I
did. Sorry.

I have a RAID 5 array running on Fedora 10.
(Linux tera.teambostrom.com 2.6.27.30-170.2.82.fc10.i686 #1 SMP Mon
Aug 17 08:38:59 EDT 2009 i686 athlon i386 GNU/Linux)

5 drives in an external enclosure (AMS eSATA Venus T5). It's a
Sil4726 inside the enclosure running to a Sil3132 controller via eSATA
in the desktop. I had been running this setup for just over a year.
Was working fine. I just moved into a new home and had my server
down for a while - before I brought it back online, I got a "great
idea" to blow out the dust from the enclosure using compressed air.
When I finally brought up the array again, I noticed that drives were
missing. Tried re-adding the drives to the array and had some issues
- they seemed to get added but after a short time of rebuilding the
array, I would get a bunch of HW resets in dmesg and then the array
would kick out drives and stop.

LOG BELOW:
----------------------
md: recovery of RAID array md0
md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
md: using maximum available idle IO bandwidth (but not more than
200000 KB/sec) for recovery.
md: using 128k window, over a total of 976759808 blocks.
md: resuming recovery of md0 from checkpoint.
md: md0: recovery done.
ata8.04: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
RAID5 conf printout:
--- rd:5 wd:0
disk 4, o:0, dev:sdf1
ata8.04: configured for UDMA/33
sd 8:4:0:0: [sdf] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
sd 8:4:0:0: [sdf] Sense Key : Aborted Command [current] [descriptor]
Descriptor sense data with sense descriptors (in hex):
72 0b 47 00 00 00 00 0c 00 0a 80 00 00 00 00 00
16 3f 98 c6
sd 8:4:0:0: [sdf] Add. Sense: Scsi parity error
end_request: I/O error, dev sdf, sector 373266495
__ratelimit: 87 callbacks suppressed
raid5:md0: read error not correctable (sector 373266432 on sdf1).
raid5:md0: read error not correctable (sector 373266440 on sdf1).
raid5:md0: read error not correctable (sector 373266448 on sdf1).
raid5:md0: read error not correctable (sector 373266456 on sdf1).
raid5:md0: read error not correctable (sector 373266464 on sdf1).
raid5:md0: read error not correctable (sector 373266472 on sdf1).
raid5:md0: read error not correctable (sector 373266480 on sdf1).
raid5:md0: read error not correctable (sector 373266488 on sdf1).
raid5:md0: read error not correctable (sector 373266496 on sdf1).
raid5:md0: read error not correctable (sector 373266504 on sdf1).
ata8: EH complete
sd 8:4:0:0: [sdf] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 8:0:0:0: [sdb] 1953525168 512-byte hardware sectors (1000205 MB)
sd 8:0:0:0: [sdb] Write Protect is off
sd 8:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 8:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 8:1:0:0: [sdc] 1953525168 512-byte hardware sectors (1000205 MB)
sd 8:1:0:0: [sdc] Write Protect is off
sd 8:1:0:0: [sdc] Mode Sense: 00 3a 00 00
sd 8:1:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 8:2:0:0: [sdd] 1953525168 512-byte hardware sectors (1000205 MB)
sd 8:2:0:0: [sdd] Write Protect is off
sd 8:2:0:0: [sdd] Mode Sense: 00 3a 00 00
sd 8:2:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 8:3:0:0: [sde] 1953525168 512-byte hardware sectors (1000205 MB)
sd 8:3:0:0: [sde] Write Protect is off
sd 8:3:0:0: [sde] Mode Sense: 00 3a 00 00
sd 8:3:0:0: [sde] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 8:4:0:0: [sdf] 1953525168 512-byte hardware sectors (1000205 MB)
sd 8:4:0:0: [sdf] Write Protect is off
sd 8:4:0:0: [sdf] Mode Sense: 00 3a 00 00
sd 8:4:0:0: [sdf] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
Aborting journal on device md0.
md: recovery of RAID array md0
md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
md: using maximum available idle IO bandwidth (but not more than
200000 KB/sec) for recovery.
md: using 128k window, over a total of 976759808 blocks.
md: resuming recovery of md0 from checkpoint.
md: md0: recovery done.
RAID5 conf printout:
--- rd:5 wd:0
disk 4, o:0, dev:sdf1
RAID5 conf printout:
--- rd:5 wd:0
disk 4, o:0, dev:sdf1
RAID5 conf printout:
--- rd:5 wd:0
ext3_abort called.
EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted journal
Remounting filesystem read-only
__ratelimit: 57 callbacks suppressed
Buffer I/O error on device md0, logical block 122126358
lost page write due to I/O error on md0
Buffer I/O error on device md0, logical block 278462467
lost page write due to I/O error on md0
[root@tera tbostrom]# cat /proc/mdstat
Personalities : [raid6] [raid5] [raid4]
md0 : active raid5 sdb1[5](F) sdf1[6](F) sde1[7](F) sdc1[8](F) sdd1[9](F)
3907039232 blocks level 5, 256k chunk, algorithm 2 [5/0] [_____]

unused devices: <none>
[root@tera tbostrom]# mdad m -S /dev/md0
mdadm: fail to stop array /dev/md0: Device or resource busy
Perhaps a running process, mounted filesystem or active volume group?

mdadm -E /dev/sdd1
/dev/sdd1:
Magic : a92b4efc
Version : 0.90.00
UUID : b03d6cbd:0faa8837:6e16d19a:3f7b9448
Creation Time : Sun Jul 13 22:36:44 2008
Raid Level : raid5
Used Dev Size : 976759808 (931.51 GiB 1000.20 GB)
Array Size : 3907039232 (3726.04 GiB 4000.81 GB)
Raid Devices : 5
Total Devices : 5
Preferred Minor : 0

Update Time : Sun Sep 13 22:12:31 2009
State : active
Active Devices : 4
Working Devices : 4
Failed Devices : 1
Spare Devices : 0
Checksum : e414d7ac - correct
Events : 674075

Layout : left-symmetric
Chunk Size : 256K

Number Major Minor RaidDevice State
this 1 8 49 1 active sync /dev/sdd1

0 0 8 17 0 active sync /dev/sdb1
1 1 8 49 1 active sync /dev/sdd1
2 2 8 33 2 active sync /dev/sdc1
3 3 8 65 3 active sync /dev/sde1
4 4 0 0 4 faulty removed

md: recovery of RAID array md0
md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
md: using maximum available idle IO bandwidth (but not more than
200000 KB/sec) for recovery.
md: using 128k window, over a total of 976759808 blocks.
md: resuming recovery of md0 from checkpoint.
md: md0: recovery done.
RAID5 conf printout:
--- rd:5 wd:0
disk 4, o:0, dev:sdf1
md: recovery of RAID array md0
md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
md: using maximum available idle IO bandwidth (but not more than
200000 KB/sec) for recovery.
md: using 128k window, over a total of 976759808 blocks.
md: resuming recovery of md0 from checkpoint.
md: md0: recovery done.
RAID5 conf printout:
--- rd:5 wd:0
disk 4, o:0, dev:sdf1
md: recovery of RAID array md0
md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
md: using maximum available idle IO bandwidth (but not more than
200000 KB/sec) for recovery.
md: using 128k window, over a total of 976759808 blocks.
md: resuming recovery of md0 from checkpoint.
md: md0: recovery done.
ata8.04: SATA link up 1.5 Gbps (SStatus 113 SControl 310)
RAID5 conf printout:
--- rd:5 wd:0
disk 4, o:0, dev:sdf1
ata8.04: configured for UDMA/33
sd 8:4:0:0: [sdf] Result: hostbyte=DID_OK driverbyte=DRIVER_SENSE,SUGGEST_OK
sd 8:4:0:0: [sdf] Sense Key : Aborted Command [current] [descriptor]
Descriptor sense data with sense descriptors (in hex):
72 0b 47 00 00 00 00 0c 00 0a 80 00 00 00 00 00
16 3f 98 c6
sd 8:4:0:0: [sdf] Add. Sense: Scsi parity error
end_request: I/O error, dev sdf, sector 373266495
__ratelimit: 87 callbacks suppressed
raid5:md0: read error not correctable (sector 373266432 on sdf1).
raid5:md0: read error not correctable (sector 373266440 on sdf1).
raid5:md0: read error not correctable (sector 373266448 on sdf1).
raid5:md0: read error not correctable (sector 373266456 on sdf1).
raid5:md0: read error not correctable (sector 373266464 on sdf1).
raid5:md0: read error not correctable (sector 373266472 on sdf1).
raid5:md0: read error not correctable (sector 373266480 on sdf1).
raid5:md0: read error not correctable (sector 373266488 on sdf1).
raid5:md0: read error not correctable (sector 373266496 on sdf1).
raid5:md0: read error not correctable (sector 373266504 on sdf1).
ata8: EH complete
sd 8:4:0:0: [sdf] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 8:0:0:0: [sdb] 1953525168 512-byte hardware sectors (1000205 MB)
sd 8:0:0:0: [sdb] Write Protect is off
sd 8:0:0:0: [sdb] Mode Sense: 00 3a 00 00
sd 8:0:0:0: [sdb] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 8:1:0:0: [sdc] 1953525168 512-byte hardware sectors (1000205 MB)
sd 8:1:0:0: [sdc] Write Protect is off
sd 8:1:0:0: [sdc] Mode Sense: 00 3a 00 00
sd 8:1:0:0: [sdc] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 8:2:0:0: [sdd] 1953525168 512-byte hardware sectors (1000205 MB)
sd 8:2:0:0: [sdd] Write Protect is off
sd 8:2:0:0: [sdd] Mode Sense: 00 3a 00 00
sd 8:2:0:0: [sdd] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 8:3:0:0: [sde] 1953525168 512-byte hardware sectors (1000205 MB)
sd 8:3:0:0: [sde] Write Protect is off
sd 8:3:0:0: [sde] Mode Sense: 00 3a 00 00
sd 8:3:0:0: [sde] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
sd 8:4:0:0: [sdf] 1953525168 512-byte hardware sectors (1000205 MB)
sd 8:4:0:0: [sdf] Write Protect is off
sd 8:4:0:0: [sdf] Mode Sense: 00 3a 00 00
sd 8:4:0:0: [sdf] Write cache: enabled, read cache: enabled, doesn't
support DPO or FUA
Aborting journal on device md0.
md: recovery of RAID array md0
md: minimum _guaranteed_ speed: 1000 KB/sec/disk.
md: using maximum available idle IO bandwidth (but not more than
200000 KB/sec) for recovery.
md: using 128k window, over a total of 976759808 blocks.
md: resuming recovery of md0 from checkpoint.
md: md0: recovery done.
RAID5 conf printout:
--- rd:5 wd:0
disk 4, o:0, dev:sdf1
RAID5 conf printout:
--- rd:5 wd:0
disk 4, o:0, dev:sdf1
RAID5 conf printout:
--- rd:5 wd:0
ext3_abort called.
EXT3-fs error (device md0): ext3_journal_start_sb: Detected aborted journal
Remounting filesystem read-only
__ratelimit: 57 callbacks suppressed
Buffer I/O error on device md0, logical block 122126358
lost page write due to I/O error on md0
Buffer I/O error on device md0, logical block 278462467
lost page write due to I/O error on md0
md: md0 still in use.
md: md0 stopped.
md: unbind<sdb1>
md: export_rdev(sdb1)
md: unbind<sdf1>
md: export_rdev(sdf1)
md: unbind<sde1>
md: export_rdev(sde1)
md: unbind<sdc1>
md: export_rdev(sdc1)
md: unbind<sdd1>
md: export_rdev(sdd1)
md: md0 stopped.
md: bind<sdd1>
md: bind<sdc1>
md: bind<sde1>
md: bind<sdf1>
md: bind<sdb1>
md: md0 stopped.
md: unbind<sdb1>
md: export_rdev(sdb1)
md: unbind<sdf1>
md: export_rdev(sdf1)
md: unbind<sde1>
md: export_rdev(sde1)
md: unbind<sdc1>
md: export_rdev(sdc1)
md: unbind<sdd1>
md: export_rdev(sdd1)
md: bind<sdd1>
md: bind<sdc1>
md: bind<sde1>
md: bind<sdf1>
md: bind<sdb1>
md: kicking non-fresh sdf1 from array!
md: unbind<sdf1>
md: export_rdev(sdf1)
raid5: device sdb1 operational as raid disk 0
raid5: device sde1 operational as raid disk 3
raid5: device sdc1 operational as raid disk 2
raid5: device sdd1 operational as raid disk 1
raid5: allocated 5268kB for md0
raid5: raid level 5 set md0 active with 4 out of 5 devices, algorithm 2
RAID5 conf printout:



------------------------------------------



I popped the drives out of the enclosure and into the actual tower
case and connected each of them to its own SATA port. The HW resets
seemed to go away, but I couldn't get the array to come back online.
Then I did the stupid panic (following someone's advice I shouldn't
have).

thinking I should just re-create the array, I did:

mdadm --create /dev/md0 --level=5 --raid-devices=5 /dev/sd[b-f]1

Stupid me again - ignores the warning that it belongs to an array
already. I let it build for a minute or so and then tried to mount it
while rebuilding... and got error messages:

EXT3-fs: unable to read superblock
EXT3-fs: md0: couldn't mount because of unsupported optional features
(3fd18e00).

Now - I'm at a loss. I'm afraid to do anything else. I've been
viewing the FAQ and I have a few ideas, but I'm just more freaked. Is
there any hope? What should I do next without causing more trouble?


-Tim


logs below: let me know if more is needed.


--------------------------------------------------------
:mdadm.conf
# mdadm.conf written out by anaconda
#DEVICE partitions
DEVICE /dev/sd[bcdef]1

MAILADDR tbostrom@--.com

ARRAY /dev/md0 level=raid5 num-devices=5
UUID=b03d6cbd:0faa8837:6e16d19a:3f7b9448



PREVIOUS MDADM -E
------------------------

[root@tera tbostrom]# mdadm -E /dev/sdd1
/dev/sdd1:
Magic : a92b4efc
Version : 0.90.00
UUID : b03d6cbd:0faa8837:6e16d19a:3f7b9448
Creation Time : Sun Jul 13 22:36:44 2008
Raid Level : raid5
Used Dev Size : 976759808 (931.51 GiB 1000.20 GB)
Array Size : 3907039232 (3726.04 GiB 4000.81 GB)
Raid Devices : 5
Total Devices : 5
Preferred Minor : 0

Update Time : Sun Sep 13 22:12:31 2009
State : active
Active Devices : 4
Working Devices : 4
Failed Devices : 1
Spare Devices : 0
Checksum : e414d7ac - correct
Events : 674075

Layout : left-symmetric
Chunk Size : 256K

Number Major Minor RaidDevice State
this 1 8 49 1 active sync /dev/sdd1

0 0 8 17 0 active sync /dev/sdb1
1 1 8 49 1 active sync /dev/sdd1
2 2 8 33 2 active sync /dev/sdc1
3 3 8 65 3 active sync /dev/sde1
4 4 0 0 4 faulty removed
[root@tera tbostrom]# mdadm -E /dev/sdd1 [K [Kb1





CURRENT MDADM -E after my stupid mistake

[root@tera ~]# mdadm -E /dev/sdb1
/dev/sdb1:
Magic : a92b4efc
Version : 0.90.00
UUID : b096b3cc:2db97ff1:59967991:b265d5ac
Creation Time : Thu Sep 17 10:35:38 2009
Raid Level : raid5
Used Dev Size : 976759936 (931.51 GiB 1000.20 GB)
Array Size : 3907039744 (3726.04 GiB 4000.81 GB)
Raid Devices : 5
Total Devices : 5
Preferred Minor : 0

Update Time : Thu Sep 17 10:39:04 2009
State : clean
Active Devices : 4
Working Devices : 5
Failed Devices : 1
Spare Devices : 1
Checksum : 6315e811 - correct
Events : 2

Layout : left-symmetric
Chunk Size : 64K

Number Major Minor RaidDevice State
this 0 8 17 0 active sync /dev/sdb1

0 0 8 17 0 active sync /dev/sdb1
1 1 8 33 1 active sync /dev/sdc1
2 2 8 49 2 active sync /dev/sdd1
3 3 8 65 3 active sync /dev/sde1
4 4 0 0 4 faulty removed
5 5 8 81 5 spare /dev/sdf1
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Report this message

#2: Re: RAID 5 array recovery - two drives errors in external enclosure

Posted on 2009-09-17 23:22:52 by Robin Hill

--qjNfmADvan18RZcF
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu Sep 17, 2009 at 01:42:30PM -0700, Tim Bostrom wrote:

> OK,
>=20
> Let me start off by saying - I panicked. Rule #1 - don't panic. I
> did. Sorry.
>=20
> I have a RAID 5 array running on Fedora 10.
> (Linux tera.teambostrom.com 2.6.27.30-170.2.82.fc10.i686 #1 SMP Mon
> Aug 17 08:38:59 EDT 2009 i686 athlon i386 GNU/Linux)
>=20
> 5 drives in an external enclosure (AMS eSATA Venus T5). It's a
> Sil4726 inside the enclosure running to a Sil3132 controller via eSATA
> in the desktop. I had been running this setup for just over a year.
> Was working fine. I just moved into a new home and had my server
> down for a while - before I brought it back online, I got a "great
> idea" to blow out the dust from the enclosure using compressed air.
> When I finally brought up the array again, I noticed that drives were
> missing. Tried re-adding the drives to the array and had some issues
> - they seemed to get added but after a short time of rebuilding the
> array, I would get a bunch of HW resets in dmesg and then the array
> would kick out drives and stop.
>=20
<- much snippage ->

> I popped the drives out of the enclosure and into the actual tower
> case and connected each of them to its own SATA port. The HW resets
> seemed to go away, but I couldn't get the array to come back online.
> Then I did the stupid panic (following someone's advice I shouldn't
> have).
>=20
> thinking I should just re-create the array, I did:
>=20
> mdadm --create /dev/md0 --level=3D5 --raid-devices=3D5 /dev/sd[b-f]1
>=20
> Stupid me again - ignores the warning that it belongs to an array
> already. I let it build for a minute or so and then tried to mount it
> while rebuilding... and got error messages:
>=20
> EXT3-fs: unable to read superblock
> EXT3-fs: md0: couldn't mount because of unsupported optional features
> (3fd18e00).
>=20
> Now - I'm at a loss. I'm afraid to do anything else. I've been
> viewing the FAQ and I have a few ideas, but I'm just more freaked. Is
> there any hope? What should I do next without causing more trouble?
>
Looking at the mdadm output, there's a couple of possible errors.
Firstly, your newly created array has a different chunksize than your
original one. Secondly, the drives may be in the wrong order. In
either case, providing you don't _actually_ have any faulty drives, then
it should be (mostly) recoverable.=20

Given the order you specified the drives in the create, sdf1 will be the
partition that's been trashed by the rebuild, so you'll want to leave
that out altogether for now.

You need to try to recreate the array with the correct chunk size and
with the remaining drives in different orders, running a read-only
filesystem check each time until you find the correct order.

So start with:
mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bcde]1 missing

Then repeat for every possible order of the four disks and "missing",
stopping the array each time if the mount fails.

When you've finally found the correct order, you can re-add sdf1 to get
the array back to normal.

HTH,
Robin
--=20
___ =20
( ' } | Robin Hill <robin@robinhill.me.uk> |
/ / ) | Little Jim says .... |
// !! | "He fallen in de water !!" |

--qjNfmADvan18RZcF
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.11 (GNU/Linux)

iEYEARECAAYFAkqyqKsACgkQShxCyD40xBI8sACfSX6OaJwRoIhRopHtbAow OlxL
2UcAoLZWOqDaFthCZB4AssJhDnGLn3VT
=pVX/
-----END PGP SIGNATURE-----

--qjNfmADvan18RZcF--
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Report this message

#3: Re: RAID 5 array recovery - two drives errors in external enclosure

Posted on 2009-09-17 23:35:39 by majedb

Looking at your initial examine output, seems like the proper order is:=
bdce.

If the hardware resets have gone after plugging into a normal PC case,
with different SATA cables, then I'd say the cables in your external
enclosure might be the suspect here.

As Robin said, make sure you have the disks in the proper original
order as they were previously and that the chunksize is the same as
before.

This should do it: mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bdce]1 mi=
ssing
(notice the order)

On Fri, Sep 18, 2009 at 12:22 AM, Robin Hill <robin@robinhill.me.uk> wr=
ote:
> On Thu Sep 17, 2009 at 01:42:30PM -0700, Tim Bostrom wrote:
>
>> OK,
>>
>> Let me start off by saying - I panicked.  Rule #1 - don't panic=
=2E  I
>> did.  Sorry.
>>
>> I have a RAID 5 array running on Fedora 10.
>> (Linux tera.teambostrom.com 2.6.27.30-170.2.82.fc10.i686 #1 SMP Mon
>> Aug 17 08:38:59 EDT 2009 i686 athlon i386 GNU/Linux)
>>
>> 5 drives in an external enclosure (AMS eSATA Venus T5).  It's a
>> Sil4726 inside the enclosure running to a Sil3132 controller via eSA=
TA
>> in the desktop.  I had been running this setup for just over a =
year.
>> Was working fine.   I just moved into a new home and had my ser=
ver
>> down for a while  - before I brought it back online, I got a "g=
reat
>> idea" to blow out the dust from the enclosure using compressed air.
>> When I finally brought up the array again, I noticed that drives wer=
e
>> missing.  Tried re-adding the drives to the array and had some =
issues
>> - they seemed to get added but after a short time of rebuilding the
>> array, I would get a bunch of HW resets in dmesg and then the array
>> would kick out drives and stop.
>>
> <- much snippage ->
>
>> I popped the drives out of the enclosure and into the actual tower
>> case and connected each of them to its own SATA port.  The HW r=
esets
>> seemed to go away, but I couldn't get the array to come back online.
>>  Then I did the stupid panic (following someone's advice I shou=
ldn't
>> have).
>>
>> thinking I should just re-create the array, I did:
>>
>> mdadm --create /dev/md0 --level=3D5 --raid-devices=3D5 /dev/sd[b-f]1
>>
>> Stupid me again - ignores the warning that it belongs to an array
>> already.  I let it build for a minute or so and then tried to m=
ount it
>> while rebuilding... and got error messages:
>>
>> EXT3-fs: unable to read superblock
>> EXT3-fs: md0: couldn't mount because of unsupported optional feature=
s
>> (3fd18e00).
>>
>> Now - I'm at a loss.  I'm afraid to do anything else.   I'=
ve been
>> viewing the FAQ and I have a few ideas, but I'm just more freaked. =C2=
=A0Is
>> there any hope?  What should I do next without causing more tro=
uble?
>>
> Looking at the mdadm output, there's a couple of possible errors.
> Firstly, your newly created array has a different chunksize than your
> original one.  Secondly, the drives may be in the wrong order. =C2=
=A0In
> either case, providing you don't _actually_ have any faulty drives, t=
hen
> it should be (mostly) recoverable.
>
> Given the order you specified the drives in the create, sdf1 will be =
the
> partition that's been trashed by the rebuild, so you'll want to leave
> that out altogether for now.
>
> You need to try to recreate the array with the correct chunk size and
> with the remaining drives in different orders, running a read-only
> filesystem check each time until you find the correct order.
>
> So start with:
>    mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bcde]1 missin=
g
>
> Then repeat for every possible order of the four disks and "missing",
> stopping the array each time if the mount fails.
>
> When you've finally found the correct order, you can re-add sdf1 to g=
et
> the array back to normal.
>
> HTH,
>    Robin
> --
>     ___
>    ( ' }     |       Robin Hill =C2=
=A0      <robin@robinhill.me.uk> |
>   / / )      | Little Jim says ....    =
                    =
   |
>  // !!       |      "He fallen in =
de water !!"                 |
>



--=20
Majed B.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Report this message

#4: Re: RAID 5 array recovery - two drives errors in external enclosure

Posted on 2009-09-17 23:51:22 by Tim Bostrom

I think the direct SATA connections ended up making them get reversed.
sdb =3D sdf now
sdc =3D sde now
=2E.... I think....

I labeled the drives as I pulled them out of the enclosure... I'll
make sure they match up and then try your suggestions. I just now
noticed the chunk size issue as well. <ugh>


-Tim


On Thu, Sep 17, 2009 at 2:35 PM, Majed B. <majedb@gmail.com> wrote:
> Looking at your initial examine output, seems like the proper order i=
s: bdce.
>
> If the hardware resets have gone after plugging into a normal PC case=
,
> with different SATA cables, then I'd say the cables in your external
> enclosure might be the suspect here.
>
> As Robin said, make sure you have the disks in the proper original
> order as they were previously and that the chunksize is the same as
> before.
>
> This should do it: mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bdce]1 =
missing
> (notice the order)
>
> On Fri, Sep 18, 2009 at 12:22 AM, Robin Hill <robin@robinhill.me.uk> =
wrote:
>> On Thu Sep 17, 2009 at 01:42:30PM -0700, Tim Bostrom wrote:
>>
>>> OK,
>>>
>>> Let me start off by saying - I panicked. =A0Rule #1 - don't panic. =
=A0I
>>> did. =A0Sorry.
>>>
>>> I have a RAID 5 array running on Fedora 10.
>>> (Linux tera.teambostrom.com 2.6.27.30-170.2.82.fc10.i686 #1 SMP Mon
>>> Aug 17 08:38:59 EDT 2009 i686 athlon i386 GNU/Linux)
>>>
>>> 5 drives in an external enclosure (AMS eSATA Venus T5). =A0It's a
>>> Sil4726 inside the enclosure running to a Sil3132 controller via eS=
ATA
>>> in the desktop. =A0I had been running this setup for just over a ye=
ar.
>>> Was working fine. =A0 I just moved into a new home and had my serve=
r
>>> down for a while =A0- before I brought it back online, I got a "gre=
at
>>> idea" to blow out the dust from the enclosure using compressed air.
>>> When I finally brought up the array again, I noticed that drives we=
re
>>> missing. =A0Tried re-adding the drives to the array and had some is=
sues
>>> - they seemed to get added but after a short time of rebuilding the
>>> array, I would get a bunch of HW resets in dmesg and then the array
>>> would kick out drives and stop.
>>>
>> <- much snippage ->
>>
>>> I popped the drives out of the enclosure and into the actual tower
>>> case and connected each of them to its own SATA port. =A0The HW res=
ets
>>> seemed to go away, but I couldn't get the array to come back online=
=2E
>>> =A0Then I did the stupid panic (following someone's advice I should=
n't
>>> have).
>>>
>>> thinking I should just re-create the array, I did:
>>>
>>> mdadm --create /dev/md0 --level=3D5 --raid-devices=3D5 /dev/sd[b-f]=
1
>>>
>>> Stupid me again - ignores the warning that it belongs to an array
>>> already. =A0I let it build for a minute or so and then tried to mou=
nt it
>>> while rebuilding... and got error messages:
>>>
>>> EXT3-fs: unable to read superblock
>>> EXT3-fs: md0: couldn't mount because of unsupported optional featur=
es
>>> (3fd18e00).
>>>
>>> Now - I'm at a loss. =A0I'm afraid to do anything else. =A0 I've be=
en
>>> viewing the FAQ and I have a few ideas, but I'm just more freaked. =
=A0Is
>>> there any hope? =A0What should I do next without causing more troub=
le?
>>>
>> Looking at the mdadm output, there's a couple of possible errors.
>> Firstly, your newly created array has a different chunksize than you=
r
>> original one. =A0Secondly, the drives may be in the wrong order. =A0=
In
>> either case, providing you don't _actually_ have any faulty drives, =
then
>> it should be (mostly) recoverable.
>>
>> Given the order you specified the drives in the create, sdf1 will be=
the
>> partition that's been trashed by the rebuild, so you'll want to leav=
e
>> that out altogether for now.
>>
>> You need to try to recreate the array with the correct chunk size an=
d
>> with the remaining drives in different orders, running a read-only
>> filesystem check each time until you find the correct order.
>>
>> So start with:
>> =A0 =A0mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bcde]1 missing
>>
>> Then repeat for every possible order of the four disks and "missing"=
,
>> stopping the array each time if the mount fails.
>>
>> When you've finally found the correct order, you can re-add sdf1 to =
get
>> the array back to normal.
>>
>> HTH,
>> =A0 =A0Robin
>> --
>> =A0 =A0 ___
>> =A0 =A0( ' } =A0 =A0 | =A0 =A0 =A0 Robin Hill =A0 =A0 =A0 =A0<robin@=
robinhill.me.uk> |
>> =A0 / / ) =A0 =A0 =A0| Little Jim says .... =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0|
>> =A0// !! =A0 =A0 =A0 | =A0 =A0 =A0"He fallen in de water !!" =A0 =A0=
=A0 =A0 =A0 =A0 =A0 =A0 |
>>
>
>
>
> --
> =A0 =A0 =A0 Majed B.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid"=
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>



--=20
-tim
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Report this message

#5: Re: RAID 5 array recovery - two drives errors in external enclosure

Posted on 2009-09-18 00:11:14 by majedb

If you run mdadm --examine /dev/sda you'll be able to see the disks'
order in the array (and the position of the disk you're
querying/examining). The faulty one is previously known as sdf. You
can find its new name by running --examine on all disks, and the one
that shows that all disks are healthy, is sdf.

On Fri, Sep 18, 2009 at 12:51 AM, Tim Bostrom <tbostrom@gmail.com> wrot=
e:
> I think the direct SATA connections ended up making them get reversed=
=2E
> sdb =3D sdf now
> sdc =3D sde now
> ..... I think....
>
> I labeled the drives as I pulled them out of the enclosure... I'll
> make sure they match up and then try your suggestions.  I just n=
ow
> noticed the chunk size issue as well.  <ugh>
>
>
> -Tim
>
>
> On Thu, Sep 17, 2009 at 2:35 PM, Majed B. <majedb@gmail.com> wrote:
>> Looking at your initial examine output, seems like the proper order =
is: bdce.
>>
>> If the hardware resets have gone after plugging into a normal PC cas=
e,
>> with different SATA cables, then I'd say the cables in your external
>> enclosure might be the suspect here.
>>
>> As Robin said, make sure you have the disks in the proper original
>> order as they were previously and that the chunksize is the same as
>> before.
>>
>> This should do it: mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bdce]1=
missing
>> (notice the order)
>>
>> On Fri, Sep 18, 2009 at 12:22 AM, Robin Hill <robin@robinhill.me.uk>=
wrote:
>>> On Thu Sep 17, 2009 at 01:42:30PM -0700, Tim Bostrom wrote:
>>>
>>>> OK,
>>>>
>>>> Let me start off by saying - I panicked.  Rule #1 - don't pan=
ic.  I
>>>> did.  Sorry.
>>>>
>>>> I have a RAID 5 array running on Fedora 10.
>>>> (Linux tera.teambostrom.com 2.6.27.30-170.2.82.fc10.i686 #1 SMP Mo=
n
>>>> Aug 17 08:38:59 EDT 2009 i686 athlon i386 GNU/Linux)
>>>>
>>>> 5 drives in an external enclosure (AMS eSATA Venus T5).  It's=
a
>>>> Sil4726 inside the enclosure running to a Sil3132 controller via e=
SATA
>>>> in the desktop.  I had been running this setup for just over =
a year.
>>>> Was working fine.   I just moved into a new home and had my s=
erver
>>>> down for a while  - before I brought it back online, I got a =
"great
>>>> idea" to blow out the dust from the enclosure using compressed air=
=2E
>>>> When I finally brought up the array again, I noticed that drives w=
ere
>>>> missing.  Tried re-adding the drives to the array and had som=
e issues
>>>> - they seemed to get added but after a short time of rebuilding th=
e
>>>> array, I would get a bunch of HW resets in dmesg and then the arra=
y
>>>> would kick out drives and stop.
>>>>
>>> <- much snippage ->
>>>
>>>> I popped the drives out of the enclosure and into the actual tower
>>>> case and connected each of them to its own SATA port.  The HW=
resets
>>>> seemed to go away, but I couldn't get the array to come back onlin=
e.
>>>>  Then I did the stupid panic (following someone's advice I sh=
ouldn't
>>>> have).
>>>>
>>>> thinking I should just re-create the array, I did:
>>>>
>>>> mdadm --create /dev/md0 --level=3D5 --raid-devices=3D5 /dev/sd[b-f=
]1
>>>>
>>>> Stupid me again - ignores the warning that it belongs to an array
>>>> already.  I let it build for a minute or so and then tried to=
mount it
>>>> while rebuilding... and got error messages:
>>>>
>>>> EXT3-fs: unable to read superblock
>>>> EXT3-fs: md0: couldn't mount because of unsupported optional featu=
res
>>>> (3fd18e00).
>>>>
>>>> Now - I'm at a loss.  I'm afraid to do anything else.   =
I've been
>>>> viewing the FAQ and I have a few ideas, but I'm just more freaked.=
 Is
>>>> there any hope?  What should I do next without causing more t=
rouble?
>>>>
>>> Looking at the mdadm output, there's a couple of possible errors.
>>> Firstly, your newly created array has a different chunksize than yo=
ur
>>> original one.  Secondly, the drives may be in the wrong order.=
 In
>>> either case, providing you don't _actually_ have any faulty drives,=
then
>>> it should be (mostly) recoverable.
>>>
>>> Given the order you specified the drives in the create, sdf1 will b=
e the
>>> partition that's been trashed by the rebuild, so you'll want to lea=
ve
>>> that out altogether for now.
>>>
>>> You need to try to recreate the array with the correct chunk size a=
nd
>>> with the remaining drives in different orders, running a read-only
>>> filesystem check each time until you find the correct order.
>>>
>>> So start with:
>>>    mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bcde]1 miss=
ing
>>>
>>> Then repeat for every possible order of the four disks and "missing=
",
>>> stopping the array each time if the mount fails.
>>>
>>> When you've finally found the correct order, you can re-add sdf1 to=
get
>>> the array back to normal.
>>>
>>> HTH,
>>>    Robin
>>> --
>>>     ___
>>>    ( ' }     |       Robin Hill =
       <robin@robinhill.me.uk> |
>>>   / / )      | Little Jim says ....    =
                    =
   |
>>>  // !!       |      "He fallen i=
n de water !!"                 =
|
>>>
>>
>>
>>
>> --
>>       Majed B.
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid=
" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at  http://vger.kernel.org/majordomo-info.h=
tml
>>
>
>
>
> --
> -tim
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid"=
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.ht=
ml
>



--=20
Majed B.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Report this message

#6: Re: RAID 5 array recovery - two drives errors in external enclosure

Posted on 2009-09-18 00:23:51 by Tim Bostrom

OK - I was just about to ask how you both knew that the array was out o=
f order.


Thank you again.

-Tim

On Thu, Sep 17, 2009 at 3:11 PM, Majed B. <majedb@gmail.com> wrote:
> If you run mdadm --examine /dev/sda you'll be able to see the disks'
> order in the array (and the position of the disk you're
> querying/examining). The faulty one is previously known as sdf. You
> can find its new name by running --examine on all disks, and the one
> that shows that all disks are healthy, is sdf.
>
> On Fri, Sep 18, 2009 at 12:51 AM, Tim Bostrom <tbostrom@gmail.com> wr=
ote:
>> I think the direct SATA connections ended up making them get reverse=
d.
>> sdb =3D sdf now
>> sdc =3D sde now
>> ..... I think....
>>
>> I labeled the drives as I pulled them out of the enclosure... I'll
>> make sure they match up and then try your suggestions. =A0I just now
>> noticed the chunk size issue as well. =A0<ugh>
>>
>>
>> -Tim
>>
>>
>> On Thu, Sep 17, 2009 at 2:35 PM, Majed B. <majedb@gmail.com> wrote:
>>> Looking at your initial examine output, seems like the proper order=
is: bdce.
>>>
>>> If the hardware resets have gone after plugging into a normal PC ca=
se,
>>> with different SATA cables, then I'd say the cables in your externa=
l
>>> enclosure might be the suspect here.
>>>
>>> As Robin said, make sure you have the disks in the proper original
>>> order as they were previously and that the chunksize is the same as
>>> before.
>>>
>>> This should do it: mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bdce]=
1 missing
>>> (notice the order)
>>>
>>> On Fri, Sep 18, 2009 at 12:22 AM, Robin Hill <robin@robinhill.me.uk=
> wrote:
>>>> On Thu Sep 17, 2009 at 01:42:30PM -0700, Tim Bostrom wrote:
>>>>
>>>>> OK,
>>>>>
>>>>> Let me start off by saying - I panicked. =A0Rule #1 - don't panic=
=2E =A0I
>>>>> did. =A0Sorry.
>>>>>
>>>>> I have a RAID 5 array running on Fedora 10.
>>>>> (Linux tera.teambostrom.com 2.6.27.30-170.2.82.fc10.i686 #1 SMP M=
on
>>>>> Aug 17 08:38:59 EDT 2009 i686 athlon i386 GNU/Linux)
>>>>>
>>>>> 5 drives in an external enclosure (AMS eSATA Venus T5). =A0It's a
>>>>> Sil4726 inside the enclosure running to a Sil3132 controller via =
eSATA
>>>>> in the desktop. =A0I had been running this setup for just over a =
year.
>>>>> Was working fine. =A0 I just moved into a new home and had my ser=
ver
>>>>> down for a while =A0- before I brought it back online, I got a "g=
reat
>>>>> idea" to blow out the dust from the enclosure using compressed ai=
r.
>>>>> When I finally brought up the array again, I noticed that drives =
were
>>>>> missing. =A0Tried re-adding the drives to the array and had some =
issues
>>>>> - they seemed to get added but after a short time of rebuilding t=
he
>>>>> array, I would get a bunch of HW resets in dmesg and then the arr=
ay
>>>>> would kick out drives and stop.
>>>>>
>>>> <- much snippage ->
>>>>
>>>>> I popped the drives out of the enclosure and into the actual towe=
r
>>>>> case and connected each of them to its own SATA port. =A0The HW r=
esets
>>>>> seemed to go away, but I couldn't get the array to come back onli=
ne.
>>>>> =A0Then I did the stupid panic (following someone's advice I shou=
ldn't
>>>>> have).
>>>>>
>>>>> thinking I should just re-create the array, I did:
>>>>>
>>>>> mdadm --create /dev/md0 --level=3D5 --raid-devices=3D5 /dev/sd[b-=
f]1
>>>>>
>>>>> Stupid me again - ignores the warning that it belongs to an array
>>>>> already. =A0I let it build for a minute or so and then tried to m=
ount it
>>>>> while rebuilding... and got error messages:
>>>>>
>>>>> EXT3-fs: unable to read superblock
>>>>> EXT3-fs: md0: couldn't mount because of unsupported optional feat=
ures
>>>>> (3fd18e00).
>>>>>
>>>>> Now - I'm at a loss. =A0I'm afraid to do anything else. =A0 I've =
been
>>>>> viewing the FAQ and I have a few ideas, but I'm just more freaked=
=2E =A0Is
>>>>> there any hope? =A0What should I do next without causing more tro=
uble?
>>>>>
>>>> Looking at the mdadm output, there's a couple of possible errors.
>>>> Firstly, your newly created array has a different chunksize than y=
our
>>>> original one. =A0Secondly, the drives may be in the wrong order. =A0=
In
>>>> either case, providing you don't _actually_ have any faulty drives=
, then
>>>> it should be (mostly) recoverable.
>>>>
>>>> Given the order you specified the drives in the create, sdf1 will =
be the
>>>> partition that's been trashed by the rebuild, so you'll want to le=
ave
>>>> that out altogether for now.
>>>>
>>>> You need to try to recreate the array with the correct chunk size =
and
>>>> with the remaining drives in different orders, running a read-only
>>>> filesystem check each time until you find the correct order.
>>>>
>>>> So start with:
>>>> =A0 =A0mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bcde]1 missing
>>>>
>>>> Then repeat for every possible order of the four disks and "missin=
g",
>>>> stopping the array each time if the mount fails.
>>>>
>>>> When you've finally found the correct order, you can re-add sdf1 t=
o get
>>>> the array back to normal.
>>>>
>>>> HTH,
>>>> =A0 =A0Robin
>>>> --
>>>> =A0 =A0 ___
>>>> =A0 =A0( ' } =A0 =A0 | =A0 =A0 =A0 Robin Hill =A0 =A0 =A0 =A0<robi=
n@robinhill.me.uk> |
>>>> =A0 / / ) =A0 =A0 =A0| Little Jim says .... =A0 =A0 =A0 =A0 =A0 =A0=
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0|
>>>> =A0// !! =A0 =A0 =A0 | =A0 =A0 =A0"He fallen in de water !!" =A0 =A0=
=A0 =A0 =A0 =A0 =A0 =A0 |
>>>>
>>>
>>>
>>>
>>> --
>>> =A0 =A0 =A0 Majed B.
>>> --
>>> To unsubscribe from this list: send the line "unsubscribe linux-rai=
d" in
>>> the body of a message to majordomo@vger.kernel.org
>>> More majordomo info at =A0http://vger.kernel.org/majordomo-info.htm=
l
>>>
>>
>>
>>
>> --
>> -tim
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid=
" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>>
>
>
>
> --
> =A0 =A0 =A0 Majed B.
>



--=20
-tim
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Report this message

#7: Re: RAID 5 array recovery - two drives errors in external enclosure

Posted on 2009-09-18 01:11:32 by Tim Bostrom

I re-cabled the drives so that they show up as the same drive letter
as they were before when in the enclosure.

I then went ahead and tried your idea of restarting the array. I tried
this first:

mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bcde]1 missing

mount -o ro /dev/md0 /mnt/teradata

/var/log/messages:
-----------------
Sep 17 16:07:09 tera kernel: md: bind<sdb1>
Sep 17 16:07:09 tera kernel: md: bind<sdc1>
Sep 17 16:07:09 tera kernel: md: bind<sdd1>
Sep 17 16:07:09 tera kernel: md: bind<sde1>
Sep 17 16:07:09 tera kernel: raid5: device sde1 operational as raid dis=
k 3
Sep 17 16:07:09 tera kernel: raid5: device sdd1 operational as raid dis=
k 2
Sep 17 16:07:09 tera kernel: raid5: device sdc1 operational as raid dis=
k 1
Sep 17 16:07:09 tera kernel: raid5: device sdb1 operational as raid dis=
k 0
Sep 17 16:07:09 tera kernel: raid5: allocated 5268kB for md0
Sep 17 16:07:09 tera kernel: raid5: raid level 5 set md0 active with 4
out of 5 devices, algorithm 2
Sep 17 16:07:09 tera kernel: RAID5 conf printout:
Sep 17 16:07:09 tera kernel: --- rd:5 wd:4
Sep 17 16:07:09 tera kernel: disk 0, o:1, dev:sdb1
Sep 17 16:07:09 tera kernel: disk 1, o:1, dev:sdc1
Sep 17 16:07:09 tera kernel: disk 2, o:1, dev:sdd1
Sep 17 16:07:09 tera kernel: disk 3, o:1, dev:sde1
Sep 17 16:07:56 tera kernel: EXT3-fs error (device md0):
ext3_check_descriptors: Block bitmap for group 8064 not in group
(block 532677632)!
Sep 17 16:07:56 tera kernel: EXT3-fs: group descriptors corrupted!
--------------------------------


I then tried a few more permutations of the command:
mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bdce]1 missing
mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bdec]1 missing
mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[becd]1 missing

Every time I changed the order, it would still print the order the
same in the log:

Sep 17 16:02:52 tera kernel: md: bind<sdb1>
Sep 17 16:02:52 tera kernel: md: bind<sdc1>
Sep 17 16:02:52 tera kernel: md: bind<sdd1>
Sep 17 16:02:52 tera kernel: md: bind<sde1>
Sep 17 16:02:52 tera kernel: raid5: device sde1 operational as raid dis=
k 3
Sep 17 16:02:52 tera kernel: raid5: device sdd1 operational as raid dis=
k 2
Sep 17 16:02:52 tera kernel: raid5: device sdc1 operational as raid dis=
k 1
Sep 17 16:02:52 tera kernel: raid5: device sdb1 operational as raid dis=
k 0
Sep 17 16:02:52 tera kernel: raid5: allocated 5268kB for md0
Sep 17 16:02:52 tera kernel: raid5: raid level 5 set md0 active with 4
out of 5 devices, algorithm 2
Sep 17 16:02:52 tera kernel: RAID5 conf printout:
Sep 17 16:02:52 tera kernel: --- rd:5 wd:4
Sep 17 16:02:52 tera kernel: disk 0, o:1, dev:sdb1
Sep 17 16:02:52 tera kernel: disk 1, o:1, dev:sdc1
Sep 17 16:02:52 tera kernel: disk 2, o:1, dev:sdd1
Sep 17 16:02:52 tera kernel: disk 3, o:1, dev:sde1



Am I doing something wrong?




On Thu, Sep 17, 2009 at 2:22 PM, Robin Hill <robin@robinhill.me.uk> wro=
te:
> On Thu Sep 17, 2009 at 01:42:30PM -0700, Tim Bostrom wrote:
>
>> OK,
>>
>> Let me start off by saying - I panicked. =A0Rule #1 - don't panic. =A0=
I
>> did. =A0Sorry.
>>
>> I have a RAID 5 array running on Fedora 10.
>> (Linux tera.teambostrom.com 2.6.27.30-170.2.82.fc10.i686 #1 SMP Mon
>> Aug 17 08:38:59 EDT 2009 i686 athlon i386 GNU/Linux)
>>
>> 5 drives in an external enclosure (AMS eSATA Venus T5). =A0It's a
>> Sil4726 inside the enclosure running to a Sil3132 controller via eSA=
TA
>> in the desktop. =A0I had been running this setup for just over a yea=
r.
>> Was working fine. =A0 I just moved into a new home and had my server
>> down for a while =A0- before I brought it back online, I got a "grea=
t
>> idea" to blow out the dust from the enclosure using compressed air.
>> When I finally brought up the array again, I noticed that drives wer=
e
>> missing. =A0Tried re-adding the drives to the array and had some iss=
ues
>> - they seemed to get added but after a short time of rebuilding the
>> array, I would get a bunch of HW resets in dmesg and then the array
>> would kick out drives and stop.
>>
> <- much snippage ->
>
>> I popped the drives out of the enclosure and into the actual tower
>> case and connected each of them to its own SATA port. =A0The HW rese=
ts
>> seemed to go away, but I couldn't get the array to come back online.
>> =A0Then I did the stupid panic (following someone's advice I shouldn=
't
>> have).
>>
>> thinking I should just re-create the array, I did:
>>
>> mdadm --create /dev/md0 --level=3D5 --raid-devices=3D5 /dev/sd[b-f]1
>>
>> Stupid me again - ignores the warning that it belongs to an array
>> already. =A0I let it build for a minute or so and then tried to moun=
t it
>> while rebuilding... and got error messages:
>>
>> EXT3-fs: unable to read superblock
>> EXT3-fs: md0: couldn't mount because of unsupported optional feature=
s
>> (3fd18e00).
>>
>> Now - I'm at a loss. =A0I'm afraid to do anything else. =A0 I've bee=
n
>> viewing the FAQ and I have a few ideas, but I'm just more freaked. =A0=
Is
>> there any hope? =A0What should I do next without causing more troubl=
e?
>>
> Looking at the mdadm output, there's a couple of possible errors.
> Firstly, your newly created array has a different chunksize than your
> original one. =A0Secondly, the drives may be in the wrong order. =A0I=
n
> either case, providing you don't _actually_ have any faulty drives, t=
hen
> it should be (mostly) recoverable.
>
> Given the order you specified the drives in the create, sdf1 will be =
the
> partition that's been trashed by the rebuild, so you'll want to leave
> that out altogether for now.
>
> You need to try to recreate the array with the correct chunk size and
> with the remaining drives in different orders, running a read-only
> filesystem check each time until you find the correct order.
>
> So start with:
> =A0 =A0mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bcde]1 missing
>
> Then repeat for every possible order of the four disks and "missing",
> stopping the array each time if the mount fails.
>
> When you've finally found the correct order, you can re-add sdf1 to g=
et
> the array back to normal.
>
> HTH,
> =A0 =A0Robin
> --
> =A0 =A0 ___
> =A0 =A0( ' } =A0 =A0 | =A0 =A0 =A0 Robin Hill =A0 =A0 =A0 =A0<robin@r=
obinhill.me.uk> |
> =A0 / / ) =A0 =A0 =A0| Little Jim says .... =A0 =A0 =A0 =A0 =A0 =A0 =A0=
=A0 =A0 =A0 =A0 =A0 =A0 =A0|
> =A0// !! =A0 =A0 =A0 | =A0 =A0 =A0"He fallen in de water !!" =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 |
>



--=20
-tim
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Report this message

#8: Re: RAID 5 array recovery - two drives errors in external enclosure

Posted on 2009-09-18 01:28:53 by majedb

Before creating the array, did you re-examine the disks with mdadm and
made sure of each disk's position in the array?

After your recabling, the disk names may have changed again.

mdadm --examine /dev/sdb1

Number Major Minor RaidDevice State
this 7 8 17 7 active sync /dev/sdb1

0 0 8 113 0 active sync /dev/sdh1
1 1 8 97 1 active sync /dev/sdg1
2 2 0 0 2 faulty removed
3 3 0 0 3 faulty removed
4 4 8 33 4 active sync /dev/sdc1
5 5 8 65 5 active sync /dev/sde1
6 6 8 49 6 active sync /dev/sdd1
7 7 8 17 7 active sync /dev/sdb1

(That's the output of an array I'm working on)

Notice the first line: *this* and then the value of RaidDevice. That's
the position of the partition in the array. 0 is first, 1 is second,
and so on.

In my case, the order is: sdh1,sdg1,missing,missing,sdc1,sde1,sdd1,sdb1

On Fri, Sep 18, 2009 at 2:11 AM, Tim Bostrom <tbostrom@gmail.com> wrote=
:
> I re-cabled the drives so that they show up as the same drive letter
> as they were before when in the enclosure.
>
> I then went ahead and tried your idea of restarting the array. I trie=
d
> this first:
>
> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bcde]1 missing
>
> mount -o ro /dev/md0 /mnt/teradata
>
> /var/log/messages:
> -----------------
> Sep 17 16:07:09 tera kernel: md: bind<sdb1>
> Sep 17 16:07:09 tera kernel: md: bind<sdc1>
> Sep 17 16:07:09 tera kernel: md: bind<sdd1>
> Sep 17 16:07:09 tera kernel: md: bind<sde1>
> Sep 17 16:07:09 tera kernel: raid5: device sde1 operational as raid d=
isk 3
> Sep 17 16:07:09 tera kernel: raid5: device sdd1 operational as raid d=
isk 2
> Sep 17 16:07:09 tera kernel: raid5: device sdc1 operational as raid d=
isk 1
> Sep 17 16:07:09 tera kernel: raid5: device sdb1 operational as raid d=
isk 0
> Sep 17 16:07:09 tera kernel: raid5: allocated 5268kB for md0
> Sep 17 16:07:09 tera kernel: raid5: raid level 5 set md0 active with =
4
> out of 5 devices, algorithm 2
> Sep 17 16:07:09 tera kernel: RAID5 conf printout:
> Sep 17 16:07:09 tera kernel: --- rd:5 wd:4
> Sep 17 16:07:09 tera kernel: disk 0, o:1, dev:sdb1
> Sep 17 16:07:09 tera kernel: disk 1, o:1, dev:sdc1
> Sep 17 16:07:09 tera kernel: disk 2, o:1, dev:sdd1
> Sep 17 16:07:09 tera kernel: disk 3, o:1, dev:sde1
> Sep 17 16:07:56 tera kernel: EXT3-fs error (device md0):
> ext3_check_descriptors: Block bitmap for group 8064 not in group
> (block 532677632)!
> Sep 17 16:07:56 tera kernel: EXT3-fs: group descriptors corrupted!
> --------------------------------
>
>
> I then tried a few more permutations of the command:
> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bdce]1 missing
> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bdec]1 missing
> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[becd]1 missing
>
> Every time I changed the order, it would still print the order the
> same in the log:
>
> Sep 17 16:02:52 tera kernel: md: bind<sdb1>
> Sep 17 16:02:52 tera kernel: md: bind<sdc1>
> Sep 17 16:02:52 tera kernel: md: bind<sdd1>
> Sep 17 16:02:52 tera kernel: md: bind<sde1>
> Sep 17 16:02:52 tera kernel: raid5: device sde1 operational as raid d=
isk 3
> Sep 17 16:02:52 tera kernel: raid5: device sdd1 operational as raid d=
isk 2
> Sep 17 16:02:52 tera kernel: raid5: device sdc1 operational as raid d=
isk 1
> Sep 17 16:02:52 tera kernel: raid5: device sdb1 operational as raid d=
isk 0
> Sep 17 16:02:52 tera kernel: raid5: allocated 5268kB for md0
> Sep 17 16:02:52 tera kernel: raid5: raid level 5 set md0 active with =
4
> out of 5 devices, algorithm 2
> Sep 17 16:02:52 tera kernel: RAID5 conf printout:
> Sep 17 16:02:52 tera kernel: --- rd:5 wd:4
> Sep 17 16:02:52 tera kernel: disk 0, o:1, dev:sdb1
> Sep 17 16:02:52 tera kernel: disk 1, o:1, dev:sdc1
> Sep 17 16:02:52 tera kernel: disk 2, o:1, dev:sdd1
> Sep 17 16:02:52 tera kernel: disk 3, o:1, dev:sde1
>
>
>
> Am I doing something wrong?
>
>
>
>
> On Thu, Sep 17, 2009 at 2:22 PM, Robin Hill <robin@robinhill.me.uk> w=
rote:
>> On Thu Sep 17, 2009 at 01:42:30PM -0700, Tim Bostrom wrote:
>>
>>> OK,
>>>
>>> Let me start off by saying - I panicked.  Rule #1 - don't pani=
c.  I
>>> did.  Sorry.
>>>
>>> I have a RAID 5 array running on Fedora 10.
>>> (Linux tera.teambostrom.com 2.6.27.30-170.2.82.fc10.i686 #1 SMP Mon
>>> Aug 17 08:38:59 EDT 2009 i686 athlon i386 GNU/Linux)
>>>
>>> 5 drives in an external enclosure (AMS eSATA Venus T5).  It's =
a
>>> Sil4726 inside the enclosure running to a Sil3132 controller via eS=
ATA
>>> in the desktop.  I had been running this setup for just over a=
year.
>>> Was working fine.   I just moved into a new home and had my se=
rver
>>> down for a while  - before I brought it back online, I got a "=
great
>>> idea" to blow out the dust from the enclosure using compressed air.
>>> When I finally brought up the array again, I noticed that drives we=
re
>>> missing.  Tried re-adding the drives to the array and had some=
issues
>>> - they seemed to get added but after a short time of rebuilding the
>>> array, I would get a bunch of HW resets in dmesg and then the array
>>> would kick out drives and stop.
>>>
>> <- much snippage ->
>>
>>> I popped the drives out of the enclosure and into the actual tower
>>> case and connected each of them to its own SATA port.  The HW =
resets
>>> seemed to go away, but I couldn't get the array to come back online=
=2E
>>>  Then I did the stupid panic (following someone's advice I sho=
uldn't
>>> have).
>>>
>>> thinking I should just re-create the array, I did:
>>>
>>> mdadm --create /dev/md0 --level=3D5 --raid-devices=3D5 /dev/sd[b-f]=
1
>>>
>>> Stupid me again - ignores the warning that it belongs to an array
>>> already.  I let it build for a minute or so and then tried to =
mount it
>>> while rebuilding... and got error messages:
>>>
>>> EXT3-fs: unable to read superblock
>>> EXT3-fs: md0: couldn't mount because of unsupported optional featur=
es
>>> (3fd18e00).
>>>
>>> Now - I'm at a loss.  I'm afraid to do anything else.   I=
've been
>>> viewing the FAQ and I have a few ideas, but I'm just more freaked. =
 Is
>>> there any hope?  What should I do next without causing more tr=
ouble?
>>>
>> Looking at the mdadm output, there's a couple of possible errors.
>> Firstly, your newly created array has a different chunksize than you=
r
>> original one.  Secondly, the drives may be in the wrong order. =
 In
>> either case, providing you don't _actually_ have any faulty drives, =
then
>> it should be (mostly) recoverable.
>>
>> Given the order you specified the drives in the create, sdf1 will be=
the
>> partition that's been trashed by the rebuild, so you'll want to leav=
e
>> that out altogether for now.
>>
>> You need to try to recreate the array with the correct chunk size an=
d
>> with the remaining drives in different orders, running a read-only
>> filesystem check each time until you find the correct order.
>>
>> So start with:
>>    mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bcde]1 missi=
ng
>>
>> Then repeat for every possible order of the four disks and "missing"=
,
>> stopping the array each time if the mount fails.
>>
>> When you've finally found the correct order, you can re-add sdf1 to =
get
>> the array back to normal.
>>
>> HTH,
>>    Robin
>> --
>>     ___
>>    ( ' }     |       Robin Hill =C2=
=A0      <robin@robinhill.me.uk> |
>>   / / )      | Little Jim says ....    =
                    =
   |
>>  // !!       |      "He fallen in=
de water !!"                 |
>>
>
>
>
> --
> -tim
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid"=
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.ht=
ml
>



--=20
Majed B.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Report this message

#9: Re: RAID 5 array recovery - two drives errors in external enclosure

Posted on 2009-09-18 01:54:55 by Tim Bostrom

It's still showing the order that you had previously posted: [bcde]
(see log below)

It appears that trying different permutations isn't yielding any
change. I haven't tried every permutation, but are these commands
supposed to yield different effects? They seem to always build the
array as [bcde] no matter what. Or should I be swapping around the
cables on the drives?

>> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bdce]1 missing
>> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bdec]1 missing
>> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[becd]1 missing


-Tim

[root@tera ~]# mdadm --examine /dev/sdb1
/dev/sdb1:
Magic : a92b4efc
Version : 0.90.00
UUID : 9fefb6ce:dcbfe649:f456b3f0:371e8bcc
Creation Time : Thu Sep 17 16:13:45 2009
Raid Level : raid5
Used Dev Size : 976759808 (931.51 GiB 1000.20 GB)
Array Size : 3907039232 (3726.04 GiB 4000.81 GB)
Raid Devices : 5
Total Devices : 5
Preferred Minor : 0

Update Time : Thu Sep 17 16:13:45 2009
State : clean
Active Devices : 4
Working Devices : 4
Failed Devices : 1
Spare Devices : 0
Checksum : 20f1deab - correct
Events : 1

Layout : left-symmetric
Chunk Size : 256K

Number Major Minor RaidDevice State
this 0 8 17 0 active sync /dev/sdb1

0 0 8 17 0 active sync /dev/sdb1
1 1 8 33 1 active sync /dev/sdc1
2 2 8 49 2 active sync /dev/sdd1
3 3 8 65 3 active sync /dev/sde1
4 4 0 0 4 faulty



On Thu, Sep 17, 2009 at 4:28 PM, Majed B. <majedb@gmail.com> wrote:
> Before creating the array, did you re-examine the disks with mdadm an=
d
> made sure of each disk's position in the array?
>
> After your recabling, the disk names may have changed again.
>
> mdadm --examine /dev/sdb1
>
> =A0 =A0 =A0Number =A0 Major =A0 Minor =A0 RaidDevice State
> this =A0 =A0 7 =A0 =A0 =A0 8 =A0 =A0 =A0 17 =A0 =A0 =A0 =A07 =A0 =A0 =
=A0active sync =A0 /dev/sdb1
>
> =A0 0 =A0 =A0 0 =A0 =A0 =A0 8 =A0 =A0 =A0113 =A0 =A0 =A0 =A00 =A0 =A0=
=A0active sync =A0 /dev/sdh1
> =A0 1 =A0 =A0 1 =A0 =A0 =A0 8 =A0 =A0 =A0 97 =A0 =A0 =A0 =A01 =A0 =A0=
=A0active sync =A0 /dev/sdg1
> =A0 2 =A0 =A0 2 =A0 =A0 =A0 0 =A0 =A0 =A0 =A00 =A0 =A0 =A0 =A02 =A0 =A0=
=A0faulty removed
> =A0 3 =A0 =A0 3 =A0 =A0 =A0 0 =A0 =A0 =A0 =A00 =A0 =A0 =A0 =A03 =A0 =A0=
=A0faulty removed
> =A0 4 =A0 =A0 4 =A0 =A0 =A0 8 =A0 =A0 =A0 33 =A0 =A0 =A0 =A04 =A0 =A0=
=A0active sync =A0 /dev/sdc1
> =A0 5 =A0 =A0 5 =A0 =A0 =A0 8 =A0 =A0 =A0 65 =A0 =A0 =A0 =A05 =A0 =A0=
=A0active sync =A0 /dev/sde1
> =A0 6 =A0 =A0 6 =A0 =A0 =A0 8 =A0 =A0 =A0 49 =A0 =A0 =A0 =A06 =A0 =A0=
=A0active sync =A0 /dev/sdd1
> =A0 7 =A0 =A0 7 =A0 =A0 =A0 8 =A0 =A0 =A0 17 =A0 =A0 =A0 =A07 =A0 =A0=
=A0active sync =A0 /dev/sdb1
>
> (That's the output of an array I'm working on)
>
> Notice the first line: *this* and then the value of RaidDevice. That'=
s
> the position of the partition in the array. 0 is first, 1 is second,
> and so on.
>
> In my case, the order is: sdh1,sdg1,missing,missing,sdc1,sde1,sdd1,sd=
b1
>
> On Fri, Sep 18, 2009 at 2:11 AM, Tim Bostrom <tbostrom@gmail.com> wro=
te:
>> I re-cabled the drives so that they show up as the same drive letter
>> as they were before when in the enclosure.
>>
>> I then went ahead and tried your idea of restarting the array. I tri=
ed
>> this first:
>>
>> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bcde]1 missing
>>
>> mount -o ro /dev/md0 /mnt/teradata
>>
>> /var/log/messages:
>> -----------------
>> Sep 17 16:07:09 tera kernel: md: bind<sdb1>
>> Sep 17 16:07:09 tera kernel: md: bind<sdc1>
>> Sep 17 16:07:09 tera kernel: md: bind<sdd1>
>> Sep 17 16:07:09 tera kernel: md: bind<sde1>
>> Sep 17 16:07:09 tera kernel: raid5: device sde1 operational as raid =
disk 3
>> Sep 17 16:07:09 tera kernel: raid5: device sdd1 operational as raid =
disk 2
>> Sep 17 16:07:09 tera kernel: raid5: device sdc1 operational as raid =
disk 1
>> Sep 17 16:07:09 tera kernel: raid5: device sdb1 operational as raid =
disk 0
>> Sep 17 16:07:09 tera kernel: raid5: allocated 5268kB for md0
>> Sep 17 16:07:09 tera kernel: raid5: raid level 5 set md0 active with=
4
>> out of 5 devices, algorithm 2
>> Sep 17 16:07:09 tera kernel: RAID5 conf printout:
>> Sep 17 16:07:09 tera kernel: --- rd:5 wd:4
>> Sep 17 16:07:09 tera kernel: disk 0, o:1, dev:sdb1
>> Sep 17 16:07:09 tera kernel: disk 1, o:1, dev:sdc1
>> Sep 17 16:07:09 tera kernel: disk 2, o:1, dev:sdd1
>> Sep 17 16:07:09 tera kernel: disk 3, o:1, dev:sde1
>> Sep 17 16:07:56 tera kernel: EXT3-fs error (device md0):
>> ext3_check_descriptors: Block bitmap for group 8064 not in group
>> (block 532677632)!
>> Sep 17 16:07:56 tera kernel: EXT3-fs: group descriptors corrupted!
>> --------------------------------
>>
>>
>> I then tried a few more permutations of the command:
>> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bdce]1 missing
>> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bdec]1 missing
>> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[becd]1 missing
>>
>> Every time I changed the order, it would still print the order the
>> same in the log:
>>
>> Sep 17 16:02:52 tera kernel: md: bind<sdb1>
>> Sep 17 16:02:52 tera kernel: md: bind<sdc1>
>> Sep 17 16:02:52 tera kernel: md: bind<sdd1>
>> Sep 17 16:02:52 tera kernel: md: bind<sde1>
>> Sep 17 16:02:52 tera kernel: raid5: device sde1 operational as raid =
disk 3
>> Sep 17 16:02:52 tera kernel: raid5: device sdd1 operational as raid =
disk 2
>> Sep 17 16:02:52 tera kernel: raid5: device sdc1 operational as raid =
disk 1
>> Sep 17 16:02:52 tera kernel: raid5: device sdb1 operational as raid =
disk 0
>> Sep 17 16:02:52 tera kernel: raid5: allocated 5268kB for md0
>> Sep 17 16:02:52 tera kernel: raid5: raid level 5 set md0 active with=
4
>> out of 5 devices, algorithm 2
>> Sep 17 16:02:52 tera kernel: RAID5 conf printout:
>> Sep 17 16:02:52 tera kernel: --- rd:5 wd:4
>> Sep 17 16:02:52 tera kernel: disk 0, o:1, dev:sdb1
>> Sep 17 16:02:52 tera kernel: disk 1, o:1, dev:sdc1
>> Sep 17 16:02:52 tera kernel: disk 2, o:1, dev:sdd1
>> Sep 17 16:02:52 tera kernel: disk 3, o:1, dev:sde1
>>
>>
>>
>> Am I doing something wrong?
>>
>>
>>
>>
>> On Thu, Sep 17, 2009 at 2:22 PM, Robin Hill <robin@robinhill.me.uk> =
wrote:
>>> On Thu Sep 17, 2009 at 01:42:30PM -0700, Tim Bostrom wrote:
>>>
>>>> OK,
>>>>
>>>> Let me start off by saying - I panicked. =A0Rule #1 - don't panic.=
=A0I
>>>> did. =A0Sorry.
>>>>
>>>> I have a RAID 5 array running on Fedora 10.
>>>> (Linux tera.teambostrom.com 2.6.27.30-170.2.82.fc10.i686 #1 SMP Mo=
n
>>>> Aug 17 08:38:59 EDT 2009 i686 athlon i386 GNU/Linux)
>>>>
>>>> 5 drives in an external enclosure (AMS eSATA Venus T5). =A0It's a
>>>> Sil4726 inside the enclosure running to a Sil3132 controller via e=
SATA
>>>> in the desktop. =A0I had been running this setup for just over a y=
ear.
>>>> Was working fine. =A0 I just moved into a new home and had my serv=
er
>>>> down for a while =A0- before I brought it back online, I got a "gr=
eat
>>>> idea" to blow out the dust from the enclosure using compressed air=
=2E
>>>> When I finally brought up the array again, I noticed that drives w=
ere
>>>> missing. =A0Tried re-adding the drives to the array and had some i=
ssues
>>>> - they seemed to get added but after a short time of rebuilding th=
e
>>>> array, I would get a bunch of HW resets in dmesg and then the arra=
y
>>>> would kick out drives and stop.
>>>>
>>> <- much snippage ->
>>>
>>>> I popped the drives out of the enclosure and into the actual tower
>>>> case and connected each of them to its own SATA port. =A0The HW re=
sets
>>>> seemed to go away, but I couldn't get the array to come back onlin=
e.
>>>> =A0Then I did the stupid panic (following someone's advice I shoul=
dn't
>>>> have).
>>>>
>>>> thinking I should just re-create the array, I did:
>>>>
>>>> mdadm --create /dev/md0 --level=3D5 --raid-devices=3D5 /dev/sd[b-f=
]1
>>>>
>>>> Stupid me again - ignores the warning that it belongs to an array
>>>> already. =A0I let it build for a minute or so and then tried to mo=
unt it
>>>> while rebuilding... and got error messages:
>>>>
>>>> EXT3-fs: unable to read superblock
>>>> EXT3-fs: md0: couldn't mount because of unsupported optional featu=
res
>>>> (3fd18e00).
>>>>
>>>> Now - I'm at a loss. =A0I'm afraid to do anything else. =A0 I've b=
een
>>>> viewing the FAQ and I have a few ideas, but I'm just more freaked.=
=A0Is
>>>> there any hope? =A0What should I do next without causing more trou=
ble?
>>>>
>>> Looking at the mdadm output, there's a couple of possible errors.
>>> Firstly, your newly created array has a different chunksize than yo=
ur
>>> original one. =A0Secondly, the drives may be in the wrong order. =A0=
In
>>> either case, providing you don't _actually_ have any faulty drives,=
then
>>> it should be (mostly) recoverable.
>>>
>>> Given the order you specified the drives in the create, sdf1 will b=
e the
>>> partition that's been trashed by the rebuild, so you'll want to lea=
ve
>>> that out altogether for now.
>>>
>>> You need to try to recreate the array with the correct chunk size a=
nd
>>> with the remaining drives in different orders, running a read-only
>>> filesystem check each time until you find the correct order.
>>>
>>> So start with:
>>> =A0 =A0mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bcde]1 missing
>>>
>>> Then repeat for every possible order of the four disks and "missing=
",
>>> stopping the array each time if the mount fails.
>>>
>>> When you've finally found the correct order, you can re-add sdf1 to=
get
>>> the array back to normal.
>>>
>>> HTH,
>>> =A0 =A0Robin
>>> --
>>> =A0 =A0 ___
>>> =A0 =A0( ' } =A0 =A0 | =A0 =A0 =A0 Robin Hill =A0 =A0 =A0 =A0<robin=
@robinhill.me.uk> |
>>> =A0 / / ) =A0 =A0 =A0| Little Jim says .... =A0 =A0 =A0 =A0 =A0 =A0=
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0|
>>> =A0// !! =A0 =A0 =A0 | =A0 =A0 =A0"He fallen in de water !!" =A0 =A0=
=A0 =A0 =A0 =A0 =A0 =A0 |
>>>
>>
>>
>>
>> --
>> -tim
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-raid=
" in
>> the body of a message to majordomo@vger.kernel.org
>> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>>
>
>
>
> --
> =A0 =A0 =A0 Majed B.
>



--=20
-tim
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Report this message

#10: RE: RAID 5 array recovery - two drives errors in external enclosure

Posted on 2009-09-18 03:31:04 by Guy Watkins

It is the way you list the drives. Look at this command:
# echo /dev/sd[bdce]1
/dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1

Notice the output is not in the same order as in the command. You shou=
ld
list each disk in the order you want. Like this:
mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/s=
de1
missing

I hope this helps.

} -----Original Message-----
} From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
} owner@vger.kernel.org] On Behalf Of Tim Bostrom
} Sent: Thursday, September 17, 2009 7:55 PM
} To: linux-raid
} Subject: Re: RAID 5 array recovery - two drives errors in external
} enclosure
}=20
} It's still showing the order that you had previously posted: [bcde]
} (see log below)
}=20
} It appears that trying different permutations isn't yielding any
} change. I haven't tried every permutation, but are these commands
} supposed to yield different effects? They seem to always build the
} array as [bcde] no matter what. Or should I be swapping around the
} cables on the drives?
}=20
} >> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bdce]1 missing
} >> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bdec]1 missing
} >> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[becd]1 missing
}=20
}=20
} -Tim
}=20
} [root@tera ~]# mdadm --examine /dev/sdb1
} /dev/sdb1:
} Magic : a92b4efc
} Version : 0.90.00
} UUID : 9fefb6ce:dcbfe649:f456b3f0:371e8bcc
} Creation Time : Thu Sep 17 16:13:45 2009
} Raid Level : raid5
} Used Dev Size : 976759808 (931.51 GiB 1000.20 GB)
} Array Size : 3907039232 (3726.04 GiB 4000.81 GB)
} Raid Devices : 5
} Total Devices : 5
} Preferred Minor : 0
}=20
} Update Time : Thu Sep 17 16:13:45 2009
} State : clean
} Active Devices : 4
} Working Devices : 4
} Failed Devices : 1
} Spare Devices : 0
} Checksum : 20f1deab - correct
} Events : 1
}=20
} Layout : left-symmetric
} Chunk Size : 256K
}=20
} Number Major Minor RaidDevice State
} this 0 8 17 0 active sync /dev/sdb1
}=20
} 0 0 8 17 0 active sync /dev/sdb1
} 1 1 8 33 1 active sync /dev/sdc1
} 2 2 8 49 2 active sync /dev/sdd1
} 3 3 8 65 3 active sync /dev/sde1
} 4 4 0 0 4 faulty
}=20
}=20
}=20
} On Thu, Sep 17, 2009 at 4:28 PM, Majed B. <majedb@gmail.com> wrote:
} > Before creating the array, did you re-examine the disks with mdadm =
and
} > made sure of each disk's position in the array?
} >
} > After your recabling, the disk names may have changed again.
} >
} > mdadm --examine /dev/sdb1
} >
} > =A0 =A0 =A0Number =A0 Major =A0 Minor =A0 RaidDevice State
} > this =A0 =A0 7 =A0 =A0 =A0 8 =A0 =A0 =A0 17 =A0 =A0 =A0 =A07 =A0 =A0=
=A0active sync =A0 /dev/sdb1
} >
} > =A0 0 =A0 =A0 0 =A0 =A0 =A0 8 =A0 =A0 =A0113 =A0 =A0 =A0 =A00 =A0 =A0=
=A0active sync =A0 /dev/sdh1
} > =A0 1 =A0 =A0 1 =A0 =A0 =A0 8 =A0 =A0 =A0 97 =A0 =A0 =A0 =A01 =A0 =A0=
=A0active sync =A0 /dev/sdg1
} > =A0 2 =A0 =A0 2 =A0 =A0 =A0 0 =A0 =A0 =A0 =A00 =A0 =A0 =A0 =A02 =A0=
=A0 =A0faulty removed
} > =A0 3 =A0 =A0 3 =A0 =A0 =A0 0 =A0 =A0 =A0 =A00 =A0 =A0 =A0 =A03 =A0=
=A0 =A0faulty removed
} > =A0 4 =A0 =A0 4 =A0 =A0 =A0 8 =A0 =A0 =A0 33 =A0 =A0 =A0 =A04 =A0 =A0=
=A0active sync =A0 /dev/sdc1
} > =A0 5 =A0 =A0 5 =A0 =A0 =A0 8 =A0 =A0 =A0 65 =A0 =A0 =A0 =A05 =A0 =A0=
=A0active sync =A0 /dev/sde1
} > =A0 6 =A0 =A0 6 =A0 =A0 =A0 8 =A0 =A0 =A0 49 =A0 =A0 =A0 =A06 =A0 =A0=
=A0active sync =A0 /dev/sdd1
} > =A0 7 =A0 =A0 7 =A0 =A0 =A0 8 =A0 =A0 =A0 17 =A0 =A0 =A0 =A07 =A0 =A0=
=A0active sync =A0 /dev/sdb1
} >
} > (That's the output of an array I'm working on)
} >
} > Notice the first line: *this* and then the value of RaidDevice. Tha=
t's
} > the position of the partition in the array. 0 is first, 1 is second=
,
} > and so on.
} >
} > In my case, the order is: sdh1,sdg1,missing,missing,sdc1,sde1,sdd1,=
sdb1
} >
} > On Fri, Sep 18, 2009 at 2:11 AM, Tim Bostrom <tbostrom@gmail.com> w=
rote:
} >> I re-cabled the drives so that they show up as the same drive lett=
er
} >> as they were before when in the enclosure.
} >>
} >> I then went ahead and tried your idea of restarting the array. I t=
ried
} >> this first:
} >>
} >> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bcde]1 missing
} >>
} >> mount -o ro /dev/md0 /mnt/teradata
} >>
} >> /var/log/messages:
} >> -----------------
} >> Sep 17 16:07:09 tera kernel: md: bind<sdb1>
} >> Sep 17 16:07:09 tera kernel: md: bind<sdc1>
} >> Sep 17 16:07:09 tera kernel: md: bind<sdd1>
} >> Sep 17 16:07:09 tera kernel: md: bind<sde1>
} >> Sep 17 16:07:09 tera kernel: raid5: device sde1 operational as rai=
d
} disk 3
} >> Sep 17 16:07:09 tera kernel: raid5: device sdd1 operational as rai=
d
} disk 2
} >> Sep 17 16:07:09 tera kernel: raid5: device sdc1 operational as rai=
d
} disk 1
} >> Sep 17 16:07:09 tera kernel: raid5: device sdb1 operational as rai=
d
} disk 0
} >> Sep 17 16:07:09 tera kernel: raid5: allocated 5268kB for md0
} >> Sep 17 16:07:09 tera kernel: raid5: raid level 5 set md0 active wi=
th 4
} >> out of 5 devices, algorithm 2
} >> Sep 17 16:07:09 tera kernel: RAID5 conf printout:
} >> Sep 17 16:07:09 tera kernel: --- rd:5 wd:4
} >> Sep 17 16:07:09 tera kernel: disk 0, o:1, dev:sdb1
} >> Sep 17 16:07:09 tera kernel: disk 1, o:1, dev:sdc1
} >> Sep 17 16:07:09 tera kernel: disk 2, o:1, dev:sdd1
} >> Sep 17 16:07:09 tera kernel: disk 3, o:1, dev:sde1
} >> Sep 17 16:07:56 tera kernel: EXT3-fs error (device md0):
} >> ext3_check_descriptors: Block bitmap for group 8064 not in group
} >> (block 532677632)!
} >> Sep 17 16:07:56 tera kernel: EXT3-fs: group descriptors corrupted!
} >> --------------------------------
} >>
} >>
} >> I then tried a few more permutations of the command:
} >> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bdce]1 missing
} >> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bdec]1 missing
} >> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[becd]1 missing
} >>
} >> Every time I changed the order, it would still print the order the
} >> same in the log:
} >>
} >> Sep 17 16:02:52 tera kernel: md: bind<sdb1>
} >> Sep 17 16:02:52 tera kernel: md: bind<sdc1>
} >> Sep 17 16:02:52 tera kernel: md: bind<sdd1>
} >> Sep 17 16:02:52 tera kernel: md: bind<sde1>
} >> Sep 17 16:02:52 tera kernel: raid5: device sde1 operational as rai=
d
} disk 3
} >> Sep 17 16:02:52 tera kernel: raid5: device sdd1 operational as rai=
d
} disk 2
} >> Sep 17 16:02:52 tera kernel: raid5: device sdc1 operational as rai=
d
} disk 1
} >> Sep 17 16:02:52 tera kernel: raid5: device sdb1 operational as rai=
d
} disk 0
} >> Sep 17 16:02:52 tera kernel: raid5: allocated 5268kB for md0
} >> Sep 17 16:02:52 tera kernel: raid5: raid level 5 set md0 active wi=
th 4
} >> out of 5 devices, algorithm 2
} >> Sep 17 16:02:52 tera kernel: RAID5 conf printout:
} >> Sep 17 16:02:52 tera kernel: --- rd:5 wd:4
} >> Sep 17 16:02:52 tera kernel: disk 0, o:1, dev:sdb1
} >> Sep 17 16:02:52 tera kernel: disk 1, o:1, dev:sdc1
} >> Sep 17 16:02:52 tera kernel: disk 2, o:1, dev:sdd1
} >> Sep 17 16:02:52 tera kernel: disk 3, o:1, dev:sde1
} >>
} >>
} >>
} >> Am I doing something wrong?
} >>
} >>
} >>
} >>
} >> On Thu, Sep 17, 2009 at 2:22 PM, Robin Hill <robin@robinhill.me.uk=
>
} wrote:
} >>> On Thu Sep 17, 2009 at 01:42:30PM -0700, Tim Bostrom wrote:
} >>>
} >>>> OK,
} >>>>
} >>>> Let me start off by saying - I panicked. =A0Rule #1 - don't pani=
c. =A0I
} >>>> did. =A0Sorry.
} >>>>
} >>>> I have a RAID 5 array running on Fedora 10.
} >>>> (Linux tera.teambostrom.com 2.6.27.30-170.2.82.fc10.i686 #1 SMP =
Mon
} >>>> Aug 17 08:38:59 EDT 2009 i686 athlon i386 GNU/Linux)
} >>>>
} >>>> 5 drives in an external enclosure (AMS eSATA Venus T5). =A0It's =
a
} >>>> Sil4726 inside the enclosure running to a Sil3132 controller via
} eSATA
} >>>> in the desktop. =A0I had been running this setup for just over a=
year.
} >>>> Was working fine. =A0 I just moved into a new home and had my se=
rver
} >>>> down for a while =A0- before I brought it back online, I got a "=
great
} >>>> idea" to blow out the dust from the enclosure using compressed a=
ir.
} >>>> When I finally brought up the array again, I noticed that drives=
were
} >>>> missing. =A0Tried re-adding the drives to the array and had some=
issues
} >>>> - they seemed to get added but after a short time of rebuilding =
the
} >>>> array, I would get a bunch of HW resets in dmesg and then the ar=
ray
} >>>> would kick out drives and stop.
} >>>>
} >>> <- much snippage ->
} >>>
} >>>> I popped the drives out of the enclosure and into the actual tow=
er
} >>>> case and connected each of them to its own SATA port. =A0The HW =
resets
} >>>> seemed to go away, but I couldn't get the array to come back onl=
ine.
} >>>> =A0Then I did the stupid panic (following someone's advice I sho=
uldn't
} >>>> have).
} >>>>
} >>>> thinking I should just re-create the array, I did:
} >>>>
} >>>> mdadm --create /dev/md0 --level=3D5 --raid-devices=3D5 /dev/sd[b=
-f]1
} >>>>
} >>>> Stupid me again - ignores the warning that it belongs to an arra=
y
} >>>> already. =A0I let it build for a minute or so and then tried to =
mount
} it
} >>>> while rebuilding... and got error messages:
} >>>>
} >>>> EXT3-fs: unable to read superblock
} >>>> EXT3-fs: md0: couldn't mount because of unsupported optional fea=
tures
} >>>> (3fd18e00).
} >>>>
} >>>> Now - I'm at a loss. =A0I'm afraid to do anything else. =A0 I've=
been
} >>>> viewing the FAQ and I have a few ideas, but I'm just more freake=
d.
} =A0Is
} >>>> there any hope? =A0What should I do next without causing more tr=
ouble?
} >>>>
} >>> Looking at the mdadm output, there's a couple of possible errors.
} >>> Firstly, your newly created array has a different chunksize than =
your
} >>> original one. =A0Secondly, the drives may be in the wrong order. =
=A0In
} >>> either case, providing you don't _actually_ have any faulty drive=
s,
} then
} >>> it should be (mostly) recoverable.
} >>>
} >>> Given the order you specified the drives in the create, sdf1 will=
be
} the
} >>> partition that's been trashed by the rebuild, so you'll want to l=
eave
} >>> that out altogether for now.
} >>>
} >>> You need to try to recreate the array with the correct chunk size=
and
} >>> with the remaining drives in different orders, running a read-onl=
y
} >>> filesystem check each time until you find the correct order.
} >>>
} >>> So start with:
} >>> =A0 =A0mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bcde]1 missing
} >>>
} >>> Then repeat for every possible order of the four disks and "missi=
ng",
} >>> stopping the array each time if the mount fails.
} >>>
} >>> When you've finally found the correct order, you can re-add sdf1 =
to
} get
} >>> the array back to normal.
} >>>
} >>> HTH,
} >>> =A0 =A0Robin
} >>> --
} >>> =A0 =A0 ___
} >>> =A0 =A0( ' } =A0 =A0 | =A0 =A0 =A0 Robin Hill =A0 =A0 =A0 =A0<rob=
in@robinhill.me.uk> |
} >>> =A0 / / ) =A0 =A0 =A0| Little Jim says .... =A0 =A0 =A0 =A0 =A0 =A0=
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0|
} >>> =A0// !! =A0 =A0 =A0 | =A0 =A0 =A0"He fallen in de water !!" =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 =A0 |
} >>>
} >>
} >>
} >>
} >> --
} >> -tim
} >> --
} >> To unsubscribe from this list: send the line "unsubscribe linux-ra=
id"
} in
} >> the body of a message to majordomo@vger.kernel.org
} >> More majordomo info at =A0http://vger.kernel.org/majordomo-info.ht=
ml
} >>
} >
} >
} >
} > --
} > =A0 =A0 =A0 Majed B.
} >
}=20
}=20
}=20
} --
} -tim
} --
} To unsubscribe from this list: send the line "unsubscribe linux-raid"=
in
} the body of a message to majordomo@vger.kernel.org
} More majordomo info at http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Report this message

#11: Re: RAID 5 array recovery - two drives errors in external enclosure

Posted on 2009-09-18 05:50:21 by Tim Bostrom

I just noticed that my bootable flag is set on two of the disks.
Would that cause any issue?

Disk /dev/sdb: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units =3D cylinders of 16065 * 512 =3D 8225280 bytes
Disk identifier: 0x000e1d5a

Device Boot Start End Blocks Id System
/dev/sdb1 * 1 121601 976760001 fd Linux raid auto=
detect

Disk /dev/sdc: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units =3D cylinders of 16065 * 512 =3D 8225280 bytes
Disk identifier: 0x323eeffc

Device Boot Start End Blocks Id System
/dev/sdc1 1 121601 976760001 fd Linux raid auto=
detect

Disk /dev/sdd: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units =3D cylinders of 16065 * 512 =3D 8225280 bytes
Disk identifier: 0xd98df0ac

Device Boot Start End Blocks Id System
/dev/sdd1 1 121601 976760001 fd Linux raid auto=
detect

Disk /dev/sde: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units =3D cylinders of 16065 * 512 =3D 8225280 bytes
Disk identifier: 0x00000000

Device Boot Start End Blocks Id System
/dev/sde1 1 121601 976760001 fd Linux raid auto=
detect

Disk /dev/sdf: 1000.2 GB, 1000204886016 bytes
255 heads, 63 sectors/track, 121601 cylinders
Units =3D cylinders of 16065 * 512 =3D 8225280 bytes
Disk identifier: 0x0004c8a2

Device Boot Start End Blocks Id System
/dev/sdf1 * 1 121601 976760001 fd Linux raid auto=
detect
[root@tera tbostrom]#


On Thu, Sep 17, 2009 at 6:31 PM, Guy Watkins
<linux-raid@watkins-home.com> wrote:
> It is the way you list the drives. =A0Look at this command:
> # echo /dev/sd[bdce]1
> /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1
>
> Notice the output is not in the same order as in the command. =A0You =
should
> list each disk in the order you want. =A0Like this:
> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev=
/sde1
> missing
>
> I hope this helps.
>
> } -----Original Message-----
> } From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
> } owner@vger.kernel.org] On Behalf Of Tim Bostrom
> } Sent: Thursday, September 17, 2009 7:55 PM
> } To: linux-raid
> } Subject: Re: RAID 5 array recovery - two drives errors in external
> } enclosure
> }
> } It's still showing the order that you had previously posted: =A0[bc=
de]
> } (see log below)
> }
> } It appears that trying different permutations isn't yielding any
> } change. =A0I haven't tried every permutation, but are these command=
s
> } supposed to yield different effects? =A0They seem to always build t=
he
> } array as [bcde] no matter what. =A0Or should I be swapping around t=
he
> } cables on the drives?
> }
> } >> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bdce]1 missing
> } >> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bdec]1 missing
> } >> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[becd]1 missing
> }
> }
> } -Tim
> }
> } [root@tera ~]# mdadm --examine /dev/sdb1
> } /dev/sdb1:
> } =A0 =A0 =A0 =A0 =A0 Magic : a92b4efc
> } =A0 =A0 =A0 =A0 Version : 0.90.00
> } =A0 =A0 =A0 =A0 =A0 =A0UUID : 9fefb6ce:dcbfe649:f456b3f0:371e8bcc
> } =A0 Creation Time : Thu Sep 17 16:13:45 2009
> } =A0 =A0 =A0Raid Level : raid5
> } =A0 Used Dev Size : 976759808 (931.51 GiB 1000.20 GB)
> } =A0 =A0 =A0Array Size : 3907039232 (3726.04 GiB 4000.81 GB)
> } =A0 =A0Raid Devices : 5
> } =A0 Total Devices : 5
> } Preferred Minor : 0
> }
> } =A0 =A0 Update Time : Thu Sep 17 16:13:45 2009
> } =A0 =A0 =A0 =A0 =A0 State : clean
> } =A0Active Devices : 4
> } Working Devices : 4
> } =A0Failed Devices : 1
> } =A0 Spare Devices : 0
> } =A0 =A0 =A0 =A0Checksum : 20f1deab - correct
> } =A0 =A0 =A0 =A0 =A0Events : 1
> }
> } =A0 =A0 =A0 =A0 =A0Layout : left-symmetric
> } =A0 =A0 =A0Chunk Size : 256K
> }
> } =A0 =A0 =A0 Number =A0 Major =A0 Minor =A0 RaidDevice State
> } this =A0 =A0 0 =A0 =A0 =A0 8 =A0 =A0 =A0 17 =A0 =A0 =A0 =A00 =A0 =A0=
=A0active sync =A0 /dev/sdb1
> }
> } =A0 =A00 =A0 =A0 0 =A0 =A0 =A0 8 =A0 =A0 =A0 17 =A0 =A0 =A0 =A00 =A0=
=A0 =A0active sync =A0 /dev/sdb1
> } =A0 =A01 =A0 =A0 1 =A0 =A0 =A0 8 =A0 =A0 =A0 33 =A0 =A0 =A0 =A01 =A0=
=A0 =A0active sync =A0 /dev/sdc1
> } =A0 =A02 =A0 =A0 2 =A0 =A0 =A0 8 =A0 =A0 =A0 49 =A0 =A0 =A0 =A02 =A0=
=A0 =A0active sync =A0 /dev/sdd1
> } =A0 =A03 =A0 =A0 3 =A0 =A0 =A0 8 =A0 =A0 =A0 65 =A0 =A0 =A0 =A03 =A0=
=A0 =A0active sync =A0 /dev/sde1
> } =A0 =A04 =A0 =A0 4 =A0 =A0 =A0 0 =A0 =A0 =A0 =A00 =A0 =A0 =A0 =A04 =
=A0 =A0 =A0faulty
> }
> }
> }
> } On Thu, Sep 17, 2009 at 4:28 PM, Majed B. <majedb@gmail.com> wrote:
> } > Before creating the array, did you re-examine the disks with mdad=
m and
> } > made sure of each disk's position in the array?
> } >
> } > After your recabling, the disk names may have changed again.
> } >
> } > mdadm --examine /dev/sdb1
> } >
> } > =A0 =A0 =A0Number =A0 Major =A0 Minor =A0 RaidDevice State
> } > this =A0 =A0 7 =A0 =A0 =A0 8 =A0 =A0 =A0 17 =A0 =A0 =A0 =A07 =A0 =
=A0 =A0active sync =A0 /dev/sdb1
> } >
> } > =A0 0 =A0 =A0 0 =A0 =A0 =A0 8 =A0 =A0 =A0113 =A0 =A0 =A0 =A00 =A0=
=A0 =A0active sync =A0 /dev/sdh1
> } > =A0 1 =A0 =A0 1 =A0 =A0 =A0 8 =A0 =A0 =A0 97 =A0 =A0 =A0 =A01 =A0=
=A0 =A0active sync =A0 /dev/sdg1
> } > =A0 2 =A0 =A0 2 =A0 =A0 =A0 0 =A0 =A0 =A0 =A00 =A0 =A0 =A0 =A02 =A0=
=A0 =A0faulty removed
> } > =A0 3 =A0 =A0 3 =A0 =A0 =A0 0 =A0 =A0 =A0 =A00 =A0 =A0 =A0 =A03 =A0=
=A0 =A0faulty removed
> } > =A0 4 =A0 =A0 4 =A0 =A0 =A0 8 =A0 =A0 =A0 33 =A0 =A0 =A0 =A04 =A0=
=A0 =A0active sync =A0 /dev/sdc1
> } > =A0 5 =A0 =A0 5 =A0 =A0 =A0 8 =A0 =A0 =A0 65 =A0 =A0 =A0 =A05 =A0=
=A0 =A0active sync =A0 /dev/sde1
> } > =A0 6 =A0 =A0 6 =A0 =A0 =A0 8 =A0 =A0 =A0 49 =A0 =A0 =A0 =A06 =A0=
=A0 =A0active sync =A0 /dev/sdd1
> } > =A0 7 =A0 =A0 7 =A0 =A0 =A0 8 =A0 =A0 =A0 17 =A0 =A0 =A0 =A07 =A0=
=A0 =A0active sync =A0 /dev/sdb1
> } >
> } > (That's the output of an array I'm working on)
> } >
> } > Notice the first line: *this* and then the value of RaidDevice. T=
hat's
> } > the position of the partition in the array. 0 is first, 1 is seco=
nd,
> } > and so on.
> } >
> } > In my case, the order is: sdh1,sdg1,missing,missing,sdc1,sde1,sdd=
1,sdb1
> } >
> } > On Fri, Sep 18, 2009 at 2:11 AM, Tim Bostrom <tbostrom@gmail.com>=
wrote:
> } >> I re-cabled the drives so that they show up as the same drive le=
tter
> } >> as they were before when in the enclosure.
> } >>
> } >> I then went ahead and tried your idea of restarting the array. I=
tried
> } >> this first:
> } >>
> } >> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bcde]1 missing
> } >>
> } >> mount -o ro /dev/md0 /mnt/teradata
> } >>
> } >> /var/log/messages:
> } >> -----------------
> } >> Sep 17 16:07:09 tera kernel: md: bind<sdb1>
> } >> Sep 17 16:07:09 tera kernel: md: bind<sdc1>
> } >> Sep 17 16:07:09 tera kernel: md: bind<sdd1>
> } >> Sep 17 16:07:09 tera kernel: md: bind<sde1>
> } >> Sep 17 16:07:09 tera kernel: raid5: device sde1 operational as r=
aid
> } disk 3
> } >> Sep 17 16:07:09 tera kernel: raid5: device sdd1 operational as r=
aid
> } disk 2
> } >> Sep 17 16:07:09 tera kernel: raid5: device sdc1 operational as r=
aid
> } disk 1
> } >> Sep 17 16:07:09 tera kernel: raid5: device sdb1 operational as r=
aid
> } disk 0
> } >> Sep 17 16:07:09 tera kernel: raid5: allocated 5268kB for md0
> } >> Sep 17 16:07:09 tera kernel: raid5: raid level 5 set md0 active =
with 4
> } >> out of 5 devices, algorithm 2
> } >> Sep 17 16:07:09 tera kernel: RAID5 conf printout:
> } >> Sep 17 16:07:09 tera kernel: --- rd:5 wd:4
> } >> Sep 17 16:07:09 tera kernel: disk 0, o:1, dev:sdb1
> } >> Sep 17 16:07:09 tera kernel: disk 1, o:1, dev:sdc1
> } >> Sep 17 16:07:09 tera kernel: disk 2, o:1, dev:sdd1
> } >> Sep 17 16:07:09 tera kernel: disk 3, o:1, dev:sde1
> } >> Sep 17 16:07:56 tera kernel: EXT3-fs error (device md0):
> } >> ext3_check_descriptors: Block bitmap for group 8064 not in group
> } >> (block 532677632)!
> } >> Sep 17 16:07:56 tera kernel: EXT3-fs: group descriptors corrupte=
d!
> } >> --------------------------------
> } >>
> } >>
> } >> I then tried a few more permutations of the command:
> } >> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bdce]1 missing
> } >> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bdec]1 missing
> } >> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[becd]1 missing
> } >>
> } >> Every time I changed the order, it would still print the order t=
he
> } >> same in the log:
> } >>
> } >> Sep 17 16:02:52 tera kernel: md: bind<sdb1>
> } >> Sep 17 16:02:52 tera kernel: md: bind<sdc1>
> } >> Sep 17 16:02:52 tera kernel: md: bind<sdd1>
> } >> Sep 17 16:02:52 tera kernel: md: bind<sde1>
> } >> Sep 17 16:02:52 tera kernel: raid5: device sde1 operational as r=
aid
> } disk 3
> } >> Sep 17 16:02:52 tera kernel: raid5: device sdd1 operational as r=
aid
> } disk 2
> } >> Sep 17 16:02:52 tera kernel: raid5: device sdc1 operational as r=
aid
> } disk 1
> } >> Sep 17 16:02:52 tera kernel: raid5: device sdb1 operational as r=
aid
> } disk 0
> } >> Sep 17 16:02:52 tera kernel: raid5: allocated 5268kB for md0
> } >> Sep 17 16:02:52 tera kernel: raid5: raid level 5 set md0 active =
with 4
> } >> out of 5 devices, algorithm 2
> } >> Sep 17 16:02:52 tera kernel: RAID5 conf printout:
> } >> Sep 17 16:02:52 tera kernel: --- rd:5 wd:4
> } >> Sep 17 16:02:52 tera kernel: disk 0, o:1, dev:sdb1
> } >> Sep 17 16:02:52 tera kernel: disk 1, o:1, dev:sdc1
> } >> Sep 17 16:02:52 tera kernel: disk 2, o:1, dev:sdd1
> } >> Sep 17 16:02:52 tera kernel: disk 3, o:1, dev:sde1
> } >>
> } >>
> } >>
> } >> Am I doing something wrong?
> } >>
> } >>
> } >>
> } >>
> } >> On Thu, Sep 17, 2009 at 2:22 PM, Robin Hill <robin@robinhill.me.=
uk>
> } wrote:
> } >>> On Thu Sep 17, 2009 at 01:42:30PM -0700, Tim Bostrom wrote:
> } >>>
> } >>>> OK,
> } >>>>
> } >>>> Let me start off by saying - I panicked. =A0Rule #1 - don't pa=
nic. =A0I
> } >>>> did. =A0Sorry.
> } >>>>
> } >>>> I have a RAID 5 array running on Fedora 10.
> } >>>> (Linux tera.teambostrom.com 2.6.27.30-170.2.82.fc10.i686 #1 SM=
P Mon
> } >>>> Aug 17 08:38:59 EDT 2009 i686 athlon i386 GNU/Linux)
> } >>>>
> } >>>> 5 drives in an external enclosure (AMS eSATA Venus T5). =A0It'=
s a
> } >>>> Sil4726 inside the enclosure running to a Sil3132 controller v=
ia
> } eSATA
> } >>>> in the desktop. =A0I had been running this setup for just over=
a year.
> } >>>> Was working fine. =A0 I just moved into a new home and had my =
server
> } >>>> down for a while =A0- before I brought it back online, I got a=
"great
> } >>>> idea" to blow out the dust from the enclosure using compressed=
air.
> } >>>> When I finally brought up the array again, I noticed that driv=
es were
> } >>>> missing. =A0Tried re-adding the drives to the array and had so=
me issues
> } >>>> - they seemed to get added but after a short time of rebuildin=
g the
> } >>>> array, I would get a bunch of HW resets in dmesg and then the =
array
> } >>>> would kick out drives and stop.
> } >>>>
> } >>> <- much snippage ->
> } >>>
> } >>>> I popped the drives out of the enclosure and into the actual t=
ower
> } >>>> case and connected each of them to its own SATA port. =A0The H=
W resets
> } >>>> seemed to go away, but I couldn't get the array to come back o=
nline.
> } >>>> =A0Then I did the stupid panic (following someone's advice I s=
houldn't
> } >>>> have).
> } >>>>
> } >>>> thinking I should just re-create the array, I did:
> } >>>>
> } >>>> mdadm --create /dev/md0 --level=3D5 --raid-devices=3D5 /dev/sd=
[b-f]1
> } >>>>
> } >>>> Stupid me again - ignores the warning that it belongs to an ar=
ray
> } >>>> already. =A0I let it build for a minute or so and then tried t=
o mount
> } it
> } >>>> while rebuilding... and got error messages:
> } >>>>
> } >>>> EXT3-fs: unable to read superblock
> } >>>> EXT3-fs: md0: couldn't mount because of unsupported optional f=
eatures
> } >>>> (3fd18e00).
> } >>>>
> } >>>> Now - I'm at a loss. =A0I'm afraid to do anything else. =A0 I'=
ve been
> } >>>> viewing the FAQ and I have a few ideas, but I'm just more frea=
ked.
> } =A0Is
> } >>>> there any hope? =A0What should I do next without causing more =
trouble?
> } >>>>
> } >>> Looking at the mdadm output, there's a couple of possible error=
s.
> } >>> Firstly, your newly created array has a different chunksize tha=
n your
> } >>> original one. =A0Secondly, the drives may be in the wrong order=
=2E =A0In
> } >>> either case, providing you don't _actually_ have any faulty dri=
ves,
> } then
> } >>> it should be (mostly) recoverable.
> } >>>
> } >>> Given the order you specified the drives in the create, sdf1 wi=
ll be
> } the
> } >>> partition that's been trashed by the rebuild, so you'll want to=
leave
> } >>> that out altogether for now.
> } >>>
> } >>> You need to try to recreate the array with the correct chunk si=
ze and
> } >>> with the remaining drives in different orders, running a read-o=
nly
> } >>> filesystem check each time until you find the correct order.
> } >>>
> } >>> So start with:
> } >>> =A0 =A0mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bcde]1 missin=
g
> } >>>
> } >>> Then repeat for every possible order of the four disks and "mis=
sing",
> } >>> stopping the array each time if the mount fails.
> } >>>
> } >>> When you've finally found the correct order, you can re-add sdf=
1 to
> } get
> } >>> the array back to normal.
> } >>>
> } >>> HTH,
> } >>> =A0 =A0Robin
> } >>> --
> } >>> =A0 =A0 ___
> } >>> =A0 =A0( ' } =A0 =A0 | =A0 =A0 =A0 Robin Hill =A0 =A0 =A0 =A0<r=
obin@robinhill.me.uk> |
> } >>> =A0 / / ) =A0 =A0 =A0| Little Jim says .... =A0 =A0 =A0 =A0 =A0=
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0|
> } >>> =A0// !! =A0 =A0 =A0 | =A0 =A0 =A0"He fallen in de water !!" =A0=
=A0 =A0 =A0 =A0 =A0 =A0 =A0 |
> } >>>
> } >>
> } >>
> } >>
> } >> --
> } >> -tim
> } >> --
> } >> To unsubscribe from this list: send the line "unsubscribe linux-=
raid"
> } in
> } >> the body of a message to majordomo@vger.kernel.org
> } >> More majordomo info at =A0http://vger.kernel.org/majordomo-info.=
html
> } >>
> } >
> } >
> } >
> } > --
> } > =A0 =A0 =A0 Majed B.
> } >
> }
> }
> }
> } --
> } -tim
> } --
> } To unsubscribe from this list: send the line "unsubscribe linux-rai=
d" in
> } the body of a message to majordomo@vger.kernel.org
> } More majordomo info at =A0http://vger.kernel.org/majordomo-info.htm=
l
>
>



--=20
-tim
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Report this message

#12: Re: RAID 5 array recovery - two drives errors in external enclosure

Posted on 2009-09-18 06:19:58 by Tim Bostrom

This seemed to work, though I'm still working through the permutations
of the drive letters.

I noticed that mdadm think that partition sde1 is ext2 filesystem on
it. See below:

[root@tera tbostrom]# mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sdb1
/dev/sdd1 /dev/sdc1 /dev/sde1 missing
mdadm: /dev/sdb1 appears to be part of a raid array:
level=3Draid5 devices=3D5 ctime=3DThu Sep 17 21:13:21 2009
mdadm: /dev/sdd1 appears to be part of a raid array:
level=3Draid5 devices=3D5 ctime=3DThu Sep 17 21:13:21 2009
mdadm: /dev/sdc1 appears to be part of a raid array:
level=3Draid5 devices=3D5 ctime=3DThu Sep 17 21:13:21 2009
mdadm: /dev/sde1 appears to contain an ext2fs file system
size=3D396408836K mtime=3DMon Sep 21 03:41:16 2026
mdadm: /dev/sde1 appears to be part of a raid array:
level=3Draid5 devices=3D5 ctime=3DThu Sep 17 21:13:21 2009
Continue creating array?

What gives? I tried popping sdf1 in there without creating the array
- just to see what would happen and it thinks that sdf1 has ext2 as
well.

root@tera tbostrom]# mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sdc1
/dev/sdb1 /dev/sde1 /dev/sdf1 missing
mdadm: /dev/sdc1 appears to be part of a raid array:
level=3Draid5 devices=3D5 ctime=3DThu Sep 17 21:13:21 2009
mdadm: /dev/sdb1 appears to be part of a raid array:
level=3Draid5 devices=3D5 ctime=3DThu Sep 17 21:13:21 2009
mdadm: /dev/sde1 appears to contain an ext2fs file system
size=3D396408836K mtime=3DMon Sep 21 03:41:16 2026
mdadm: /dev/sde1 appears to be part of a raid array:
level=3Draid5 devices=3D5 ctime=3DThu Sep 17 21:13:21 2009
mdadm: /dev/sdf1 appears to contain an ext2fs file system
size=3D-387928064K mtime=3DWed Sep 16 16:26:42 2009
mdadm: /dev/sdf1 appears to be part of a raid array:
level=3Draid5 devices=3D5 ctime=3DThu Sep 17 20:54:33 2009
Continue creating array? no
mdadm: create aborted.


Still at a loss here. I haven't worked through all the drive
permutations. In the meantime, I'll try that. Does it make sense to
try sdf1 in the permutation since the drive letters may have changed
since moving from the enclosure? I thought I put them back in the
same order as the enclosure.

-Tim


On Thu, Sep 17, 2009 at 6:31 PM, Guy Watkins
<linux-raid@watkins-home.com> wrote:
> It is the way you list the drives. =A0Look at this command:
> # echo /dev/sd[bdce]1
> /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1
>
> Notice the output is not in the same order as in the command. =A0You =
should
> list each disk in the order you want. =A0Like this:
> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev=
/sde1
> missing
>
> I hope this helps.
>
> } -----Original Message-----
> } From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
> } owner@vger.kernel.org] On Behalf Of Tim Bostrom
> } Sent: Thursday, September 17, 2009 7:55 PM
> } To: linux-raid
> } Subject: Re: RAID 5 array recovery - two drives errors in external
> } enclosure
> }
> } It's still showing the order that you had previously posted: =A0[bc=
de]
> } (see log below)
> }
> } It appears that trying different permutations isn't yielding any
> } change. =A0I haven't tried every permutation, but are these command=
s
> } supposed to yield different effects? =A0They seem to always build t=
he
> } array as [bcde] no matter what. =A0Or should I be swapping around t=
he
> } cables on the drives?
> }
> } >> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bdce]1 missing
> } >> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bdec]1 missing
> } >> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[becd]1 missing
> }
> }
> } -Tim
> }
> } [root@tera ~]# mdadm --examine /dev/sdb1
> } /dev/sdb1:
> } =A0 =A0 =A0 =A0 =A0 Magic : a92b4efc
> } =A0 =A0 =A0 =A0 Version : 0.90.00
> } =A0 =A0 =A0 =A0 =A0 =A0UUID : 9fefb6ce:dcbfe649:f456b3f0:371e8bcc
> } =A0 Creation Time : Thu Sep 17 16:13:45 2009
> } =A0 =A0 =A0Raid Level : raid5
> } =A0 Used Dev Size : 976759808 (931.51 GiB 1000.20 GB)
> } =A0 =A0 =A0Array Size : 3907039232 (3726.04 GiB 4000.81 GB)
> } =A0 =A0Raid Devices : 5
> } =A0 Total Devices : 5
> } Preferred Minor : 0
> }
> } =A0 =A0 Update Time : Thu Sep 17 16:13:45 2009
> } =A0 =A0 =A0 =A0 =A0 State : clean
> } =A0Active Devices : 4
> } Working Devices : 4
> } =A0Failed Devices : 1
> } =A0 Spare Devices : 0
> } =A0 =A0 =A0 =A0Checksum : 20f1deab - correct
> } =A0 =A0 =A0 =A0 =A0Events : 1
> }
> } =A0 =A0 =A0 =A0 =A0Layout : left-symmetric
> } =A0 =A0 =A0Chunk Size : 256K
> }
> } =A0 =A0 =A0 Number =A0 Major =A0 Minor =A0 RaidDevice State
> } this =A0 =A0 0 =A0 =A0 =A0 8 =A0 =A0 =A0 17 =A0 =A0 =A0 =A00 =A0 =A0=
=A0active sync =A0 /dev/sdb1
> }
> } =A0 =A00 =A0 =A0 0 =A0 =A0 =A0 8 =A0 =A0 =A0 17 =A0 =A0 =A0 =A00 =A0=
=A0 =A0active sync =A0 /dev/sdb1
> } =A0 =A01 =A0 =A0 1 =A0 =A0 =A0 8 =A0 =A0 =A0 33 =A0 =A0 =A0 =A01 =A0=
=A0 =A0active sync =A0 /dev/sdc1
> } =A0 =A02 =A0 =A0 2 =A0 =A0 =A0 8 =A0 =A0 =A0 49 =A0 =A0 =A0 =A02 =A0=
=A0 =A0active sync =A0 /dev/sdd1
> } =A0 =A03 =A0 =A0 3 =A0 =A0 =A0 8 =A0 =A0 =A0 65 =A0 =A0 =A0 =A03 =A0=
=A0 =A0active sync =A0 /dev/sde1
> } =A0 =A04 =A0 =A0 4 =A0 =A0 =A0 0 =A0 =A0 =A0 =A00 =A0 =A0 =A0 =A04 =
=A0 =A0 =A0faulty
> }
> }
> }
> } On Thu, Sep 17, 2009 at 4:28 PM, Majed B. <majedb@gmail.com> wrote:
> } > Before creating the array, did you re-examine the disks with mdad=
m and
> } > made sure of each disk's position in the array?
> } >
> } > After your recabling, the disk names may have changed again.
> } >
> } > mdadm --examine /dev/sdb1
> } >
> } > =A0 =A0 =A0Number =A0 Major =A0 Minor =A0 RaidDevice State
> } > this =A0 =A0 7 =A0 =A0 =A0 8 =A0 =A0 =A0 17 =A0 =A0 =A0 =A07 =A0 =
=A0 =A0active sync =A0 /dev/sdb1
> } >
> } > =A0 0 =A0 =A0 0 =A0 =A0 =A0 8 =A0 =A0 =A0113 =A0 =A0 =A0 =A00 =A0=
=A0 =A0active sync =A0 /dev/sdh1
> } > =A0 1 =A0 =A0 1 =A0 =A0 =A0 8 =A0 =A0 =A0 97 =A0 =A0 =A0 =A01 =A0=
=A0 =A0active sync =A0 /dev/sdg1
> } > =A0 2 =A0 =A0 2 =A0 =A0 =A0 0 =A0 =A0 =A0 =A00 =A0 =A0 =A0 =A02 =A0=
=A0 =A0faulty removed
> } > =A0 3 =A0 =A0 3 =A0 =A0 =A0 0 =A0 =A0 =A0 =A00 =A0 =A0 =A0 =A03 =A0=
=A0 =A0faulty removed
> } > =A0 4 =A0 =A0 4 =A0 =A0 =A0 8 =A0 =A0 =A0 33 =A0 =A0 =A0 =A04 =A0=
=A0 =A0active sync =A0 /dev/sdc1
> } > =A0 5 =A0 =A0 5 =A0 =A0 =A0 8 =A0 =A0 =A0 65 =A0 =A0 =A0 =A05 =A0=
=A0 =A0active sync =A0 /dev/sde1
> } > =A0 6 =A0 =A0 6 =A0 =A0 =A0 8 =A0 =A0 =A0 49 =A0 =A0 =A0 =A06 =A0=
=A0 =A0active sync =A0 /dev/sdd1
> } > =A0 7 =A0 =A0 7 =A0 =A0 =A0 8 =A0 =A0 =A0 17 =A0 =A0 =A0 =A07 =A0=
=A0 =A0active sync =A0 /dev/sdb1
> } >
> } > (That's the output of an array I'm working on)
> } >
> } > Notice the first line: *this* and then the value of RaidDevice. T=
hat's
> } > the position of the partition in the array. 0 is first, 1 is seco=
nd,
> } > and so on.
> } >
> } > In my case, the order is: sdh1,sdg1,missing,missing,sdc1,sde1,sdd=
1,sdb1
> } >
> } > On Fri, Sep 18, 2009 at 2:11 AM, Tim Bostrom <tbostrom@gmail.com>=
wrote:
> } >> I re-cabled the drives so that they show up as the same drive le=
tter
> } >> as they were before when in the enclosure.
> } >>
> } >> I then went ahead and tried your idea of restarting the array. I=
tried
> } >> this first:
> } >>
> } >> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bcde]1 missing
> } >>
> } >> mount -o ro /dev/md0 /mnt/teradata
> } >>
> } >> /var/log/messages:
> } >> -----------------
> } >> Sep 17 16:07:09 tera kernel: md: bind<sdb1>
> } >> Sep 17 16:07:09 tera kernel: md: bind<sdc1>
> } >> Sep 17 16:07:09 tera kernel: md: bind<sdd1>
> } >> Sep 17 16:07:09 tera kernel: md: bind<sde1>
> } >> Sep 17 16:07:09 tera kernel: raid5: device sde1 operational as r=
aid
> } disk 3
> } >> Sep 17 16:07:09 tera kernel: raid5: device sdd1 operational as r=
aid
> } disk 2
> } >> Sep 17 16:07:09 tera kernel: raid5: device sdc1 operational as r=
aid
> } disk 1
> } >> Sep 17 16:07:09 tera kernel: raid5: device sdb1 operational as r=
aid
> } disk 0
> } >> Sep 17 16:07:09 tera kernel: raid5: allocated 5268kB for md0
> } >> Sep 17 16:07:09 tera kernel: raid5: raid level 5 set md0 active =
with 4
> } >> out of 5 devices, algorithm 2
> } >> Sep 17 16:07:09 tera kernel: RAID5 conf printout:
> } >> Sep 17 16:07:09 tera kernel: --- rd:5 wd:4
> } >> Sep 17 16:07:09 tera kernel: disk 0, o:1, dev:sdb1
> } >> Sep 17 16:07:09 tera kernel: disk 1, o:1, dev:sdc1
> } >> Sep 17 16:07:09 tera kernel: disk 2, o:1, dev:sdd1
> } >> Sep 17 16:07:09 tera kernel: disk 3, o:1, dev:sde1
> } >> Sep 17 16:07:56 tera kernel: EXT3-fs error (device md0):
> } >> ext3_check_descriptors: Block bitmap for group 8064 not in group
> } >> (block 532677632)!
> } >> Sep 17 16:07:56 tera kernel: EXT3-fs: group descriptors corrupte=
d!
> } >> --------------------------------
> } >>
> } >>
> } >> I then tried a few more permutations of the command:
> } >> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bdce]1 missing
> } >> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bdec]1 missing
> } >> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[becd]1 missing
> } >>
> } >> Every time I changed the order, it would still print the order t=
he
> } >> same in the log:
> } >>
> } >> Sep 17 16:02:52 tera kernel: md: bind<sdb1>
> } >> Sep 17 16:02:52 tera kernel: md: bind<sdc1>
> } >> Sep 17 16:02:52 tera kernel: md: bind<sdd1>
> } >> Sep 17 16:02:52 tera kernel: md: bind<sde1>
> } >> Sep 17 16:02:52 tera kernel: raid5: device sde1 operational as r=
aid
> } disk 3
> } >> Sep 17 16:02:52 tera kernel: raid5: device sdd1 operational as r=
aid
> } disk 2
> } >> Sep 17 16:02:52 tera kernel: raid5: device sdc1 operational as r=
aid
> } disk 1
> } >> Sep 17 16:02:52 tera kernel: raid5: device sdb1 operational as r=
aid
> } disk 0
> } >> Sep 17 16:02:52 tera kernel: raid5: allocated 5268kB for md0
> } >> Sep 17 16:02:52 tera kernel: raid5: raid level 5 set md0 active =
with 4
> } >> out of 5 devices, algorithm 2
> } >> Sep 17 16:02:52 tera kernel: RAID5 conf printout:
> } >> Sep 17 16:02:52 tera kernel: --- rd:5 wd:4
> } >> Sep 17 16:02:52 tera kernel: disk 0, o:1, dev:sdb1
> } >> Sep 17 16:02:52 tera kernel: disk 1, o:1, dev:sdc1
> } >> Sep 17 16:02:52 tera kernel: disk 2, o:1, dev:sdd1
> } >> Sep 17 16:02:52 tera kernel: disk 3, o:1, dev:sde1
> } >>
> } >>
> } >>
> } >> Am I doing something wrong?
> } >>
> } >>
> } >>
> } >>
> } >> On Thu, Sep 17, 2009 at 2:22 PM, Robin Hill <robin@robinhill.me.=
uk>
> } wrote:
> } >>> On Thu Sep 17, 2009 at 01:42:30PM -0700, Tim Bostrom wrote:
> } >>>
> } >>>> OK,
> } >>>>
> } >>>> Let me start off by saying - I panicked. =A0Rule #1 - don't pa=
nic. =A0I
> } >>>> did. =A0Sorry.
> } >>>>
> } >>>> I have a RAID 5 array running on Fedora 10.
> } >>>> (Linux tera.teambostrom.com 2.6.27.30-170.2.82.fc10.i686 #1 SM=
P Mon
> } >>>> Aug 17 08:38:59 EDT 2009 i686 athlon i386 GNU/Linux)
> } >>>>
> } >>>> 5 drives in an external enclosure (AMS eSATA Venus T5). =A0It'=
s a
> } >>>> Sil4726 inside the enclosure running to a Sil3132 controller v=
ia
> } eSATA
> } >>>> in the desktop. =A0I had been running this setup for just over=
a year.
> } >>>> Was working fine. =A0 I just moved into a new home and had my =
server
> } >>>> down for a while =A0- before I brought it back online, I got a=
"great
> } >>>> idea" to blow out the dust from the enclosure using compressed=
air.
> } >>>> When I finally brought up the array again, I noticed that driv=
es were
> } >>>> missing. =A0Tried re-adding the drives to the array and had so=
me issues
> } >>>> - they seemed to get added but after a short time of rebuildin=
g the
> } >>>> array, I would get a bunch of HW resets in dmesg and then the =
array
> } >>>> would kick out drives and stop.
> } >>>>
> } >>> <- much snippage ->
> } >>>
> } >>>> I popped the drives out of the enclosure and into the actual t=
ower
> } >>>> case and connected each of them to its own SATA port. =A0The H=
W resets
> } >>>> seemed to go away, but I couldn't get the array to come back o=
nline.
> } >>>> =A0Then I did the stupid panic (following someone's advice I s=
houldn't
> } >>>> have).
> } >>>>
> } >>>> thinking I should just re-create the array, I did:
> } >>>>
> } >>>> mdadm --create /dev/md0 --level=3D5 --raid-devices=3D5 /dev/sd=
[b-f]1
> } >>>>
> } >>>> Stupid me again - ignores the warning that it belongs to an ar=
ray
> } >>>> already. =A0I let it build for a minute or so and then tried t=
o mount
> } it
> } >>>> while rebuilding... and got error messages:
> } >>>>
> } >>>> EXT3-fs: unable to read superblock
> } >>>> EXT3-fs: md0: couldn't mount because of unsupported optional f=
eatures
> } >>>> (3fd18e00).
> } >>>>
> } >>>> Now - I'm at a loss. =A0I'm afraid to do anything else. =A0 I'=
ve been
> } >>>> viewing the FAQ and I have a few ideas, but I'm just more frea=
ked.
> } =A0Is
> } >>>> there any hope? =A0What should I do next without causing more =
trouble?
> } >>>>
> } >>> Looking at the mdadm output, there's a couple of possible error=
s.
> } >>> Firstly, your newly created array has a different chunksize tha=
n your
> } >>> original one. =A0Secondly, the drives may be in the wrong order=
=2E =A0In
> } >>> either case, providing you don't _actually_ have any faulty dri=
ves,
> } then
> } >>> it should be (mostly) recoverable.
> } >>>
> } >>> Given the order you specified the drives in the create, sdf1 wi=
ll be
> } the
> } >>> partition that's been trashed by the rebuild, so you'll want to=
leave
> } >>> that out altogether for now.
> } >>>
> } >>> You need to try to recreate the array with the correct chunk si=
ze and
> } >>> with the remaining drives in different orders, running a read-o=
nly
> } >>> filesystem check each time until you find the correct order.
> } >>>
> } >>> So start with:
> } >>> =A0 =A0mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sd[bcde]1 missin=
g
> } >>>
> } >>> Then repeat for every possible order of the four disks and "mis=
sing",
> } >>> stopping the array each time if the mount fails.
> } >>>
> } >>> When you've finally found the correct order, you can re-add sdf=
1 to
> } get
> } >>> the array back to normal.
> } >>>
> } >>> HTH,
> } >>> =A0 =A0Robin
> } >>> --
> } >>> =A0 =A0 ___
> } >>> =A0 =A0( ' } =A0 =A0 | =A0 =A0 =A0 Robin Hill =A0 =A0 =A0 =A0<r=
obin@robinhill.me.uk> |
> } >>> =A0 / / ) =A0 =A0 =A0| Little Jim says .... =A0 =A0 =A0 =A0 =A0=
=A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0|
> } >>> =A0// !! =A0 =A0 =A0 | =A0 =A0 =A0"He fallen in de water !!" =A0=
=A0 =A0 =A0 =A0 =A0 =A0 =A0 |
> } >>>
> } >>
> } >>
> } >>
> } >> --
> } >> -tim
> } >> --
> } >> To unsubscribe from this list: send the line "unsubscribe linux-=
raid"
> } in
> } >> the body of a message to majordomo@vger.kernel.org
> } >> More majordomo info at =A0http://vger.kernel.org/majordomo-info.=
html
> } >>
> } >
> } >
> } >
> } > --
> } > =A0 =A0 =A0 Majed B.
> } >
> }
> }
> }
> } --
> } -tim
> } --
> } To unsubscribe from this list: send the line "unsubscribe linux-rai=
d" in
> } the body of a message to majordomo@vger.kernel.org
> } More majordomo info at =A0http://vger.kernel.org/majordomo-info.htm=
l
>
>



--=20
-tim
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Report this message

#13: Re: RAID 5 array recovery - two drives errors in external enclosure

Posted on 2009-09-18 09:58:23 by Robin Hill

--wRRV7LY7NUeQGEoC
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri Sep 18, 2009 at 01:11:14AM +0300, Majed B. wrote:

> If you run mdadm --examine /dev/sda you'll be able to see the disks'
> order in the array (and the position of the disk you're
> querying/examining). The faulty one is previously known as sdf. You
> can find its new name by running --examine on all disks, and the one
> that shows that all disks are healthy, is sdf.
>=20
Ideally, yes, but having run --create already, the RAID metadata no
longer has any significance to the previous array configuration.

Cheers,
Robin
--=20
___ =20
( ' } | Robin Hill <robin@robinhill.me.uk> |
/ / ) | Little Jim says .... |
// !! | "He fallen in de water !!" |

--wRRV7LY7NUeQGEoC
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.11 (GNU/Linux)

iEYEARECAAYFAkqzPZ4ACgkQShxCyD40xBIMhQCfYyr+7cxVFXBMoCUPB6ZF BFsJ
5skAmwRYF8m5W3GZXUt5aXjWCv7Hg2MY
=aNbZ
-----END PGP SIGNATURE-----

--wRRV7LY7NUeQGEoC--
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Report this message

#14: Re: RAID 5 array recovery - two drives errors in external enclosure

Posted on 2009-09-18 10:07:06 by Robin Hill

--mxv5cy4qt+RJ9ypb
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thu Sep 17, 2009 at 09:19:58PM -0700, Tim Bostrom wrote:

> This seemed to work, though I'm still working through the permutations
> of the drive letters.
>=20
> I noticed that mdadm think that partition sde1 is ext2 filesystem on
> it. See below:
>=20
> [root@tera tbostrom]# mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sdb1
> /dev/sdd1 /dev/sdc1 /dev/sde1 missing
> mdadm: /dev/sdb1 appears to be part of a raid array:
> level=3Draid5 devices=3D5 ctime=3DThu Sep 17 21:13:21 2009
> mdadm: /dev/sdd1 appears to be part of a raid array:
> level=3Draid5 devices=3D5 ctime=3DThu Sep 17 21:13:21 2009
> mdadm: /dev/sdc1 appears to be part of a raid array:
> level=3Draid5 devices=3D5 ctime=3DThu Sep 17 21:13:21 2009
> mdadm: /dev/sde1 appears to contain an ext2fs file system
> size=3D396408836K mtime=3DMon Sep 21 03:41:16 2026
> mdadm: /dev/sde1 appears to be part of a raid array:
> level=3Draid5 devices=3D5 ctime=3DThu Sep 17 21:13:21 2009
> Continue creating array?
>=20
> What gives? I tried popping sdf1 in there without creating the array
> - just to see what would happen and it thinks that sdf1 has ext2 as
> well.
>=20
That would suggest that sde1 is the first disk in the array (I think).

> Still at a loss here. I haven't worked through all the drive
> permutations. In the meantime, I'll try that. Does it make sense to
> try sdf1 in the permutation since the drive letters may have changed
> since moving from the enclosure? I thought I put them back in the
> same order as the enclosure.
>=20
Definitely not - we know sdf1 has been re-written when you did the
initial --create after moving the drives out of the enclosure. This
means it _definitely_ has invalid data on it.

Unfortunately, without having the metadata information from the
_original_ array _after_ it has been moved from the enclosure, there's
no way of knowing what order the drives should be in. I _think_ that
sde1 will be the first disk (as it shows up as an ext2 filesystem), but
you'll really have to try every possible combination of the 5 devices
(the 4 partitions and "missing"). You may be best scripting this (or
searching to see whether someone's already done that) - there's 120
possible combinations to try.

Cheers,
Robin

--=20
___ =20
( ' } | Robin Hill <robin@robinhill.me.uk> |
/ / ) | Little Jim says .... |
// !! | "He fallen in de water !!" |

--mxv5cy4qt+RJ9ypb
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.11 (GNU/Linux)

iEYEARECAAYFAkqzP6kACgkQShxCyD40xBLBuACfZgNp+EID4n3UCh8pxDXY vx9B
7E0AoJcDj8ML02ipWa7BRwC0TAQiMtiX
=hiV5
-----END PGP SIGNATURE-----

--mxv5cy4qt+RJ9ypb--
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Report this message

#15: Re: RAID 5 array recovery - two drives errors in external enclosure

Posted on 2009-09-18 17:27:08 by Tim Bostrom

Is there any mdadm command that would have given any info or metadata
from the previous array? I have almost 9 months of SSH logging on my
laptop that I use to manage the array. Maybe there's some useful info
left in the log that might help.

-Tim

On Fri, Sep 18, 2009 at 1:07 AM, Robin Hill <robin@robinhill.me.uk> wro=
te:
> On Thu Sep 17, 2009 at 09:19:58PM -0700, Tim Bostrom wrote:
>
>> This seemed to work, though I'm still working through the permutatio=
ns
>> of the drive letters.
>>
>> I noticed that mdadm think that partition sde1 is ext2 filesystem on
>> it. =A0See below:
>>
>> [root@tera tbostrom]# mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sdb1
>> /dev/sdd1 /dev/sdc1 /dev/sde1 missing
>> mdadm: /dev/sdb1 appears to be part of a raid array:
>> =A0 =A0 level=3Draid5 devices=3D5 ctime=3DThu Sep 17 21:13:21 2009
>> mdadm: /dev/sdd1 appears to be part of a raid array:
>> =A0 =A0 level=3Draid5 devices=3D5 ctime=3DThu Sep 17 21:13:21 2009
>> mdadm: /dev/sdc1 appears to be part of a raid array:
>> =A0 =A0 level=3Draid5 devices=3D5 ctime=3DThu Sep 17 21:13:21 2009
>> mdadm: /dev/sde1 appears to contain an ext2fs file system
>> =A0 =A0 size=3D396408836K =A0mtime=3DMon Sep 21 03:41:16 2026
>> mdadm: /dev/sde1 appears to be part of a raid array:
>> =A0 =A0 level=3Draid5 devices=3D5 ctime=3DThu Sep 17 21:13:21 2009
>> Continue creating array?
>>
>> What gives? =A0I tried popping sdf1 in there without creating the ar=
ray
>> - just to see what would happen and it thinks that sdf1 has ext2 as
>> well.
>>
> That would suggest that sde1 is the first disk in the array (I think)=
=2E
>
>> Still at a loss here. =A0I haven't worked through all the drive
>> permutations. =A0In the meantime, I'll try that. =A0Does it make sen=
se to
>> try sdf1 in the permutation since the drive letters may have changed
>> since moving from the enclosure? =A0I thought I put them back in the
>> same order as the enclosure.
>>
> Definitely not - we know sdf1 has been re-written when you did the
> initial --create after moving the drives out of the enclosure. =A0Thi=
s
> means it _definitely_ has invalid data on it.
>
> Unfortunately, without having the metadata information from the
> _original_ array _after_ it has been moved from the enclosure, there'=
s
> no way of knowing what order the drives should be in. =A0I _think_ th=
at
> sde1 will be the first disk (as it shows up as an ext2 filesystem), b=
ut
> you'll really have to try every possible combination of the 5 devices
> (the 4 partitions and "missing"). =A0You may be best scripting this (=
or
> searching to see whether someone's already done that) - there's 120
> possible combinations to try.
>
> Cheers,
> =A0 =A0Robin
>
> --
> =A0 =A0 ___
> =A0 =A0( ' } =A0 =A0 | =A0 =A0 =A0 Robin Hill =A0 =A0 =A0 =A0<robin@r=
obinhill.me.uk> |
> =A0 / / ) =A0 =A0 =A0| Little Jim says .... =A0 =A0 =A0 =A0 =A0 =A0 =A0=
=A0 =A0 =A0 =A0 =A0 =A0 =A0|
> =A0// !! =A0 =A0 =A0 | =A0 =A0 =A0"He fallen in de water !!" =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 =A0 |
>



--=20
-tim
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Report this message

#16: Re: RAID 5 array recovery - two drives errors in external enclosure

Posted on 2009-09-18 17:41:36 by Robin Hill

--Qf1oXS95uex85X0R
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Fri Sep 18, 2009 at 08:27:08AM -0700, Tim Bostrom wrote:

> Is there any mdadm command that would have given any info or metadata
> from the previous array? I have almost 9 months of SSH logging on my
> laptop that I use to manage the array. Maybe there's some useful info
> left in the log that might help.
>=20
Not directly, no. The problem is that the mdadm commands report based
on drive/partition names as they are at the time (as that's what's
immediately useful). Unless you have anything to tie those
drive/partition names to a unique identifier for the drive (e.g.
/dev/disk/by-id, /dev/disk/by-uuid) then it's not going to be possible
to map the old drive/partition names to the current ones.

It's worth having a look anyway - see if you've got anything providing
drive serial number (smartctl output, listing of /dev/disk/by-id or
by-uuid). If the drives are different makes/models then the output of
lsscsi or /proc/scsi/scsi would also do.

Cheers,
Robin
--=20
___ =20
( ' } | Robin Hill <robin@robinhill.me.uk> |
/ / ) | Little Jim says .... |
// !! | "He fallen in de water !!" |

--Qf1oXS95uex85X0R
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.11 (GNU/Linux)

iEYEARECAAYFAkqzqi8ACgkQShxCyD40xBKAVgCeIlT5QeOka88zm5L2Chj3 Pzv3
vYoAoMlsE9UZOI+YJvJ7BszwaAMV0wJC
=6PEY
-----END PGP SIGNATURE-----

--Qf1oXS95uex85X0R--
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Report this message

#17: Re: RAID 5 array recovery - two drives errors in external enclosure

Posted on 2009-09-19 01:30:46 by Tim Bostrom

THANK YOU!

To all of you who helped with advice. I finally waded through 70MB of
log files from the last 6 months on my server and finally figured out
a way to get the correct order. I had a few logs of 'fdisk -l" which
contained the Disk Identifier. I matched that up against where I had
the drives in the enclosure and the ata.#'s. I came up with a short
list of drive letter ordering and just kept trying them with four
drives at a time moving the _missing_ drive around. Finally, it came
up.

I'm now up with four drives and RO filesystem. I'm going to try and
backup as much as possible off-disk. Then I'm thinking that I need to
add the "missing" drive back in and re-build.

Correct?

Thank you again.

Tim

On Fri, Sep 18, 2009 at 1:17 PM, Guy Watkins <guy@watkins-home.com> wro=
te:
> When you did the create, it trashed 1 drive.=A0 I don't know which on=
e.=A0 As
> long as you don't use that drive, your data should still be there.=A0=
By
> "trashed" I mean that it reconstructed the data on that drive, but th=
e data
> was not correct.=A0 As long as you aways have 1 drive missing when do=
ing the
> create, md will not write to the user area of the disks.=A0 Anyway, I=
think
> someone else indicated which drive that was, but maybe you don't real=
ly
> know.=A0 If that is the case, you must try all 5 drives in 4 of the 5=
slots!
> I think that would be 6*5*4*3*2 or 720 permutations.=A0 Do you want a=
new
> list?=A0 :)
-snip-

>
> On Fri, Sep 18, 2009 at 1:41 PM, Tim Bostrom <tbostrom@gmail.com> wro=
te:
>>
>> I had log output from my array before it started exhibiting issues a=
nd
>> dropping off drives in the external enclosure. =A0It said 256k in my=
old
>> mdadm -E output.
>>
>> I accidentally did a re-create of 5 drives for a few seconds with 64=
k
>> chunk size, but then stopped it after it wouldn't mount.
>>
>> i'm hoping I didn't completely destroy this thing. =A0I had a lot of
>> data on there.
>>
>>
>> -Tim
>>
>> On Fri, Sep 18, 2009 at 9:23 AM, Guy Watkins <guy@watkins-home.com> =
wrote:
>> > No problem.=A0 Hopefully you can modify the script I sent and make=
it
>> > generate
>> > the mdadm commands that you need.=A0 I assumed you could anyway.=A0=
If you
>> > need,
>> > I can make the changes for you.
>> >
>> > I forgot to ask.=A0 How do you know the correct chunk size?
>> >
>> >
>> >
>> > On Fri, Sep 18, 2009 at 12:12 PM, Tim Bostrom <tbostrom@gmail.com>
>> > wrote:
>> >>
>> >> Thank you - I was looking for a tool or script to get all the
>> >> permutations. =A0I'm not too good at doing it by hand. =A0:)
>> >>
>> >
>>
>>
>>
>> --
>> -tim
>
>



--=20
-tim
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Report this message

#18: Re: RAID 5 array recovery - two drives errors in external enclosure

Posted on 2009-09-19 01:37:59 by majedb

If you're going to take all your data out, I would suggest you do a
clean start and zero out all the disks to force the remapping of bad
sectors, then run smartctl -t offline on all disks and after it's done
(it will take A LONG time), create your filesystem on an array and put
back your data.

On Sat, Sep 19, 2009 at 2:30 AM, Tim Bostrom <tbostrom@gmail.com> wrote=
:
> THANK YOU!
>
> To all of you who helped with advice.  I finally waded through 7=
0MB of
> log files from the last 6 months on my server and finally figured out
> a way to get the correct order.  I had a few logs of 'fdisk -l" =
which
> contained the Disk Identifier.  I matched that up against where =
I had
> the drives in the enclosure and the ata.#'s.   I came up with a =
short
> list of drive letter ordering and just kept trying them with four
> drives at a time moving the _missing_ drive around.  Finally, it=
came
> up.
>
> I'm now up with four drives and RO filesystem.  I'm going to try=
and
> backup as much as possible off-disk.  Then I'm thinking that I n=
eed to
> add the "missing" drive back in and re-build.
>
> Correct?
>
> Thank you again.
>
> Tim
--=20
Majed B.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Report this message

#19: Re: RAID 5 array recovery - two drives errors in external enclosure

Posted on 2009-09-19 03:17:52 by Twigathy

2009/9/19 Majed B. <majedb@gmail.com>:
> If you're going to take all your data out, I would suggest you do a
> clean start and zero out all the disks to force the remapping of bad
> sectors, then run smartctl -t offline on all disks and after it's done
> (it will take A LONG time), create your filesystem on an array and put
> back your data.

I'd recommend running the badblocks program on each device to be put
in the array too, just to be certain that none of your disks are going
to go horribly flakey on assembly and build... I'm not sure if the
offline smart test does a full scan like badblocks would.

T
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Report this message

#20: RE: RAID 5 array recovery - two drives errors in external enclosure

Posted on 2009-09-19 03:35:33 by Guy Watkins

If you are starting over, maybe you should use RAID6?

} -----Original Message-----
} From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
} owner@vger.kernel.org] On Behalf Of Tom Carlson
} Sent: Friday, September 18, 2009 9:18 PM
} To: Majed B.
} Cc: Tim Bostrom; linux-raid
} Subject: Re: RAID 5 array recovery - two drives errors in external
} enclosure
}
} 2009/9/19 Majed B. <majedb@gmail.com>:
} > If you're going to take all your data out, I would suggest you do a
} > clean start and zero out all the disks to force the remapping of bad
} > sectors, then run smartctl -t offline on all disks and after it's done
} > (it will take A LONG time), create your filesystem on an array and put
} > back your data.
}
} I'd recommend running the badblocks program on each device to be put
} in the array too, just to be certain that none of your disks are going
} to go horribly flakey on assembly and build... I'm not sure if the
} offline smart test does a full scan like badblocks would.
}
} T
} --
} To unsubscribe from this list: send the line "unsubscribe linux-raid" in
} the body of a message to majordomo@vger.kernel.org
} More majordomo info at http://vger.kernel.org/majordomo-info.html

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Report this message

#21: Re: RAID 5 array recovery - two drives errors in external enclosure

Posted on 2009-09-21 06:40:30 by Tim Bostrom

Well, thank god I copied everything off the array this weekend, but str=
ange:

I had gotten the array up finally with the correct order and missing dr=
ive:

mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sdf1 /dev/sdc1 /dev/sdb1
/dev/sdd1 missing

------------
After copying everything off, I power-cycled my server and tried to
bring the array back up again using:

mdadm -A /dev/md0 /dev/sdf1 /dev/sdc1 /dev/sdb1 /dev/sdd1 missing

I received the error : mdadm: superblock on /dev/sdc1 doesn't match
others - assembly aborted.

This is strange since I had this seemingly working and was able to
copy all the data offline this weekend.
Drives haven't changed order - I haven't unplugged anything or changed
any cords.

Another issue of the command that worked before on the array:
mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sdf1 /dev/sdc1 /dev/sdb1
/dev/sdd1 missing

yields my old problem of not being able to mount.
EXT3-fs: md0: couldn't mount because of unsupported optional features
(3fd18e00).
mount: wrong fs type, bad option, bad superblock on /dev/md0,
missing codepage or helper program, or other error
In some cases useful info is found in syslog - try
dmesg | tail or so

What gives? The drive order couldn't have changed just through a
reboot. All the same number of drives and drive letters are there.


-Tim



On Fri, Sep 18, 2009 at 6:35 PM, Guy Watkins <guy@watkins-home.com> wro=
te:
> If you are starting over, maybe you should use RAID6?
>
> } -----Original Message-----
> } From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
> } owner@vger.kernel.org] On Behalf Of Tom Carlson
> } Sent: Friday, September 18, 2009 9:18 PM
> } To: Majed B.
> } Cc: Tim Bostrom; linux-raid
> } Subject: Re: RAID 5 array recovery - two drives errors in external
> } enclosure
> }
> } 2009/9/19 Majed B. <majedb@gmail.com>:
> } > If you're going to take all your data out, I would suggest you do=
a
> } > clean start and zero out all the disks to force the remapping of =
bad
> } > sectors, then run smartctl -t offline on all disks and after it's=
done
> } > (it will take A LONG time), create your filesystem on an array an=
d put
> } > back your data.
> }
> } I'd recommend running the badblocks program on each device to be pu=
t
> } in the array too, just to be certain that none of your disks are go=
ing
> } to go horribly flakey on assembly and build... I'm not sure if the
> } offline smart test does a full scan like badblocks would.
> }
> } T
> } --
> } To unsubscribe from this list: send the line "unsubscribe linux-rai=
d" in
> } the body of a message to majordomo@vger.kernel.org
> } More majordomo info at =A0http://vger.kernel.org/majordomo-info.htm=
l
>
>



--=20
-tim
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Report this message

#22: RE: RAID 5 array recovery - two drives errors in external enclosure

Posted on 2009-09-21 06:59:31 by Guy Watkins

I think the order of your drives changed. But I don't know how. You s=
hould
not have done another create, because it destroys the superblocks. You
should have done an assemble, listing all drives and not using missing.=
Md
should find all the correct drives. Or if it said 1 was wrong, just tr=
y
again and don't list that one.

Anyway, glad you got your data.

} -----Original Message-----
} From: Tim Bostrom [mailto:tbostrom@gmail.com]
} Sent: Monday, September 21, 2009 12:41 AM
} To: Guy Watkins
} Cc: Tom Carlson; Majed B.; linux-raid
} Subject: Re: RAID 5 array recovery - two drives errors in external
} enclosure
}=20
} Well, thank god I copied everything off the array this weekend, but
} strange:
}=20
} I had gotten the array up finally with the correct order and missing
} drive:
}=20
} mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sdf1 /dev/sdc1 /dev/sdb1
} /dev/sdd1 missing
}=20
} ------------
} After copying everything off, I power-cycled my server and tried to
} bring the array back up again using:
}=20
} mdadm -A /dev/md0 /dev/sdf1 /dev/sdc1 /dev/sdb1 /dev/sdd1 missing
}=20
} I received the error : mdadm: superblock on /dev/sdc1 doesn't match
} others - assembly aborted.
}=20
} This is strange since I had this seemingly working and was able to
} copy all the data offline this weekend.
} Drives haven't changed order - I haven't unplugged anything or change=
d
} any cords.
}=20
} Another issue of the command that worked before on the array:
} mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sdf1 /dev/sdc1 /dev/sdb1
} /dev/sdd1 missing
}=20
} yields my old problem of not being able to mount.
} EXT3-fs: md0: couldn't mount because of unsupported optional features
} (3fd18e00).
} mount: wrong fs type, bad option, bad superblock on /dev/md0,
} missing codepage or helper program, or other error
} In some cases useful info is found in syslog - try
} dmesg | tail or so
}=20
} What gives? The drive order couldn't have changed just through a
} reboot. All the same number of drives and drive letters are there.
}=20
}=20
} -Tim
}=20
}=20
}=20
} On Fri, Sep 18, 2009 at 6:35 PM, Guy Watkins <guy@watkins-home.com> w=
rote:
} > If you are starting over, maybe you should use RAID6?
} >
} > } -----Original Message-----
} > } From: linux-raid-owner@vger.kernel.org [mailto:linux-raid-
} > } owner@vger.kernel.org] On Behalf Of Tom Carlson
} > } Sent: Friday, September 18, 2009 9:18 PM
} > } To: Majed B.
} > } Cc: Tim Bostrom; linux-raid
} > } Subject: Re: RAID 5 array recovery - two drives errors in externa=
l
} > } enclosure
} > }
} > } 2009/9/19 Majed B. <majedb@gmail.com>:
} > } > If you're going to take all your data out, I would suggest you =
do a
} > } > clean start and zero out all the disks to force the remapping o=
f bad
} > } > sectors, then run smartctl -t offline on all disks and after it=
's
} done
} > } > (it will take A LONG time), create your filesystem on an array =
and
} put
} > } > back your data.
} > }
} > } I'd recommend running the badblocks program on each device to be =
put
} > } in the array too, just to be certain that none of your disks are =
going
} > } to go horribly flakey on assembly and build... I'm not sure if th=
e
} > } offline smart test does a full scan like badblocks would.
} > }
} > } T
} > } --
} > } To unsubscribe from this list: send the line "unsubscribe linux-r=
aid"
} in
} > } the body of a message to majordomo@vger.kernel.org
} > } More majordomo info at =A0http://vger.kernel.org/majordomo-info.h=
tml
} >
} >
}=20
}=20
}=20
} --
} -tim

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Report this message

#23: Re: RAID 5 array recovery - two drives errors in external enclosure

Posted on 2009-09-21 09:05:23 by majedb

> I received the error : mdadm: superblock on /dev/sdc1 doesn't match
> others - assembly aborted.

I got that when using an older version of mdadm (2.6.7.1). According
to Neil, it's been fixed already so try using the latest version (3.0)
and do this:
mdadm -Af /dev/md0 /dev/sdf1 /dev/sdc1 /dev/sdb1 /dev/sdd1

(No need to specify a missing one).

I avoided creating another array. No matter what the situation is, it
would be the last thing I'd ever do, because I want to keep the
metadata as intact as possible to get the most info I can out of it to
fix the situation.
And my motherboard keeps changing names like crazy, so if I create a
new array, I may not be able to find the original disks' order again
(not easily).

On Mon, Sep 21, 2009 at 7:40 AM, Tim Bostrom <tbostrom@gmail.com> wrote=
:
> Well, thank god I copied everything off the array this weekend, but s=
trange:
>
> I had gotten the array up finally with the correct order and missing =
drive:
>
> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sdf1 /dev/sdc1 /dev/sdb1
> /dev/sdd1 missing
>
> ------------
> After copying everything off, I power-cycled my server and tried to
> bring the array back up again using:
>
> mdadm -A /dev/md0 /dev/sdf1 /dev/sdc1 /dev/sdb1 /dev/sdd1 missing
>
> I received the error : mdadm: superblock on /dev/sdc1 doesn't match
> others - assembly aborted.
>
> This is strange since I had this seemingly working and was able to
> copy all the data offline this weekend.
> Drives haven't changed order - I haven't unplugged anything or change=
d
> any cords.
>
> Another issue of the command that worked before on the array:
> mdadm -C /dev/md0 -l 5 -n 5 -c 256 /dev/sdf1 /dev/sdc1 /dev/sdb1
> /dev/sdd1 missing
>
> yields my old problem of not being able to mount.
> EXT3-fs: md0: couldn't mount because of unsupported optional features
> (3fd18e00).
> mount: wrong fs type, bad option, bad superblock on /dev/md0,
>       missing codepage or helper program, or other err=
or
>       In some cases useful info is found in syslog - t=
ry
>       dmesg | tail  or so
--=20
Majed B.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Report this message