Re: Maximizing failed disk replacement on a RAID5 array

am 08.06.2011 08:58:52 von Durval Menezes

Hello,

On Tue, Jun 7, 2011 at 2:35 AM, Brad Campbell wrote:
> On 07/06/11 13:03, Durval Menezes wrote:
>>
>> Hello Folks,
>>
>> Just finished the "repair". It completed OK, and over SMART the HD now
>> shows a "Reallocated_Sector_Ct" of 291 (which shows that many bad
>> sectors have been remapped), but it's also still reporting 4
>> "Current_Pending_Sector" and 4 "Offline_Uncorrectable"... which I
>> think means exactly the same thing, ie, that there are 4 "active"
>> (from the HD perspective) sectors on the drive still detected as bad
>> and not remapped.
>>
>> I've been thinking about exactly what that means, and I think that
>> these 4 sectors are either A) outside the RAID partition (not very
>> probable as this partition occupies more than 99.99% of the disk,
>> leaving just a small, less than 105MB area at the beginning), or B)
>> some kind of metadata or unused space that hasn't been read and
>> rewritten by the "repair" I've just completed. I've just done a "dd
>> bs=1024k count=105/dev/null" to account for the
>> hyphotesys A), and come out empty: no errors, and the drive still
>> shows 4 bad, unmapped sectors on SMART.
>>
>> So, by elimination, it must be either case B) above, or a bug in the
>> linux md code (which prevents it from hitting every needed block on
>> the disk), or a bug in SMART (which makes it report inexistent bad
>>
> Try running a SMART long test smartctl -t long and it will tell you whether
> the sectors are really bad or not.
> I've had instances where the firmware still thought that some previously
> pending sectors were still pending until I forced a test, at which time the
> drive came to its senses and they went away.
>
> I believe if you wait until the drive gets around to doing its periodic
> offline data collection you'll see the same thing, but a long test is nice
> as it will give you an actual block number for the first failure (if you
> have one)

I did it (smartctl -a long) and it completed (registering an error at
the very end of the disk):

SMART Self-test log structure revision number 1
Num Test_Description Status Remaining
LifeTime(hours) LBA_of_first_error
# 1 Extended offline Completed: read failure 10%
9942 2930273794

The SMART Attributes table still shows 4 pending/uncorrectable sectors:

197 Current_Pending_Sector 0x0012 100 100 000 Old_age
Always -
4
198 Offline_Uncorrectable 0x0010 100 100 000 Old_age
Offline -
4

Converting the above LBA to a block number, I find 2930273794/2=
1465136897; as this is a 1.5TB HD,
this first error (there are possibly 3 more) is right at the final
35GB of the media, so it's inside (near the
end) of the RAID partition:

fdisk -l /dev/sdc
Disk /dev/sdc: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 512 bytes / 512 bytes
Disk identifier: 0x6be6057c
Device Boot Start End Blocks Id System
/dev/sdc1 1 1 8001 4 FAT16 <32M
/dev/sdc2 * 2 14 104422+ 83 Linux
/dev/sdc3 15 182401 1465023577+ fd
Linux raid autodetect

Confirming that this block is indeed returning read errors:

dd count=1 bs=1024 skip=1465136897 if=/dev/sdc of=/dev/null
[long delay]
dd: reading `/dev/sdc': Input/output error
0+0 records in
0+0 records out
0 bytes (0 B) copied, 45.1076 s, 0.0 kB/s

Examining one sector before:

dd count=1 bs=1024 skip=146513686 if=/dev/sdc | hexdump -C
00000000 92 e1 b4 d4 c6 cd 0f 33 db 7c ff a9 be c1 c1 8e
|.......3.|......|
00000010 71 35 fc 55 16 c4 36 ef 59 10 db 20 22 f4 57 99
|q5.U..6.Y.. ".W.|
00000020 31 61 2b 24 e0 98 3c 94 4b 8a 17 93 23 aa e9 96
|1a+$..<.K...#...|
00000030 b0 47 7b 8f 12 c6 52 42 99 0d 72 b4 51 02 5a 8e
|.G{...RB..r.Q.Z.|
00000040 c6 5a ac 86 0b a5 74 9b 13 e7 87 7a db 94 e2 7f
|.Z....t....z....|
00000050 c6 42 75 ba 53 bf 7f 20 fc 9c ad 4b 8f 3c 85 64
|.Bu.S.. ...K.<.d|
00000060 3a b0 ac 41 6e 41 fb 95 03 70 24 7e 2e d5 df 8a
|:..AnA...p$~....|
00000070 f9 dc d1 7d 4a 1e e1 93 9d 39 18 83 6c 9f 9f 79
|...}J....9..l..y|
00000080 53 a3 d1 fb 7f c6 bd 44 8d 0c 40 06 0a 92 f9 7e
|S......D..@....~|
00000090 0c 0e 87 43 66 9d fc 12 2b 0d 7a 34 ba 84 cb 73
|...Cf...+.z4...s|
000000a0 47 3b a4 fa c9 50 d9 96 f9 50 a2 60 17 eb 7c c8
|G;...P...P.`..|.|
000000b0 42 76 59 d0 1e 06 10 a8 3b 89 74 8d b4 04 83 88
|BvY.....;.t.....|
000000c0 d7 9d 3c 82 cf 8f 7d 6e a2 b6 bf 56 06 c0 aa 7c
|..<...}n...V...||
000000d0 7d 39 ae 0a 67 48 28 b5 07 fd fc ae 49 e4 7a 08
|}9..gH(.....I.z.|
000000e0 8a 37 94 e0 d3 d7 f0 f4 4c 49 3a ed b7 f4 84 95
|.7......LI:.....|
000000f0 3f 0a 4f 6c 47 62 1a f4 70 ca 14 8a 52 6d 4c 1e
|?.OlGb..p...RmL.|
00000100 da 0c 29 17 c1 a4 e1 5c cb 43 e0 01 45 9c 72 7f
|..)....\.C..E.r.|
00000110 78 b8 19 3f dd 35 c5 50 ff 9b 42 fb 0b d8 61 5a
|x..?.5.P..B...aZ|
00000120 24 2b ae c9 45 e6 e5 e9 04 00 93 bb 53 c0 fd d6
|$+..E.......S...|
00000130 9c ab 69 98 50 f0 5e 98 0d 0b b3 dc cb cb d0 7d
|..i.P.^........}|
00000140 21 70 68 e8 fb 3c 55 fd 2d c6 6c 25 86 dd 9a 4a
|!ph.. 00000150 fc e2 24 a9 fb 9a 6b be d5 e2 3b e9 a0 b1 61 ad
|..$...k...;...a.|
00000160 1f 9a c8 31 86 91 c6 1f 86 9e 17 35 25 7e 77 42
|...1.......5%~wB|
00000170 37 86 b2 17 08 8e c4 cf 4e e2 64 7d 83 11 05 1e
|7.......N.d}....|
00000180 6b c1 e7 5d 0f e2 c9 f9 0a 0a b1 2b 83 a1 2a a4
|k..].......+..*.|
00000190 1d f8 a6 13 2f e9 45 bb b7 e2 71 e9 69 ad 3c 47
|..../.E...q.i. 000001a0 3f fa 39 7f 1e 93 0e d2 89 09 dc d2 b3 3b f8 6f
|?.9..........;.o|
000001b0 21 21 72 b6 9e 9d 42 79 fb 78 3c 02 85 7b 1f 4f
|!!r...By.x<..{.O|
000001c0 8b 3c 26 62 8a 58 38 a7 48 31 b9 e2 0c 0d 41 d6
|.<&b.X8.H1....A.|
000001d0 8f 43 95 f0 1f 52 3e 0e 55 8d c0 93 f7 e3 c8 79
|.C...R>.U......y|
000001e0 a2 bc 51 72 87 3c 16 c3 d0 f3 57 a8 e4 48 51 32
|..Qr.<....W..HQ2|
000001f0 00 99 3e 0e 88 a3 fa e3 00 a4 c2 cb 28 7a a1 00
|..>.........(z..|
00000200 a0 b4 1b 6d c4 2a 15 75 a3 f0 24 47 5a d6 54 74
|...m.*.u..$GZ.Tt|
00000210 d0 ad e4 92 b1 99 5d 7a 62 47 b9 54 8f 9e 15 ca
|......]zbG.T....|
00000220 65 09 9e d0 d3 61 51 93 88 4a 46 1e 5c 15 07 ef
|e....aQ..JF.\...|
00000230 b0 92 fa a7 e7 3d e5 36 20 67 d2 24 b7 59 ae f4
|.....=.6 g.$.Y..|
00000240 7c 26 57 90 e1 69 b5 f3 b4 1b 8e e6 07 2e 46 84
||&W..i........F.|
1+0 records in
1+0 records out
1024 bytes (1.0 kB) copied, 5.0224e-05 s, 20.4 MB/s

Looking at one sector after the error returns similar results.

So, I don't know about you, but the above seems pretty much like data
to me (although it could also be parity).

So I have two questions:

1) can I simply skip over these sectors (using dd_rescue or multiple
dd invocations) when off-line copying the old disk to the new one,
trusting the RAID5 to reconstruct the data correctly from the other 2
disks? Or is it better to simply do the recover the "traditional" way
(ie, "fail" the old disk, "add" the new one, and run the risk of a
possible bad sector on one of the two remaining old disks ruining the
show completely and forcing me to recover from backups [I *do* have
up-to-date backups on this array])?

2) Is there a formula, a program or anything that can tell me exactly
what is located at the above sector (ie, whether it's RAID parity or a
data sector)?

Thanks,
--
Durval Menezes.

Ditto, one sector after:

So, when I "dd" this partition to a new one, I think

>
>
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Maximizing failed disk replacement on a RAID5 array

am 08.06.2011 09:32:26 von Brad Campbell

On 08/06/11 14:58, Durval Menezes wrote:

> 1) can I simply skip over these sectors (using dd_rescue or multiple
> dd invocations) when off-line copying the old disk to the new one,
> trusting the RAID5 to reconstruct the data correctly from the other 2

Noooooooooooo. As we stated early on, it you do that md will have no
idea that the data missing is actually missing as the drive won't return
a read error.

does a repair take long on your machine? I find that a few repair runs
generally gets me enough re-writes to clear the dud sectors and allow an
offline clone.

If your dd of the old disk to the new disk aborts with an error, do
_not_ under any circumstances (well, unless you have really good
backups) do a dd_rescue and just swap the disks.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html