Do I understand my RAID6 correctly?
Do I understand my RAID6 correctly?
am 12.04.2011 10:39:00 von lists
Today I had a drive fail in a customers server.
It was part of a RAID6 which seems to have rebuilt onto a spare drive now.
Right now it looks like:
# mdadm -D /dev/md3
/dev/md3:
Version : 00.90.03
Creation Time : Thu Dec 20 17:47:07 2007
Raid Level : raid6
Array Size : 4391334912 (4187.90 GiB 4496.73 GB)
Used Dev Size : 731889152 (697.98 GiB 749.45 GB)
Raid Devices : 8
Total Devices : 9
Preferred Minor : 3
Persistence : Superblock is persistent
Update Time : Tue Apr 12 10:27:45 2011
State : clean
Active Devices : 8
Working Devices : 8
Failed Devices : 1
Spare Devices : 0
Chunk Size : 64K
UUID : e848b637:ca2bde73:9f92f3cc:128cdbad
Events : 0.47127534
Number Major Minor RaidDevice State
0 8 33 0 active sync /dev/sdc1
1 8 177 1 active sync /dev/sdl1
2 8 65 2 active sync /dev/sde1
3 8 81 3 active sync /dev/sdf1
4 8 97 4 active sync /dev/sdg1
5 8 113 5 active sync /dev/sdh1
6 8 129 6 active sync /dev/sdi1
7 8 145 7 active sync /dev/sdj1
8 8 161 - faulty spare /dev/sdk1
My question (just to be sure):
Do I understand it correctly that the system has substituted the failed
/dev/sdk1 by a former spare drive (dunno the device name now) and that I
now I have a valid RAID6-device with 8 drives in it?
So out of the 8 drives there could fail another 2 now without losing
data ...
correct?
I have to tell the customer what to do and the grade of redundancy
available also relates to how urgent it is to get a new drive into the
system.
I assume I would remove /dev/sdk1 from md3, swap the drive, fdisk it and
re-add sdk1 to md3 (it is failed already now, so the fail-step isn't
necessary anymore). It would the be the new spare drive ... ?
Thanks for refreshing my RAID-knowledge ;-)
Stefan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Do I understand my RAID6 correctly?
am 12.04.2011 11:23:36 von Yann Ormanns
Subject: Do I understand my RAID6 correctly?
From: Stefan G. Weichinger
To: Yann Ormanns
Date: 2011-04-12 11:11 (+0200)
> Array Size : 4391334912 (4187.90 GiB 4496.73 GB)
> Used Dev Size : 731889152 (697.98 GiB 749.45 GB)
> Raid Devices : 8
> Total Devices : 9
> Preferred Minor : 3
> Persistence : Superblock is persistent
>
> Active Devices : 8
> Working Devices : 8
> Failed Devices : 1
You now have 9 devices in this Array (750GB*8 = 6 TB, 6 TB - (2*750GB) =
4.5 TB). One of them is the failed spare disk. That means, that this
array can "lose" two disks without losing data, as you already wrote.
Of course you can re-use /dev/sdk as a spare disk, but before, you
should check, why it failed (SMART data for example).
You should also have a look at the used drive models. E.g. if this array
uses 9x model XYZ from manucaturer ABC, perhaps more drives will fail in
the next time.
If the array uses mixed models, it should not be THAT urgent - but that
depends on the importance of the data...
I've read several times of people losing their RAID6, because they did
not mix the hard drive models. Then, a manufacturing fault have very bad
consequences.
Best regards,
Yann
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Do I understand my RAID6 correctly?
am 12.04.2011 12:42:26 von lists
Am 12.04.2011 11:23, schrieb Yann Ormanns:
> You now have 9 devices in this Array (750GB*8 = 6 TB, 6 TB - (2*750GB) =
> 4.5 TB). One of them is the failed spare disk. That means, that this
> array can "lose" two disks without losing data, as you already wrote.
Yep, fine.
> Of course you can re-use /dev/sdk as a spare disk, but before, you
> should check, why it failed (SMART data for example).
I already exported the controller-logs and will look through for SMART
info. Unfortunately the controller does not allow the use of
smartmontools, I can only use the specific ICP Storage Manager.
> You should also have a look at the used drive models. E.g. if this array
> uses 9x model XYZ from manucaturer ABC, perhaps more drives will fail in
> the next time.
uuuh
> If the array uses mixed models, it should not be THAT urgent - but that
> depends on the importance of the data...
> I've read several times of people losing their RAID6, because they did
> not mix the hard drive models. Then, a manufacturing fault have very bad
> consequences.
Scary. Yes, the server uses the same model for all 9 devices.
From your domain I see that you seem to be located in germany, so you
might know the manufacturer of the server: transtec.
I already opened a support ticket there, we still have a valid support
contract. Last time they sent a new drive, we'll see.
Thanks, Stefan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Do I understand my RAID6 correctly?
am 12.04.2011 19:17:15 von Yann Ormanns
Subject: Re: Do I understand my RAID6 correctly?
From: Stefan G. Weichinger
To: Yann Ormanns
Date: 2011-04-12 19:03 (+0200)
>
> I already exported the controller-logs and will look through for SMART
> info. Unfortunately the controller does not allow the use of
> smartmontools, I can only use the specific ICP Storage Manager.
>
I recommend to compare the active hours and the serial numbers of the
disks. So you can _perhaps_ predict the next disk with problems.
Unfortunately, SMART-data is no credible basis for any hard disk failure
predictions.
For further information, you may want to take a look at this german
link:
http://www.heise.de/newsticker/meldung/Google-Studie-zur-Aus fallursache-von-Festplatten-147178.html
and / or at this document http://labs.google.com/papers/disk_failures.pdf
>
> Scary. Yes, the server uses the same model for all 9 devices.
Yeah, that's really scary - it shows, that even with a RAID6 your data
is not absolutely safe. But I have to admit that this is "only" the
worst case scenario - the chance that this situation occurs, is really
small (but not impossible).
I suppose, that your customer keeps his backups up to date although he
uses a RAID6?
> From your domain I see that you seem to be located in germany, so you
> might know the manufacturer of the server: transtec.
>
No, I do not know this manucaturer - but I'm just a private user, so I
don't really have any experiences with "real" servers :)
Best regards,
Yann
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html