Recovery of failed RAID 6 and LVM

Recovery of failed RAID 6 and LVM

am 25.09.2011 09:55:04 von lists

Hi guys.


I have a RAID 6 setup with 5 2TB drives on Debian Wheezy [1] & [2].
Yesterday 3 of the drives failed working leaving the RAID setup broken.
Following [5] I managed to start the array and make it resync.
The problem I'm facing now is I cannot access any of the LVM partitions
[3] I have on top of my md0. Fdisk says the disk doesn't contain a valid
partition table [4].
I tried to run fsck on the lvm devices without luck.
Has any of you a suggestion, a method I could use to access my data please?



[1]:
# mdadm -QD /dev/md0
/dev/md0:
Version : 1.2
Creation Time : Sat Sep 24 23:59:02 2011
Raid Level : raid6
Array Size : 5860531200 (5589.04 GiB 6001.18 GB)
Used Dev Size : 1953510400 (1863.01 GiB 2000.39 GB)
Raid Devices : 5
Total Devices : 5
Persistence : Superblock is persistent

Update Time : Sun Sep 25 09:40:20 2011
State : clean, degraded, recovering
Active Devices : 3
Working Devices : 5
Failed Devices : 0
Spare Devices : 2

Layout : left-symmetric
Chunk Size : 512K

Rebuild Status : 63% complete

Name : odin:0 (local to host odin)
UUID : be51de24:ebcc6eef:8fc41158:fc728448
Events : 10314

Number Major Minor RaidDevice State
0 8 65 0 active sync /dev/sde1
1 8 81 1 active sync /dev/sdf1
2 8 97 2 active sync /dev/sdg1
5 8 129 3 spare rebuilding /dev/sdi1
4 0 0 4 removed

6 8 113 - spare /dev/sdh1


[2]:
# cat /proc/mdstat

Personalities : [raid1] [raid6] [raid5] [raid4]
md0 : active raid6 sdh1[6](S) sdi1[5] sdg1[2] sdf1[1] sde1[0]
5860531200 blocks super 1.2 level 6, 512k chunk, algorithm 2
[5/3] [UUU__]
[=======>.............] recovery = 36.8% (720185308/1953510400)
finish=441.4min speed=46564K/sec


[3]:
# lvdisplay
Logging initialised at Sun Sep 25 09:49:11 2011
Set umask from 0022 to 0077
Finding all logical volumes
--- Logical volume ---
LV Name /dev/fridge/storage
VG Name fridge
LV UUID kIhbSq-hePX-UIVv-uuiP-iK6w-djcz-iQ3cEI
LV Write Access read/write
LV Status available
# open 0
LV Size 4.88 TiB
Current LE 1280000
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 6144
Block device 253:0


[4]:

# fdisk -l /dev/fridge/storage

Disk /dev/fridge/storage: 5368.7 GB, 5368709120000 bytes
255 heads, 63 sectors/track, 652708 cylinders, total 10485760000 sectors
Units = sectors of 1 * 512 = 512 bytes
Sector size (logical/physical): 512 bytes / 512 bytes
I/O size (minimum/optimal): 524288 bytes / 1572864 bytes
Disk identifier: 0x00000000

Disk /dev/fridge/storage doesn't contain a valid partition table



[5]:
http://en.wikipedia.org/wiki/Mdadm#Recovering_from_a_loss_of _raid_superblock



--

Marcin M. Jessa
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Recovery of failed RAID 6 and LVM

am 25.09.2011 10:39:25 von Stan Hoeppner

On 9/25/2011 2:55 AM, Marcin M. Jessa wrote:
> Hi guys.
>
>
> I have a RAID 6 setup with 5 2TB drives on Debian Wheezy [1] & [2].
> Yesterday 3 of the drives failed working leaving the RAID setup broken.

What was the hardware event that caused this situation? Did you lose
power to, or kick the data cable out of a 3-bay eSATA enclosure? Do you
have a bad 3 in 1 hot swap cage? A flaky HBA/driver?

You need to identify the cause and fix it permanently or this will
likely happen again and again.

--
Stan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Recovery of failed RAID 6 and LVM

am 25.09.2011 12:07:02 von lists

On 9/25/11 10:39 AM, Stan Hoeppner wrote:
> On 9/25/2011 2:55 AM, Marcin M. Jessa wrote:
>> Hi guys.
>>
>>
>> I have a RAID 6 setup with 5 2TB drives on Debian Wheezy [1] & [2].
>> Yesterday 3 of the drives failed working leaving the RAID setup broken.
>
> What was the hardware event that caused this situation? Did you lose
> power to, or kick the data cable out of a 3-bay eSATA enclosure? Do you
> have a bad 3 in 1 hot swap cage? A flaky HBA/driver?
>
> You need to identify the cause and fix it permanently or this will
> likely happen again and again.
>

The problem is the Seagate drives. Searching for issues with my
drives I found many people are complaining about the same problems as I
have - drives failing randomly - [1-4].
I just ordered WD drives to replace the Seagate ones one by one in the
array. I will return the Seagate HDs to the shop.
Right now my main concern is to get the data back from the LVM
partitions and back it up...
The worst thing is the 3 drives failed so unexpectedly and fast I didn't
even have a chance to set up a backup solution.
Any help would be greatly appreciated.



[1]: http://bugs.debian.org/cgi-bin/bugreport.cgi?bug=625922
[2]:
http://forums.seagate.com/t5/Barracuda-XT-Barracuda-Barracud a/ST2000DL003-Barracuda-Green-not-detected-at-BIOS/td-p/8715 4/page/7
[3]: http://www.readynas.com/forum/viewtopic.php?f=65&t=51496&p=3 06494
[4]: http://forum.qnap.com/viewtopic.php?f=182&t=39893&start=30



--

Marcin M. Jessa
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Recovery of failed RAID 6 and LVM

am 25.09.2011 15:15:40 von Phil Turmel

On 09/25/2011 03:55 AM, Marcin M. Jessa wrote:
> Hi guys.
>
>
> I have a RAID 6 setup with 5 2TB drives on Debian Wheezy [1] & [2].
> Yesterday 3 of the drives failed working leaving the RAID setup broken.
> Following [5] I managed to start the array and make it resync.
> The problem I'm facing now is I cannot access any of the LVM partitions [3] I have on top of my md0. Fdisk says the disk doesn't contain a valid partition table [4].
> I tried to run fsck on the lvm devices without luck.
> Has any of you a suggestion, a method I could use to access my data please?

[trim /]

> [5]: http://en.wikipedia.org/wiki/Mdadm#Recovering_from_a_loss_of _raid_superblock

These instructions are horrible! If you make the slightest mistake, your data is completely hosed.

If first asks for your "mdadm -E" reports from the drives, but it has you filter them through a grep that throws away important information. (Did you keep that report?)

Next, it has you wipe the superblocks on the array members, destroying all possibility of future forensics.

Then, it has you re-create the array, but omits "--assume-clean", so the array rebuilds. With the slightest mistake in superblock type, chunk size, layout, alignment, data offset, or device order, the rebuild will trash your data. Default values for some of those have changed in mdadm from version to version, so a naive "--create" command has a good chance of getting something wrong.

There is no mention of attempting "--assemble --force" with your original superblocks, which is the correct first step in this situation. And it nearly always works.

I'm sorry, Marcin, but you shouldn't expect to get your data back. Per your "mdadm -D" report, the rebuild was already 63% done, so the destruction of your data is certainly complete now.

Regards,

Phil
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Recovery of failed RAID 6 and LVM

am 25.09.2011 16:16:43 von lists

On 9/25/11 3:15 PM, Phil Turmel wrote:
> On 09/25/2011 03:55 AM, Marcin M. Jessa wrote:
[...]

>> [5]: http://en.wikipedia.org/wiki/Mdadm#Recovering_from_a_loss_of _raid_superblock
>
> These instructions are horrible! If you make the slightest mistake, your data is completely hosed.

Do you know of a better howto ? I was desperate googling a lot, trying
to run different commands first in order to rebuild my raid array, but
with no luck. The only howto that started resyncing was the wikipedia
one I linked to...

> If first asks for your "mdadm -E" reports from the drives, but it has you filter them through a grep that throws away important information. (Did you keep that report?)

No, unfortunately I did not.

> Next, it has you wipe the superblocks on the array members, destroying all possibility of future forensics.
> Then, it has you re-create the array, but omits "--assume-clean", so the array rebuilds. With the slightest mistake in superblock type, chunk size, layout, alignment, data offset, or device order, the rebuild will trash your data. Default values for some of those have changed in mdadm from version to version, so a naive "--create" command has a good chance of getting something wrong.

I tried to run mdadm --assemble --assume-clean /dev/md0 /dev/sd[f-j]1
but that AFAIR only said that the devices which still were members of
the array and were still working were busy. I always stoped the array
before running it.

> There is no mention of attempting "--assemble --force" with your original superblocks, which is the correct first step in this situation. And it nearly always works.

I also tried running - with no luck:
# mdadm --assemble --force --scan /dev/md0
# mdadm --assemble --force /dev/md0 /dev/sde1 /dev/sdf1 /dev/sdg1
/dev/sdi1
# mdadm --assemble --force --run /dev/md0 /dev/sde1 /dev/sdf1
/dev/sdg1 /dev/sdi1
and
# mdadm --assemble /dev/md0 --uuid=9f1b28cb:9efcd750:324cd77a:b318ed33
--force


> I'm sorry, Marcin, but you shouldn't expect to get your data back. Per your "mdadm -D" report, the rebuild was already 63% done, so the destruction of your data is certainly complete now.

Oh sh** ! :( Really, there is nothing that can be done? What happened
when I started resyncing? I thought the good, working drives would get
the data syneced with the one of drives which failed (it did not really
fail, it was up after reboot and smartctl --attributes --log=selftest
shows it's healthy).


--

Marcin M. Jessa
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Recovery of failed RAID 6 and LVM

am 25.09.2011 16:41:00 von lists

On 9/25/11 3:15 PM, Phil Turmel wrote:


> I'm sorry, Marcin, but you shouldn't expect to get your data back. Per your "mdadm -D" report, the rebuild was already 63% done, so the destruction of your data is certainly complete now.

What I don't understand is I still have the LVM info. It's just the LVs
don't have partition table stored anymore:

# pvdisplay
Logging initialised at Sun Sep 25 16:38:43 2011
Set umask from 0022 to 0077
Scanning for physical volume names
--- Physical volume ---
PV Name /dev/md0
VG Name fridge
PV Size 5.46 TiB / not usable 3.00 MiB
Allocatable yes
PE Size 4.00 MiB
Total PE 1430793
Free PE 102153
Allocated PE 1328640
PV UUID Ubx1OW-jCyN-Vcy2-4p2W-L6Qb-u6W8-cVkrE4

# vgdisplay
Logging initialised at Sun Sep 25 16:40:12 2011
Set umask from 0022 to 0077
Finding all volume groups
Finding volume group "fridge"
--- Volume group ---
VG Name fridge
System ID
Format lvm2
Metadata Areas 1
Metadata Sequence No 33
VG Access read/write
VG Status resizable
MAX LV 0
Cur LV 7
Open LV 0
Max PV 0
Cur PV 1
Act PV 1
VG Size 5.46 TiB
PE Size 4.00 MiB
Total PE 1430793
Alloc PE / Size 1328640 / 5.07 TiB
Free PE / Size 102153 / 399.04 GiB
VG UUID ZD2fsN-dFq4-PcMB-owRh-WxGs-ciK8-PPwPbd

# lvdisplay
Logging initialised at Sun Sep 25 16:40:30 2011
Set umask from 0022 to 0077
Finding all logical volumes
--- Logical volume ---
LV Name /dev/fridge/storage
VG Name fridge
LV UUID kIhbSq-hePX-UIVv-uuiP-iK6w-djcz-iQ3cEI
LV Write Access read/write
LV Status available
# open 0
LV Size 4.88 TiB
Current LE 1280000
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 6144
Block device 253:0

--- Logical volume ---
LV Name /dev/fridge/webstorage
VG Name fridge
LV UUID PuCGo1-LkRa-doEI-n8qU-mqS3-20Cw-SICWPk
LV Write Access read/write
LV Status available
# open 0
LV Size 100.00 GiB
Current LE 25600
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 6144
Block device 253:1

--- Logical volume ---
LV Name /dev/fridge/mailstorage
VG Name fridge
LV UUID 538TGs-fRYt-VT1n-r8jE-Uvv3-nNXl-Cf8ojP
LV Write Access read/write
LV Status available
# open 0
LV Size 30.00 GiB
Current LE 7680
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 6144
Block device 253:2

--- Logical volume ---
LV Name /dev/fridge/web01
VG Name fridge
LV UUID NsABmI-ok5I-GCaE-yGV6-Dqp6-Qedz-jVDS6Y
LV Write Access read/write
LV Status available
# open 0
LV Size 10.00 GiB
Current LE 2560
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 6144
Block device 253:3

--- Logical volume ---
LV Name /dev/fridge/db01
VG Name fridge
LV UUID qa88nB-MqX8-25YN-MEqf-ln81-vNtP-w2yVMW
LV Write Access read/write
LV Status available
# open 0
LV Size 10.00 GiB
Current LE 2560
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 6144
Block device 253:4

--- Logical volume ---
LV Name /dev/fridge/mail01
VG Name fridge
LV UUID qxUbLd-SaDq-wCwd-Z5M6-2llk-8SJh-vTlruR
LV Write Access read/write
LV Status available
# open 0
LV Size 10.00 GiB
Current LE 2560
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 6144
Block device 253:5

--- Logical volume ---
LV Name /dev/fridge/win8
VG Name fridge
LV UUID TPsBeN-Nj2o-w1mt-pkS8-d9zu-wCMm-vRv3e7
LV Write Access read/write
LV Status available
# open 0
LV Size 30.00 GiB
Current LE 7680
Segments 1
Allocation inherit
Read ahead sectors auto
- currently set to 6144
Block device 253:6



--

Marcin M. Jessa
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Recovery of failed RAID 6 and LVM

am 25.09.2011 18:19:31 von Phil Turmel

On 09/25/2011 10:41 AM, Marcin M. Jessa wrote:
> On 9/25/11 3:15 PM, Phil Turmel wrote:
>
>
>> I'm sorry, Marcin, but you shouldn't expect to get your data back. Per your "mdadm -D" report, the rebuild was already 63% done, so the destruction of your data is certainly complete now.
>
> What I don't understand is I still have the LVM info. It's just the LVs don't have partition table stored anymore:

You probably got the device order partially correct, which would put some of your data blocks in the correct location. Having the LVM metadata line up is not terribly surprising. When some drives are placed back into the correct slots, but not others, only the non-parity data on the correctly placed drives will be correct. The rebuild will destroy the parity data on those devices, and much of the data on the other devices. Your partition tables were probably among the latter.

If chunk size, data offset, or layout were also incorrect, then even fewer good data blocks will show up by chance in the correct location.

Without the original mdadm -E reports (complete), there's no way I know of to figure out what happened, much less repair it.

Phil
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Recovery of failed RAID 6 and LVM

am 25.09.2011 18:43:58 von Phil Turmel

On 09/25/2011 10:16 AM, Marcin M. Jessa wrote:
> On 9/25/11 3:15 PM, Phil Turmel wrote:
>> On 09/25/2011 03:55 AM, Marcin M. Jessa wrote:
> [...]
>
>>> [5]: http://en.wikipedia.org/wiki/Mdadm#Recovering_from_a_loss_of _raid_superblock
>>
>> These instructions are horrible! If you make the slightest mistake, your data is completely hosed.
>
> Do you know of a better howto ? I was desperate googling a lot, trying to run different commands first in order to rebuild my raid array, but with no luck. The only howto that started resyncing was the wikipedia one I linked to...

The mdadm(1) and md(7) manual pages are first. Next would be anything on or linked from Neil Brown's blog: http://neil.brown.name/blog/mdadm

Of course, you found this list somehow. It's the official home of mdadm development, and the primary developer, Neil Brown, is an active participant.

>> If first asks for your "mdadm -E" reports from the drives, but it has you filter them through a grep that throws away important information. (Did you keep that report?)
>
> No, unfortunately I did not.

Then there's no way to determine any of the original parameters of the array, nor the proper device order. You can't rely on the device names themselves, as modern kernels try to identify drives on multiple controllers simultaneously, and slight timing changes will change what name ends up where. Only the original superblock will have a positive ID.

>> Next, it has you wipe the superblocks on the array members, destroying all possibility of future forensics.
>> Then, it has you re-create the array, but omits "--assume-clean", so the array rebuilds. With the slightest mistake in superblock type, chunk size, layout, alignment, data offset, or device order, the rebuild will trash your data. Default values for some of those have changed in mdadm from version to version, so a naive "--create" command has a good chance of getting something wrong.
>
> I tried to run mdadm --assemble --assume-clean /dev/md0 /dev/sd[f-j]1 but that AFAIR only said that the devices which still were members of the array and were still working were busy. I always stoped the array before running it.

"--assume-clean" only applies to "--create" operations, where it suppresses the starting rebuild. This gives you the opportunity to run "fsck -n" to test whether the device order and other parameters you used results in a working filesystem.

Devices can be reported busy for a variety of reasons. I would examine /proc/mdstat, the output of "dmsetup table", and the contents of /sys/block/.

>> There is no mention of attempting "--assemble --force" with your original superblocks, which is the correct first step in this situation. And it nearly always works.
>
> I also tried running - with no luck:
> # mdadm --assemble --force --scan /dev/md0
> # mdadm --assemble --force /dev/md0 /dev/sde1 /dev/sdf1 /dev/sdg1 /dev/sdi1
> # mdadm --assemble --force --run /dev/md0 /dev/sde1 /dev/sdf1 /dev/sdg1 /dev/sdi1
> and
> # mdadm --assemble /dev/md0 --uuid=9f1b28cb:9efcd750:324cd77a:b318ed33 --force

If these failed with "device busy", you never really tested whether assembly could have worked.

>> I'm sorry, Marcin, but you shouldn't expect to get your data back. Per your "mdadm -D" report, the rebuild was already 63% done, so the destruction of your data is certainly complete now.
>
> Oh sh** ! :( Really, there is nothing that can be done? What happened when I started resyncing? I thought the good, working drives would get the data syneced with the one of drives which failed (it did not really fail, it was up after reboot and smartctl --attributes --log=selftest shows it's healthy).

"--zero-superblock" destroys all previous knowledge of the member devices' condition, role, or history. After that, all are considered "good", with the role specified with "--create".

Phil
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Recovery of failed RAID 6 and LVM

am 25.09.2011 23:40:02 von NeilBrown

--Sig_/Vi1bt68b2SmWAe1dTO6xMFu
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

On Sun, 25 Sep 2011 09:55:04 +0200 "Marcin M. Jessa" wrot=
e:

> Hi guys.
>=20
>=20
> I have a RAID 6 setup with 5 2TB drives on Debian Wheezy [1] & [2].
> Yesterday 3 of the drives failed working leaving the RAID setup broken.
> Following [5] I managed to start the array and make it resync.

As has been noted, you seem to be quite lucky. [5] contains fairly bad advi=
ce
but it isn't obvious yet that you have lost everything.


> The problem I'm facing now is I cannot access any of the LVM partitions=20
> [3] I have on top of my md0. Fdisk says the disk doesn't contain a valid=
=20
> partition table [4].

You wouldn't expect an LV to contain a partition table. You would expect it
to contain a filesystem.
What does
fsck -f -n /dev/fridge/storage

show??

NeilBrown


> I tried to run fsck on the lvm devices without luck.
> Has any of you a suggestion, a method I could use to access my data pleas=
e?
>=20
>=20
>=20
> [1]:
> # mdadm -QD /dev/md0
> /dev/md0:
> Version : 1.2
> Creation Time : Sat Sep 24 23:59:02 2011
> Raid Level : raid6
> Array Size : 5860531200 (5589.04 GiB 6001.18 GB)
> Used Dev Size : 1953510400 (1863.01 GiB 2000.39 GB)
> Raid Devices : 5
> Total Devices : 5
> Persistence : Superblock is persistent
>=20
> Update Time : Sun Sep 25 09:40:20 2011
> State : clean, degraded, recovering
> Active Devices : 3
> Working Devices : 5
> Failed Devices : 0
> Spare Devices : 2
>=20
> Layout : left-symmetric
> Chunk Size : 512K
>=20
> Rebuild Status : 63% complete
>=20
> Name : odin:0 (local to host odin)
> UUID : be51de24:ebcc6eef:8fc41158:fc728448
> Events : 10314
>=20
> Number Major Minor RaidDevice State
> 0 8 65 0 active sync /dev/sde1
> 1 8 81 1 active sync /dev/sdf1
> 2 8 97 2 active sync /dev/sdg1
> 5 8 129 3 spare rebuilding /dev/sdi1
> 4 0 0 4 removed
>=20
> 6 8 113 - spare /dev/sdh1
>=20
>=20
> [2]:
> # cat /proc/mdstat
>=20
> Personalities : [raid1] [raid6] [raid5] [raid4]
> md0 : active raid6 sdh1[6](S) sdi1[5] sdg1[2] sdf1[1] sde1[0]
> 5860531200 blocks super 1.2 level 6, 512k chunk, algorithm 2=20
> [5/3] [UUU__]
> [=======3D>.............] recovery =3D 36.8% (7201853=
08/1953510400)=20
> finish=3D441.4min speed=3D46564K/sec
>=20
>=20
> [3]:
> # lvdisplay
> Logging initialised at Sun Sep 25 09:49:11 2011
> Set umask from 0022 to 0077
> Finding all logical volumes
> --- Logical volume ---
> LV Name /dev/fridge/storage
> VG Name fridge
> LV UUID kIhbSq-hePX-UIVv-uuiP-iK6w-djcz-iQ3cEI
> LV Write Access read/write
> LV Status available
> # open 0
> LV Size 4.88 TiB
> Current LE 1280000
> Segments 1
> Allocation inherit
> Read ahead sectors auto
> - currently set to 6144
> Block device 253:0
>=20
>=20
> [4]:
>=20
> # fdisk -l /dev/fridge/storage
>=20
> Disk /dev/fridge/storage: 5368.7 GB, 5368709120000 bytes
> 255 heads, 63 sectors/track, 652708 cylinders, total 10485760000 sectors
> Units =3D sectors of 1 * 512 =3D 512 bytes
> Sector size (logical/physical): 512 bytes / 512 bytes
> I/O size (minimum/optimal): 524288 bytes / 1572864 bytes
> Disk identifier: 0x00000000
>=20
> Disk /dev/fridge/storage doesn't contain a valid partition table
>=20
>=20
>=20
> [5]:=20
> http://en.wikipedia.org/wiki/Mdadm#Recovering_from_a_loss_of _raid_superbl=
ock
>=20
>=20
>=20


--Sig_/Vi1bt68b2SmWAe1dTO6xMFu
Content-Type: application/pgp-signature; name=signature.asc
Content-Disposition: attachment; filename=signature.asc

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)

iD8DBQFOf5+yG5fc6gV+Wb0RAhVaAJ4+9Qex6a63NjULVVfGOcZiBTPqmACf TPfQ
5zIA09mp7fbimSDz4qq+0fo=
=3J2x
-----END PGP SIGNATURE-----

--Sig_/Vi1bt68b2SmWAe1dTO6xMFu--
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Recovery of failed RAID 6 and LVM

am 25.09.2011 23:58:00 von lists

On 9/25/11 11:40 PM, NeilBrown wrote:

[...]

> You wouldn't expect an LV to contain a partition table. You would expect it
> to contain a filesystem.

Yes, there is still data available on the LVs.
I actually managed to grab some files from one of the LVs using
foremost. But foremost is limited and creates it's own directory
hierarchy with file names being changed.

> What does
> fsck -f -n /dev/fridge/storage
>
> show??

# fsck -f -n /dev/fridge/storage
fsck from util-linux 2.19.1
e2fsck 1.42-WIP (02-Jul-2011)
fsck.ext2: Superblock invalid, trying backup blocks...
fsck.ext2: Bad magic number in super-block while trying to open
/dev/mapper/fridge-storage

The superblock could not be read or does not describe a correct ext2
filesystem. If the device is valid and it really contains an ext2
filesystem (and not swap or ufs or something else), then the superblock
is corrupt, and you might try running e2fsck with an alternate superblock:
e2fsck -b 8193


--

Marcin M. Jessa
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Recovery of failed RAID 6 and LVM

am 26.09.2011 00:18:53 von NeilBrown

--Sig_/ij.dpnpon67OB1kJE.6KsRL
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

On Sun, 25 Sep 2011 23:58:00 +0200 "Marcin M. Jessa" wrot=
e:

> On 9/25/11 11:40 PM, NeilBrown wrote:
>=20
> [...]
>=20
> > You wouldn't expect an LV to contain a partition table. You would expe=
ct it
> > to contain a filesystem.
>=20
> Yes, there is still data available on the LVs.
> I actually managed to grab some files from one of the LVs using=20
> foremost. But foremost is limited and creates it's own directory=20
> hierarchy with file names being changed.
>=20
> > What does
> > fsck -f -n /dev/fridge/storage
> >
> > show??
>=20
> # fsck -f -n /dev/fridge/storage
> fsck from util-linux 2.19.1
> e2fsck 1.42-WIP (02-Jul-2011)
> fsck.ext2: Superblock invalid, trying backup blocks...
> fsck.ext2: Bad magic number in super-block while trying to open=20
> /dev/mapper/fridge-storage
>=20
> The superblock could not be read or does not describe a correct ext2
> filesystem. If the device is valid and it really contains an ext2
> filesystem (and not swap or ufs or something else), then the superblock
> is corrupt, and you might try running e2fsck with an alternate superblock:
> e2fsck -b 8193
>=20
>=20

Do you remember what filesystem you had on 'storage'? Was it ext3 or ext4 =
or
xfs or something else?

NeilBrown

--Sig_/ij.dpnpon67OB1kJE.6KsRL
Content-Type: application/pgp-signature; name=signature.asc
Content-Disposition: attachment; filename=signature.asc

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)

iD8DBQFOf6jNG5fc6gV+Wb0RAr3UAKCXmBL54suqR1NeSczJLFJIDaU7IACf YeH0
215WHs5BgW6CIMvJq10SqAU=
=Qgwt
-----END PGP SIGNATURE-----

--Sig_/ij.dpnpon67OB1kJE.6KsRL--
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Recovery of failed RAID 6 and LVM

am 26.09.2011 00:21:48 von lists

On 9/26/11 12:18 AM, NeilBrown wrote:

> Do you remember what filesystem you had on 'storage'? Was it ext3 or ext4 or
> xfs or something else?

It was EXT4.


--

Marcin M. Jessa
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Recovery of failed RAID 6 and LVM

am 26.09.2011 11:31:30 von NeilBrown

--Sig_/6=eZrKma46s54rho23Ag+H9
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

On Mon, 26 Sep 2011 11:05:38 +0200 "Marcin M. Jessa" wrot=
e:

> On 9/26/11 12:18 AM, NeilBrown wrote:
>=20
> > Do you remember what filesystem you had on 'storage'? Was it ext3 or e=
xt4 or
> > xfs or something else?
>=20
> You're giving me some hope here and then silence :)
> Why did you ask about the file system? Should I run fsck on the LV ?
>=20
>=20
>=20

You already did run fsck on the LV. It basically said that it didn't
recognise the filesystem at all.
I asked in case maybe it was XFS in which case a different tool would be
required.
But you said it was EXT4, so the fsck.ext2 which you used should have worked
if anything would.

It is certainly odd that the LVM info is all consistent, but the filesystem
info has disappear. It could be that you have the chunksize or device order
wrong and so it is looking for the filesystem info at the wrong place.

Nothing else I can suggest - sorry.

NeilBrown

--Sig_/6=eZrKma46s54rho23Ag+H9
Content-Type: application/pgp-signature; name=signature.asc
Content-Disposition: attachment; filename=signature.asc

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)

iD8DBQFOgEZyG5fc6gV+Wb0RAs59AKC06etbySkFAJBKoIJ0jcB+vVq9fQCf XkbB
qTlUihw3J/zbMkjBDlFsq8Y=
=cP6x
-----END PGP SIGNATURE-----

--Sig_/6=eZrKma46s54rho23Ag+H9--
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Recovery of failed RAID 6 and LVM

am 26.09.2011 12:53:32 von lists

On 9/26/11 11:31 AM, NeilBrown wrote:
> On Mon, 26 Sep 2011 11:05:38 +0200 "Marcin M. Jessa" wrote:
>
>> On 9/26/11 12:18 AM, NeilBrown wrote:
>>
>>> Do you remember what filesystem you had on 'storage'? Was it ext3 or ext4 or
>>> xfs or something else?
>>
>> You're giving me some hope here and then silence :)
>> Why did you ask about the file system? Should I run fsck on the LV ?
>>
>>
>>
>
> You already did run fsck on the LV. It basically said that it didn't
> recognise the filesystem at all.
> I asked in case maybe it was XFS in which case a different tool would be
> required.
> But you said it was EXT4, so the fsck.ext2 which you used should have worked
> if anything would.
>
> It is certainly odd that the LVM info is all consistent, but the filesystem
> info has disappear. It could be that you have the chunksize or device order
> wrong and so it is looking for the filesystem info at the wrong place.
>
> Nothing else I can suggest - sorry.

Would it be worth a shot to use parted, create msdos label and then make
a partition with a ext file system on top of it and run fsck?



--

Marcin M. Jessa
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Recovery of failed RAID 6 and LVM

am 26.09.2011 13:10:10 von NeilBrown

--Sig_/Qprdqb7GqtxpudBGxxv=RdX
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

On Mon, 26 Sep 2011 12:53:32 +0200 "Marcin M. Jessa" wrot=
e:

> On 9/26/11 11:31 AM, NeilBrown wrote:

> > Nothing else I can suggest - sorry.
>=20
> Would it be worth a shot to use parted, create msdos label and then make=
=20
> a partition with a ext file system on top of it and run fsck?
>=20

I cannot imagine how doing that would actually improve your situation at
all. It sounds like you are just corrupting things more. But maybe I don't
understand what you are trying to do.

I really wouldn't write anything to any device until you had found out where
the data you want is. However as I cannot suggest how to find the data (I
really think it is beyond repair - sorry) there is still nothing I can
suggest.

NeilBrown

--Sig_/Qprdqb7GqtxpudBGxxv=RdX
Content-Type: application/pgp-signature; name=signature.asc
Content-Disposition: attachment; filename=signature.asc

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)

iD8DBQFOgF2SG5fc6gV+Wb0RAgAEAKCW7x2v3bl3EOeWCpuw2MrmCv+y2QCg qjg+
7a1XmkPUUKph0dYnGYXUIcA=
=2m3q
-----END PGP SIGNATURE-----

--Sig_/Qprdqb7GqtxpudBGxxv=RdX--
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Recovery of failed RAID 6 and LVM

am 27.09.2011 21:12:58 von lists

On 9/26/11 11:31 AM, NeilBrown wrote:
> On Mon, 26 Sep 2011 11:05:38 +0200 "Marcin M. Jessa" wrote:
>
>> On 9/26/11 12:18 AM, NeilBrown wrote:
>>
>>> Do you remember what filesystem you had on 'storage'? Was it ext3 or ext4 or
>>> xfs or something else?
>>
>> You're giving me some hope here and then silence :)
>> Why did you ask about the file system? Should I run fsck on the LV ?
>>
>>
>>
>
> You already did run fsck on the LV. It basically said that it didn't
> recognise the filesystem at all.
> I asked in case maybe it was XFS in which case a different tool would be
> required.

Looks like I didn't remember correctly. I ran testdisk and it reported
the file system to be XFS. What would you suggest now Neil?


--

Marcin M. Jessa
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Recovery of failed RAID 6 and LVM

am 28.09.2011 01:13:35 von NeilBrown

--Sig_/kxTjlraYe+XxiVmE4qI9Fci
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

On Tue, 27 Sep 2011 21:12:58 +0200 "Marcin M. Jessa" wrot=
e:

> On 9/26/11 11:31 AM, NeilBrown wrote:
> > On Mon, 26 Sep 2011 11:05:38 +0200 "Marcin M. Jessa" =
wrote:
> >
> >> On 9/26/11 12:18 AM, NeilBrown wrote:
> >>
> >>> Do you remember what filesystem you had on 'storage'? Was it ext3 or=
ext4 or
> >>> xfs or something else?
> >>
> >> You're giving me some hope here and then silence :)
> >> Why did you ask about the file system? Should I run fsck on the LV ?
> >>
> >>
> >>
> >
> > You already did run fsck on the LV. It basically said that it didn't
> > recognise the filesystem at all.
> > I asked in case maybe it was XFS in which case a different tool would be
> > required.
>=20
> Looks like I didn't remember correctly. I ran testdisk and it reported=20
> the file system to be XFS. What would you suggest now Neil?
>=20
>=20

Presumably
xfs_check /dev/fridge/storage

and then maybe
xfs_repair /dev/fridge/storage

but I have no experience with XFS - I'm just reading man pages.

NeilBrown


--Sig_/kxTjlraYe+XxiVmE4qI9Fci
Content-Type: application/pgp-signature; name=signature.asc
Content-Disposition: attachment; filename=signature.asc

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)

iD8DBQFOglifG5fc6gV+Wb0RAtc+AKC90q667dcXREefcHQSc3b3f/UDEACf X+pK
vfj8fmXwiUIZ0rizvSc9zwQ=
=Wltc
-----END PGP SIGNATURE-----

--Sig_/kxTjlraYe+XxiVmE4qI9Fci--
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Recovery of failed RAID 6 and LVM

am 28.09.2011 04:50:58 von Stan Hoeppner

On 9/27/2011 6:13 PM, NeilBrown wrote:
> On Tue, 27 Sep 2011 21:12:58 +0200 "Marcin M. Jessa" wrote:
>
>> On 9/26/11 11:31 AM, NeilBrown wrote:
>>> On Mon, 26 Sep 2011 11:05:38 +0200 "Marcin M. Jessa" wrote:
>>>
>>>> On 9/26/11 12:18 AM, NeilBrown wrote:
>>>>
>>>>> Do you remember what filesystem you had on 'storage'? Was it ext3 or ext4 or
>>>>> xfs or something else?
>>>>
>>>> You're giving me some hope here and then silence :)
>>>> Why did you ask about the file system? Should I run fsck on the LV ?
>>>>
>>>>
>>>>
>>>
>>> You already did run fsck on the LV. It basically said that it didn't
>>> recognise the filesystem at all.
>>> I asked in case maybe it was XFS in which case a different tool would be
>>> required.
>>
>> Looks like I didn't remember correctly. I ran testdisk and it reported
>> the file system to be XFS. What would you suggest now Neil?
>>
>>
>
> Presumably
> xfs_check /dev/fridge/storage
>
> and then maybe
> xfs_repair /dev/fridge/storage
>
> but I have no experience with XFS - I'm just reading man pages.
>
> NeilBrown

Reading the thread, and the many like it over the past months/years, may
yield a clue as to why you wish to move on to something other than Linux
RAID...

--
Stan


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Recovery of failed RAID 6 and LVM

am 28.09.2011 09:10:26 von lists

On 9/28/11 4:50 AM, Stan Hoeppner wrote:

> Reading the thread, and the many like it over the past months/years, may
> yield a clue as to why you wish to move on to something other than Linux
> RAID...

:) I will give it another chance.
In case of failure FreeBSD and ZFS would be another option.


--

Marcin M. Jessa
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Recovery of failed RAID 6 and LVM

am 28.09.2011 09:51:25 von David Brown

On 28/09/2011 09:10, Marcin M. Jessa wrote:
> On 9/28/11 4:50 AM, Stan Hoeppner wrote:
>
>> Reading the thread, and the many like it over the past months/years, may
>> yield a clue as to why you wish to move on to something other than Linux
>> RAID...
>
> :) I will give it another chance.
> In case of failure FreeBSD and ZFS would be another option.
>
>

Don't forget that in the face of 3 disk drives that suddenly decide to
play silly buggers, /no/ raid system will cope well. You are not having
a problem because of Linux software raid problems - your problem is due
to bad hardware. If you had a similar situation with a hardware raid
system, it is quite unlikely that you would have had any chance of
recovering your raid. What spoiled your chances of recovery here is the
unfortunate bad advice you found on a website - but that won't happen
again, since you now know to post here before trying anything!

The key lesson to take away from this experience is to set up a backup
solution /before/ disaster strikes :-)


--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Recovery of failed RAID 6 and LVM

am 28.09.2011 12:38:27 von Michal Soltys

W dniu 28.09.2011 04:50, Stan Hoeppner pisze:
>
> Reading the thread, and the many like it over the past months/years, may
> yield a clue as to why you wish to move on to something other than Linux
> RAID...
>

IMHO, in almost all cases - at the end of the chain - is misinformation.
While this might be a bit bold - a lot of users don't even spend a few
minutes doing elementary homework like man md/mdadm/mdadm.conf and less
/usr/src/linux/Documentation/md.txt. Be it normal usage, or when
problems happen. Rumors and forum wisdom can be really damaging - not to
look far away - how many people keep believing that xfs eats your data
and fills it with 0s ?

It's hard to find cases, when md driver or mdadm was really at fault for
something. For the most part the typical route is: [bottom barrel cheap
desktop ]hardware/[terribly designed sata ]cable issues -> a user
applying random googled suggestions (with shaking hands) -> really,
really bad problems. But that's not md's failure.

I'd put lots of responsibility on [big] distros as well, which have been
trying (for many years already) to turn linux into layers of gui/script
wrapped (and often buggy) experience, trying to hide any and all
technical details at all cost. But that's OT ...


Be it flexibility, recoverability (with cooled head and after panicking
while being /away/ from the drives and md) or performance (especially
after some small adjustments - namely stripe_cache_size for write
speeds) it's hard to challenge md. And some awesome features are coming
too (e.g. http://thread.gmane.org/gmane.linux.raid/34708 ).
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Recovery of failed RAID 6 and LVM

am 28.09.2011 15:20:36 von Brad Campbell

On 28/09/11 18:38, Michal Soltys wrote:

> It's hard to find cases, when md driver or mdadm was really at fault for
> something. For the most part the typical route is: [bottom barrel cheap
> desktop ]hardware/[terribly designed sata ]cable issues -> a user
> applying random googled suggestions (with shaking hands) -> really,
> really bad problems. But that's not md's failure.

This really sums it up succinctly.

If you watched the cases of disaster that swing past linux-raid, the
ones who always walk away whistling a happy tune are the ones who stop,
think and ask for help.

The basket cases are more often than not created by people trying stuff
out because they saw it mentioned somewhere else.

I'd suggest that users of real hardware raid suffer less "problems"
because as they pay a bucketload of money for their raid card, they are
far less likely to cheap out on cables, enclosures, drives and power
supplies.

Most of the tales of woe here are related to the failures associated
with commodity hardware. The 8TB I lost was entirely due to using a $15
SATA controller.

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Recovery of failed RAID 6 and LVM

am 28.09.2011 18:12:08 von Stan Hoeppner

On 9/28/2011 2:10 AM, Marcin M. Jessa wrote:
> On 9/28/11 4:50 AM, Stan Hoeppner wrote:
>
>> Reading the thread, and the many like it over the past months/years, may
>> yield a clue as to why you wish to move on to something other than Linux
>> RAID...
>
> :) I will give it another chance.
> In case of failure FreeBSD and ZFS would be another option.

I was responding to Neil's exhaustion with mdadm. I was speculating
that help threads such as yours may be a contributing factor,
requesting/requiring Neil to become Superman many times per month to try
to save some OP's bacon.

--
Stan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Recovery of failed RAID 6 and LVM

am 28.09.2011 18:30:02 von lists

On 9/28/11 6:12 PM, Stan Hoeppner wrote:
> On 9/28/2011 2:10 AM, Marcin M. Jessa wrote:
>> On 9/28/11 4:50 AM, Stan Hoeppner wrote:
>>
>>> Reading the thread, and the many like it over the past months/years, may
>>> yield a clue as to why you wish to move on to something other than Linux
>>> RAID...
>>
>> :) I will give it another chance.
>> In case of failure FreeBSD and ZFS would be another option.
>
> I was responding to Neil's exhaustion with mdadm. I was speculating that
> help threads such as yours may be a contributing factor,
> requesting/requiring Neil to become Superman many times per month to try
> to save some OP's bacon.

That's what mailing lists are for. And more will come as long as there
is no documentation on how to save your behind in case of failures like
that. Or if the docs with examples available online are utterly useless.


--

Marcin M. Jessa
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Recovery of failed RAID 6 and LVM

am 28.09.2011 18:31:30 von Stan Hoeppner

On 9/28/2011 5:38 AM, Michal Soltys wrote:
> W dniu 28.09.2011 04:50, Stan Hoeppner pisze:
>>
>> Reading the thread, and the many like it over the past months/years, may
>> yield a clue as to why you wish to move on to something other than Linux
>> RAID...
>>
>
> IMHO, in almost all cases - at the end of the chain - is misinformation.
> While this might be a bit bold - a lot of users don't even spend a few
> minutes doing elementary homework like man md/mdadm/mdadm.conf and less
> /usr/src/linux/Documentation/md.txt. Be it normal usage, or when
> problems happen. Rumors and forum wisdom can be really damaging - not to
> look far away - how many people keep believing that xfs eats your data
> and fills it with 0s ?

Two people have responded to my comment above. Neither read it in the
proper context. I was responding to Neil. Neil wants to quit Linux
RAID. I simply eluded that 'desperate, please save my bacon!' help
threads such as the current one may be a factor.

--
Stan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Recovery of failed RAID 6 and LVM

am 28.09.2011 18:37:20 von lists

On 9/28/11 6:31 PM, Stan Hoeppner wrote:

> Two people have responded to my comment above. Neither read it in the
> proper context. I was responding to Neil. Neil wants to quit Linux RAID.
> I simply eluded that 'desperate, please save my bacon!' help threads
> such as the current one may be a factor.

Right, so you're saying we're betting on a soon to be dead horse?


--

Marcin M. Jessa
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Recovery of failed RAID 6 and LVM

am 28.09.2011 20:56:10 von Thomas Fjellstrom

On September 28, 2011, Marcin M. Jessa wrote:
> On 9/28/11 6:12 PM, Stan Hoeppner wrote:
> > On 9/28/2011 2:10 AM, Marcin M. Jessa wrote:
> >> On 9/28/11 4:50 AM, Stan Hoeppner wrote:
> >>> Reading the thread, and the many like it over the past months/years,
> >>> may yield a clue as to why you wish to move on to something other than
> >>> Linux RAID...
> >>>
> >> :) I will give it another chance.
> >>
> >> In case of failure FreeBSD and ZFS would be another option.
> >
> > I was responding to Neil's exhaustion with mdadm. I was speculating that
> > help threads such as yours may be a contributing factor,
> > requesting/requiring Neil to become Superman many times per month to try
> > to save some OP's bacon.
>
> That's what mailing lists are for. And more will come as long as there
> is no documentation on how to save your behind in case of failures like
> that. Or if the docs with examples available online are utterly useless.

I think that those of us that have been helped on the list should think about
contributing some wiki docs, if there's one we can edit.

--
Thomas Fjellstrom
thomas@fjellstrom.ca
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Recovery of failed RAID 6 and LVM

am 28.09.2011 21:02:26 von Thomas Fjellstrom

On September 28, 2011, Brad Campbell wrote:
> On 28/09/11 18:38, Michal Soltys wrote:
> > It's hard to find cases, when md driver or mdadm was really at fault for
> > something. For the most part the typical route is: [bottom barrel cheap
> > desktop ]hardware/[terribly designed sata ]cable issues -> a user
> > applying random googled suggestions (with shaking hands) -> really,
> > really bad problems. But that's not md's failure.
>
> This really sums it up succinctly.
>
> If you watched the cases of disaster that swing past linux-raid, the
> ones who always walk away whistling a happy tune are the ones who stop,
> think and ask for help.
>
> The basket cases are more often than not created by people trying stuff
> out because they saw it mentioned somewhere else.
>
> I'd suggest that users of real hardware raid suffer less "problems"
> because as they pay a bucketload of money for their raid card, they are
> far less likely to cheap out on cables, enclosures, drives and power
> supplies.
>
> Most of the tales of woe here are related to the failures associated
> with commodity hardware. The 8TB I lost was entirely due to using a $15
> SATA controller.
>

I completely agree. The last time I lost my array, it was because I fat
fingered a mdadm command. Can't remember exactly what it was now, either a
reshape, or a drive replacement. Now I try to be very very careful.

--
Thomas Fjellstrom
thomas@fjellstrom.ca
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Recovery of failed RAID 6 and LVM

am 28.09.2011 21:03:52 von Thomas Fjellstrom

On September 28, 2011, Marcin M. Jessa wrote:
> On 9/28/11 6:31 PM, Stan Hoeppner wrote:
> > Two people have responded to my comment above. Neither read it in the
> > proper context. I was responding to Neil. Neil wants to quit Linux RAID.
> > I simply eluded that 'desperate, please save my bacon!' help threads
> > such as the current one may be a factor.
>
> Right, so you're saying we're betting on a soon to be dead horse?

Oh heck no. There's no way md would die if Neil left. And I doubt he'd leave
it without a maintainer. At least not under normal circumstances.

--
Thomas Fjellstrom
thomas@fjellstrom.ca
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Recovery of failed RAID 6 and LVM

am 28.09.2011 21:26:23 von lists

On 9/28/11 8:56 PM, Thomas Fjellstrom wrote:
> On September 28, 2011, Marcin M. Jessa wrote:
>> On 9/28/11 6:12 PM, Stan Hoeppner wrote:
>>> On 9/28/2011 2:10 AM, Marcin M. Jessa wrote:
>>>> On 9/28/11 4:50 AM, Stan Hoeppner wrote:
>>>>> Reading the thread, and the many like it over the past months/years,
>>>>> may yield a clue as to why you wish to move on to something other than
>>>>> Linux RAID...
>>>>>
>>>> :) I will give it another chance.
>>>>
>>>> In case of failure FreeBSD and ZFS would be another option.
>>>
>>> I was responding to Neil's exhaustion with mdadm. I was speculating that
>>> help threads such as yours may be a contributing factor,
>>> requesting/requiring Neil to become Superman many times per month to try
>>> to save some OP's bacon.
>>
>> That's what mailing lists are for. And more will come as long as there
>> is no documentation on how to save your behind in case of failures like
>> that. Or if the docs with examples available online are utterly useless.
>
> I think that those of us that have been helped on the list should think about
> contributing some wiki docs, if there's one we can edit.


I have a site, ezunix.org (a bit crippled since the crash) where I
document anything I come across that can be useful.
But after all the messages I still don't know what to do if you lose 3
drives in a 5 drive RAID6 setup ;)
I was told I was doing it wrong but never how to do it right.
And that's the case of all the mailing lists I came across before I
found that wikipedia site with incorrect instructions.



--

Marcin M. Jessa
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Recovery of failed RAID 6 and LVM

am 28.09.2011 21:29:30 von lists

On 9/28/11 9:03 PM, Thomas Fjellstrom wrote:
> On September 28, 2011, Marcin M. Jessa wrote:
>> On 9/28/11 6:31 PM, Stan Hoeppner wrote:
>>> Two people have responded to my comment above. Neither read it in the
>>> proper context. I was responding to Neil. Neil wants to quit Linux RAID.
>>> I simply eluded that 'desperate, please save my bacon!' help threads
>>> such as the current one may be a factor.
>>
>> Right, so you're saying we're betting on a soon to be dead horse?
>
> Oh heck no. There's no way md would die if Neil left. And I doubt he'd leave
> it without a maintainer. At least not under normal circumstances.

Let's hope not. I saw the planned changes for Linux 3.1 and they look great.


--

Marcin M. Jessa
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Recovery of failed RAID 6 and LVM

am 28.09.2011 21:42:25 von Thomas Fjellstrom

On September 28, 2011, Marcin M. Jessa wrote:
> On 9/28/11 8:56 PM, Thomas Fjellstrom wrote:
> > On September 28, 2011, Marcin M. Jessa wrote:
> >> On 9/28/11 6:12 PM, Stan Hoeppner wrote:
> >>> On 9/28/2011 2:10 AM, Marcin M. Jessa wrote:
> >>>> On 9/28/11 4:50 AM, Stan Hoeppner wrote:
> >>>>> Reading the thread, and the many like it over the past months/years,
> >>>>> may yield a clue as to why you wish to move on to something other
> >>>>> than Linux RAID...
> >>>>>
> >>>> :) I will give it another chance.
> >>>>
> >>>> In case of failure FreeBSD and ZFS would be another option.
> >>>
> >>> I was responding to Neil's exhaustion with mdadm. I was speculating
> >>> that help threads such as yours may be a contributing factor,
> >>> requesting/requiring Neil to become Superman many times per month to
> >>> try to save some OP's bacon.
> >>
> >> That's what mailing lists are for. And more will come as long as there
> >> is no documentation on how to save your behind in case of failures like
> >> that. Or if the docs with examples available online are utterly useless.
> >
> > I think that those of us that have been helped on the list should think
> > about contributing some wiki docs, if there's one we can edit.
>
> I have a site, ezunix.org (a bit crippled since the crash) where I
> document anything I come across that can be useful.
> But after all the messages I still don't know what to do if you lose 3
> drives in a 5 drive RAID6 setup ;)
> I was told I was doing it wrong but never how to do it right.
> And that's the case of all the mailing lists I came across before I
> found that wikipedia site with incorrect instructions.

I think they did mention how to do it right. something like: mdadm --assemble
--force

since 3 drives at once likely means the drives are fine. I recently lost ALL of
the drives in my 7 drive raid5 array. First one was kicked, then the rest fell
offline at the same time. Because the 6 drives all fell offline at the same time,
their metadata all agreed on the current state of the array, so nothing other
than some data that was stuck in the page cache was gone. In my case, after I
ran: `mdadm --assemble --verbose /dev/md1 /dev/sd[fhijedg]` only one drive
was left out, which was the first drive to go. Then I ran: `mdadm --re-add
/dev/md1 /dev/sdi` to add that drive back, and since I use the very nice
bitmap feature, it only too my array a few minutes to resync.

--
Thomas Fjellstrom
thomas@fjellstrom.ca
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Recovery of failed RAID 6 and LVM

am 28.09.2011 21:43:01 von Thomas Fjellstrom

On September 28, 2011, Marcin M. Jessa wrote:
> On 9/28/11 9:03 PM, Thomas Fjellstrom wrote:
> > On September 28, 2011, Marcin M. Jessa wrote:
> >> On 9/28/11 6:31 PM, Stan Hoeppner wrote:
> >>> Two people have responded to my comment above. Neither read it in the
> >>> proper context. I was responding to Neil. Neil wants to quit Linux
> >>> RAID. I simply eluded that 'desperate, please save my bacon!' help
> >>> threads such as the current one may be a factor.
> >>
> >> Right, so you're saying we're betting on a soon to be dead horse?
> >
> > Oh heck no. There's no way md would die if Neil left. And I doubt he'd
> > leave it without a maintainer. At least not under normal circumstances.
>
> Let's hope not. I saw the planned changes for Linux 3.1 and they look
> great.

I believe he said it wasn't going to happen any time soon. Just that he was
thinking about retiring eventually.

--
Thomas Fjellstrom
thomas@fjellstrom.ca
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Recovery of failed RAID 6 and LVM

am 29.09.2011 01:49:10 von NeilBrown

--Sig_/5EjrSN9onuu6.xHPmV+/Ees
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

On Wed, 28 Sep 2011 11:12:08 -0500 Stan Hoeppner
wrote:

> On 9/28/2011 2:10 AM, Marcin M. Jessa wrote:
> > On 9/28/11 4:50 AM, Stan Hoeppner wrote:
> >
> >> Reading the thread, and the many like it over the past months/years, m=
ay
> >> yield a clue as to why you wish to move on to something other than Lin=
ux
> >> RAID...
> >
> > :) I will give it another chance.
> > In case of failure FreeBSD and ZFS would be another option.
>=20
> I was responding to Neil's exhaustion with mdadm. I was speculating=20
> that help threads such as yours may be a contributing factor,=20
> requesting/requiring Neil to become Superman many times per month to try=
=20
> to save some OP's bacon.
>=20

No, I don't really think they are a factor - though thanks for thinking
about it.

Obviously not all "help threads" end with a good result but quite a few do
and one has to take the rough with the smooth.
And each help thread is a potential learning experience. If I see patterns
of failure recurring it will guide and motivate me to improve md or mdadm to
make that failure mode less likely.

I think it is simply that it isn't new any more. I first started
contributing to md early in 2000, and 11 years is a long time. Not as long
as Mr Torvalds has works on Linux of course, but Linux is a lot bigger than
md so there is more room to be interested.
There have been many highlights over that time, but the ones that stick in =
my
memory is when others have contributed in significant ways. I really value
that, whether it is code, or review or documentation, or making a wiki or
answering mailing lists questions before I do, or even putting extra time in
to reproduce a bug so we can drill down to the cause.

I figure that appearing competent capable and in control isn't going to
attract new blood - new blood wants wide open frontiers with lots of
opportunity (I started in md when it was essentially unmaintained - I know
the attraction). So I just want to say that there is certainly room and
opportunity over here.

I'm not about to drop md, but I would love an apprentice or two (or 3 or 4)
and would aim to provide the same mix of independence and oversight as Linus
does.

NeilBrown

--Sig_/5EjrSN9onuu6.xHPmV+/Ees
Content-Type: application/pgp-signature; name=signature.asc
Content-Disposition: attachment; filename=signature.asc

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)

iD8DBQFOg7J2G5fc6gV+Wb0RAl5lAJ0V9Lwo/8CG6GVrOYh35UymociHdwCf QcqY
ozyWkQDTHpq+gWn5+nUTMDI=
=BbHY
-----END PGP SIGNATURE-----

--Sig_/5EjrSN9onuu6.xHPmV+/Ees--
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Recovery of failed RAID 6 and LVM

am 29.09.2011 11:03:09 von David Brown

On 29/09/2011 01:49, NeilBrown wrote:
> On Wed, 28 Sep 2011 11:12:08 -0500 Stan Hoeppner
> wrote:
>
>> On 9/28/2011 2:10 AM, Marcin M. Jessa wrote:
>>> On 9/28/11 4:50 AM, Stan Hoeppner wrote:
>>>
>>>> Reading the thread, and the many like it over the past months/years, may
>>>> yield a clue as to why you wish to move on to something other than Linux
>>>> RAID...
>>>
>>> :) I will give it another chance.
>>> In case of failure FreeBSD and ZFS would be another option.
>>
>> I was responding to Neil's exhaustion with mdadm. I was speculating
>> that help threads such as yours may be a contributing factor,
>> requesting/requiring Neil to become Superman many times per month to try
>> to save some OP's bacon.
>>
>
> No, I don't really think they are a factor - though thanks for thinking
> about it.
>
> Obviously not all "help threads" end with a good result but quite a few do
> and one has to take the rough with the smooth.
> And each help thread is a potential learning experience. If I see patterns
> of failure recurring it will guide and motivate me to improve md or mdadm to
> make that failure mode less likely.
>
> I think it is simply that it isn't new any more. I first started
> contributing to md early in 2000, and 11 years is a long time. Not as long
> as Mr Torvalds has works on Linux of course, but Linux is a lot bigger than
> md so there is more room to be interested.
> There have been many highlights over that time, but the ones that stick in my
> memory is when others have contributed in significant ways. I really value
> that, whether it is code, or review or documentation, or making a wiki or
> answering mailing lists questions before I do, or even putting extra time in
> to reproduce a bug so we can drill down to the cause.
>
> I figure that appearing competent capable and in control isn't going to
> attract new blood - new blood wants wide open frontiers with lots of
> opportunity (I started in md when it was essentially unmaintained - I know
> the attraction). So I just want to say that there is certainly room and
> opportunity over here.
>
> I'm not about to drop md, but I would love an apprentice or two (or 3 or 4)
> and would aim to provide the same mix of independence and oversight as Linus
> does.
>
> NeilBrown

One challenge for getting apprentices in this particular area is the
hardware costs. For someone to be able to really help you out in
development and serious testing of md, they are going to need access to
a machine with plenty of disks, preferably with hotplug bays and with a
mix of hdds and ssds. That's going to out of reach for many potential
assistants - it is hard enough finding someone with the required talent,
interest and time to spend on md/mdadm. Finding people with money -
especially people using md raid professionally - should be a lot easier.
So if there is anyone out there who is willing and able to contribute
seriously to md/mdadm, but is hindered by lack of hardware, then I for
one would be willing to contribute to a fund to help out.



--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Recovery of failed RAID 6 and LVM

am 29.09.2011 17:21:50 von Stan Hoeppner

On 9/29/2011 4:03 AM, David Brown wrote:

> One challenge for getting apprentices in this particular area is the
> hardware costs. For someone to be able to really help you out in
> development and serious testing of md, they are going to need access to
> a machine with plenty of disks, preferably with hotplug bays and with a
> mix of hdds and ssds. That's going to out of reach for many potential
> assistants - it is hard enough finding someone with the required talent,
> interest and time to spend on md/mdadm. Finding people with money -
> especially people using md raid professionally - should be a lot easier.
> So if there is anyone out there who is willing and able to contribute
> seriously to md/mdadm, but is hindered by lack of hardware, then I for
> one would be willing to contribute to a fund to help out.

Almost a dozen different people from Intel have contributed code
recently. I would think such folks wouldn't have any problem getting
access to all the hardware they could need given Intel's financial
resources. Seems like a good recruiting pool.

--
Stan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Recovery of failed RAID 6 and LVM

am 29.09.2011 19:14:23 von David Brown

On 29/09/11 17:21, Stan Hoeppner wrote:
> On 9/29/2011 4:03 AM, David Brown wrote:
>
>> One challenge for getting apprentices in this particular area is the
>> hardware costs. For someone to be able to really help you out in
>> development and serious testing of md, they are going to need access to
>> a machine with plenty of disks, preferably with hotplug bays and with a
>> mix of hdds and ssds. That's going to out of reach for many potential
>> assistants - it is hard enough finding someone with the required talent,
>> interest and time to spend on md/mdadm. Finding people with money -
>> especially people using md raid professionally - should be a lot easier.
>> So if there is anyone out there who is willing and able to contribute
>> seriously to md/mdadm, but is hindered by lack of hardware, then I for
>> one would be willing to contribute to a fund to help out.
>
> Almost a dozen different people from Intel have contributed code
> recently. I would think such folks wouldn't have any problem getting
> access to all the hardware they could need given Intel's financial
> resources. Seems like a good recruiting pool.
>

I am sure you are right that these people should have access to plenty
of hardware - in particular, they should be in an ideal position to help
with testing/tuning for SSD usage. But they may not have the time to
help much, unless Intel is happy to pay them to do so. People who have
lots of time - say, a new graduate or someone "between jobs" - often
don't have the hardware. All I am saying is that /if/ there are such
people around, and they are serious about working on md, then I think it
should be possible to raise a little money to help them help us. (The
same applies to existing md developers, of course.)

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Recovery of failed RAID 6 and LVM

am 29.09.2011 20:28:36 von dan.j.williams

On Wed, Sep 28, 2011 at 4:49 PM, NeilBrown wrote:
> On Wed, 28 Sep 2011 11:12:08 -0500 Stan Hoeppner com>
> wrote:
>
>> On 9/28/2011 2:10 AM, Marcin M. Jessa wrote:
>> > On 9/28/11 4:50 AM, Stan Hoeppner wrote:
>> >
>> >> Reading the thread, and the many like it over the past months/yea=
rs, may
>> >> yield a clue as to why you wish to move on to something other tha=
n Linux
>> >> RAID...
>> >
>> > :) I will give it another chance.
>> > In case of failure FreeBSD and ZFS would be another option.
>>
>> I was responding to Neil's exhaustion with mdadm. =A0I was speculati=
ng
>> that help threads such as yours may be a contributing factor,
>> requesting/requiring Neil to become Superman many times per month to=
try
>> to save some OP's bacon.
>>
>
> No, I don't really think they are a factor - though thanks for thinki=
ng
> about it.
>
> Obviously not all "help threads" end with a good result but quite a f=
ew do
> and one has to take the rough with the smooth.
> And each help thread is a potential learning experience. =A0If I see =
patterns
> of failure recurring it will guide and motivate me to improve md or m=
dadm to
> make that failure mode less likely.
>
> I think it is simply that it isn't new any more. =A0I first started
> contributing to md early in 2000, and 11 years is a long time. =A0Not=
as long
> as Mr Torvalds has works on Linux of course, but Linux is a lot bigge=
r than
> md so there is more room to be interested.
> There have been many highlights over that time, but the ones that sti=
ck in my
> memory is when others have contributed in significant ways. =A0I real=
ly value
> that, whether it is code, or review or documentation, or making a wik=
i or
> answering mailing lists questions before I do, or even putting extra =
time in
> to reproduce a bug so we can drill down to the cause.
>
> I figure that appearing competent capable and in control isn't going =
to
> attract new blood - new blood wants wide open frontiers with lots of
> opportunity (I started in md when it was essentially unmaintained - I=
know
> the attraction). =A0So I just want to say that there is certainly roo=
m and
> opportunity over here.
>
> I'm not about to drop md, but I would love an apprentice or two (or 3=
or 4)
> and would aim to provide the same mix of independence and oversight a=
s Linus
> does.
>

What if as a starting point we could get a Patchwork queue hosted
somewhere so you could at least start formally delegating incoming
patches for an apprentice to disposition?

The hardest part about maintenance is taste, and md has been thriving
on good-taste pragmatic decisions for a while now.

--
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Recovery of failed RAID 6 and LVM

am 30.09.2011 01:07:20 von NeilBrown

--Sig_/sC9nTMV+fobL+_jN61QbJN8
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable

On Thu, 29 Sep 2011 11:28:36 -0700 Dan Williams
wrote:

> On Wed, Sep 28, 2011 at 4:49 PM, NeilBrown wrote:
> > On Wed, 28 Sep 2011 11:12:08 -0500 Stan Hoeppner m>
> > wrote:
> >
> >> On 9/28/2011 2:10 AM, Marcin M. Jessa wrote:
> >> > On 9/28/11 4:50 AM, Stan Hoeppner wrote:
> >> >
> >> >> Reading the thread, and the many like it over the past months/years=
, may
> >> >> yield a clue as to why you wish to move on to something other than =
Linux
> >> >> RAID...
> >> >
> >> > :) I will give it another chance.
> >> > In case of failure FreeBSD and ZFS would be another option.
> >>
> >> I was responding to Neil's exhaustion with mdadm. =A0I was speculating
> >> that help threads such as yours may be a contributing factor,
> >> requesting/requiring Neil to become Superman many times per month to t=
ry
> >> to save some OP's bacon.
> >>
> >
> > No, I don't really think they are a factor - though thanks for thinking
> > about it.
> >
> > Obviously not all "help threads" end with a good result but quite a few=
do
> > and one has to take the rough with the smooth.
> > And each help thread is a potential learning experience. =A0If I see pa=
tterns
> > of failure recurring it will guide and motivate me to improve md or mda=
dm to
> > make that failure mode less likely.
> >
> > I think it is simply that it isn't new any more. =A0I first started
> > contributing to md early in 2000, and 11 years is a long time. =A0Not a=
s long
> > as Mr Torvalds has works on Linux of course, but Linux is a lot bigger =
than
> > md so there is more room to be interested.
> > There have been many highlights over that time, but the ones that stick=
in my
> > memory is when others have contributed in significant ways. =A0I really=
value
> > that, whether it is code, or review or documentation, or making a wiki =
or
> > answering mailing lists questions before I do, or even putting extra ti=
me in
> > to reproduce a bug so we can drill down to the cause.
> >
> > I figure that appearing competent capable and in control isn't going to
> > attract new blood - new blood wants wide open frontiers with lots of
> > opportunity (I started in md when it was essentially unmaintained - I k=
now
> > the attraction). =A0So I just want to say that there is certainly room =
and
> > opportunity over here.
> >
> > I'm not about to drop md, but I would love an apprentice or two (or 3 o=
r 4)
> > and would aim to provide the same mix of independence and oversight as =
Linus
> > does.
> >
>=20
> What if as a starting point we could get a Patchwork queue hosted
> somewhere so you could at least start formally delegating incoming
> patches for an apprentice to disposition?

I don't know much about Patchwork ... what sort of value does it add?

But I don't think much of the idea of delegation. I don't see a thriving
developer community full of people who want work delegated to them. Rather
I see a thriving developer community of people who see problems and want to
fix them and dive in and do stuff.
An apprentice who needs to have stuff delegated to them will always be an
apprentice. A master starts by doing the things their master doesn't want =
to
do, then moves to the things the master didn't think to do and finally
blossoms by doing the things their master didn't know how to do.

>=20
> The hardest part about maintenance is taste, and md has been thriving
> on good-taste pragmatic decisions for a while now.

Taste is learnt by practise. Having someone correct - or at least
highlight - your mistakes is important, but making the mistakes in the first
place is vital.


I think the starting point is simply to do. Read the code, ask a question,
suggest a design, send a patch, pick a task of the road-map (or make one up
your self) and start work on it.

NeilBrown

--Sig_/sC9nTMV+fobL+_jN61QbJN8
Content-Type: application/pgp-signature; name=signature.asc
Content-Disposition: attachment; filename=signature.asc

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)

iD8DBQFOhPooG5fc6gV+Wb0RAhEFAKDS2IRY7q4pq3zORDsmndPrhOLqYgCe MOJR
kAmWIXo8zQdf0a5Fllk2I0w=
=FGvI
-----END PGP SIGNATURE-----

--Sig_/sC9nTMV+fobL+_jN61QbJN8--
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Recovery of failed RAID 6 and LVM

am 30.09.2011 02:18:45 von dan.j.williams

On Thu, Sep 29, 2011 at 4:07 PM, NeilBrown wrote:
>> What if as a starting point we could get a Patchwork queue hosted
>> somewhere so you could at least start formally delegating incoming
>> patches for an apprentice to disposition?
>
> I don't know much about Patchwork ... what sort of value does it add?

It just makes things more transparent. It allows a submitter to have
a web interface to view the state of a patch: accepted, rejected,
under review. Allows a maintainer or a group of maintainers to see
the backlog and assign (delegate) patches between them. It also
automates the collection of Acked-by, Reviewed-by, etc tags.

> But I don't think much of the idea of delegation. =A0I don't see a th=
riving
> developer community full of people who want work delegated to them.

So I only meant "delegate" in the Patchwork parlance to make "who is
merging this md/mdadm patch" clear as the apprentice ramps up. But
this is probably too much mechanics.

It simply sounds like you want a similar situation like what happened
with git. I.e. a "Junio" to take over but you'll still be around to
course correct and send patches.

--
Dan
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Recovery of failed RAID 6 and LVM

am 30.09.2011 22:01:40 von lists

On 9/28/11 1:13 AM, NeilBrown wrote:
> On Tue, 27 Sep 2011 21:12:58 +0200 "Marcin M. Jessa" wrote:
>
>> On 9/26/11 11:31 AM, NeilBrown wrote:
>>> On Mon, 26 Sep 2011 11:05:38 +0200 "Marcin M. Jessa" wrote:
>>>
>>>> On 9/26/11 12:18 AM, NeilBrown wrote:
>>>>
>>>>> Do you remember what filesystem you had on 'storage'? Was it ext3 or ext4 or
>>>>> xfs or something else?
>>>>
>>>> You're giving me some hope here and then silence :)
>>>> Why did you ask about the file system? Should I run fsck on the LV ?
>>>>
>>>>
>>>>
>>>
>>> You already did run fsck on the LV. It basically said that it didn't
>>> recognise the filesystem at all.
>>> I asked in case maybe it was XFS in which case a different tool would be
>>> required.
>>
>> Looks like I didn't remember correctly. I ran testdisk and it reported
>> the file system to be XFS. What would you suggest now Neil?
>>
>>
>
> Presumably
> xfs_check /dev/fridge/storage
>
> and then maybe
> xfs_repair /dev/fridge/storage
>
> but I have no experience with XFS - I'm just reading man pages.
>

That didn't work so I decided to give photorec a spin and so far it
could find and recover lots of files [1].
Why the heck is photorec able to do so but the normal file system fixing
tools are just useless?


[1]:
Disk /dev/dm-0 - 5368 GB / 5000 GiB (RO)
Partition Start End Size in sectors
No partition 0 10485759999 10485760000 [Whole disk]


Pass 2 - Reading sector 1506591634/10485760000, 843 files found
Elapsed time 10h16m59s - Estimated time for achievement 61h17m10
txt: 445 recovered
exe: 198 recovered
mpg: 62 recovered
swf: 33 recovered
tx?: 33 recovered
gif: 25 recovered
gpg: 19 recovered
bmp: 5 recovered
gz: 4 recovered
riff: 4 recovered
others: 15 recovered



--

Marcin M. Jessa
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Recovery of failed RAID 6 and LVM

am 30.09.2011 23:47:06 von Thomas Fjellstrom

On September 30, 2011, Marcin M. Jessa wrote:
> On 9/28/11 1:13 AM, NeilBrown wrote:
> > On Tue, 27 Sep 2011 21:12:58 +0200 "Marcin M. Jessa"
wrote:
> >> On 9/26/11 11:31 AM, NeilBrown wrote:
> >>> On Mon, 26 Sep 2011 11:05:38 +0200 "Marcin M. Jessa"
wrote:
> >>>> On 9/26/11 12:18 AM, NeilBrown wrote:
> >>>>> Do you remember what filesystem you had on 'storage'? Was it ext3 or
> >>>>> ext4 or xfs or something else?
> >>>>
> >>>> You're giving me some hope here and then silence :)
> >>>> Why did you ask about the file system? Should I run fsck on the LV ?
> >>>
> >>> You already did run fsck on the LV. It basically said that it didn't
> >>> recognise the filesystem at all.
> >>> I asked in case maybe it was XFS in which case a different tool would
> >>> be required.
> >>
> >> Looks like I didn't remember correctly. I ran testdisk and it reported
> >> the file system to be XFS. What would you suggest now Neil?
> >
> > Presumably
> >
> > xfs_check /dev/fridge/storage
> >
> > and then maybe
> >
> > xfs_repair /dev/fridge/storage
> >
> > but I have no experience with XFS - I'm just reading man pages.
>
> That didn't work so I decided to give photorec a spin and so far it
> could find and recover lots of files [1].
> Why the heck is photorec able to do so but the normal file system fixing
> tools are just useless?


The file system metadata was trashed, and photorec looks at the data only,
looking for file headers and trying to pull out contiguous files.

>
> [1]:
> Disk /dev/dm-0 - 5368 GB / 5000 GiB (RO)
> Partition Start End Size in sectors
> No partition 0 10485759999 10485760000 [Whole disk]
>
>
> Pass 2 - Reading sector 1506591634/10485760000, 843 files found
> Elapsed time 10h16m59s - Estimated time for achievement 61h17m10
> txt: 445 recovered
> exe: 198 recovered
> mpg: 62 recovered
> swf: 33 recovered
> tx?: 33 recovered
> gif: 25 recovered
> gpg: 19 recovered
> bmp: 5 recovered
> gz: 4 recovered
> riff: 4 recovered
> others: 15 recovered


--
Thomas Fjellstrom
thomas@fjellstrom.ca
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Recovery of failed RAID 6 and LVM

am 01.10.2011 00:30:53 von lists

On 9/30/11 11:47 PM, Thomas Fjellstrom wrote:

> The file system metadata was trashed, and photorec looks at the data only,
> looking for file headers and trying to pull out contiguous files.

I'd pay for a tool if it only could restore directories and file names
as well...

--

Marcin M. Jessa
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Recovery of failed RAID 6 and LVM

am 05.10.2011 04:15:30 von NeilBrown

--Sig_/V3t2TztuR6HN8yPS_pR195l
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: quoted-printable

On Thu, 29 Sep 2011 17:18:45 -0700 "Williams, Dan J"
wrote:


> It simply sounds like you want a similar situation like what happened
> with git. I.e. a "Junio" to take over but you'll still be around to
> course correct and send patches.

That isn't necessary in the first instance. It could possibly (hopefully)
reach that stage, but taking over as maintainer is a big ask and not
something to be expected in a hurry.
I'm happy to start small.

NeilBrown

--Sig_/V3t2TztuR6HN8yPS_pR195l
Content-Type: application/pgp-signature; name=signature.asc
Content-Disposition: attachment; filename=signature.asc

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v2.0.18 (GNU/Linux)

iD8DBQFOi73CG5fc6gV+Wb0RAs7NAJ95iafQQ1mjHoZ5L/UtOQ99166F1QCe NZ0D
d32Zu1iabjc322erlAKJ0mA=
=f3z8
-----END PGP SIGNATURE-----

--Sig_/V3t2TztuR6HN8yPS_pR195l--
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html