Crash during raid6 reshape, now cannot restart?

Crash during raid6 reshape, now cannot restart?

am 10.12.2010 18:05:47 von Phil Genera

--0016e64cbe0a47b4b10497115fd9
Content-Type: text/plain; charset=ISO-8859-1

I had a power failure during a large raid6 reshape (6->8 disks) on one
of my arm systems last night, and can't seem to get it going again.

I did this:
# mdadm --grow --backup-file=./backup.mdadm --array-size=8 /dev/md0

which (I've now noticed) didn't seem to write a backup file. There was
a read error during the reshape, but it claimed recovery:
Dec 9 20:48:07 love kernel: sd 2:0:0:0: [sda] Unhandled sense code
Dec 9 20:48:07 love kernel: sd 2:0:0:0: [sda] Result: hostbyte=DID_OK
driverbyte=DRIVER_SENSE
Dec 9 20:48:07 love kernel: sd 2:0:0:0: [sda] Sense Key : Medium
Error [current]
Dec 9 20:48:07 love kernel: sd 2:0:0:0: [sda] Add. Sense: Unrecovered
read error
Dec 9 20:48:07 love kernel: sd 2:0:0:0: [sda] CDB: Read(10): 28 00 00
02 09 60 00 00 20 00
Dec 9 20:48:07 love kernel: end_request: I/O error, dev sda, sector 133472
Dec 9 20:48:08 love kernel: raid5:md0: read error corrected (8
sectors at 133472 on sda)
Dec 9 20:48:08 love kernel: raid5:md0: read error corrected (8
sectors at 133480 on sda)
Dec 9 20:48:08 love kernel: raid5:md0: read error corrected (8
sectors at 133488 on sda)
Dec 9 20:48:08 love kernel: raid5:md0: read error corrected (8
sectors at 133496 on sda)

Some time during the night, the electricity went away, and on reboot I get this:

raid5: reshape_position too early for auto-recovery - aborting.

as well as when I try to assemble the array manually. There's nothing
critical I don't have backed up, but there's a lot of TV on there I
was planning to watch :).

Any good ideas? I'd sure appreciate some help. I'm guessing this is
just a crash in the critical section, and without a backup file I'm
screwed. I'm surprised the backup file is still needed 200gb into the
reshape though. Thanks!


Versions & status:

# cat /proc/mdstat
Personalities : [raid1] [raid6] [raid5] [raid4]
md0 : inactive sdg[0] sdj[7] sdi[6] sdf[5] sde[4] sdd[3] sdc[2] sdh[1]
3125690368 blocks super 0.91

# uname -a
Linux love 2.6.32-5-kirkwood #1 Sun Oct 31 11:19:32 UTC 2010 armv5tel GNU/Linux
# mdadm --version
mdadm - v3.1.4 - 31st August 2010


More details (and --examine of all disks attached):

# mdadm --detail /dev/md0
/dev/md0:
Version : 0.91
Creation Time : Fri Oct 9 09:32:08 2009
Raid Level : raid6
Used Dev Size : 390711296 (372.61 GiB 400.09 GB)
Raid Devices : 8
Total Devices : 8
Preferred Minor : 0
Persistence : Superblock is persistent

Update Time : Fri Dec 10 05:52:35 2010
State : active, Not Started
Active Devices : 8
Working Devices : 8
Failed Devices : 0
Spare Devices : 0

Layout : left-symmetric
Chunk Size : 64K

Delta Devices : 2, (6->8)

UUID : 81ddccd8:5abf5b03:181548d9:47e92625
Events : 0.1048248

Number Major Minor RaidDevice State
0 8 96 0 active sync /dev/sdg
1 8 112 1 active sync /dev/sdh
2 8 32 2 active sync /dev/sdc
3 8 48 3 active sync /dev/sdd
4 8 64 4 active sync /dev/sde
5 8 80 5 active sync /dev/sdf
6 8 128 6 active sync /dev/sdi
7 8 144 7 active sync /dev/sdj

--
Phil

--0016e64cbe0a47b4b10497115fd9
Content-Type: text/plain; charset=US-ASCII; name="examine.txt"
Content-Disposition: attachment; filename="examine.txt"
Content-Transfer-Encoding: base64
X-Attachment-Id: f_ghjbi5zm0

L2Rldi9zZGM6CiAgICAgICAgICBNYWdpYyA6IGE5MmI0ZWZjCiAgICAgICAg VmVyc2lvbiA6IDAu
OTEuMDAKICAgICAgICAgICBVVUlEIDogODFkZGNjZDg6NWFiZjViMDM6MTgx NTQ4ZDk6NDdlOTI2
MjUKICBDcmVhdGlvbiBUaW1lIDogRnJpIE9jdCAgOSAwOTozMjowOCAyMDA5 CiAgICAgUmFpZCBM
ZXZlbCA6IHJhaWQ2CiAgVXNlZCBEZXYgU2l6ZSA6IDM5MDcxMTI5NiAoMzcy LjYxIEdpQiA0MDAu
MDkgR0IpCiAgICAgQXJyYXkgU2l6ZSA6IDIzNDQyNjc3NzYgKDIyMzUuNjcg R2lCIDI0MDAuNTMg
R0IpCiAgIFJhaWQgRGV2aWNlcyA6IDgKICBUb3RhbCBEZXZpY2VzIDogOApQ cmVmZXJyZWQgTWlu
b3IgOiAwCgogIFJlc2hhcGUgcG9zJ24gOiAzMDY2ODA4MzIgKDI5Mi40NyBH aUIgMzE0LjA0IEdC
KQogIERlbHRhIERldmljZXMgOiAyICg2LT44KQoKICAgIFVwZGF0ZSBUaW1l IDogRnJpIERlYyAx
MCAwNTo1MjozNSAyMDEwCiAgICAgICAgICBTdGF0ZSA6IGNsZWFuCiBBY3Rp dmUgRGV2aWNlcyA6
IDgKV29ya2luZyBEZXZpY2VzIDogOAogRmFpbGVkIERldmljZXMgOiAwCiAg U3BhcmUgRGV2aWNl
cyA6IDAKICAgICAgIENoZWNrc3VtIDogYjk5MzZjN2EgLSBjb3JyZWN0CiAg ICAgICAgIEV2ZW50
cyA6IDEwNDgyNDgKCiAgICAgICAgIExheW91dCA6IGxlZnQtc3ltbWV0cmlj CiAgICAgQ2h1bmsg
U2l6ZSA6IDY0SwoKICAgICAgTnVtYmVyICAgTWFqb3IgICBNaW5vciAgIFJh aWREZXZpY2UgU3Rh
dGUKdGhpcyAgICAgMiAgICAgICA4ICAgICAgICAwICAgICAgICAyICAgICAg YWN0aXZlIHN5bmMg
ICAvZGV2L3NkYQoKICAgMCAgICAgMCAgICAgICA4ICAgICAgIDY0ICAgICAg ICAwICAgICAgYWN0
aXZlIHN5bmMgICAvZGV2L3NkZQogICAxICAgICAxICAgICAgIDggICAgICAg ODAgICAgICAgIDEg
ICAgICBhY3RpdmUgc3luYyAgIC9kZXYvc2RmCiAgIDIgICAgIDIgICAgICAg OCAgICAgICAgMCAg
ICAgICAgMiAgICAgIGFjdGl2ZSBzeW5jICAgL2Rldi9zZGEKICAgMyAgICAg MyAgICAgICA4ICAg
ICAgIDE2ICAgICAgICAzICAgICAgYWN0aXZlIHN5bmMgICAvZGV2L3NkYgog ICA0ICAgICA0ICAg
ICAgIDggICAgICAgMzIgICAgICAgIDQgICAgICBhY3RpdmUgc3luYyAgIC9k ZXYvc2RjCiAgIDUg
ICAgIDUgICAgICAgOCAgICAgICA0OCAgICAgICAgNSAgICAgIGFjdGl2ZSBz eW5jICAgL2Rldi9z
ZGQKICAgNiAgICAgNiAgICAgICA4ICAgICAgIDk2ICAgICAgICA2ICAgICAg YWN0aXZlIHN5bmMg
ICAvZGV2L3NkZwogICA3ICAgICA3ICAgICAgIDggICAgICAxMTIgICAgICAg IDcgICAgICBhY3Rp
dmUgc3luYyAgIC9kZXYvc2RoCi9kZXYvc2RkOgogICAgICAgICAgTWFnaWMg OiBhOTJiNGVmYwog
ICAgICAgIFZlcnNpb24gOiAwLjkxLjAwCiAgICAgICAgICAgVVVJRCA6IDgx ZGRjY2Q4OjVhYmY1
YjAzOjE4MTU0OGQ5OjQ3ZTkyNjI1CiAgQ3JlYXRpb24gVGltZSA6IEZyaSBP Y3QgIDkgMDk6MzI6
MDggMjAwOQogICAgIFJhaWQgTGV2ZWwgOiByYWlkNgogIFVzZWQgRGV2IFNp emUgOiAzOTA3MTEy
OTYgKDM3Mi42MSBHaUIgNDAwLjA5IEdCKQogICAgIEFycmF5IFNpemUgOiAy MzQ0MjY3Nzc2ICgy
MjM1LjY3IEdpQiAyNDAwLjUzIEdCKQogICBSYWlkIERldmljZXMgOiA4CiAg VG90YWwgRGV2aWNl
cyA6IDgKUHJlZmVycmVkIE1pbm9yIDogMAoKICBSZXNoYXBlIHBvcyduIDog MzA2NjgwODMyICgy
OTIuNDcgR2lCIDMxNC4wNCBHQikKICBEZWx0YSBEZXZpY2VzIDogMiAoNi0+ OCkKCiAgICBVcGRh
dGUgVGltZSA6IEZyaSBEZWMgMTAgMDU6NTI6MzUgMjAxMAogICAgICAgICAg U3RhdGUgOiBjbGVh
bgogQWN0aXZlIERldmljZXMgOiA4CldvcmtpbmcgRGV2aWNlcyA6IDgKIEZh aWxlZCBEZXZpY2Vz
IDogMAogIFNwYXJlIERldmljZXMgOiAwCiAgICAgICBDaGVja3N1bSA6IGI5 OTM2YzhjIC0gY29y
cmVjdAogICAgICAgICBFdmVudHMgOiAxMDQ4MjQ4CgogICAgICAgICBMYXlv dXQgOiBsZWZ0LXN5
bW1ldHJpYwogICAgIENodW5rIFNpemUgOiA2NEsKCiAgICAgIE51bWJlciAg IE1ham9yICAgTWlu
b3IgICBSYWlkRGV2aWNlIFN0YXRlCnRoaXMgICAgIDMgICAgICAgOCAgICAg ICAxNiAgICAgICAg
MyAgICAgIGFjdGl2ZSBzeW5jICAgL2Rldi9zZGIKCiAgIDAgICAgIDAgICAg ICAgOCAgICAgICA2
NCAgICAgICAgMCAgICAgIGFjdGl2ZSBzeW5jICAgL2Rldi9zZGUKICAgMSAg ICAgMSAgICAgICA4
ICAgICAgIDgwICAgICAgICAxICAgICAgYWN0aXZlIHN5bmMgICAvZGV2L3Nk ZgogICAyICAgICAy
ICAgICAgIDggICAgICAgIDAgICAgICAgIDIgICAgICBhY3RpdmUgc3luYyAg IC9kZXYvc2RhCiAg
IDMgICAgIDMgICAgICAgOCAgICAgICAxNiAgICAgICAgMyAgICAgIGFjdGl2 ZSBzeW5jICAgL2Rl
di9zZGIKICAgNCAgICAgNCAgICAgICA4ICAgICAgIDMyICAgICAgICA0ICAg ICAgYWN0aXZlIHN5
bmMgICAvZGV2L3NkYwogICA1ICAgICA1ICAgICAgIDggICAgICAgNDggICAg ICAgIDUgICAgICBh
Y3RpdmUgc3luYyAgIC9kZXYvc2RkCiAgIDYgICAgIDYgICAgICAgOCAgICAg ICA5NiAgICAgICAg
NiAgICAgIGFjdGl2ZSBzeW5jICAgL2Rldi9zZGcKICAgNyAgICAgNyAgICAg ICA4ICAgICAgMTEy
ICAgICAgICA3ICAgICAgYWN0aXZlIHN5bmMgICAvZGV2L3NkaAovZGV2L3Nk ZToKICAgICAgICAg
IE1hZ2ljIDogYTkyYjRlZmMKICAgICAgICBWZXJzaW9uIDogMC45MS4wMAog ICAgICAgICAgIFVV
SUQgOiA4MWRkY2NkODo1YWJmNWIwMzoxODE1NDhkOTo0N2U5MjYyNQogIENy ZWF0aW9uIFRpbWUg
OiBGcmkgT2N0ICA5IDA5OjMyOjA4IDIwMDkKICAgICBSYWlkIExldmVsIDog cmFpZDYKICBVc2Vk
IERldiBTaXplIDogMzkwNzExMjk2ICgzNzIuNjEgR2lCIDQwMC4wOSBHQikK ICAgICBBcnJheSBT
aXplIDogMjM0NDI2Nzc3NiAoMjIzNS42NyBHaUIgMjQwMC41MyBHQikKICAg UmFpZCBEZXZpY2Vz
IDogOAogIFRvdGFsIERldmljZXMgOiA4ClByZWZlcnJlZCBNaW5vciA6IDAK CiAgUmVzaGFwZSBw
b3MnbiA6IDMwNjY4MDgzMiAoMjkyLjQ3IEdpQiAzMTQuMDQgR0IpCiAgRGVs dGEgRGV2aWNlcyA6
IDIgKDYtPjgpCgogICAgVXBkYXRlIFRpbWUgOiBGcmkgRGVjIDEwIDA1OjUy OjM1IDIwMTAKICAg
ICAgICAgIFN0YXRlIDogY2xlYW4KIEFjdGl2ZSBEZXZpY2VzIDogOApXb3Jr aW5nIERldmljZXMg
OiA4CiBGYWlsZWQgRGV2aWNlcyA6IDAKICBTcGFyZSBEZXZpY2VzIDogMAog ICAgICAgQ2hlY2tz
dW0gOiBiOTkzNmM5ZSAtIGNvcnJlY3QKICAgICAgICAgRXZlbnRzIDogMTA0 ODI0OAoKICAgICAg
ICAgTGF5b3V0IDogbGVmdC1zeW1tZXRyaWMKICAgICBDaHVuayBTaXplIDog NjRLCgogICAgICBO
dW1iZXIgICBNYWpvciAgIE1pbm9yICAgUmFpZERldmljZSBTdGF0ZQp0aGlz ICAgICA0ICAgICAg
IDggICAgICAgMzIgICAgICAgIDQgICAgICBhY3RpdmUgc3luYyAgIC9kZXYv c2RjCgogICAwICAg
ICAwICAgICAgIDggICAgICAgNjQgICAgICAgIDAgICAgICBhY3RpdmUgc3lu YyAgIC9kZXYvc2Rl
CiAgIDEgICAgIDEgICAgICAgOCAgICAgICA4MCAgICAgICAgMSAgICAgIGFj dGl2ZSBzeW5jICAg
L2Rldi9zZGYKICAgMiAgICAgMiAgICAgICA4ICAgICAgICAwICAgICAgICAy ICAgICAgYWN0aXZl
IHN5bmMgICAvZGV2L3NkYQogICAzICAgICAzICAgICAgIDggICAgICAgMTYg ICAgICAgIDMgICAg
ICBhY3RpdmUgc3luYyAgIC9kZXYvc2RiCiAgIDQgICAgIDQgICAgICAgOCAg ICAgICAzMiAgICAg
ICAgNCAgICAgIGFjdGl2ZSBzeW5jICAgL2Rldi9zZGMKICAgNSAgICAgNSAg ICAgICA4ICAgICAg
IDQ4ICAgICAgICA1ICAgICAgYWN0aXZlIHN5bmMgICAvZGV2L3NkZAogICA2 ICAgICA2ICAgICAg
IDggICAgICAgOTYgICAgICAgIDYgICAgICBhY3RpdmUgc3luYyAgIC9kZXYv c2RnCiAgIDcgICAg
IDcgICAgICAgOCAgICAgIDExMiAgICAgICAgNyAgICAgIGFjdGl2ZSBzeW5j ICAgL2Rldi9zZGgK
L2Rldi9zZGY6CiAgICAgICAgICBNYWdpYyA6IGE5MmI0ZWZjCiAgICAgICAg VmVyc2lvbiA6IDAu
OTEuMDAKICAgICAgICAgICBVVUlEIDogODFkZGNjZDg6NWFiZjViMDM6MTgx NTQ4ZDk6NDdlOTI2
MjUKICBDcmVhdGlvbiBUaW1lIDogRnJpIE9jdCAgOSAwOTozMjowOCAyMDA5 CiAgICAgUmFpZCBM
ZXZlbCA6IHJhaWQ2CiAgVXNlZCBEZXYgU2l6ZSA6IDM5MDcxMTI5NiAoMzcy LjYxIEdpQiA0MDAu
MDkgR0IpCiAgICAgQXJyYXkgU2l6ZSA6IDIzNDQyNjc3NzYgKDIyMzUuNjcg R2lCIDI0MDAuNTMg
R0IpCiAgIFJhaWQgRGV2aWNlcyA6IDgKICBUb3RhbCBEZXZpY2VzIDogOApQ cmVmZXJyZWQgTWlu
b3IgOiAwCgogIFJlc2hhcGUgcG9zJ24gOiAzMDY2ODA4MzIgKDI5Mi40NyBH aUIgMzE0LjA0IEdC
KQogIERlbHRhIERldmljZXMgOiAyICg2LT44KQoKICAgIFVwZGF0ZSBUaW1l IDogRnJpIERlYyAx
MCAwNTo1MjozNSAyMDEwCiAgICAgICAgICBTdGF0ZSA6IGNsZWFuCiBBY3Rp dmUgRGV2aWNlcyA6
IDgKV29ya2luZyBEZXZpY2VzIDogOAogRmFpbGVkIERldmljZXMgOiAwCiAg U3BhcmUgRGV2aWNl
cyA6IDAKICAgICAgIENoZWNrc3VtIDogYjk5MzZjYjAgLSBjb3JyZWN0CiAg ICAgICAgIEV2ZW50
cyA6IDEwNDgyNDgKCiAgICAgICAgIExheW91dCA6IGxlZnQtc3ltbWV0cmlj CiAgICAgQ2h1bmsg
U2l6ZSA6IDY0SwoKICAgICAgTnVtYmVyICAgTWFqb3IgICBNaW5vciAgIFJh aWREZXZpY2UgU3Rh
dGUKdGhpcyAgICAgNSAgICAgICA4ICAgICAgIDQ4ICAgICAgICA1ICAgICAg YWN0aXZlIHN5bmMg
ICAvZGV2L3NkZAoKICAgMCAgICAgMCAgICAgICA4ICAgICAgIDY0ICAgICAg ICAwICAgICAgYWN0
aXZlIHN5bmMgICAvZGV2L3NkZQogICAxICAgICAxICAgICAgIDggICAgICAg ODAgICAgICAgIDEg
ICAgICBhY3RpdmUgc3luYyAgIC9kZXYvc2RmCiAgIDIgICAgIDIgICAgICAg OCAgICAgICAgMCAg
ICAgICAgMiAgICAgIGFjdGl2ZSBzeW5jICAgL2Rldi9zZGEKICAgMyAgICAg MyAgICAgICA4ICAg
ICAgIDE2ICAgICAgICAzICAgICAgYWN0aXZlIHN5bmMgICAvZGV2L3NkYgog ICA0ICAgICA0ICAg
ICAgIDggICAgICAgMzIgICAgICAgIDQgICAgICBhY3RpdmUgc3luYyAgIC9k ZXYvc2RjCiAgIDUg
ICAgIDUgICAgICAgOCAgICAgICA0OCAgICAgICAgNSAgICAgIGFjdGl2ZSBz eW5jICAgL2Rldi9z
ZGQKICAgNiAgICAgNiAgICAgICA4ICAgICAgIDk2ICAgICAgICA2ICAgICAg YWN0aXZlIHN5bmMg
ICAvZGV2L3NkZwogICA3ICAgICA3ICAgICAgIDggICAgICAxMTIgICAgICAg IDcgICAgICBhY3Rp
dmUgc3luYyAgIC9kZXYvc2RoCi9kZXYvc2RnOgogICAgICAgICAgTWFnaWMg OiBhOTJiNGVmYwog
ICAgICAgIFZlcnNpb24gOiAwLjkxLjAwCiAgICAgICAgICAgVVVJRCA6IDgx ZGRjY2Q4OjVhYmY1
YjAzOjE4MTU0OGQ5OjQ3ZTkyNjI1CiAgQ3JlYXRpb24gVGltZSA6IEZyaSBP Y3QgIDkgMDk6MzI6
MDggMjAwOQogICAgIFJhaWQgTGV2ZWwgOiByYWlkNgogIFVzZWQgRGV2IFNp emUgOiAzOTA3MTEy
OTYgKDM3Mi42MSBHaUIgNDAwLjA5IEdCKQogICAgIEFycmF5IFNpemUgOiAy MzQ0MjY3Nzc2ICgy
MjM1LjY3IEdpQiAyNDAwLjUzIEdCKQogICBSYWlkIERldmljZXMgOiA4CiAg VG90YWwgRGV2aWNl
cyA6IDgKUHJlZmVycmVkIE1pbm9yIDogMAoKICBSZXNoYXBlIHBvcyduIDog MzA2NjgwODMyICgy
OTIuNDcgR2lCIDMxNC4wNCBHQikKICBEZWx0YSBEZXZpY2VzIDogMiAoNi0+ OCkKCiAgICBVcGRh
dGUgVGltZSA6IEZyaSBEZWMgMTAgMDU6NTI6MzUgMjAxMAogICAgICAgICAg U3RhdGUgOiBjbGVh
bgogQWN0aXZlIERldmljZXMgOiA4CldvcmtpbmcgRGV2aWNlcyA6IDgKIEZh aWxlZCBEZXZpY2Vz
IDogMAogIFNwYXJlIERldmljZXMgOiAwCiAgICAgICBDaGVja3N1bSA6IGI5 OTM2Y2I2IC0gY29y
cmVjdAogICAgICAgICBFdmVudHMgOiAxMDQ4MjQ4CgogICAgICAgICBMYXlv dXQgOiBsZWZ0LXN5
bW1ldHJpYwogICAgIENodW5rIFNpemUgOiA2NEsKCiAgICAgIE51bWJlciAg IE1ham9yICAgTWlu
b3IgICBSYWlkRGV2aWNlIFN0YXRlCnRoaXMgICAgIDAgICAgICAgOCAgICAg ICA2NCAgICAgICAg
MCAgICAgIGFjdGl2ZSBzeW5jICAgL2Rldi9zZGUKCiAgIDAgICAgIDAgICAg ICAgOCAgICAgICA2
NCAgICAgICAgMCAgICAgIGFjdGl2ZSBzeW5jICAgL2Rldi9zZGUKICAgMSAg ICAgMSAgICAgICA4
ICAgICAgIDgwICAgICAgICAxICAgICAgYWN0aXZlIHN5bmMgICAvZGV2L3Nk ZgogICAyICAgICAy
ICAgICAgIDggICAgICAgIDAgICAgICAgIDIgICAgICBhY3RpdmUgc3luYyAg IC9kZXYvc2RhCiAg
IDMgICAgIDMgICAgICAgOCAgICAgICAxNiAgICAgICAgMyAgICAgIGFjdGl2 ZSBzeW5jICAgL2Rl
di9zZGIKICAgNCAgICAgNCAgICAgICA4ICAgICAgIDMyICAgICAgICA0ICAg ICAgYWN0aXZlIHN5
bmMgICAvZGV2L3NkYwogICA1ICAgICA1ICAgICAgIDggICAgICAgNDggICAg ICAgIDUgICAgICBh
Y3RpdmUgc3luYyAgIC9kZXYvc2RkCiAgIDYgICAgIDYgICAgICAgOCAgICAg ICA5NiAgICAgICAg
NiAgICAgIGFjdGl2ZSBzeW5jICAgL2Rldi9zZGcKICAgNyAgICAgNyAgICAg ICA4ICAgICAgMTEy
ICAgICAgICA3ICAgICAgYWN0aXZlIHN5bmMgICAvZGV2L3NkaAovZGV2L3Nk aDoKICAgICAgICAg
IE1hZ2ljIDogYTkyYjRlZmMKICAgICAgICBWZXJzaW9uIDogMC45MS4wMAog ICAgICAgICAgIFVV
SUQgOiA4MWRkY2NkODo1YWJmNWIwMzoxODE1NDhkOTo0N2U5MjYyNQogIENy ZWF0aW9uIFRpbWUg
OiBGcmkgT2N0ICA5IDA5OjMyOjA4IDIwMDkKICAgICBSYWlkIExldmVsIDog cmFpZDYKICBVc2Vk
IERldiBTaXplIDogMzkwNzExMjk2ICgzNzIuNjEgR2lCIDQwMC4wOSBHQikK ICAgICBBcnJheSBT
aXplIDogMjM0NDI2Nzc3NiAoMjIzNS42NyBHaUIgMjQwMC41MyBHQikKICAg UmFpZCBEZXZpY2Vz
IDogOAogIFRvdGFsIERldmljZXMgOiA4ClByZWZlcnJlZCBNaW5vciA6IDAK CiAgUmVzaGFwZSBw
b3MnbiA6IDMwNjY4MDgzMiAoMjkyLjQ3IEdpQiAzMTQuMDQgR0IpCiAgRGVs dGEgRGV2aWNlcyA6
IDIgKDYtPjgpCgogICAgVXBkYXRlIFRpbWUgOiBGcmkgRGVjIDEwIDA1OjUy OjM1IDIwMTAKICAg
ICAgICAgIFN0YXRlIDogY2xlYW4KIEFjdGl2ZSBEZXZpY2VzIDogOApXb3Jr aW5nIERldmljZXMg
OiA4CiBGYWlsZWQgRGV2aWNlcyA6IDAKICBTcGFyZSBEZXZpY2VzIDogMAog ICAgICAgQ2hlY2tz
dW0gOiBiOTkzNmNjOCAtIGNvcnJlY3QKICAgICAgICAgRXZlbnRzIDogMTA0 ODI0OAoKICAgICAg
ICAgTGF5b3V0IDogbGVmdC1zeW1tZXRyaWMKICAgICBDaHVuayBTaXplIDog NjRLCgogICAgICBO
dW1iZXIgICBNYWpvciAgIE1pbm9yICAgUmFpZERldmljZSBTdGF0ZQp0aGlz ICAgICAxICAgICAg
IDggICAgICAgODAgICAgICAgIDEgICAgICBhY3RpdmUgc3luYyAgIC9kZXYv c2RmCgogICAwICAg
ICAwICAgICAgIDggICAgICAgNjQgICAgICAgIDAgICAgICBhY3RpdmUgc3lu YyAgIC9kZXYvc2Rl
CiAgIDEgICAgIDEgICAgICAgOCAgICAgICA4MCAgICAgICAgMSAgICAgIGFj dGl2ZSBzeW5jICAg
L2Rldi9zZGYKICAgMiAgICAgMiAgICAgICA4ICAgICAgICAwICAgICAgICAy ICAgICAgYWN0aXZl
IHN5bmMgICAvZGV2L3NkYQogICAzICAgICAzICAgICAgIDggICAgICAgMTYg ICAgICAgIDMgICAg
ICBhY3RpdmUgc3luYyAgIC9kZXYvc2RiCiAgIDQgICAgIDQgICAgICAgOCAg ICAgICAzMiAgICAg
ICAgNCAgICAgIGFjdGl2ZSBzeW5jICAgL2Rldi9zZGMKICAgNSAgICAgNSAg ICAgICA4ICAgICAg
IDQ4ICAgICAgICA1ICAgICAgYWN0aXZlIHN5bmMgICAvZGV2L3NkZAogICA2 ICAgICA2ICAgICAg
IDggICAgICAgOTYgICAgICAgIDYgICAgICBhY3RpdmUgc3luYyAgIC9kZXYv c2RnCiAgIDcgICAg
IDcgICAgICAgOCAgICAgIDExMiAgICAgICAgNyAgICAgIGFjdGl2ZSBzeW5j ICAgL2Rldi9zZGgK
L2Rldi9zZGk6CiAgICAgICAgICBNYWdpYyA6IGE5MmI0ZWZjCiAgICAgICAg VmVyc2lvbiA6IDAu
OTEuMDAKICAgICAgICAgICBVVUlEIDogODFkZGNjZDg6NWFiZjViMDM6MTgx NTQ4ZDk6NDdlOTI2
MjUKICBDcmVhdGlvbiBUaW1lIDogRnJpIE9jdCAgOSAwOTozMjowOCAyMDA5 CiAgICAgUmFpZCBM
ZXZlbCA6IHJhaWQ2CiAgVXNlZCBEZXYgU2l6ZSA6IDM5MDcxMTI5NiAoMzcy LjYxIEdpQiA0MDAu
MDkgR0IpCiAgICAgQXJyYXkgU2l6ZSA6IDIzNDQyNjc3NzYgKDIyMzUuNjcg R2lCIDI0MDAuNTMg
R0IpCiAgIFJhaWQgRGV2aWNlcyA6IDgKICBUb3RhbCBEZXZpY2VzIDogOApQ cmVmZXJyZWQgTWlu
b3IgOiAwCgogIFJlc2hhcGUgcG9zJ24gOiAzMDY2ODA4MzIgKDI5Mi40NyBH aUIgMzE0LjA0IEdC
KQogIERlbHRhIERldmljZXMgOiAyICg2LT44KQoKICAgIFVwZGF0ZSBUaW1l IDogRnJpIERlYyAx
MCAwNTo1MjozNSAyMDEwCiAgICAgICAgICBTdGF0ZSA6IGNsZWFuCiBBY3Rp dmUgRGV2aWNlcyA6
IDgKV29ya2luZyBEZXZpY2VzIDogOAogRmFpbGVkIERldmljZXMgOiAwCiAg U3BhcmUgRGV2aWNl
cyA6IDAKICAgICAgIENoZWNrc3VtIDogYjk5MzZjZTIgLSBjb3JyZWN0CiAg ICAgICAgIEV2ZW50
cyA6IDEwNDgyNDgKCiAgICAgICAgIExheW91dCA6IGxlZnQtc3ltbWV0cmlj CiAgICAgQ2h1bmsg
U2l6ZSA6IDY0SwoKICAgICAgTnVtYmVyICAgTWFqb3IgICBNaW5vciAgIFJh aWREZXZpY2UgU3Rh
dGUKdGhpcyAgICAgNiAgICAgICA4ICAgICAgIDk2ICAgICAgICA2ICAgICAg YWN0aXZlIHN5bmMg
ICAvZGV2L3NkZwoKICAgMCAgICAgMCAgICAgICA4ICAgICAgIDY0ICAgICAg ICAwICAgICAgYWN0
aXZlIHN5bmMgICAvZGV2L3NkZQogICAxICAgICAxICAgICAgIDggICAgICAg ODAgICAgICAgIDEg
ICAgICBhY3RpdmUgc3luYyAgIC9kZXYvc2RmCiAgIDIgICAgIDIgICAgICAg OCAgICAgICAgMCAg
ICAgICAgMiAgICAgIGFjdGl2ZSBzeW5jICAgL2Rldi9zZGEKICAgMyAgICAg MyAgICAgICA4ICAg
ICAgIDE2ICAgICAgICAzICAgICAgYWN0aXZlIHN5bmMgICAvZGV2L3NkYgog ICA0ICAgICA0ICAg
ICAgIDggICAgICAgMzIgICAgICAgIDQgICAgICBhY3RpdmUgc3luYyAgIC9k ZXYvc2RjCiAgIDUg
ICAgIDUgICAgICAgOCAgICAgICA0OCAgICAgICAgNSAgICAgIGFjdGl2ZSBz eW5jICAgL2Rldi9z
ZGQKICAgNiAgICAgNiAgICAgICA4ICAgICAgIDk2ICAgICAgICA2ICAgICAg YWN0aXZlIHN5bmMg
ICAvZGV2L3NkZwogICA3ICAgICA3ICAgICAgIDggICAgICAxMTIgICAgICAg IDcgICAgICBhY3Rp
dmUgc3luYyAgIC9kZXYvc2RoCi9kZXYvc2RqOgogICAgICAgICAgTWFnaWMg OiBhOTJiNGVmYwog
ICAgICAgIFZlcnNpb24gOiAwLjkxLjAwCiAgICAgICAgICAgVVVJRCA6IDgx ZGRjY2Q4OjVhYmY1
YjAzOjE4MTU0OGQ5OjQ3ZTkyNjI1CiAgQ3JlYXRpb24gVGltZSA6IEZyaSBP Y3QgIDkgMDk6MzI6
MDggMjAwOQogICAgIFJhaWQgTGV2ZWwgOiByYWlkNgogIFVzZWQgRGV2IFNp emUgOiAzOTA3MTEy
OTYgKDM3Mi42MSBHaUIgNDAwLjA5IEdCKQogICAgIEFycmF5IFNpemUgOiAy MzQ0MjY3Nzc2ICgy
MjM1LjY3IEdpQiAyNDAwLjUzIEdCKQogICBSYWlkIERldmljZXMgOiA4CiAg VG90YWwgRGV2aWNl
cyA6IDgKUHJlZmVycmVkIE1pbm9yIDogMAoKICBSZXNoYXBlIHBvcyduIDog MzA2NjgwODMyICgy
OTIuNDcgR2lCIDMxNC4wNCBHQikKICBEZWx0YSBEZXZpY2VzIDogMiAoNi0+ OCkKCiAgICBVcGRh
dGUgVGltZSA6IEZyaSBEZWMgMTAgMDU6NTI6MzUgMjAxMAogICAgICAgICAg U3RhdGUgOiBjbGVh
bgogQWN0aXZlIERldmljZXMgOiA4CldvcmtpbmcgRGV2aWNlcyA6IDgKIEZh aWxlZCBEZXZpY2Vz
IDogMAogIFNwYXJlIERldmljZXMgOiAwCiAgICAgICBDaGVja3N1bSA6IGI5 OTM2Y2Y0IC0gY29y
cmVjdAogICAgICAgICBFdmVudHMgOiAxMDQ4MjQ4CgogICAgICAgICBMYXlv dXQgOiBsZWZ0LXN5
bW1ldHJpYwogICAgIENodW5rIFNpemUgOiA2NEsKCiAgICAgIE51bWJlciAg IE1ham9yICAgTWlu
b3IgICBSYWlkRGV2aWNlIFN0YXRlCnRoaXMgICAgIDcgICAgICAgOCAgICAg IDExMiAgICAgICAg
NyAgICAgIGFjdGl2ZSBzeW5jICAgL2Rldi9zZGgKCiAgIDAgICAgIDAgICAg ICAgOCAgICAgICA2
NCAgICAgICAgMCAgICAgIGFjdGl2ZSBzeW5jICAgL2Rldi9zZGUKICAgMSAg ICAgMSAgICAgICA4
ICAgICAgIDgwICAgICAgICAxICAgICAgYWN0aXZlIHN5bmMgICAvZGV2L3Nk ZgogICAyICAgICAy
ICAgICAgIDggICAgICAgIDAgICAgICAgIDIgICAgICBhY3RpdmUgc3luYyAg IC9kZXYvc2RhCiAg
IDMgICAgIDMgICAgICAgOCAgICAgICAxNiAgICAgICAgMyAgICAgIGFjdGl2 ZSBzeW5jICAgL2Rl
di9zZGIKICAgNCAgICAgNCAgICAgICA4ICAgICAgIDMyICAgICAgICA0ICAg ICAgYWN0aXZlIHN5
bmMgICAvZGV2L3NkYwogICA1ICAgICA1ICAgICAgIDggICAgICAgNDggICAg ICAgIDUgICAgICBh
Y3RpdmUgc3luYyAgIC9kZXYvc2RkCiAgIDYgICAgIDYgICAgICAgOCAgICAg ICA5NiAgICAgICAg
NiAgICAgIGFjdGl2ZSBzeW5jICAgL2Rldi9zZGcKICAgNyAgICAgNyAgICAg ICA4ICAgICAgMTEy
ICAgICAgICA3ICAgICAgYWN0aXZlIHN5bmMgICAvZGV2L3NkaAo=
--0016e64cbe0a47b4b10497115fd9--
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Crash during raid6 reshape, now cannot restart?

am 10.12.2010 21:43:05 von NeilBrown

On Fri, 10 Dec 2010 09:05:47 -0800 Phil Genera wrote:

> I had a power failure during a large raid6 reshape (6->8 disks) on one
> of my arm systems last night, and can't seem to get it going again.
>
> I did this:
> # mdadm --grow --backup-file=./backup.mdadm --array-size=8 /dev/md0
>
> which (I've now noticed) didn't seem to write a backup file. There was
> a read error during the reshape, but it claimed recovery:
> Dec 9 20:48:07 love kernel: sd 2:0:0:0: [sda] Unhandled sense code
> Dec 9 20:48:07 love kernel: sd 2:0:0:0: [sda] Result: hostbyte=DID_OK
> driverbyte=DRIVER_SENSE
> Dec 9 20:48:07 love kernel: sd 2:0:0:0: [sda] Sense Key : Medium
> Error [current]
> Dec 9 20:48:07 love kernel: sd 2:0:0:0: [sda] Add. Sense: Unrecovered
> read error
> Dec 9 20:48:07 love kernel: sd 2:0:0:0: [sda] CDB: Read(10): 28 00 00
> 02 09 60 00 00 20 00
> Dec 9 20:48:07 love kernel: end_request: I/O error, dev sda, sector 133472
> Dec 9 20:48:08 love kernel: raid5:md0: read error corrected (8
> sectors at 133472 on sda)
> Dec 9 20:48:08 love kernel: raid5:md0: read error corrected (8
> sectors at 133480 on sda)
> Dec 9 20:48:08 love kernel: raid5:md0: read error corrected (8
> sectors at 133488 on sda)
> Dec 9 20:48:08 love kernel: raid5:md0: read error corrected (8
> sectors at 133496 on sda)
>
> Some time during the night, the electricity went away, and on reboot I get this:
>
> raid5: reshape_position too early for auto-recovery - aborting.

Something must be going wrong with the math in raid5:

if (mddev->delta_disks < 0
? (here_new * mddev->new_chunk_sectors <=
here_old * mddev->chunk_sectors)
: (here_new * mddev->new_chunk_sectors >=
here_old * mddev->chunk_sectors)) {
/* Reading from the same stripe as writing to - bad */
printk(KERN_ERR "raid5: reshape_position too early for "
"auto-recovery - aborting.\n");
return -EINVAL;
}

there 'here_new* new_chunk_size' must be over-flowing. So the size of the
array must only just fit into sector_t.
On and arm5 you would need to have CONFIG_LBD set - do you know if it is?

I guess I need to make that code more robust when sector_t doesn't have lots
more bits that the size of the device...

If you can compile your own kernel, you should be able to get it to work
easily. If not ... complain to whoever provided you with a kernel.

NeilBrown



>
> as well as when I try to assemble the array manually. There's nothing
> critical I don't have backed up, but there's a lot of TV on there I
> was planning to watch :).
>
> Any good ideas? I'd sure appreciate some help. I'm guessing this is
> just a crash in the critical section, and without a backup file I'm
> screwed. I'm surprised the backup file is still needed 200gb into the
> reshape though. Thanks!
>
>
> Versions & status:
>
> # cat /proc/mdstat
> Personalities : [raid1] [raid6] [raid5] [raid4]
> md0 : inactive sdg[0] sdj[7] sdi[6] sdf[5] sde[4] sdd[3] sdc[2] sdh[1]
> 3125690368 blocks super 0.91
>
> # uname -a
> Linux love 2.6.32-5-kirkwood #1 Sun Oct 31 11:19:32 UTC 2010 armv5tel GNU/Linux
> # mdadm --version
> mdadm - v3.1.4 - 31st August 2010
>
>
> More details (and --examine of all disks attached):
>
> # mdadm --detail /dev/md0
> /dev/md0:
> Version : 0.91
> Creation Time : Fri Oct 9 09:32:08 2009
> Raid Level : raid6
> Used Dev Size : 390711296 (372.61 GiB 400.09 GB)
> Raid Devices : 8
> Total Devices : 8
> Preferred Minor : 0
> Persistence : Superblock is persistent
>
> Update Time : Fri Dec 10 05:52:35 2010
> State : active, Not Started
> Active Devices : 8
> Working Devices : 8
> Failed Devices : 0
> Spare Devices : 0
>
> Layout : left-symmetric
> Chunk Size : 64K
>
> Delta Devices : 2, (6->8)
>
> UUID : 81ddccd8:5abf5b03:181548d9:47e92625
> Events : 0.1048248
>
> Number Major Minor RaidDevice State
> 0 8 96 0 active sync /dev/sdg
> 1 8 112 1 active sync /dev/sdh
> 2 8 32 2 active sync /dev/sdc
> 3 8 48 3 active sync /dev/sdd
> 4 8 64 4 active sync /dev/sde
> 5 8 80 5 active sync /dev/sdf
> 6 8 128 6 active sync /dev/sdi
> 7 8 144 7 active sync /dev/sdj
>
> --
> Phil

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Crash during raid6 reshape, now cannot restart?

am 10.12.2010 23:02:53 von NeilBrown

On Sat, 11 Dec 2010 07:43:05 +1100 Neil Brown wrote:

> >
> > raid5: reshape_position too early for auto-recovery - aborting.
>
> Something must be going wrong with the math in raid5:
>
> if (mddev->delta_disks < 0
> ? (here_new * mddev->new_chunk_sectors <=
> here_old * mddev->chunk_sectors)
> : (here_new * mddev->new_chunk_sectors >=
> here_old * mddev->chunk_sectors)) {
> /* Reading from the same stripe as writing to - bad */
> printk(KERN_ERR "raid5: reshape_position too early for "
> "auto-recovery - aborting.\n");
> return -EINVAL;
> }
>
> there 'here_new* new_chunk_size' must be over-flowing. So the size of the
> array must only just fit into sector_t.
> On and arm5 you would need to have CONFIG_LBD set - do you know if it is?
>
> I guess I need to make that code more robust when sector_t doesn't have lots
> more bits that the size of the device...
>
> If you can compile your own kernel, you should be able to get it to work
> easily. If not ... complain to whoever provided you with a kernel.
>

No ... I take that back. here_new is the result of dividing the
reshape_position by chunk_sector times number of disks.
So multiplying by chunk_sectors again is not going to cause an overflow.

So I have no idea what if going on here.... maybe a compiler bug?

If you compile your own kernel, I would put some printk's in
drives/md/raid5.c just before the above code to see what the values of the
variables are, and to see what the results of the multiplications will be.

NeilBrown

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Crash during raid6 reshape, now cannot restart?

am 10.12.2010 23:11:50 von Phil Genera

On Fri, Dec 10, 2010 at 14:02, Neil Brown wrote:
> On Sat, 11 Dec 2010 07:43:05 +1100 Neil Brown wrote:
>
>> >
>> > raid5: reshape_position too early for auto-recovery - aborting.
>>
>> Something must be going wrong with the math in raid5:
>>
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0if (mddev->delta_disks < 0
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 ? (here_new * mddev->new_chu=
nk_sectors <=3D
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0here_old * mddev->chu=
nk_sectors)
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 : (here_new * mddev->new_chu=
nk_sectors >=3D
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0here_old * mddev->chu=
nk_sectors)) {
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 /* Reading from the =
same stripe as writing to - bad */
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 printk(KERN_ERR "rai=
d5: reshape_position too early for "
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0"auto=
-recovery - aborting.\n");
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 return -EINVAL;
>> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 }
>>
>> there 'here_new* new_chunk_size' must be over-flowing. =A0So the siz=
e of the
>> array must only just fit into sector_t.
>> On and arm5 you would need to have CONFIG_LBD set - do you know if i=
t is?

Looking at what I think are the relevant kirkwood kernel config, I
only see CONFIG_LBDAF here:
http://merkel.debian.org/~jurij/

Which upon further investigation is the new name of CONFIG_LBD. So, I
believe so.

>> I guess I need to make that code more robust when sector_t doesn't h=
ave lots
>> more bits that the size of the device...
>>
>> If you can compile your own kernel, you should be able to get it to =
work
>> easily. =A0If not ... complain to whoever provided you with a kernel=


That'd be the nice folks working on debian squeeze.

> No ... I take that back. =A0here_new is the result of dividing the
> reshape_position by chunk_sector times number of disks.
> So multiplying by chunk_sectors again is not going to cause an overfl=
ow.
>
> So I have no idea what if going on here.... maybe a compiler bug?
>
> If you compile your own kernel, I would put some printk's in
> drives/md/raid5.c just before the above code to see what the values o=
f the
> variables are, and to see what the results of the multiplications wil=
l be.

I'm happy to try my own kernel, but first I think I'll try pulling the
disks and putting them on an x86 machine to see if its actually
platform related. If that works, I can keep the broken array around
for a bit to troubleshoot on. Thanks!

--=20
Phil
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Crash during raid6 reshape, now cannot restart?

am 12.12.2010 04:12:37 von Phil Genera

On Sat, Dec 11, 2010 at 11:06, Phil Genera wrote:
> On Fri, Dec 10, 2010 at 14:11, Phil Genera wrote:
>> I'm happy to try my own kernel, but first I think I'll try pulling the
>> disks and putting them on an x86 machine to see if its actually
>> platform related. If that works, I can keep the broken array around
>> for a bit to troubleshoot on. Thanks!
>
> Gave it a shot on an x86 machine today (with an oddly old mdadm, from
> ubuntu lucid), and mdadm segfault'd. I guess there's no avoiding
> building some kernel modules.
>
> details:
>
> $ mdadm --version
> mdadm - v2.6.7.1 - 15th October 2008

I built mdam 3.1.4 from source, and got the array reshaping again on
x86. I'm still trying to build a new raid456.ko on this tiny arm box
with some extra debugging in it.

--
Phil
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html