segfault in mdadm v2.6.4

segfault in mdadm v2.6.4

am 04.04.2008 07:03:48 von Brett Dikeman

On a debian/testing system, under both 2.6.22 and 2.6.24, I've been
trying to set up a 4-drive RAID6 array. I see the following segfault
listed in /var/log/messages, and it's appeared each time I've
assembled the array. Two drives are on the onboard SATA; two are on
external USB-SATA bridges (this is not permanent- just what I had
available for migrating off an other array.)

Also, the array's initial "resync" hung after a few hours, much to my
great annoyance (it's a +10 hour process; I was 3 hours in.) It took
the entire device with it- I couldn't unmount the filesystem. I
eventually tracked it to one of the four drives, on the USB<->SATA
bridge; it wasn't responding, and the other 3 drives seemed fine. It
then took the entire system with it a few minutes later; all the
running daemons stopped responding and I couldn't get a shell. After
waiting half an hour for a device timeout, etc- I unplugged the hung
drive since I didn't have anything on the array. No change, no kernel
messages logged before or after, not even when the USB device
'disappeared.' I gave up and power-cycled the box.

I'd appreciate being cc'd on followups- though I will be checking the
archives. I'm happy to provide additional info and run tests.

Thanks!
Brett

Apr 4 00:36:46 frank kernel: md: md1 stopped.
Apr 4 00:36:46 frank kernel: md: unbind
Apr 4 00:36:46 frank kernel: md: export_rdev(sdc2)
Apr 4 00:36:46 frank kernel: md: unbind
Apr 4 00:36:46 frank kernel: md: export_rdev(sdd2)
Apr 4 00:36:46 frank kernel: md: bind
Apr 4 00:36:46 frank kernel: md: bind
Apr 4 00:36:46 frank kernel: md: bind
Apr 4 00:36:46 frank kernel: md: bind
Apr 4 00:36:46 frank kernel: xor: automatically using best
checksumming function: generic_sse
Apr 4 00:36:46 frank kernel: generic_sse: 3086.000 MB/sec
Apr 4 00:36:46 frank kernel: xor: using function: generic_sse
(3086.000 MB/sec)
Apr 4 00:36:46 frank kernel: async_tx: api initialized (sync-only)
Apr 4 00:36:46 frank kernel: raid6: int64x1 693 MB/s
Apr 4 00:36:46 frank kernel: raid6: int64x2 922 MB/s
Apr 4 00:36:46 frank kernel: raid6: int64x4 1083 MB/s
Apr 4 00:36:46 frank kernel: raid6: int64x8 794 MB/s
Apr 4 00:36:46 frank kernel: raid6: sse2x1 1268 MB/s
Apr 4 00:36:46 frank kernel: raid6: sse2x2 1828 MB/s
Apr 4 00:36:46 frank kernel: raid6: sse2x4 1929 MB/s
Apr 4 00:36:46 frank kernel: raid6: using algorithm sse2x4 (1929 MB/s)
Apr 4 00:36:46 frank kernel: md: raid6 personality registered for
level 6
Apr 4 00:36:46 frank kernel: md: raid5 personality registered for
level 5
Apr 4 00:36:46 frank kernel: md: raid4 personality registered for
level 4
Apr 4 00:36:46 frank kernel: raid5: device sdc2 operational as raid
disk 0
Apr 4 00:36:46 frank kernel: raid5: device sdf2 operational as raid
disk 3
Apr 4 00:36:46 frank kernel: raid5: device sde2 operational as raid
disk 2
Apr 4 00:36:46 frank kernel: raid5: device sdd2 operational as raid
disk 1
Apr 4 00:36:46 frank kernel: raid5: allocated 4274kB for md1
Apr 4 00:36:46 frank kernel: raid5: raid level 6 set md1 active with
4 out of 4 devices, algorithm 2
Apr 4 00:36:46 frank kernel: RAID5 conf printout:
Apr 4 00:36:46 frank kernel: --- rd:4 wd:4
Apr 4 00:36:46 frank kernel: disk 0, o:1, dev:sdc2
Apr 4 00:36:46 frank kernel: disk 1, o:1, dev:sdd2
Apr 4 00:36:46 frank kernel: disk 2, o:1, dev:sde2
Apr 4 00:36:46 frank kernel: disk 3, o:1, dev:sdf2
Apr 4 00:36:46 frank kernel: mdadm[3954]: segfault at 0 rip 412d2c
rsp 7fffcdd18fa0 error 4

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: segfault in mdadm v2.6.4

am 04.04.2008 12:05:16 von Christian Pernegger

> Apr 4 00:36:46 frank kernel: mdadm[3954]: segfault at 0 rip 412d2c rsp
> 7fffcdd18fa0 error 4

The segfault after assemble is more or less "normal" with the Debian
version right now. Of course I haven't been able to reproduce it with
any consistency since I reported the bug :(

You might want to post exact array config + mdadm commandline used.

C.
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: segfault in mdadm v2.6.4

am 04.04.2008 18:01:08 von dan.j.williams

------=_Part_7314_30918300.1207324868692
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

On Fri, Apr 4, 2008 at 3:05 AM, Christian Pernegger wrote:
> > Apr 4 00:36:46 frank kernel: mdadm[3954]: segfault at 0 rip 412d2c rsp
> > 7fffcdd18fa0 error 4
>
> The segfault after assemble is more or less "normal" with the Debian
> version right now. Of course I haven't been able to reproduce it with
> any consistency since I reported the bug :(
>

There is a known issue with arrays in the "read-auto" state and "mdadm
--monitor". The attached patch addresses this.

What is more concerning is the usb-storage hangs. Do you have logs
from when it hung?

--
Dan

------=_Part_7314_30918300.1207324868692
Content-Type: text/x-patch; name=mdstat-fix-level-parsing.patch
Content-Transfer-Encoding: base64
X-Attachment-Id: f_femy0oor0
Content-Disposition: attachment; filename=mdstat-fix-level-parsing.patch

bWRhZG06IGZpeCBzZWdmYXVsdCwgL3Byb2MvbWRzdGF0IHBhcnNpbmcgb2Yg J2xldmVsJwoKRnJv
bTogRGFuIFdpbGxpYW1zIDxkYW4uai53aWxsaWFtc0BpbnRlbC5jb20+CgpJ ZiB0aGUgYXJyYXkg
aXMgaW4gJ3JlYWQtYXV0bycgbW9kZSAvcHJvYy9tZHN0YXQgd2lsbCBoYXZl IGEgc3RyaW5nCmxp
a2U6CgoJImFjdGl2ZShhdXRvLXJlYWQtb25seSkiCgpUaGUgcGFyc2luZyBj b2RlIGRvZXMgbm90
IHJlY29nbml6ZSB0aGlzIGFzICJhY3RpdmUiIHNvIGl0IGRvZXMgbm90IHNl dAotPmxldmVsLiAg
VGhpcyBsZWFkcyB0byBhIHNlZ2ZhdWx0IGluIC0tbW9uaXRvciBtb2RlIChN b25pdG9yLmM6NDA1
KS4KClNpZ25lZC1vZmYtYnk6IERhbiBXaWxsaWFtcyA8ZGFuLmoud2lsbGlh bXNAaW50ZWwuY29t
PgotLS0KCiBtZHN0YXQuYyB8ICAgIDMgKystCiAxIGZpbGVzIGNoYW5nZWQs IDIgaW5zZXJ0aW9u
cygrKSwgMSBkZWxldGlvbnMoLSkKCgpkaWZmIC0tZ2l0IGEvbWRzdGF0LmMg Yi9tZHN0YXQuYwpp
bmRleCAzMzVlMWU1Li4xZGNkNzA5IDEwMDY0NAotLS0gYS9tZHN0YXQuYwor KysgYi9tZHN0YXQu
YwpAQCAtMTY1LDcgKzE2NSw4IEBAIHN0cnVjdCBtZHN0YXRfZW50ICptZHN0 YXRfcmVhZChpbnQg
aG9sZCwgaW50IHN0YXJ0KQogCQlmb3IgKHc9ZGxfbmV4dChsaW5lKTsgdyE9 IGxpbmUgOyB3PWRs
X25leHQodykpIHsKIAkJCWludCBsID0gc3RybGVuKHcpOwogCQkJY2hhciAq ZXE7Ci0JCQlpZiAo
c3RyY21wKHcsICJhY3RpdmUiKT09MCkKKwkJCWlmIChzdHJuY21wKHcsICJh Y3RpdmUiLCBzdHJs
ZW4oImFjdGl2ZSIpKT09MCkKKwkJCS8qIHN0cm5jbXAgdG8gY2F0Y2ggdGhl ICJhY3RpdmUoYXV0
by1yZWFkLW9ubHkpIiBjYXNlICovCiAJCQkJZW50LT5hY3RpdmUgPSAxOwog CQkJZWxzZSBpZiAo
c3RyY21wKHcsICJpbmFjdGl2ZSIpPT0wKQogCQkJCWVudC0+YWN0aXZlID0g MDsK
------=_Part_7314_30918300.1207324868692--
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: segfault in mdadm v2.6.4

am 04.04.2008 18:14:26 von Brett Dikeman

> There is a known issue with arrays in the "read-auto" state and "mdadm
> --monitor". The attached patch addresses this.

Ah- I forgot that, yes, mdadm monitor was running. I was *wondering* what
mdadm process was hanging around to segfault- duh! :-)


> What is more concerning is the usb-storage hangs. Do you have logs
> from when it hung?

Nope- absolutely nothing useful from dmesg or /var/log/messages. The
array simply grinds to a halt. I think it might be the USB bridge, since
this morning I woke up to find the same drive as previously mentioned,
with its access light on continuously (didn't happen the first time.) It
did get further along in the sync- 66.7%, very roughly twice as far as the
first time.

/proc/mdstat keeps getting updated during all this; the (obviously
averaged) rebuild rate average drops steadily.

I don't have any info handy on the USB device, but it's an older
SATA<->USB/eSATA AMS Venus (expensive but otherwise nice case. Has a
silent 80mm fan in it that doesn't move much hair, but does keep drives
cooler.) It dates back to when manufacturers were still making "L"
"eSATA" ports (grr.) The other USB bridge is a Vantec multi-interface
bare adapter. That one seemed fine.

Ask away with things you'd like me to try- I'll get more info this
evening, and I might try picking up a second one of the Vantec interfaces
if I can (they're cheap- and have proven endlessly useful.)
Unfortunately, my intended destination for this array uses Adaptec 1205SA
PCI cards, which idiotically don't support drives over 500GB and hang
during their POST :(

Sometimes I wish for the days of ISA and IRQ's...grrrr!

Brett

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: segfault in mdadm v2.6.4

am 04.04.2008 18:43:02 von dan.j.williams

On Fri, Apr 4, 2008 at 9:14 AM, Brett Dikeman wrote:
> > There is a known issue with arrays in the "read-auto" state and "mdadm
> > --monitor". The attached patch addresses this.
>
> Ah- I forgot that, yes, mdadm monitor was running. I was *wondering* what
> mdadm process was hanging around to segfault- duh! :-)
>
>
>
> > What is more concerning is the usb-storage hangs. Do you have logs
> > from when it hung?
>
> Nope- absolutely nothing useful from dmesg or /var/log/messages. The
> array simply grinds to a halt. I think it might be the USB bridge, since
> this morning I woke up to find the same drive as previously mentioned,
> with its access light on continuously (didn't happen the first time.) It
> did get further along in the sync- 66.7%, very roughly twice as far as the
> first time.
>
> /proc/mdstat keeps getting updated during all this; the (obviously
> averaged) rebuild rate average drops steadily.
>
> I don't have any info handy on the USB device, but it's an older
> SATA<->USB/eSATA AMS Venus (expensive but otherwise nice case. Has a
> silent 80mm fan in it that doesn't move much hair, but does keep drives
> cooler.) It dates back to when manufacturers were still making "L"
> "eSATA" ports (grr.) The other USB bridge is a Vantec multi-interface
> bare adapter. That one seemed fine.
>
> Ask away with things you'd like me to try- I'll get more info this
> evening, and I might try picking up a second one of the Vantec interfaces
> if I can (they're cheap- and have proven endlessly useful.)
> Unfortunately, my intended destination for this array uses Adaptec 1205SA
> PCI cards, which idiotically don't support drives over 500GB and hang
> during their POST :(
>

The output from sysrq-w after the hang might shed some light. I have
copied linux-usb [1] in case they recognize the devices you mentioned.

--
Dan

[1]: original report http://marc.info/?l=linux-raid&m=120728545823965&w=2
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html