mdadm 3.1.4 - hanging on cat /proc/mdstat

mdadm 3.1.4 - hanging on cat /proc/mdstat

am 11.07.2011 20:41:21 von Sandra Escandor

Hello all,

I'm facing an issue where it appears that only one RAID disk (on a
RAID10) is failing, but the whole RAID becomes unusable - when issuing a
cat /proc/mdstat, the system hangs. We actually had to recover by
restarting the system - then the failed disk was listed as removed in
output of "mdadm --detail /dev/md126". But the RAID should have still be
usable with only one disk failing - does anyone know what I should do to
work around this issue?

Some preliminary info:
RAID10 was built using Intel matrix storage manager metadata format,
using the commands:
1. "sudo mdadm -A /dev/md0 /dev/sd[b-g]" - in order to assemble the IMSM
container of the /dev/sd[b-g] devices.
2. "sudo mdadm -I /dev/md0" - in order to put the RAID member disks into
the container.
-Using mdadm 3.1.4 with kernel 2.6.32-5-amd64.

I've looked through the output of kern.log, and the following is what I
have interpreted:

1. It appears that there is some unhandled error that occurs with one of
the RAID member disks - /dev/sdc. ("I/O error, dev sdc, sector
1053765632")

Jul 8 14:57:19 ecs-1u kernel: [ 8753.699973] sd 2:0:0:0: [sdc]
Unhandled error code
Jul 8 14:57:19 ecs-1u kernel: [ 8753.699975] sd 2:0:0:0: [sdc] Result:
hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
Jul 8 14:57:19 ecs-1u kernel: [ 8753.699977] sd 2:0:0:0: [sdc] CDB:
Write(10): 2a 00 3e cf 30 00 00 03 68 00
Jul 8 14:57:19 ecs-1u kernel: [ 8753.699982] end_request: I/O error,
dev sdc, sector 1053765632


2. md starts a recovery for the RAID array. The RAID10 conf printout
looks like the following:

Jul 8 14:57:23 ecs-1u kernel: [ 8758.163655] md: recovery of RAID array
md126
Jul 8 14:57:23 ecs-1u kernel: [ 8758.163660] md: minimum _guaranteed_
speed: 1000 KB/sec/disk.
Jul 8 14:57:23 ecs-1u kernel: [ 8758.163662] md: using maximum
available idle IO bandwidth (but not more than 200000 KB/sec) for
recovery.
Jul 8 14:57:23 ecs-1u kernel: [ 8758.163672] md: using 128k window,
over a total of 732572288 blocks.
Jul 8 14:57:23 ecs-1u kernel: [ 8758.163675] md: resuming recovery of
md126 from checkpoint.
Jul 8 14:57:23 ecs-1u kernel: [ 8758.163677] md: md126: recovery done.
Jul 8 14:57:23 ecs-1u kernel: [ 8758.296414] RAID10 conf printout:
Jul 8 14:57:23 ecs-1u kernel: [ 8758.296416] --- wd:3 rd:4
Jul 8 14:57:23 ecs-1u kernel: [ 8758.296417] disk 0, wo:0, o:1,
dev:sdb
Jul 8 14:57:23 ecs-1u kernel: [ 8758.296419] disk 1, wo:1, o:0,
dev:sdc
Jul 8 14:57:23 ecs-1u kernel: [ 8758.296420] disk 2, wo:0, o:1,
dev:sdd
Jul 8 14:57:23 ecs-1u kernel: [ 8758.296421] disk 3, wo:0, o:1,
dev:sde

3. But then another unhandled error occurs, and it looks like something
is causing the md126_raid10 task to block.

Jul 8 14:58:17 ecs-1u kernel: [ 8812.088705] sd 2:0:0:0: [sdc]
Unhandled error code
Jul 8 14:58:17 ecs-1u kernel: [ 8812.088710] sd 2:0:0:0: [sdc] Result:
hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
Jul 8 14:58:17 ecs-1u kernel: [ 8812.088714] sd 2:0:0:0: [sdc] CDB:
Write(10): 2a 00 3e cf 63 00 00 04 00 00
Jul 8 14:58:17 ecs-1u kernel: [ 8812.088723] end_request: I/O error,
dev sdc, sector 1053778688
Jul 8 14:58:17 ecs-1u kernel: [ 8812.088775] sd 2:0:0:0: [sdc]
Unhandled error code
Jul 8 14:58:17 ecs-1u kernel: [ 8812.088776] sd 2:0:0:0: [sdc] Result:
hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
Jul 8 14:58:17 ecs-1u kernel: [ 8812.088778] sd 2:0:0:0: [sdc] CDB:
Write(10): 2a 00 3e cf 67 00 00 04 00 00
Jul 8 14:58:17 ecs-1u kernel: [ 8812.088781] end_request: I/O error,
dev sdc, sector 1053779712
Jul 8 14:58:17 ecs-1u kernel: [ 8812.088817] sd 2:0:0:0: [sdc]
Unhandled error code
Jul 8 14:58:17 ecs-1u kernel: [ 8812.088818] sd 2:0:0:0: [sdc] Result:
hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
Jul 8 14:58:17 ecs-1u kernel: [ 8812.088820] sd 2:0:0:0: [sdc] CDB:
Write(10): 2a 00 3e cf 6b 00 00 04 00 00
Jul 8 14:58:17 ecs-1u kernel: [ 8812.088823] end_request: I/O error,
dev sdc, sector 1053780736
Jul 8 14:58:17 ecs-1u kernel: [ 8812.088859] sd 2:0:0:0: [sdc]
Unhandled error code
Jul 8 14:58:17 ecs-1u kernel: [ 8812.088860] sd 2:0:0:0: [sdc] Result:
hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
Jul 8 14:58:17 ecs-1u kernel: [ 8812.088862] sd 2:0:0:0: [sdc] CDB:
Write(10): 2a 00 3e cf 6f 00 00 04 00 00
Jul 8 14:58:17 ecs-1u kernel: [ 8812.088865] end_request: I/O error,
dev sdc, sector 1053781760
Jul 8 14:58:17 ecs-1u kernel: [ 8812.088909] sd 2:0:0:0: [sdc]
Unhandled error code
Jul 8 14:58:17 ecs-1u kernel: [ 8812.088910] sd 2:0:0:0: [sdc] Result:
hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
Jul 8 14:58:17 ecs-1u kernel: [ 8812.088912] sd 2:0:0:0: [sdc] CDB:
Write(10): 2a 00 3e cf 73 00 00 04 00 00
Jul 8 14:58:17 ecs-1u kernel: [ 8812.088916] end_request: I/O error,
dev sdc, sector 1053782784
Jul 8 14:58:17 ecs-1u kernel: [ 8812.089014] sd 2:0:0:0: [sdc]
Unhandled error code
Jul 8 14:58:17 ecs-1u kernel: [ 8812.089015] sd 2:0:0:0: [sdc] Result:
hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
Jul 8 14:58:17 ecs-1u kernel: [ 8812.089017] sd 2:0:0:0: [sdc] CDB:
Write(10): 2a 00 3e cf 77 00 00 04 00 00
Jul 8 14:58:17 ecs-1u kernel: [ 8812.089020] end_request: I/O error,
dev sdc, sector 1053783808
Jul 8 14:58:17 ecs-1u kernel: [ 8812.089121] sd 2:0:0:0: [sdc]
Unhandled error code
Jul 8 14:58:17 ecs-1u kernel: [ 8812.089122] sd 2:0:0:0: [sdc] Result:
hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
Jul 8 14:58:17 ecs-1u kernel: [ 8812.089124] sd 2:0:0:0: [sdc] CDB:
Write(10): 2a 00 3e cf 7b 00 00 04 00 00
Jul 8 14:58:17 ecs-1u kernel: [ 8812.089127] end_request: I/O error,
dev sdc, sector 1053784832
Jul 8 14:58:17 ecs-1u kernel: [ 8812.089236] sd 2:0:0:0: [sdc]
Unhandled error code
Jul 8 14:58:17 ecs-1u kernel: [ 8812.089237] sd 2:0:0:0: [sdc] Result:
hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
Jul 8 14:58:17 ecs-1u kernel: [ 8812.089239] sd 2:0:0:0: [sdc] CDB:
Write(10): 2a 00 3e cf 7f 00 00 04 00 00
Jul 8 14:58:17 ecs-1u kernel: [ 8812.089243] end_request: I/O error,
dev sdc, sector 1053785856
Jul 8 14:58:17 ecs-1u kernel: [ 8812.089344] sd 2:0:0:0: [sdc]
Unhandled error code
Jul 8 14:58:17 ecs-1u kernel: [ 8812.089345] sd 2:0:0:0: [sdc] Result:
hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
Jul 8 14:58:17 ecs-1u kernel: [ 8812.089347] sd 2:0:0:0: [sdc] CDB:
Write(10): 2a 00 3e cf 83 00 00 04 00 00
Jul 8 14:58:17 ecs-1u kernel: [ 8812.089351] end_request: I/O error,
dev sdc, sector 1053786880
Jul 8 14:58:17 ecs-1u kernel: [ 8812.089441] sd 2:0:0:0: [sdc]
Unhandled error code
Jul 8 14:58:17 ecs-1u kernel: [ 8812.089443] sd 2:0:0:0: [sdc] Result:
hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
Jul 8 14:58:17 ecs-1u kernel: [ 8812.089444] sd 2:0:0:0: [sdc] CDB:
Write(10): 2a 00 3e cf 87 00 00 04 00 00
Jul 8 14:58:17 ecs-1u kernel: [ 8812.089448] end_request: I/O error,
dev sdc, sector 1053787904
Jul 8 14:58:17 ecs-1u kernel: [ 8812.089536] sd 2:0:0:0: [sdc]
Unhandled error code
Jul 8 14:58:17 ecs-1u kernel: [ 8812.089537] sd 2:0:0:0: [sdc] Result:
hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
Jul 8 14:58:17 ecs-1u kernel: [ 8812.089538] sd 2:0:0:0: [sdc] CDB:
Write(10): 2a 00 3e cf 8b 00 00 04 00 00
Jul 8 14:58:17 ecs-1u kernel: [ 8812.089542] end_request: I/O error,
dev sdc, sector 1053788928
Jul 8 14:58:17 ecs-1u kernel: [ 8812.089631] sd 2:0:0:0: [sdc]
Unhandled error code
Jul 8 14:58:17 ecs-1u kernel: [ 8812.089632] sd 2:0:0:0: [sdc] Result:
hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
Jul 8 14:58:17 ecs-1u kernel: [ 8812.089634] sd 2:0:0:0: [sdc] CDB:
Write(10): 2a 00 3e cf 8f 00 00 04 00 00
Jul 8 14:58:17 ecs-1u kernel: [ 8812.089637] end_request: I/O error,
dev sdc, sector 1053789952
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041839] INFO: task kthreadd:2
blocked for more than 120 seconds.
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041867] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041905] kthreadd D
0000000000000000 0 2 0 0x00000000
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041908] ffff8801bf13aa60
0000000000000046 0000000000000000 ffff8801bf11d000
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041911] 0000000000000400
0000000000003737 000000000000f9e0 ffff8801bf067fd8
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041913] 0000000000015780
0000000000015780 ffff88033f028710 ffff88033f028a08
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041915] Call Trace:
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041925] [] ?
sync_page+0x0/0x46
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041929] [] ?
io_schedule+0x73/0xb7
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041931] [] ?
sync_page+0x41/0x46
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041933] [] ?
__wait_on_bit+0x41/0x70
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041935] [] ?
wait_on_page_bit+0x6b/0x71
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041938] [] ?
wake_bit_function+0x0/0x23
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041943] [] ?
shrink_page_list+0x14e/0x623
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041948] [] ?
del_timer_sync+0xc/0x16
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041953] [] ?
read_tsc+0xa/0x20
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041955] [] ?
schedule_timeout+0xad/0xdd
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041958] [] ?
ktime_get_ts+0x68/0xb2
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041961] [] ?
delayacct_end+0x74/0x7f
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041963] [] ?
isolate_pages_global+0x1a0/0x20f
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041965] [] ?
finish_wait+0x35/0x60
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041967] [] ?
autoremove_wake_function+0x0/0x2e
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041969] [] ?
shrink_list+0x528/0x767
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041971] [] ?
shrink_zone+0x280/0x342
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041975] [] ?
zone_statistics+0x3c/0x5d
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041977] [] ?
zone_watermark_ok+0x20/0xb1
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041979] [] ?
zone_reclaim+0x276/0x357
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041981] [] ?
isolate_pages_global+0x0/0x20f
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041983] [] ?
zone_watermark_ok+0x20/0xb1
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041985] [] ?
get_page_from_freelist+0x1ff/0x760
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041987] [] ?
__alloc_pages_nodemask+0x11c/0x5f4
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041994] [] ?
cpumask_next_and+0x2a/0x3a
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041998] [] ?
find_busiest_group+0x9ae/0xa1e
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042001] [] ?
alloc_pid+0x26e/0x390
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042003] [] ?
__get_free_pages+0x9/0x46
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042005] [] ?
copy_process+0xd7/0x115f
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042007] [] ?
do_fork+0x157/0x31e
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042009] [] ?
finish_task_switch+0x3a/0xaf
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042012] [] ?
kernel_thread+0x82/0xe0
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042014] [] ?
kthread+0x0/0x81
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042015] [] ?
child_rip+0x0/0x20
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042017] [] ?
kthreadd+0xb1/0xec
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042021] [] ?
early_idt_handler+0x0/0x71
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042022] [] ?
child_rip+0xa/0x20
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042024] [] ?
early_idt_handler+0x0/0x71
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042028] [] ?
do_set_mempolicy+0x128/0x13a
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042029] [] ?
kthreadd+0x0/0xec
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042031] [] ?
child_rip+0x0/0x20
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042076] INFO: task
md126_raid10:3493 blocked for more than 120 seconds.
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042101] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042138] md126_raid10 D
0000000000000000 0 3493 2 0x00000000
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042140] ffff88033f02b880
0000000000000046 0000000000000000 0000000a00000006
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042143] 0000006cffffffff
ffff880006e0fa98 000000000000f9e0 ffff88033df07fd8
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042145] 0000000000015780
0000000000015780 ffff88033e79aa60 ffff88033e79ad58
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042147] Call Trace:
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042150] [] ?
sprintf+0x51/0x59
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042152] [] ?
select_task_rq_fair+0x472/0x836
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042154] [] ?
schedule_timeout+0x2e/0xdd
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042156] [] ?
wait_for_common+0xde/0x15b
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042158] [] ?
default_wake_function+0x0/0x9
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042163] [] ?
kthread_create+0x93/0x121
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042167] [] ?
md_thread+0x0/0x10f [md_mod]
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042172] [] ?
__kmalloc+0x12f/0x141
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042175] [] ?
md_register_thread+0x22/0xcc [md_mod]
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042178] [] ?
md_do_sync+0x0/0xaf6 [md_mod]
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042181] [] ?
md_register_thread+0x96/0xcc [md_mod]
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042184] [] ?
md_check_recovery+0x3fd/0x4b9 [md_mod]
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042187] [] ?
flush_pending_writes+0x13/0x8a [raid10]
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042190] [] ?
raid10d+0x42/0xade [raid10]
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042191] [] ?
thread_return+0x79/0xe0
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042194] [] ?
apic_timer_interrupt+0xe/0x20
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042196] [] ?
thread_return+0xd6/0xe0
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042197] [] ?
schedule_timeout+0x2e/0xdd
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042200] [] ?
md_thread+0xf1/0x10f [md_mod]
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042202] [] ?
autoremove_wake_function+0x0/0x2e
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042205] [] ?
md_thread+0x0/0x10f [md_mod]
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042206] [] ?
kthread+0x79/0x81
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042208] [] ?
child_rip+0xa/0x20
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042210] [] ?
kthread+0x0/0x81
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042211] [] ?
child_rip+0x0/0x20
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963652] INFO: task kthreadd:2
blocked for more than 120 seconds.
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963680] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963718] kthreadd D
0000000000000000 0 2 0 0x00000000
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963721] ffff8801bf13aa60
0000000000000046 0000000000000000 ffff8801bf11d000
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963723] 0000000000000400
0000000000003737 000000000000f9e0 ffff8801bf067fd8
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963726] 0000000000015780
0000000000015780 ffff88033f028710 ffff88033f028a08
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963728] Call Trace:
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963737] [] ?
sync_page+0x0/0x46
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963742] [] ?
io_schedule+0x73/0xb7
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963744] [] ?
sync_page+0x41/0x46
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963746] [] ?
__wait_on_bit+0x41/0x70
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963748] [] ?
wait_on_page_bit+0x6b/0x71
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963752] [] ?
wake_bit_function+0x0/0x23
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963755] [] ?
shrink_page_list+0x14e/0x623
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963760] [] ?
del_timer_sync+0xc/0x16
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963765] [] ?
read_tsc+0xa/0x20
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963766] [] ?
schedule_timeout+0xad/0xdd
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963769] [] ?
ktime_get_ts+0x68/0xb2
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963772] [] ?
delayacct_end+0x74/0x7f
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963774] [] ?
isolate_pages_global+0x1a0/0x20f
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963776] [] ?
finish_wait+0x35/0x60
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963778] [] ?
autoremove_wake_function+0x0/0x2e
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963780] [] ?
shrink_list+0x528/0x767
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963783] [] ?
shrink_zone+0x280/0x342
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963786] [] ?
zone_statistics+0x3c/0x5d
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963788] [] ?
zone_watermark_ok+0x20/0xb1
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963790] [] ?
zone_reclaim+0x276/0x357
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963792] [] ?
isolate_pages_global+0x0/0x20f
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963794] [] ?
zone_watermark_ok+0x20/0xb1
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963796] [] ?
get_page_from_freelist+0x1ff/0x760
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963798] [] ?
__alloc_pages_nodemask+0x11c/0x5f4
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963804] [] ?
cpumask_next_and+0x2a/0x3a
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963808] [] ?
find_busiest_group+0x9ae/0xa1e
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963812] [] ?
alloc_pid+0x26e/0x390
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963813] [] ?
__get_free_pages+0x9/0x46
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963816] [] ?
copy_process+0xd7/0x115f
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963818] [] ?
do_fork+0x157/0x31e
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963820] [] ?
finish_task_switch+0x3a/0xaf
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963822] [] ?
kernel_thread+0x82/0xe0
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963824] [] ?
kthread+0x0/0x81
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963825] [] ?
child_rip+0x0/0x20
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963827] [] ?
kthreadd+0xb1/0xec
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963831] [] ?
early_idt_handler+0x0/0x71
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963833] [] ?
child_rip+0xa/0x20
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963835] [] ?
early_idt_handler+0x0/0x71
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963838] [] ?
do_set_mempolicy+0x128/0x13a
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963840] [] ?
kthreadd+0x0/0xec
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963842] [] ?
child_rip+0x0/0x20
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963886] INFO: task
md126_raid10:3493 blocked for more than 120 seconds.
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963911] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963949] md126_raid10 D
0000000000000000 0 3493 2 0x00000000
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963951] ffff88033f02b880
0000000000000046 0000000000000000 0000000a00000006
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963953] 0000006cffffffff
ffff880006e0fa98 000000000000f9e0 ffff88033df07fd8
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963955] 0000000000015780
0000000000015780 ffff88033e79aa60 ffff88033e79ad58
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963957] Call Trace:
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963961] [] ?
sprintf+0x51/0x59
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963963] [] ?
select_task_rq_fair+0x472/0x836
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963965] [] ?
schedule_timeout+0x2e/0xdd
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963967] [] ?
wait_for_common+0xde/0x15b
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963969] [] ?
default_wake_function+0x0/0x9
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963973] [] ?
kthread_create+0x93/0x121
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963977] [] ?
md_thread+0x0/0x10f [md_mod]
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963982] [] ?
__kmalloc+0x12f/0x141
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963985] [] ?
md_register_thread+0x22/0xcc [md_mod]
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963988] [] ?
md_do_sync+0x0/0xaf6 [md_mod]
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963991] [] ?
md_register_thread+0x96/0xcc [md_mod]
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963994] [] ?
md_check_recovery+0x3fd/0x4b9 [md_mod]
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963997] [] ?
flush_pending_writes+0x13/0x8a [raid10]
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963999] [] ?
raid10d+0x42/0xade [raid10]
Jul 8 15:03:22 ecs-1u kernel: [ 9116.964001] [] ?
thread_return+0x79/0xe0
Jul 8 15:03:22 ecs-1u kernel: [ 9116.964003] [] ?
apic_timer_interrupt+0xe/0x20
Jul 8 15:03:22 ecs-1u kernel: [ 9116.964005] [] ?
thread_return+0xd6/0xe0
Jul 8 15:03:22 ecs-1u kernel: [ 9116.964007] [] ?
schedule_timeout+0x2e/0xdd
Jul 8 15:03:22 ecs-1u kernel: [ 9116.964010] [] ?
md_thread+0xf1/0x10f [md_mod]
Jul 8 15:03:22 ecs-1u kernel: [ 9116.964012] [] ?
autoremove_wake_function+0x0/0x2e
Jul 8 15:03:22 ecs-1u kernel: [ 9116.964014] [] ?
md_thread+0x0/0x10f [md_mod]
Jul 8 15:03:22 ecs-1u kernel: [ 9116.964016] [] ?
kthread+0x79/0x81
Jul 8 15:03:22 ecs-1u kernel: [ 9116.964018] [] ?
child_rip+0xa/0x20
Jul 8 15:03:22 ecs-1u kernel: [ 9116.964019] [] ?
kthread+0x0/0x81
Jul 8 15:03:22 ecs-1u kernel: [ 9116.964021] [] ?
child_rip+0x0/0x20
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885452] INFO: task kthreadd:2
blocked for more than 120 seconds.
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885477] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885515] kthreadd D
0000000000000000 0 2 0 0x00000000
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885517] ffff8801bf13aa60
0000000000000046 0000000000000000 ffff8801bf11d000
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885519] 0000000000000400
0000000000003737 000000000000f9e0 ffff8801bf067fd8
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885521] 0000000000015780
0000000000015780 ffff88033f028710 ffff88033f028a08
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885523] Call Trace:
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885527] [] ?
sync_page+0x0/0x46
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885529] [] ?
io_schedule+0x73/0xb7
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885531] [] ?
sync_page+0x41/0x46
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885533] [] ?
__wait_on_bit+0x41/0x70
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885535] [] ?
wait_on_page_bit+0x6b/0x71
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885537] [] ?
wake_bit_function+0x0/0x23
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885539] [] ?
shrink_page_list+0x14e/0x623
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885542] [] ?
del_timer_sync+0xc/0x16
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885544] [] ?
read_tsc+0xa/0x20
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885545] [] ?
schedule_timeout+0xad/0xdd
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885547] [] ?
ktime_get_ts+0x68/0xb2
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885549] [] ?
delayacct_end+0x74/0x7f
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885551] [] ?
isolate_pages_global+0x1a0/0x20f
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885553] [] ?
finish_wait+0x35/0x60
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885554] [] ?
autoremove_wake_function+0x0/0x2e
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885556] [] ?
shrink_list+0x528/0x767
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885559] [] ?
shrink_zone+0x280/0x342
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885561] [] ?
zone_statistics+0x3c/0x5d
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885563] [] ?
zone_watermark_ok+0x20/0xb1
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885565] [] ?
zone_reclaim+0x276/0x357
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885567] [] ?
isolate_pages_global+0x0/0x20f
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885568] [] ?
zone_watermark_ok+0x20/0xb1
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885570] [] ?
get_page_from_freelist+0x1ff/0x760
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885573] [] ?
__alloc_pages_nodemask+0x11c/0x5f4
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885575] [] ?
cpumask_next_and+0x2a/0x3a
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885577] [] ?
find_busiest_group+0x9ae/0xa1e
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885579] [] ?
alloc_pid+0x26e/0x390
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885581] [] ?
__get_free_pages+0x9/0x46
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885583] [] ?
copy_process+0xd7/0x115f
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885585] [] ?
do_fork+0x157/0x31e
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885587] [] ?
finish_task_switch+0x3a/0xaf
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885589] [] ?
kernel_thread+0x82/0xe0
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885590] [] ?
kthread+0x0/0x81
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885592] [] ?
child_rip+0x0/0x20
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885594] [] ?
kthreadd+0xb1/0xec
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885596] [] ?
early_idt_handler+0x0/0x71
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885598] [] ?
child_rip+0xa/0x20
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885600] [] ?
early_idt_handler+0x0/0x71
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885602] [] ?
do_set_mempolicy+0x128/0x13a
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885603] [] ?
kthreadd+0x0/0xec
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885605] [] ?
child_rip+0x0/0x20
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885616] INFO: task
md126_raid10:3493 blocked for more than 120 seconds.
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885641] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885678] md126_raid10 D
0000000000000000 0 3493 2 0x00000000
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885681] ffff88033f02b880
0000000000000046 0000000000000000 0000000a00000006
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885683] 0000006cffffffff
ffff880006e0fa98 000000000000f9e0 ffff88033df07fd8
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885685] 0000000000015780
0000000000015780 ffff88033e79aa60 ffff88033e79ad58
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885687] Call Trace:
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885689] [] ?
sprintf+0x51/0x59
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885691] [] ?
select_task_rq_fair+0x472/0x836
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885692] [] ?
schedule_timeout+0x2e/0xdd
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885694] [] ?
wait_for_common+0xde/0x15b
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885696] [] ?
default_wake_function+0x0/0x9
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885699] [] ?
kthread_create+0x93/0x121
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885702] [] ?
md_thread+0x0/0x10f [md_mod]
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885705] [] ?
__kmalloc+0x12f/0x141
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885708] [] ?
md_register_thread+0x22/0xcc [md_mod]
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885711] [] ?
md_do_sync+0x0/0xaf6 [md_mod]
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885714] [] ?
md_register_thread+0x96/0xcc [md_mod]
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885716] [] ?
md_check_recovery+0x3fd/0x4b9 [md_mod]
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885719] [] ?
flush_pending_writes+0x13/0x8a [raid10]
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885721] [] ?
raid10d+0x42/0xade [raid10]
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885723] [] ?
thread_return+0x79/0xe0
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885725] [] ?
apic_timer_interrupt+0xe/0x20
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885727] [] ?
thread_return+0xd6/0xe0
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885728] [] ?
schedule_timeout+0x2e/0xdd
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885731] [] ?
md_thread+0xf1/0x10f [md_mod]
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885733] [] ?
autoremove_wake_function+0x0/0x2e
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885736] [] ?
md_thread+0x0/0x10f [md_mod]
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885738] [] ?
kthread+0x79/0x81
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885739] [] ?
child_rip+0xa/0x20
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885741] [] ?
kthread+0x0/0x81
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885742] [] ?
child_rip+0x0/0x20

.....

Jul 8 15:07:22 ecs-1u kernel: [ 9356.807402] INFO: task
md126_raid10:3493 blocked for more than 120 seconds.
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807427] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807465] md126_raid10 D
0000000000000000 0 3493 2 0x00000000
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807467] ffff88033f02b880
0000000000000046 0000000000000000 0000000a00000006
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807469] 0000006cffffffff
ffff880006e0fa98 000000000000f9e0 ffff88033df07fd8
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807471] 0000000000015780
0000000000015780 ffff88033e79aa60 ffff88033e79ad58
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807473] Call Trace:
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807475] [] ?
sprintf+0x51/0x59
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807477] [] ?
select_task_rq_fair+0x472/0x836
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807479] [] ?
schedule_timeout+0x2e/0xdd
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807481] [] ?
wait_for_common+0xde/0x15b
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807483] [] ?
default_wake_function+0x0/0x9
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807485] [] ?
kthread_create+0x93/0x121
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807488] [] ?
md_thread+0x0/0x10f [md_mod]
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807491] [] ?
__kmalloc+0x12f/0x141
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807494] [] ?
md_register_thread+0x22/0xcc [md_mod]
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807497] [] ?
md_do_sync+0x0/0xaf6 [md_mod]
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807500] [] ?
md_register_thread+0x96/0xcc [md_mod]
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807503] [] ?
md_check_recovery+0x3fd/0x4b9 [md_mod]
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807506] [] ?
flush_pending_writes+0x13/0x8a [raid10]
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807508] [] ?
raid10d+0x42/0xade [raid10]
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807510] [] ?
thread_return+0x79/0xe0
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807511] [] ?
apic_timer_interrupt+0xe/0x20
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807513] [] ?
thread_return+0xd6/0xe0
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807515] [] ?
schedule_timeout+0x2e/0xdd
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807518] [] ?
md_thread+0xf1/0x10f [md_mod]
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807520] [] ?
autoremove_wake_function+0x0/0x2e
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807522] [] ?
md_thread+0x0/0x10f [md_mod]
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807524] [] ?
kthread+0x79/0x81
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807526] [] ?
child_rip+0xa/0x20
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807527] [] ?
kthread+0x0/0x81
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807529] [] ?
child_rip+0x0/0x20

4. Eventually, the server is restarted because it's just hanging on cat
/proc/mdstat

Jul 12 00:11:06 ecs-1u kernel: [300990.576353] md: ioctl lock
interrupted, reason -4, cmd -2142762735
Jul 12 00:15:16 ecs-1u kernel: [301240.301494] md: ioctl lock
interrupted, reason -4, cmd -2142762735
Jul 12 00:17:35 ecs-1u kernel: [301379.418775] md: ioctl lock
interrupted, reason -4, cmd -2142762735
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

RE: mdadm 3.1.4 - hanging on cat /proc/mdstat

am 12.07.2011 14:01:46 von Sandra Escandor

Sorry for top-posting - I have more additional info that could shed some
light.

One more question: If only one sata disk (western digital
WD7500BPKT-00PK4T0) were to have this failed command and this sata disk
belonged to a RAID10, shouldn't we be able to still use the RAID with
the remaining disks, and not have to reboot?

Jul 8 14:48:06 ecs-1u kernel: [ 8200.901003] ata3.00: exception Emask
0x0 SAct 0x1ffc0 SErr 0x0 action 0x6 frozen
Jul 8 14:48:06 ecs-1u kernel: [ 8200.901052] ata3.00: failed command:
WRITE FPDMA QUEUED
Jul 8 14:48:06 ecs-1u kernel: [ 8200.901082] ata3.00: cmd
61/00:30:80:37:3f/04:00:44:00:00/40 tag 6 ncq 524288 out
Jul 8 14:48:06 ecs-1u kernel: [ 8200.901083] res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 8 14:48:06 ecs-1u kernel: [ 8200.901163] ata3.00: status: { DRDY }
Jul 8 14:48:06 ecs-1u kernel: [ 8200.901183] ata3.00: failed command:
WRITE FPDMA QUEUED
Jul 8 14:48:06 ecs-1u kernel: [ 8200.901207] ata3.00: cmd
61/00:38:80:3b:3f/04:00:44:00:00/40 tag 7 ncq 524288 out
Jul 8 14:48:06 ecs-1u kernel: [ 8200.901208] res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 8 14:48:06 ecs-1u kernel: [ 8200.901282] ata3.00: status: { DRDY }
Jul 8 14:48:06 ecs-1u kernel: [ 8200.901302] ata3.00: failed command:
WRITE FPDMA QUEUED
Jul 8 14:48:06 ecs-1u kernel: [ 8200.901326] ata3.00: cmd
61/00:40:80:3f:3f/04:00:44:00:00/40 tag 8 ncq 524288 out
Jul 8 14:48:06 ecs-1u kernel: [ 8200.901327] res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 8 14:48:06 ecs-1u kernel: [ 8200.901400] ata3.00: status: { DRDY }
Jul 8 14:48:06 ecs-1u kernel: [ 8200.901420] ata3.00: failed command:
WRITE FPDMA QUEUED
Jul 8 14:48:06 ecs-1u kernel: [ 8200.901444] ata3.00: cmd
61/00:48:80:43:3f/04:00:44:00:00/40 tag 9 ncq 524288 out
Jul 8 14:48:06 ecs-1u kernel: [ 8200.901445] res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 8 14:48:06 ecs-1u kernel: [ 8200.901525] ata3.00: status: { DRDY }
Jul 8 14:48:06 ecs-1u kernel: [ 8200.901545] ata3.00: failed command:
WRITE FPDMA QUEUED
Jul 8 14:48:06 ecs-1u kernel: [ 8200.901569] ata3.00: cmd
61/00:50:80:47:3f/04:00:44:00:00/40 tag 10 ncq 524288 out
Jul 8 14:48:06 ecs-1u kernel: [ 8200.901570] res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 8 14:48:06 ecs-1u kernel: [ 8200.901644] ata3.00: status: { DRDY }
Jul 8 14:48:06 ecs-1u kernel: [ 8200.901664] ata3.00: failed command:
WRITE FPDMA QUEUED
Jul 8 14:48:06 ecs-1u kernel: [ 8200.901688] ata3.00: cmd
61/00:58:80:4b:3f/04:00:44:00:00/40 tag 11 ncq 524288 out
Jul 8 14:48:06 ecs-1u kernel: [ 8200.901689] res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 8 14:48:06 ecs-1u kernel: [ 8200.901763] ata3.00: status: { DRDY }
Jul 8 14:48:06 ecs-1u kernel: [ 8200.901783] ata3.00: failed command:
WRITE FPDMA QUEUED
Jul 8 14:48:06 ecs-1u kernel: [ 8200.901807] ata3.00: cmd
61/00:60:80:4f:3f/04:00:44:00:00/40 tag 12 ncq 524288 out
Jul 8 14:48:06 ecs-1u kernel: [ 8200.901808] res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 8 14:48:06 ecs-1u kernel: [ 8200.901882] ata3.00: status: { DRDY }
Jul 8 14:48:06 ecs-1u kernel: [ 8200.901902] ata3.00: failed command:
WRITE FPDMA QUEUED
Jul 8 14:48:06 ecs-1u kernel: [ 8200.901926] ata3.00: cmd
61/00:68:80:53:3f/04:00:44:00:00/40 tag 13 ncq 524288 out
Jul 8 14:48:06 ecs-1u kernel: [ 8200.901927] res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 8 14:48:06 ecs-1u kernel: [ 8200.902000] ata3.00: status: { DRDY }
Jul 8 14:48:06 ecs-1u kernel: [ 8200.902020] ata3.00: failed command:
WRITE FPDMA QUEUED
Jul 8 14:48:06 ecs-1u kernel: [ 8200.902044] ata3.00: cmd
61/00:70:80:57:3f/04:00:44:00:00/40 tag 14 ncq 524288 out
Jul 8 14:48:06 ecs-1u kernel: [ 8200.902045] res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 8 14:48:06 ecs-1u kernel: [ 8200.902119] ata3.00: status: { DRDY }
Jul 8 14:48:06 ecs-1u kernel: [ 8200.902139] ata3.00: failed command:
WRITE FPDMA QUEUED
Jul 8 14:48:06 ecs-1u kernel: [ 8200.902163] ata3.00: cmd
61/00:78:80:5b:3f/04:00:44:00:00/40 tag 15 ncq 524288 out
Jul 8 14:48:06 ecs-1u kernel: [ 8200.902164] res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 8 14:48:06 ecs-1u kernel: [ 8200.902238] ata3.00: status: { DRDY }
Jul 8 14:48:06 ecs-1u kernel: [ 8200.902257] ata3.00: failed command:
WRITE FPDMA QUEUED
Jul 8 14:48:06 ecs-1u kernel: [ 8200.902281] ata3.00: cmd
61/10:80:70:ef:37/00:00:26:00:00/40 tag 16 ncq 8192 out
Jul 8 14:48:06 ecs-1u kernel: [ 8200.902282] res
40/00:00:00:00:00/00:00:00:00:00/00 Emask 0x4 (timeout)
Jul 8 14:48:06 ecs-1u kernel: [ 8200.902356] ata3.00: status: { DRDY }
Jul 8 14:48:06 ecs-1u kernel: [ 8200.902378] ata3: hard resetting link
Jul 8 14:48:11 ecs-1u kernel: [ 8206.257532] ata3: link is slow to
respond, please be patient (ready=0)
Jul 8 14:48:16 ecs-1u kernel: [ 8210.902508] ata3: COMRESET failed
(errno=-16)
Jul 8 14:48:16 ecs-1u kernel: [ 8210.902535] ata3: hard resetting link
Jul 8 14:48:21 ecs-1u kernel: [ 8216.259007] ata3: link is slow to
respond, please be patient (ready=0)
Jul 8 14:48:21 ecs-1u kernel: [ 8216.762685] ata3: SATA link up 3.0
Gbps (SStatus 123 SControl 300)
Jul 8 14:48:21 ecs-1u kernel: [ 8216.769012] ata3.00: configured for
UDMA/133
Jul 8 14:48:21 ecs-1u kernel: [ 8216.769019] ata3.00: device reported
invalid CHS sector 0
Jul 8 14:48:21 ecs-1u kernel: [ 8216.769024] ata3.00: device reported
invalid CHS sector 0
Jul 8 14:48:21 ecs-1u kernel: [ 8216.769028] ata3.00: device reported
invalid CHS sector 0
Jul 8 14:48:21 ecs-1u kernel: [ 8216.769032] ata3.00: device reported
invalid CHS sector 0
Jul 8 14:48:21 ecs-1u kernel: [ 8216.769036] ata3.00: device reported
invalid CHS sector 0
Jul 8 14:48:21 ecs-1u kernel: [ 8216.769041] ata3.00: device reported
invalid CHS sector 0
Jul 8 14:48:21 ecs-1u kernel: [ 8216.769045] ata3.00: device reported
invalid CHS sector 0
Jul 8 14:48:21 ecs-1u kernel: [ 8216.769049] ata3.00: device reported
invalid CHS sector 0
Jul 8 14:48:21 ecs-1u kernel: [ 8216.769054] ata3.00: device reported
invalid CHS sector 0
Jul 8 14:48:21 ecs-1u kernel: [ 8216.769058] ata3.00: device reported
invalid CHS sector 0
Jul 8 14:48:21 ecs-1u kernel: [ 8216.769060] ata3.00: device reported
invalid CHS sector 0
Jul 8 14:48:21 ecs-1u kernel: [ 8216.769078] ata3: EH complete



-----Original Message-----
From: linux-raid-owner@vger.kernel.org
[mailto:linux-raid-owner@vger.kernel.org] On Behalf Of Sandra Escandor
Sent: Monday, July 11, 2011 2:41 PM
To: linux-raid@vger.kernel.org
Subject: mdadm 3.1.4 - hanging on cat /proc/mdstat

Hello all,

I'm facing an issue where it appears that only one RAID disk (on a
RAID10) is failing, but the whole RAID becomes unusable - when issuing a
cat /proc/mdstat, the system hangs. We actually had to recover by
restarting the system - then the failed disk was listed as removed in
output of "mdadm --detail /dev/md126". But the RAID should have still be
usable with only one disk failing - does anyone know what I should do to
work around this issue?

Some preliminary info:
RAID10 was built using Intel matrix storage manager metadata format,
using the commands:
1. "sudo mdadm -A /dev/md0 /dev/sd[b-g]" - in order to assemble the IMSM
container of the /dev/sd[b-g] devices.
2. "sudo mdadm -I /dev/md0" - in order to put the RAID member disks into
the container.
-Using mdadm 3.1.4 with kernel 2.6.32-5-amd64.

I've looked through the output of kern.log, and the following is what I
have interpreted:

1. It appears that there is some unhandled error that occurs with one of
the RAID member disks - /dev/sdc. ("I/O error, dev sdc, sector
1053765632")

Jul 8 14:57:19 ecs-1u kernel: [ 8753.699973] sd 2:0:0:0: [sdc]
Unhandled error code
Jul 8 14:57:19 ecs-1u kernel: [ 8753.699975] sd 2:0:0:0: [sdc] Result:
hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
Jul 8 14:57:19 ecs-1u kernel: [ 8753.699977] sd 2:0:0:0: [sdc] CDB:
Write(10): 2a 00 3e cf 30 00 00 03 68 00
Jul 8 14:57:19 ecs-1u kernel: [ 8753.699982] end_request: I/O error,
dev sdc, sector 1053765632


2. md starts a recovery for the RAID array. The RAID10 conf printout
looks like the following:

Jul 8 14:57:23 ecs-1u kernel: [ 8758.163655] md: recovery of RAID array
md126
Jul 8 14:57:23 ecs-1u kernel: [ 8758.163660] md: minimum _guaranteed_
speed: 1000 KB/sec/disk.
Jul 8 14:57:23 ecs-1u kernel: [ 8758.163662] md: using maximum
available idle IO bandwidth (but not more than 200000 KB/sec) for
recovery.
Jul 8 14:57:23 ecs-1u kernel: [ 8758.163672] md: using 128k window,
over a total of 732572288 blocks.
Jul 8 14:57:23 ecs-1u kernel: [ 8758.163675] md: resuming recovery of
md126 from checkpoint.
Jul 8 14:57:23 ecs-1u kernel: [ 8758.163677] md: md126: recovery done.
Jul 8 14:57:23 ecs-1u kernel: [ 8758.296414] RAID10 conf printout:
Jul 8 14:57:23 ecs-1u kernel: [ 8758.296416] --- wd:3 rd:4
Jul 8 14:57:23 ecs-1u kernel: [ 8758.296417] disk 0, wo:0, o:1,
dev:sdb
Jul 8 14:57:23 ecs-1u kernel: [ 8758.296419] disk 1, wo:1, o:0,
dev:sdc
Jul 8 14:57:23 ecs-1u kernel: [ 8758.296420] disk 2, wo:0, o:1,
dev:sdd
Jul 8 14:57:23 ecs-1u kernel: [ 8758.296421] disk 3, wo:0, o:1,
dev:sde

3. But then another unhandled error occurs, and it looks like something
is causing the md126_raid10 task to block.

Jul 8 14:58:17 ecs-1u kernel: [ 8812.088705] sd 2:0:0:0: [sdc]
Unhandled error code
Jul 8 14:58:17 ecs-1u kernel: [ 8812.088710] sd 2:0:0:0: [sdc] Result:
hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
Jul 8 14:58:17 ecs-1u kernel: [ 8812.088714] sd 2:0:0:0: [sdc] CDB:
Write(10): 2a 00 3e cf 63 00 00 04 00 00
Jul 8 14:58:17 ecs-1u kernel: [ 8812.088723] end_request: I/O error,
dev sdc, sector 1053778688
Jul 8 14:58:17 ecs-1u kernel: [ 8812.088775] sd 2:0:0:0: [sdc]
Unhandled error code
Jul 8 14:58:17 ecs-1u kernel: [ 8812.088776] sd 2:0:0:0: [sdc] Result:
hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
Jul 8 14:58:17 ecs-1u kernel: [ 8812.088778] sd 2:0:0:0: [sdc] CDB:
Write(10): 2a 00 3e cf 67 00 00 04 00 00
Jul 8 14:58:17 ecs-1u kernel: [ 8812.088781] end_request: I/O error,
dev sdc, sector 1053779712
Jul 8 14:58:17 ecs-1u kernel: [ 8812.088817] sd 2:0:0:0: [sdc]
Unhandled error code
Jul 8 14:58:17 ecs-1u kernel: [ 8812.088818] sd 2:0:0:0: [sdc] Result:
hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
Jul 8 14:58:17 ecs-1u kernel: [ 8812.088820] sd 2:0:0:0: [sdc] CDB:
Write(10): 2a 00 3e cf 6b 00 00 04 00 00
Jul 8 14:58:17 ecs-1u kernel: [ 8812.088823] end_request: I/O error,
dev sdc, sector 1053780736
Jul 8 14:58:17 ecs-1u kernel: [ 8812.088859] sd 2:0:0:0: [sdc]
Unhandled error code
Jul 8 14:58:17 ecs-1u kernel: [ 8812.088860] sd 2:0:0:0: [sdc] Result:
hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
Jul 8 14:58:17 ecs-1u kernel: [ 8812.088862] sd 2:0:0:0: [sdc] CDB:
Write(10): 2a 00 3e cf 6f 00 00 04 00 00
Jul 8 14:58:17 ecs-1u kernel: [ 8812.088865] end_request: I/O error,
dev sdc, sector 1053781760
Jul 8 14:58:17 ecs-1u kernel: [ 8812.088909] sd 2:0:0:0: [sdc]
Unhandled error code
Jul 8 14:58:17 ecs-1u kernel: [ 8812.088910] sd 2:0:0:0: [sdc] Result:
hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
Jul 8 14:58:17 ecs-1u kernel: [ 8812.088912] sd 2:0:0:0: [sdc] CDB:
Write(10): 2a 00 3e cf 73 00 00 04 00 00
Jul 8 14:58:17 ecs-1u kernel: [ 8812.088916] end_request: I/O error,
dev sdc, sector 1053782784
Jul 8 14:58:17 ecs-1u kernel: [ 8812.089014] sd 2:0:0:0: [sdc]
Unhandled error code
Jul 8 14:58:17 ecs-1u kernel: [ 8812.089015] sd 2:0:0:0: [sdc] Result:
hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
Jul 8 14:58:17 ecs-1u kernel: [ 8812.089017] sd 2:0:0:0: [sdc] CDB:
Write(10): 2a 00 3e cf 77 00 00 04 00 00
Jul 8 14:58:17 ecs-1u kernel: [ 8812.089020] end_request: I/O error,
dev sdc, sector 1053783808
Jul 8 14:58:17 ecs-1u kernel: [ 8812.089121] sd 2:0:0:0: [sdc]
Unhandled error code
Jul 8 14:58:17 ecs-1u kernel: [ 8812.089122] sd 2:0:0:0: [sdc] Result:
hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
Jul 8 14:58:17 ecs-1u kernel: [ 8812.089124] sd 2:0:0:0: [sdc] CDB:
Write(10): 2a 00 3e cf 7b 00 00 04 00 00
Jul 8 14:58:17 ecs-1u kernel: [ 8812.089127] end_request: I/O error,
dev sdc, sector 1053784832
Jul 8 14:58:17 ecs-1u kernel: [ 8812.089236] sd 2:0:0:0: [sdc]
Unhandled error code
Jul 8 14:58:17 ecs-1u kernel: [ 8812.089237] sd 2:0:0:0: [sdc] Result:
hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
Jul 8 14:58:17 ecs-1u kernel: [ 8812.089239] sd 2:0:0:0: [sdc] CDB:
Write(10): 2a 00 3e cf 7f 00 00 04 00 00
Jul 8 14:58:17 ecs-1u kernel: [ 8812.089243] end_request: I/O error,
dev sdc, sector 1053785856
Jul 8 14:58:17 ecs-1u kernel: [ 8812.089344] sd 2:0:0:0: [sdc]
Unhandled error code
Jul 8 14:58:17 ecs-1u kernel: [ 8812.089345] sd 2:0:0:0: [sdc] Result:
hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
Jul 8 14:58:17 ecs-1u kernel: [ 8812.089347] sd 2:0:0:0: [sdc] CDB:
Write(10): 2a 00 3e cf 83 00 00 04 00 00
Jul 8 14:58:17 ecs-1u kernel: [ 8812.089351] end_request: I/O error,
dev sdc, sector 1053786880
Jul 8 14:58:17 ecs-1u kernel: [ 8812.089441] sd 2:0:0:0: [sdc]
Unhandled error code
Jul 8 14:58:17 ecs-1u kernel: [ 8812.089443] sd 2:0:0:0: [sdc] Result:
hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
Jul 8 14:58:17 ecs-1u kernel: [ 8812.089444] sd 2:0:0:0: [sdc] CDB:
Write(10): 2a 00 3e cf 87 00 00 04 00 00
Jul 8 14:58:17 ecs-1u kernel: [ 8812.089448] end_request: I/O error,
dev sdc, sector 1053787904
Jul 8 14:58:17 ecs-1u kernel: [ 8812.089536] sd 2:0:0:0: [sdc]
Unhandled error code
Jul 8 14:58:17 ecs-1u kernel: [ 8812.089537] sd 2:0:0:0: [sdc] Result:
hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
Jul 8 14:58:17 ecs-1u kernel: [ 8812.089538] sd 2:0:0:0: [sdc] CDB:
Write(10): 2a 00 3e cf 8b 00 00 04 00 00
Jul 8 14:58:17 ecs-1u kernel: [ 8812.089542] end_request: I/O error,
dev sdc, sector 1053788928
Jul 8 14:58:17 ecs-1u kernel: [ 8812.089631] sd 2:0:0:0: [sdc]
Unhandled error code
Jul 8 14:58:17 ecs-1u kernel: [ 8812.089632] sd 2:0:0:0: [sdc] Result:
hostbyte=DID_OK driverbyte=DRIVER_TIMEOUT
Jul 8 14:58:17 ecs-1u kernel: [ 8812.089634] sd 2:0:0:0: [sdc] CDB:
Write(10): 2a 00 3e cf 8f 00 00 04 00 00
Jul 8 14:58:17 ecs-1u kernel: [ 8812.089637] end_request: I/O error,
dev sdc, sector 1053789952
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041839] INFO: task kthreadd:2
blocked for more than 120 seconds.
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041867] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041905] kthreadd D
0000000000000000 0 2 0 0x00000000
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041908] ffff8801bf13aa60
0000000000000046 0000000000000000 ffff8801bf11d000
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041911] 0000000000000400
0000000000003737 000000000000f9e0 ffff8801bf067fd8
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041913] 0000000000015780
0000000000015780 ffff88033f028710 ffff88033f028a08
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041915] Call Trace:
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041925] [] ?
sync_page+0x0/0x46
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041929] [] ?
io_schedule+0x73/0xb7
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041931] [] ?
sync_page+0x41/0x46
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041933] [] ?
__wait_on_bit+0x41/0x70
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041935] [] ?
wait_on_page_bit+0x6b/0x71
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041938] [] ?
wake_bit_function+0x0/0x23
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041943] [] ?
shrink_page_list+0x14e/0x623
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041948] [] ?
del_timer_sync+0xc/0x16
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041953] [] ?
read_tsc+0xa/0x20
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041955] [] ?
schedule_timeout+0xad/0xdd
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041958] [] ?
ktime_get_ts+0x68/0xb2
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041961] [] ?
delayacct_end+0x74/0x7f
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041963] [] ?
isolate_pages_global+0x1a0/0x20f
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041965] [] ?
finish_wait+0x35/0x60
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041967] [] ?
autoremove_wake_function+0x0/0x2e
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041969] [] ?
shrink_list+0x528/0x767
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041971] [] ?
shrink_zone+0x280/0x342
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041975] [] ?
zone_statistics+0x3c/0x5d
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041977] [] ?
zone_watermark_ok+0x20/0xb1
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041979] [] ?
zone_reclaim+0x276/0x357
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041981] [] ?
isolate_pages_global+0x0/0x20f
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041983] [] ?
zone_watermark_ok+0x20/0xb1
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041985] [] ?
get_page_from_freelist+0x1ff/0x760
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041987] [] ?
__alloc_pages_nodemask+0x11c/0x5f4
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041994] [] ?
cpumask_next_and+0x2a/0x3a
Jul 8 15:01:22 ecs-1u kernel: [ 8997.041998] [] ?
find_busiest_group+0x9ae/0xa1e
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042001] [] ?
alloc_pid+0x26e/0x390
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042003] [] ?
__get_free_pages+0x9/0x46
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042005] [] ?
copy_process+0xd7/0x115f
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042007] [] ?
do_fork+0x157/0x31e
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042009] [] ?
finish_task_switch+0x3a/0xaf
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042012] [] ?
kernel_thread+0x82/0xe0
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042014] [] ?
kthread+0x0/0x81
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042015] [] ?
child_rip+0x0/0x20
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042017] [] ?
kthreadd+0xb1/0xec
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042021] [] ?
early_idt_handler+0x0/0x71
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042022] [] ?
child_rip+0xa/0x20
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042024] [] ?
early_idt_handler+0x0/0x71
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042028] [] ?
do_set_mempolicy+0x128/0x13a
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042029] [] ?
kthreadd+0x0/0xec
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042031] [] ?
child_rip+0x0/0x20
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042076] INFO: task
md126_raid10:3493 blocked for more than 120 seconds.
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042101] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042138] md126_raid10 D
0000000000000000 0 3493 2 0x00000000
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042140] ffff88033f02b880
0000000000000046 0000000000000000 0000000a00000006
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042143] 0000006cffffffff
ffff880006e0fa98 000000000000f9e0 ffff88033df07fd8
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042145] 0000000000015780
0000000000015780 ffff88033e79aa60 ffff88033e79ad58
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042147] Call Trace:
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042150] [] ?
sprintf+0x51/0x59
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042152] [] ?
select_task_rq_fair+0x472/0x836
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042154] [] ?
schedule_timeout+0x2e/0xdd
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042156] [] ?
wait_for_common+0xde/0x15b
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042158] [] ?
default_wake_function+0x0/0x9
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042163] [] ?
kthread_create+0x93/0x121
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042167] [] ?
md_thread+0x0/0x10f [md_mod]
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042172] [] ?
__kmalloc+0x12f/0x141
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042175] [] ?
md_register_thread+0x22/0xcc [md_mod]
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042178] [] ?
md_do_sync+0x0/0xaf6 [md_mod]
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042181] [] ?
md_register_thread+0x96/0xcc [md_mod]
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042184] [] ?
md_check_recovery+0x3fd/0x4b9 [md_mod]
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042187] [] ?
flush_pending_writes+0x13/0x8a [raid10]
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042190] [] ?
raid10d+0x42/0xade [raid10]
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042191] [] ?
thread_return+0x79/0xe0
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042194] [] ?
apic_timer_interrupt+0xe/0x20
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042196] [] ?
thread_return+0xd6/0xe0
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042197] [] ?
schedule_timeout+0x2e/0xdd
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042200] [] ?
md_thread+0xf1/0x10f [md_mod]
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042202] [] ?
autoremove_wake_function+0x0/0x2e
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042205] [] ?
md_thread+0x0/0x10f [md_mod]
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042206] [] ?
kthread+0x79/0x81
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042208] [] ?
child_rip+0xa/0x20
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042210] [] ?
kthread+0x0/0x81
Jul 8 15:01:22 ecs-1u kernel: [ 8997.042211] [] ?
child_rip+0x0/0x20
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963652] INFO: task kthreadd:2
blocked for more than 120 seconds.
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963680] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963718] kthreadd D
0000000000000000 0 2 0 0x00000000
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963721] ffff8801bf13aa60
0000000000000046 0000000000000000 ffff8801bf11d000
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963723] 0000000000000400
0000000000003737 000000000000f9e0 ffff8801bf067fd8
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963726] 0000000000015780
0000000000015780 ffff88033f028710 ffff88033f028a08
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963728] Call Trace:
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963737] [] ?
sync_page+0x0/0x46
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963742] [] ?
io_schedule+0x73/0xb7
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963744] [] ?
sync_page+0x41/0x46
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963746] [] ?
__wait_on_bit+0x41/0x70
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963748] [] ?
wait_on_page_bit+0x6b/0x71
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963752] [] ?
wake_bit_function+0x0/0x23
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963755] [] ?
shrink_page_list+0x14e/0x623
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963760] [] ?
del_timer_sync+0xc/0x16
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963765] [] ?
read_tsc+0xa/0x20
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963766] [] ?
schedule_timeout+0xad/0xdd
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963769] [] ?
ktime_get_ts+0x68/0xb2
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963772] [] ?
delayacct_end+0x74/0x7f
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963774] [] ?
isolate_pages_global+0x1a0/0x20f
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963776] [] ?
finish_wait+0x35/0x60
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963778] [] ?
autoremove_wake_function+0x0/0x2e
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963780] [] ?
shrink_list+0x528/0x767
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963783] [] ?
shrink_zone+0x280/0x342
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963786] [] ?
zone_statistics+0x3c/0x5d
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963788] [] ?
zone_watermark_ok+0x20/0xb1
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963790] [] ?
zone_reclaim+0x276/0x357
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963792] [] ?
isolate_pages_global+0x0/0x20f
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963794] [] ?
zone_watermark_ok+0x20/0xb1
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963796] [] ?
get_page_from_freelist+0x1ff/0x760
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963798] [] ?
__alloc_pages_nodemask+0x11c/0x5f4
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963804] [] ?
cpumask_next_and+0x2a/0x3a
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963808] [] ?
find_busiest_group+0x9ae/0xa1e
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963812] [] ?
alloc_pid+0x26e/0x390
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963813] [] ?
__get_free_pages+0x9/0x46
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963816] [] ?
copy_process+0xd7/0x115f
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963818] [] ?
do_fork+0x157/0x31e
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963820] [] ?
finish_task_switch+0x3a/0xaf
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963822] [] ?
kernel_thread+0x82/0xe0
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963824] [] ?
kthread+0x0/0x81
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963825] [] ?
child_rip+0x0/0x20
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963827] [] ?
kthreadd+0xb1/0xec
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963831] [] ?
early_idt_handler+0x0/0x71
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963833] [] ?
child_rip+0xa/0x20
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963835] [] ?
early_idt_handler+0x0/0x71
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963838] [] ?
do_set_mempolicy+0x128/0x13a
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963840] [] ?
kthreadd+0x0/0xec
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963842] [] ?
child_rip+0x0/0x20
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963886] INFO: task
md126_raid10:3493 blocked for more than 120 seconds.
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963911] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963949] md126_raid10 D
0000000000000000 0 3493 2 0x00000000
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963951] ffff88033f02b880
0000000000000046 0000000000000000 0000000a00000006
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963953] 0000006cffffffff
ffff880006e0fa98 000000000000f9e0 ffff88033df07fd8
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963955] 0000000000015780
0000000000015780 ffff88033e79aa60 ffff88033e79ad58
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963957] Call Trace:
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963961] [] ?
sprintf+0x51/0x59
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963963] [] ?
select_task_rq_fair+0x472/0x836
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963965] [] ?
schedule_timeout+0x2e/0xdd
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963967] [] ?
wait_for_common+0xde/0x15b
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963969] [] ?
default_wake_function+0x0/0x9
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963973] [] ?
kthread_create+0x93/0x121
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963977] [] ?
md_thread+0x0/0x10f [md_mod]
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963982] [] ?
__kmalloc+0x12f/0x141
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963985] [] ?
md_register_thread+0x22/0xcc [md_mod]
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963988] [] ?
md_do_sync+0x0/0xaf6 [md_mod]
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963991] [] ?
md_register_thread+0x96/0xcc [md_mod]
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963994] [] ?
md_check_recovery+0x3fd/0x4b9 [md_mod]
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963997] [] ?
flush_pending_writes+0x13/0x8a [raid10]
Jul 8 15:03:22 ecs-1u kernel: [ 9116.963999] [] ?
raid10d+0x42/0xade [raid10]
Jul 8 15:03:22 ecs-1u kernel: [ 9116.964001] [] ?
thread_return+0x79/0xe0
Jul 8 15:03:22 ecs-1u kernel: [ 9116.964003] [] ?
apic_timer_interrupt+0xe/0x20
Jul 8 15:03:22 ecs-1u kernel: [ 9116.964005] [] ?
thread_return+0xd6/0xe0
Jul 8 15:03:22 ecs-1u kernel: [ 9116.964007] [] ?
schedule_timeout+0x2e/0xdd
Jul 8 15:03:22 ecs-1u kernel: [ 9116.964010] [] ?
md_thread+0xf1/0x10f [md_mod]
Jul 8 15:03:22 ecs-1u kernel: [ 9116.964012] [] ?
autoremove_wake_function+0x0/0x2e
Jul 8 15:03:22 ecs-1u kernel: [ 9116.964014] [] ?
md_thread+0x0/0x10f [md_mod]
Jul 8 15:03:22 ecs-1u kernel: [ 9116.964016] [] ?
kthread+0x79/0x81
Jul 8 15:03:22 ecs-1u kernel: [ 9116.964018] [] ?
child_rip+0xa/0x20
Jul 8 15:03:22 ecs-1u kernel: [ 9116.964019] [] ?
kthread+0x0/0x81
Jul 8 15:03:22 ecs-1u kernel: [ 9116.964021] [] ?
child_rip+0x0/0x20
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885452] INFO: task kthreadd:2
blocked for more than 120 seconds.
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885477] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885515] kthreadd D
0000000000000000 0 2 0 0x00000000
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885517] ffff8801bf13aa60
0000000000000046 0000000000000000 ffff8801bf11d000
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885519] 0000000000000400
0000000000003737 000000000000f9e0 ffff8801bf067fd8
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885521] 0000000000015780
0000000000015780 ffff88033f028710 ffff88033f028a08
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885523] Call Trace:
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885527] [] ?
sync_page+0x0/0x46
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885529] [] ?
io_schedule+0x73/0xb7
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885531] [] ?
sync_page+0x41/0x46
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885533] [] ?
__wait_on_bit+0x41/0x70
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885535] [] ?
wait_on_page_bit+0x6b/0x71
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885537] [] ?
wake_bit_function+0x0/0x23
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885539] [] ?
shrink_page_list+0x14e/0x623
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885542] [] ?
del_timer_sync+0xc/0x16
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885544] [] ?
read_tsc+0xa/0x20
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885545] [] ?
schedule_timeout+0xad/0xdd
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885547] [] ?
ktime_get_ts+0x68/0xb2
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885549] [] ?
delayacct_end+0x74/0x7f
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885551] [] ?
isolate_pages_global+0x1a0/0x20f
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885553] [] ?
finish_wait+0x35/0x60
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885554] [] ?
autoremove_wake_function+0x0/0x2e
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885556] [] ?
shrink_list+0x528/0x767
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885559] [] ?
shrink_zone+0x280/0x342
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885561] [] ?
zone_statistics+0x3c/0x5d
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885563] [] ?
zone_watermark_ok+0x20/0xb1
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885565] [] ?
zone_reclaim+0x276/0x357
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885567] [] ?
isolate_pages_global+0x0/0x20f
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885568] [] ?
zone_watermark_ok+0x20/0xb1
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885570] [] ?
get_page_from_freelist+0x1ff/0x760
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885573] [] ?
__alloc_pages_nodemask+0x11c/0x5f4
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885575] [] ?
cpumask_next_and+0x2a/0x3a
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885577] [] ?
find_busiest_group+0x9ae/0xa1e
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885579] [] ?
alloc_pid+0x26e/0x390
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885581] [] ?
__get_free_pages+0x9/0x46
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885583] [] ?
copy_process+0xd7/0x115f
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885585] [] ?
do_fork+0x157/0x31e
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885587] [] ?
finish_task_switch+0x3a/0xaf
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885589] [] ?
kernel_thread+0x82/0xe0
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885590] [] ?
kthread+0x0/0x81
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885592] [] ?
child_rip+0x0/0x20
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885594] [] ?
kthreadd+0xb1/0xec
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885596] [] ?
early_idt_handler+0x0/0x71
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885598] [] ?
child_rip+0xa/0x20
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885600] [] ?
early_idt_handler+0x0/0x71
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885602] [] ?
do_set_mempolicy+0x128/0x13a
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885603] [] ?
kthreadd+0x0/0xec
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885605] [] ?
child_rip+0x0/0x20
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885616] INFO: task
md126_raid10:3493 blocked for more than 120 seconds.
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885641] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885678] md126_raid10 D
0000000000000000 0 3493 2 0x00000000
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885681] ffff88033f02b880
0000000000000046 0000000000000000 0000000a00000006
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885683] 0000006cffffffff
ffff880006e0fa98 000000000000f9e0 ffff88033df07fd8
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885685] 0000000000015780
0000000000015780 ffff88033e79aa60 ffff88033e79ad58
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885687] Call Trace:
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885689] [] ?
sprintf+0x51/0x59
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885691] [] ?
select_task_rq_fair+0x472/0x836
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885692] [] ?
schedule_timeout+0x2e/0xdd
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885694] [] ?
wait_for_common+0xde/0x15b
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885696] [] ?
default_wake_function+0x0/0x9
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885699] [] ?
kthread_create+0x93/0x121
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885702] [] ?
md_thread+0x0/0x10f [md_mod]
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885705] [] ?
__kmalloc+0x12f/0x141
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885708] [] ?
md_register_thread+0x22/0xcc [md_mod]
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885711] [] ?
md_do_sync+0x0/0xaf6 [md_mod]
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885714] [] ?
md_register_thread+0x96/0xcc [md_mod]
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885716] [] ?
md_check_recovery+0x3fd/0x4b9 [md_mod]
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885719] [] ?
flush_pending_writes+0x13/0x8a [raid10]
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885721] [] ?
raid10d+0x42/0xade [raid10]
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885723] [] ?
thread_return+0x79/0xe0
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885725] [] ?
apic_timer_interrupt+0xe/0x20
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885727] [] ?
thread_return+0xd6/0xe0
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885728] [] ?
schedule_timeout+0x2e/0xdd
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885731] [] ?
md_thread+0xf1/0x10f [md_mod]
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885733] [] ?
autoremove_wake_function+0x0/0x2e
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885736] [] ?
md_thread+0x0/0x10f [md_mod]
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885738] [] ?
kthread+0x79/0x81
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885739] [] ?
child_rip+0xa/0x20
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885741] [] ?
kthread+0x0/0x81
Jul 8 15:05:22 ecs-1u kernel: [ 9236.885742] [] ?
child_rip+0x0/0x20

.....

Jul 8 15:07:22 ecs-1u kernel: [ 9356.807402] INFO: task
md126_raid10:3493 blocked for more than 120 seconds.
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807427] "echo 0 >
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807465] md126_raid10 D
0000000000000000 0 3493 2 0x00000000
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807467] ffff88033f02b880
0000000000000046 0000000000000000 0000000a00000006
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807469] 0000006cffffffff
ffff880006e0fa98 000000000000f9e0 ffff88033df07fd8
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807471] 0000000000015780
0000000000015780 ffff88033e79aa60 ffff88033e79ad58
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807473] Call Trace:
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807475] [] ?
sprintf+0x51/0x59
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807477] [] ?
select_task_rq_fair+0x472/0x836
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807479] [] ?
schedule_timeout+0x2e/0xdd
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807481] [] ?
wait_for_common+0xde/0x15b
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807483] [] ?
default_wake_function+0x0/0x9
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807485] [] ?
kthread_create+0x93/0x121
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807488] [] ?
md_thread+0x0/0x10f [md_mod]
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807491] [] ?
__kmalloc+0x12f/0x141
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807494] [] ?
md_register_thread+0x22/0xcc [md_mod]
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807497] [] ?
md_do_sync+0x0/0xaf6 [md_mod]
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807500] [] ?
md_register_thread+0x96/0xcc [md_mod]
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807503] [] ?
md_check_recovery+0x3fd/0x4b9 [md_mod]
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807506] [] ?
flush_pending_writes+0x13/0x8a [raid10]
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807508] [] ?
raid10d+0x42/0xade [raid10]
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807510] [] ?
thread_return+0x79/0xe0
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807511] [] ?
apic_timer_interrupt+0xe/0x20
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807513] [] ?
thread_return+0xd6/0xe0
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807515] [] ?
schedule_timeout+0x2e/0xdd
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807518] [] ?
md_thread+0xf1/0x10f [md_mod]
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807520] [] ?
autoremove_wake_function+0x0/0x2e
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807522] [] ?
md_thread+0x0/0x10f [md_mod]
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807524] [] ?
kthread+0x79/0x81
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807526] [] ?
child_rip+0xa/0x20
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807527] [] ?
kthread+0x0/0x81
Jul 8 15:07:22 ecs-1u kernel: [ 9356.807529] [] ?
child_rip+0x0/0x20

4. Eventually, the server is restarted because it's just hanging on cat
/proc/mdstat

Jul 12 00:11:06 ecs-1u kernel: [300990.576353] md: ioctl lock
interrupted, reason -4, cmd -2142762735
Jul 12 00:15:16 ecs-1u kernel: [301240.301494] md: ioctl lock
interrupted, reason -4, cmd -2142762735
Jul 12 00:17:35 ecs-1u kernel: [301379.418775] md: ioctl lock
interrupted, reason -4, cmd -2142762735
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html