2.6.32.28 - md resync + pvmove - crash

2.6.32.28 - md resync + pvmove - crash

am 07.05.2011 12:39:07 von Nikola Ciprich

--===============9157196714549586020==
Content-Type: multipart/signed; micalg=pgp-sha1;
protocol="application/pgp-signature"; boundary="huq684BweRXVnRxX"
Content-Disposition: inline


--huq684BweRXVnRxX
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

Hi,
first, I'm sorry for crossposting and also CCing stable@, if that's not OK,=
please let me knows.
Anyways, we've experienced hang of system running 2.6.32.28.
After upgrading to 2.6.32 and replacing failed disk, md resync has started.=
Then when the technician started pvmove, dome deadlock must have occured, =
because all disk requests started to hang and the whole system had to be re=
booted...

here's the backtrace:

[ 1229.645028] alg: No test for stdrng (krng)
[ 1229.668172] alg: No test for authenc(hmac(sha1),cbc(des3_ede)) (authenc(=
hmac(sha1-generic),cbc(des3_ede-generic)))
[ 1531.585167] md: bind
[ 1531.927846] raid1: raid set md2 active with 1 out of 2 mirrors
[ 1531.934613] md2: detected capacity change from 0 to 2000133029888
[ 1549.850444] md1: bitmap file is out of date (0 < 439231) -- forcing full=
recovery
[ 1549.858719] md1: bitmap file is out of date, doing full recovery
[ 1550.068105] md1: bitmap initialized from disk: read 11/11 pages, set 357=
576 bits
[ 1550.076054] created bitmap (175 pages) for device md1
[ 1561.449841] md2: unknown partition table
[ 1561.501645] md2: bitmap file is out of date (0 < 4) -- forcing full reco=
very
[ 1561.509999] md2: bitmap file is out of date, doing full recovery
[ 1562.158515] md2: bitmap initialized from disk: read 15/15 pages, set 476=
869 bits
[ 1562.167764] created bitmap (233 pages) for device md2
[ 2400.956019] INFO: task kjournald:1038 blocked for more than 120 seconds.
[ 2400.963280] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables =
this message.
[ 2400.971356] kjournald D ffff8800016ac400 0 1038 2 0x000000=
00
[ 2400.978621] ffff88003cc33c60 0000000000000046 ffff88003cc33bd0 ffffffff=
8119ba6f
[ 2400.986513] 0000000000013780 ffff88003f9746b0 ffff88003f9745f0 ffff8800=
3ea2c5f0
[ 2400.994426] ffff88003f9749a0 ffff88003cc33fd8 ffff88003d65b000 ffff8800=
35600a00
[ 2401.002415] Call Trace:
[ 2401.005024] [] ? blk_unplug+0x2f/0xa0
[ 2401.010530] [] ? ktime_get_ts+0xa4/0xd0
[ 2401.016182] [] io_schedule+0x6e/0xc0
[ 2401.021643] [] sync_buffer+0x3e/0x50
[ 2401.027029] [] __wait_on_bit+0x55/0x80
[ 2401.032638] [] ? sync_buffer+0x0/0x50
[ 2401.038177] [] ? sync_buffer+0x0/0x50
[ 2401.043659] [] out_of_line_wait_on_bit+0x78/0x90
[ 2401.050129] [] ? wake_bit_function+0x0/0x30
[ 2401.056143] [] __wait_on_buffer+0x26/0x30
[ 2401.062077] [] journal_commit_transaction+0x657/0x13c=
0 [jbd]
[ 2401.069693] [] ? try_to_del_timer_sync+0x44/0x110
[ 2401.076212] [] ? _spin_unlock_irqrestore+0x1d/0x50
[ 2401.082831] [] kjournald+0xe3/0x260 [jbd]
[ 2401.088708] [] ? autoremove_wake_function+0x0/0x40
[ 2401.095369] [] ? kjournald+0x0/0x260 [jbd]
[ 2401.101337] [] kthread+0x8e/0xa0
[ 2401.106354] [] child_rip+0xa/0x20
[ 2401.111477] [] ? kthread+0x0/0xa0
[ 2401.116598] [] ? child_rip+0x0/0x20
[ 2401.121893] INFO: task flush-253:2:3168 blocked for more than 120 second=
s.
[ 2401.128983] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables =
this message.
[ 2401.137114] flush-253:2 D 0000000000000002 0 3168 2 0x000000=
00
[ 2401.144318] ffff88002c245a40 0000000000000046 ffff880035601600 ffff8800=
2f621840
[ 2401.152248] 0000000000013780 ffff88003ceb9810 ffff88003ceb9750 ffff8800=
3ea2c5f0
[ 2401.160169] ffff88003ceb9b00 ffff88002c245fd8 ffff88002c245a00 ffff8800=
35601600
[ 2401.168048] Call Trace:
[ 2401.170608] [] ? ktime_get_ts+0xa4/0xd0
[ 2401.176303] [] io_schedule+0x6e/0xc0
[ 2401.181723] [] sync_page+0x36/0x50
[ 2401.186970] [] __wait_on_bit_lock+0x4e/0xa0
[ 2401.192991] [] ? sync_page+0x0/0x50
[ 2401.198287] [] __lock_page+0x65/0x70
[ 2401.203687] [] ? wake_bit_function+0x0/0x30
[ 2401.209687] [] write_cache_pages+0x3d6/0x490
[ 2401.215802] [] ? __writepage+0x0/0x40
[ 2401.221291] [] generic_writepages+0x22/0x30
[ 2401.227327] [] do_writepages+0x26/0x30
[ 2401.232965] [] writeback_single_inode+0xa4/0x290
[ 2401.239412] [] writeback_inodes_wb+0x2d2/0x420
[ 2401.245715] [] wb_writeback+0x126/0x1e0
[ 2401.251360] [] wb_do_writeback+0x1a4/0x1c0
[ 2401.257287] [] bdi_writeback_task+0x35/0xd0
[ 2401.263317] [] ? bdi_start_fn+0x0/0xf0
[ 2401.268886] [] bdi_start_fn+0x81/0xf0
[ 2401.274370] [] ? bdi_start_fn+0x0/0xf0
[ 2401.279947] [] kthread+0x8e/0xa0
[ 2401.285000] [] child_rip+0xa/0x20
[ 2401.290120] [] ? kthread+0x0/0xa0
[ 2401.295247] [] ? child_rip+0x0/0x20
[ 2401.300586] INFO: task reiserfs/0:3204 blocked for more than 120 seconds.
[ 2401.307590] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables =
this message.
[ 2401.315682] reiserfs/0 D ffff880016fdad48 0 3204 2 0x000000=
00
[ 2401.322884] ffff88002f1b1d10 0000000000000046 ffff88000180dda0 ffff8800=
0180dec0
[ 2401.330754] 0000000000013780 ffff88003ea180c0 ffff88003ea18000 ffff8800=
2f43aea0
[ 2401.338683] ffff88003ea183b0 ffff88002f1b1fd8 ffff88002f1b1cd0 ffffffff=
81048960
[ 2401.346684] Call Trace:
[ 2401.349252] [] ? update_curr+0xb0/0x170
[ 2401.354983] [] __mutex_lock_slowpath+0x107/0x310
[ 2401.361480] [] mutex_lock+0x27/0x50
[ 2401.366791] [] flush_commit_list+0x137/0x6d0

I can't 100% separate out some hardware problem, but this system has been r=
unning 2.6.27.x rock solid for years till then..
Can somebody see something interesting in those backtraces?
If I can provide further information, I'll be glad to assist...
BR
nik


--=20
-------------------------------------
Ing. Nikola CIPRICH
LinuxBox.cz, s.r.o.
28. rijna 168, 709 01 Ostrava

tel.: +420 596 603 142
fax: +420 596 621 273
mobil: +420 777 093 799

www.linuxbox.cz

mobil servis: +420 737 238 656
email servis: servis@linuxbox.cz
-------------------------------------

--huq684BweRXVnRxX
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.11 (GNU/Linux)

iEYEARECAAYFAk3FIUsACgkQ3xdJJrLygV6+twCfWE+92qK/CCSR+mmDCvSr HvfL
3hcAoL93OACppARVrlXuDIIuGdsvnUGV
=EfAI
-----END PGP SIGNATURE-----

--huq684BweRXVnRxX--


--===============9157196714549586020==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
stable mailing list
stable@linux.kernel.org
http://linux.kernel.org/mailman/listinfo/stable

--===============9157196714549586020==--