sw raid5 hungs on resync and high IO load, 2.6.32.23
sw raid5 hungs on resync and high IO load, 2.6.32.23
am 27.10.2010 09:35:17 von Martin Hamrle
Hi,
I'm having this issue on several boxes with several configuration.
One of them is a box with 8 drives attached to ARC-1160 in pass through
mode and build sw raid5 from these drives. There is also one drive to OS.
During resync or check and heavy IO load, process tscpd (tscpd is IO
load maker) hungs, the machine is still alive but there are many blocked
processes.
After tscpd hungs, IO load is generated only by resync. In traceback you
can see blocked processes (ps, htop cat) accessing tscpd cmdline in
proc. Some tscpd threads is blocked during writing files into fs on
raid5. Reading these files is also blocking, reading other files in
filesystem is fast as usual. This state takes 110 minutes. After that
all blocked processes continue their work.
I am not sure what is the reason of the end of the weird state. I think
the end was caused by starting copying kernel source into array.
Note that this is first time when hung processes wake up I never wait so
long.
I think that it is related to sw raid because I do not see this issue on
hw raid or on sw raid without resync.
kern.log contains initial "INFO: task collectd:2577 blocked for more
than 120 seconds"
and two dumps
echo w > /proc/sysrq-trigger
log is located http://files.nangu.tv/kernel/kern.log
Let me know if you need more info.
Martin
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sw raid5 hungs on resync and high IO load, 2.6.32.23
am 27.10.2010 10:01:17 von NeilBrown
On Wed, 27 Oct 2010 09:35:17 +0200
Martin Hamrle wrote:
> Hi,
>
> I'm having this issue on several boxes with several configuration.
> One of them is a box with 8 drives attached to ARC-1160 in pass through
> mode and build sw raid5 from these drives. There is also one drive to OS.
>
> During resync or check and heavy IO load, process tscpd (tscpd is IO
> load maker) hungs, the machine is still alive but there are many blocked
> processes.
> After tscpd hungs, IO load is generated only by resync. In traceback you
> can see blocked processes (ps, htop cat) accessing tscpd cmdline in
> proc. Some tscpd threads is blocked during writing files into fs on
> raid5. Reading these files is also blocking, reading other files in
> filesystem is fast as usual. This state takes 110 minutes. After that
> all blocked processes continue their work.
>
> I am not sure what is the reason of the end of the weird state. I think
> the end was caused by starting copying kernel source into array.
>
> Note that this is first time when hung processes wake up I never wait so
> long.
>
> I think that it is related to sw raid because I do not see this issue on
> hw raid or on sw raid without resync.
>
> kern.log contains initial "INFO: task collectd:2577 blocked for more
> than 120 seconds"
> and two dumps
> echo w > /proc/sysrq-trigger
>
> log is located http://files.nangu.tv/kernel/kern.log
> Let me know if you need more info.
>
When I try to access your kern.log I get
403 - Forbidden
Just include it in-line in the email.
NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sw raid5 hungs on resync and high IO load, 2.6.32.23
am 27.10.2010 10:50:10 von Mikael Abrahamsson
On Wed, 27 Oct 2010, Martin Hamrle wrote:
> I am not sure what is the reason of the end of the weird state. I think
> the end was caused by starting copying kernel source into array.
It might be a 2.6.32 problem. I booted Ubuntu 10.04 LTS live off of an USB
stick two days ago, proceeded to mount an external USB drive and started
dd:ing my laptop drive to the external drive. To check the progress/speed
I continued to do "apt-get install sysstat" (to get iostat). This install
didn't succeed until the dd was over, I also about 40 gigs into the dd ran
"sync" which blocked also until the dd was over.
So basically, dd:ing an internal 80 gig drive to external usb hd made two
commands ("apt-get install" and sync) block and not succeed until the
write pressure from dd was over. There might be something rotten here...
This was on a Thinkpad X200 laptop with 4 gigs of ram.
--
Mikael Abrahamsson email: swmike@swm.pp.se
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sw raid5 hungs on resync and high IO load, 2.6.32.23
am 27.10.2010 12:48:13 von Martin Hamrle
On 27.10.2010 10:01, Neil Brown wrote:
> On Wed, 27 Oct 2010 09:35:17 +0200
> Martin Hamrle wrote:
>
>> Hi,
>>
>> I'm having this issue on several boxes with several configuration.
>> One of them is a box with 8 drives attached to ARC-1160 in pass through
>> mode and build sw raid5 from these drives. There is also one drive to OS.
>>
>> During resync or check and heavy IO load, process tscpd (tscpd is IO
>> load maker) hungs, the machine is still alive but there are many blocked
>> processes.
>> After tscpd hungs, IO load is generated only by resync. In traceback you
>> can see blocked processes (ps, htop cat) accessing tscpd cmdline in
>> proc. Some tscpd threads is blocked during writing files into fs on
>> raid5. Reading these files is also blocking, reading other files in
>> filesystem is fast as usual. This state takes 110 minutes. After that
>> all blocked processes continue their work.
>>
>> I am not sure what is the reason of the end of the weird state. I think
>> the end was caused by starting copying kernel source into array.
>>
>> Note that this is first time when hung processes wake up I never wait so
>> long.
>>
>> I think that it is related to sw raid because I do not see this issue on
>> hw raid or on sw raid without resync.
>>
>> kern.log contains initial "INFO: task collectd:2577 blocked for more
>> than 120 seconds"
>> and two dumps
>> echo w> /proc/sysrq-trigger
>>
>> log is located http://files.nangu.tv/kernel/kern.log
>> Let me know if you need more info.
>>
> When I try to access your kern.log I get
>
> 403 - Forbidden
Sorry about that, it is fixed now
> Just include it in-line in the email.
>
> NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: sw raid5 hungs on resync and high IO load, 2.6.32.23
am 15.11.2010 02:51:40 von NeilBrown
On Wed, 27 Oct 2010 12:48:13 +0200
Martin Hamrle wrote:
>
> On 27.10.2010 10:01, Neil Brown wrote:
> > On Wed, 27 Oct 2010 09:35:17 +0200
> > Martin Hamrle wrote:
> >
> >> Hi,
> >>
> >> I'm having this issue on several boxes with several configuration.
> >> One of them is a box with 8 drives attached to ARC-1160 in pass through
> >> mode and build sw raid5 from these drives. There is also one drive to OS.
> >>
> >> During resync or check and heavy IO load, process tscpd (tscpd is IO
> >> load maker) hungs, the machine is still alive but there are many blocked
> >> processes.
> >> After tscpd hungs, IO load is generated only by resync. In traceback you
> >> can see blocked processes (ps, htop cat) accessing tscpd cmdline in
> >> proc. Some tscpd threads is blocked during writing files into fs on
> >> raid5. Reading these files is also blocking, reading other files in
> >> filesystem is fast as usual. This state takes 110 minutes. After that
> >> all blocked processes continue their work.
> >>
> >> I am not sure what is the reason of the end of the weird state. I think
> >> the end was caused by starting copying kernel source into array.
> >>
> >> Note that this is first time when hung processes wake up I never wait so
> >> long.
> >>
> >> I think that it is related to sw raid because I do not see this issue on
> >> hw raid or on sw raid without resync.
> >>
> >> kern.log contains initial "INFO: task collectd:2577 blocked for more
> >> than 120 seconds"
> >> and two dumps
> >> echo w> /proc/sysrq-trigger
> >>
> >> log is located http://files.nangu.tv/kernel/kern.log
> >> Let me know if you need more info.
> >>
> > When I try to access your kern.log I get
> >
> > 403 - Forbidden
> Sorry about that, it is fixed now
Thanks.
Unfortunately it doesn't really show anything interesting. Just lots of
threads waiting on locks and such, nothing that even points to a problem with
md.
However some of the back traces are missing. Notice the lines:
Oct 19 13:15:01 osn02 kernel: [72048.851702] md: using 128k window, over a total of 244198464 blocks.
Oct 19 13:38:54 osn02 kernel: 009] [] ? congestion_wait+0x66/0x80
Between those there should be quite a lot of other stack trace info, but the
kernel log buffer wasn't big enough to hold everything so some got lost.
If you boot with
log-buf-len=1M
it will make the log buffer larger so you want lose anything. That *might*
be more helpful, but I cannot promise anything.
NeilBrown
>
> > Just include it in-line in the email.
> >
> > NeilBrown
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html