Sysfs update frequency

am 16.03.2010 22:32:55 von Justin Maggard

I've noticed on recent kernels that /sys/block/md?/md/sync_completed
seems to rarely get updated. What is the expected update interval?
For me, it seems to only update about once every 6% or so during the
resync. Of course, /proc/mdstat has the actual current progress.

Thanks,
-Justin
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Sysfs update frequency

am 16.03.2010 22:52:56 von NeilBrown

On Tue, 16 Mar 2010 14:32:55 -0700
Justin Maggard wrote:

> I've noticed on recent kernels that /sys/block/md?/md/sync_completed
> seems to rarely get updated. What is the expected update interval?
> For me, it seems to only update about once every 6% or so during the
> resync. Of course, /proc/mdstat has the actual current progress.

The expected update time is every 6% - actually 1/16 which is 6.25%.

sync_completed includes a guarantee that all blocks before this point really
have been processed. The number in /proc/mdstat is less precise. The much
of the array has been resynced, but due to the possibility of out-of-order
completion of writes they may not be a contiguous series of blocks.

Providing the guarantee (which is needed for externally-managed metadata)
requires briefly stalling the resync, so I didn't want to do it more often.
I could possibly make it time-bases instead of size-based though.

Is this a problem for you?

NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Sysfs update frequency

am 16.03.2010 23:25:06 von Justin Maggard

On Tue, Mar 16, 2010 at 2:52 PM, Neil Brown wrote:
> On Tue, 16 Mar 2010 14:32:55 -0700
> Justin Maggard wrote:
>
>> I've noticed on recent kernels that /sys/block/md?/md/sync_completed
>> seems to rarely get updated. =A0What is the expected update interval=
?
>> For me, it seems to only update about once every 6% or so during the
>> resync. =A0Of course, /proc/mdstat has the actual current progress.
>
> The expected update time is every 6% - actually 1/16 which is 6.25%.
>
> sync_completed includes a guarantee that all blocks before this point=
really
> have been processed. =A0The number in /proc/mdstat is less precise. =A0=
The much
> of the array has been resynced, but due to the possibility of out-of-=
order
> completion of writes they may not be a contiguous series of blocks.
>
> Providing the guarantee (which is needed for externally-managed metad=
ata)
> requires briefly stalling the resync, so I didn't want to do it more =
often.
> I could possibly make it time-bases instead of size-based though.
>
> Is this a problem for you?
>

Thanks for the info. No, it's not much of a problem, really. Just
seemed strange that an array of 2TB disks could resync for an hour
with no update to sync_completed. I thought I remembered older
kernels updating a lot more frequently, but I could be wrong about
that. So I take it that point is where the resync would resume if the
system was rebooted?

-Justin
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Sysfs update frequency

am 17.03.2010 00:03:08 von Michael Evans

On Tue, Mar 16, 2010 at 3:25 PM, Justin Maggard =
wrote:
> On Tue, Mar 16, 2010 at 2:52 PM, Neil Brown wrote:
>> On Tue, 16 Mar 2010 14:32:55 -0700
>> Justin Maggard wrote:
>>
>>> I've noticed on recent kernels that /sys/block/md?/md/sync_complete=
d
>>> seems to rarely get updated. =A0What is the expected update interva=
l?
>>> For me, it seems to only update about once every 6% or so during th=
e
>>> resync. =A0Of course, /proc/mdstat has the actual current progress.
>>
>> The expected update time is every 6% - actually 1/16 which is 6.25%.
>>
>> sync_completed includes a guarantee that all blocks before this poin=
t really
>> have been processed. =A0The number in /proc/mdstat is less precise. =
=A0The much
>> of the array has been resynced, but due to the possibility of out-of=
-order
>> completion of writes they may not be a contiguous series of blocks.
>>
>> Providing the guarantee (which is needed for externally-managed meta=
data)
>> requires briefly stalling the resync, so I didn't want to do it more=
often.
>> I could possibly make it time-bases instead of size-based though.
>>
>> Is this a problem for you?
>>
>
> Thanks for the info. =A0No, it's not much of a problem, really. =A0Ju=
st
> seemed strange that an array of 2TB disks could resync for an hour
> with no update to sync_completed. =A0I thought I remembered older
> kernels updating a lot more frequently, but I could be wrong about
> that. =A0So I take it that point is where the resync would resume if =
the
> system was rebooted?
>
> -Justin
> --
> To unsubscribe from this list: send the line "unsubscribe linux-raid"=
in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at =A0http://vger.kernel.org/majordomo-info.html
>

Rather than a time basis would it be possible to have a sysfs
paramater which could be tuned via write?

Candidates for this would be something like:

sync_flushes_per_action (fractional unit, every 1/N of the device)

OR

sync_flush_stripes
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" i=
n
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Sysfs update frequency

am 20.03.2010 17:48:28 von Bill Davidsen

Neil Brown wrote:
> On Tue, 16 Mar 2010 14:32:55 -0700
> Justin Maggard wrote:
>
>
>> I've noticed on recent kernels that /sys/block/md?/md/sync_completed
>> seems to rarely get updated. What is the expected update interval?
>> For me, it seems to only update about once every 6% or so during the
>> resync. Of course, /proc/mdstat has the actual current progress.
>>
>
> The expected update time is every 6% - actually 1/16 which is 6.25%.
>
> sync_completed includes a guarantee that all blocks before this point really
> have been processed. The number in /proc/mdstat is less precise. The much
> of the array has been resynced, but due to the possibility of out-of-order
> completion of writes they may not be a contiguous series of blocks.
>
>
Couldn't you just track the outstanding writes by LBA (or similar) and
report that the completion is one less than the lowest write still
outstanding? Since you would only do it when the user requests it, I
don't think the overhead of a list scan or similar would be a show
stopper. Or is that approach too simplistic?

> Providing the guarantee (which is needed for externally-managed metadata)
> requires briefly stalling the resync, so I didn't want to do it more often.
> I could possibly make it time-bases instead of size-based though.
>

Is perfect accuracy needed, just as long as you don't promise to have
synced more than you have? Are you using barriers to be sure the data is
all the way to the platter, or is your stall just "to the device"
anyway? Like any snapshot of a dynamic process, by the time you get the
information it's out of date in any case, so I think a "at least this
much has moved to the device" value would serve.

--
Bill Davidsen
"We can't solve today's problems by using the same thinking we
used in creating them." - Einstein

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Sysfs update frequency

am 23.03.2010 04:22:21 von NeilBrown

On Sat, 20 Mar 2010 12:48:28 -0400
Bill Davidsen wrote:

> Neil Brown wrote:
> > On Tue, 16 Mar 2010 14:32:55 -0700
> > Justin Maggard wrote:
> >
> >
> >> I've noticed on recent kernels that /sys/block/md?/md/sync_completed
> >> seems to rarely get updated. What is the expected update interval?
> >> For me, it seems to only update about once every 6% or so during the
> >> resync. Of course, /proc/mdstat has the actual current progress.
> >>
> >
> > The expected update time is every 6% - actually 1/16 which is 6.25%.
> >
> > sync_completed includes a guarantee that all blocks before this point really
> > have been processed. The number in /proc/mdstat is less precise. The much
> > of the array has been resynced, but due to the possibility of out-of-order
> > completion of writes they may not be a contiguous series of blocks.
> >
> >
> Couldn't you just track the outstanding writes by LBA (or similar) and
> report that the completion is one less than the lowest write still
> outstanding? Since you would only do it when the user requests it, I
> don't think the overhead of a list scan or similar would be a show
> stopper. Or is that approach too simplistic?

I'd have to create a data structure to which I add and remove these LBAs at a
significant rate. It isn't really worth the effort.

>
> > Providing the guarantee (which is needed for externally-managed metadata)
> > requires briefly stalling the resync, so I didn't want to do it more often.
> > I could possibly make it time-bases instead of size-based though.
> >
>
> Is perfect accuracy needed, just as long as you don't promise to have
> synced more than you have? Are you using barriers to be sure the data is
> all the way to the platter, or is your stall just "to the device"
> anyway? Like any snapshot of a dynamic process, by the time you get the
> information it's out of date in any case, so I think a "at least this
> much has moved to the device" value would serve.
>

The information may be used to update metadata, so it is critical that it
doesn't say more than is true. It is safe for it to say less than is true.

A metadata update would always be preceded by a barrier so that the data on
the device is consistent.

"at least this much has moved" isn't much good if it only tells us how many
blocks, not which ones.
The value in sync_completed says "at least all the blocks up to this one have
been synced" which is exactly the information that I want.

NeilBrown
--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html

Re: Sysfs update frequency

am 24.03.2010 20:49:28 von Bill Davidsen

Neil Brown wrote:
> On Sat, 20 Mar 2010 12:48:28 -0400
> Bill Davidsen wrote:
>
>
>> Neil Brown wrote:
>>
>>> On Tue, 16 Mar 2010 14:32:55 -0700
>>> Justin Maggard wrote:
>>>
>>>
>>>
>>>> I've noticed on recent kernels that /sys/block/md?/md/sync_completed
>>>> seems to rarely get updated. What is the expected update interval?
>>>> For me, it seems to only update about once every 6% or so during the
>>>> resync. Of course, /proc/mdstat has the actual current progress.
>>>>
>>>>
>>> The expected update time is every 6% - actually 1/16 which is 6.25%.
>>>
>>> sync_completed includes a guarantee that all blocks before this point really
>>> have been processed. The number in /proc/mdstat is less precise. The much
>>> of the array has been resynced, but due to the possibility of out-of-order
>>> completion of writes they may not be a contiguous series of blocks.
>>>
>>>
>>>
>> Couldn't you just track the outstanding writes by LBA (or similar) and
>> report that the completion is one less than the lowest write still
>> outstanding? Since you would only do it when the user requests it, I
>> don't think the overhead of a list scan or similar would be a show
>> stopper. Or is that approach too simplistic?
>>
>
> I'd have to create a data structure to which I add and remove these LBAs at a
> significant rate. It isn't really worth the effort.
>
>
I thought the current data on outstanding writes could be scanned.
Clearly you have the information somewhere, and while a scan item by
item is ugly and slow, it's in memory and all done only on user request,
so overall overhead is minimal.
>>> Providing the guarantee (which is needed for externally-managed metadata)
>>> requires briefly stalling the resync, so I didn't want to do it more often.
>>> I could possibly make it time-bases instead of size-based though.
>>>
>>>
>> Is perfect accuracy needed, just as long as you don't promise to have
>> synced more than you have? Are you using barriers to be sure the data is
>> all the way to the platter, or is your stall just "to the device"
>> anyway? Like any snapshot of a dynamic process, by the time you get the
>> information it's out of date in any case, so I think a "at least this
>> much has moved to the device" value would serve.
>>
>>
>
> The information may be used to update metadata, so it is critical that it
> doesn't say more than is true. It is safe for it to say less than is true.
>
> A metadata update would always be preceded by a barrier so that the data on
> the device is consistent.
>
> "at least this much has moved" isn't much good if it only tells us how many
> blocks, not which ones.
> The value in sync_completed says "at least all the blocks up to this one have
> been synced" which is exactly the information that I want.
>
>
That's why I wanted the LBA of the last contiguous sector written, the
lowest LBA initiated but not completed is one greater than that.

--
Bill Davidsen
"We can't solve today's problems by using the same thinking we
used in creating them." - Einstein

--
To unsubscribe from this list: send the line "unsubscribe linux-raid" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at http://vger.kernel.org/majordomo-info.html