Time::HiRes < 1.91 and glibc 2.4 incompatibility

Time::HiRes < 1.91 and glibc 2.4 incompatibility

am 12.04.2008 14:46:03 von Mark Seger

A long time ago I found a very peculiar timing bug in my open source
performance monitoring tool 'collectl' - I discovered that when glibc
went from version 2.3 to 2.4 it changed the time resolution from
microseconds to nanoseconds, going from 32 bits to 64 bits. It also
tuned out at the time the only one to make the move to that newer glibc
was SuSE. Anyhow, that change broke Time::HiRes for any timing greater
than 4.2 seconds!

I contacted the author of HiRes and he fixed it in the 1.91 release and
things have been fine since. Then yesterday I got an email from a user
who reported unusual timing problems with inconsistent monitoring
intervals which stumped me because collectl does very precise timing,
down to usecs. After a lot of digging around I realized this was the
same problem. Furthermore I also noticed even RHEL5.1 is only using
HiRes 1.86, though I also see they're running glibc 2.5. My first fear
was this is gonna break everywhere but now I'm also thinking it may have
been glibc 2.4 specific.

What I would like to do is check the version of HiRes someone is using
along with which version of glibc they've got and warn them if there's a
problem. I do know I can get the version of HiRes via
Time::HiRes->VERSION, but don't know if there's any way to get a library
version. I know on redhat I can see a version in the library name, but
I don't know if that will always be the case on all distros. I also
don't want to put too much pain into this because things do see to work
ok with 2.5 and so there may be a very small number of systems effected.
However I'm always looking to reduce support questions from users and
as the popularity of collectl grows I want to head off as much of this
sort of thing in the future if I can.

As a bonus question does anyone have any additional experiences with
versions of HiRes and glibc incompatibilities? and if so am I'm right
that things are ok with 2.5?

-mark

Re: Time::HiRes < 1.91 and glibc 2.4 incompatibility

am 13.04.2008 14:45:58 von smallpond

On Apr 12, 8:46 am, Mark Seger wrote:
> A long time ago I found a very peculiar timing bug in my open source
> performance monitoring tool 'collectl' - I discovered that when glibc
> went from version 2.3 to 2.4 it changed the time resolution from
> microseconds to nanoseconds, going from 32 bits to 64 bits. It also
> tuned out at the time the only one to make the move to that newer glibc
> was SuSE. Anyhow, that change broke Time::HiRes for any timing greater
> than 4.2 seconds!


What call into glibc changed?

Re: Time::HiRes < 1.91 and glibc 2.4 incompatibility

am 13.04.2008 16:18:01 von Mark Seger

smallpond wrote:
> On Apr 12, 8:46 am, Mark Seger wrote:
>> A long time ago I found a very peculiar timing bug in my open source
>> performance monitoring tool 'collectl' - I discovered that when glibc
>> went from version 2.3 to 2.4 it changed the time resolution from
>> microseconds to nanoseconds, going from 32 bits to 64 bits. It also
>> tuned out at the time the only one to make the move to that newer glibc
>> was SuSE. Anyhow, that change broke Time::HiRes for any timing greater
>> than 4.2 seconds!
>
>
> What call into glibc changed?
I honestly don't know the details. What I do know is if you call ualarm
with a number greater than 4.2M (actually 2**32-1), it will NOT produce
the desired wait if you're using glbic 2.4. It you update HiRes to
V1.91 or greater it will. It seems that this is not a problem with
glibc 2.5 but it would be nice to hear some more confirmation about 2.5.

The following is from the change log for HiRes:

1.91 [2006-09-28]
- ualarm() in SuSE was overflowing after ~4.2 seconds,
probably due to a glibc bug, workaround by using the
setitimer() variant if either useconds or interval >= IV_1E6
(this case seems to vary between systems: are useconds
more than 999_999 for ualarm() defined or not)

Does this help?

-mark

Re: Time::HiRes < 1.91 and glibc 2.4 incompatibility

am 13.04.2008 22:33:26 von smallpond

On Apr 13, 10:18 am, Mark Seger wrote:
> smallpond wrote:
> > On Apr 12, 8:46 am, Mark Seger wrote:
> >> A long time ago I found a very peculiar timing bug in my open source
> >> performance monitoring tool 'collectl' - I discovered that when glibc
> >> went from version 2.3 to 2.4 it changed the time resolution from
> >> microseconds to nanoseconds, going from 32 bits to 64 bits. It also
> >> tuned out at the time the only one to make the move to that newer glibc
> >> was SuSE. Anyhow, that change broke Time::HiRes for any timing greater
> >> than 4.2 seconds!
>
> > What call into glibc changed?
>
> I honestly don't know the details. What I do know is if you call ualarm
> with a number greater than 4.2M (actually 2**32-1), it will NOT produce
> the desired wait if you're using glbic 2.4. It you update HiRes to
> V1.91 or greater it will. It seems that this is not a problem with
> glibc 2.5 but it would be nice to hear some more confirmation about 2.5.
>
> The following is from the change log for HiRes:
>
> 1.91 [2006-09-28]
> - ualarm() in SuSE was overflowing after ~4.2 seconds,
> probably due to a glibc bug, workaround by using the
> setitimer() variant if either useconds or interval >= IV_1E6
> (this case seems to vary between systems: are useconds
> more than 999_999 for ualarm() defined or not)
>
> Does this help?
>
> -mark


The useconds_t type is only defined to support values up to 1,000,000.
Depending on undefined behavior is a mistake on the caller's part.
The AIX C library also returns an error if values >1M are passed in;
it's not a glibc bug.
ualarm is replaced by setitimer, which has seconds and microseconds.
--S

Re: Time::HiRes < 1.91 and glibc 2.4 incompatibility

am 14.04.2008 13:50:52 von Mark Seger

> The useconds_t type is only defined to support values up to 1,000,000.
> Depending on undefined behavior is a mistake on the caller's part.
> The AIX C library also returns an error if values >1M are passed in;
> it's not a glibc bug.
> ualarm is replaced by setitimer, which has seconds and microseconds.
> --S

When I first started using sigalrm in my tool many years ago, I was
testing on some redhat 7.2 systems as well as and redhat 9. I'm pretty
sure HiRes only called out time in usecs and didn't specify an upper
limit and it's just worked fine ever since until glibc 2.4. I'm not
sure what was changed internally but it still works just fine now as
long as you use a newer version of the module. There must be other
timer calls that do allow you to exceed 4.2 seconds because it wouldn't
make any sense to have a timer this accurate but not for longer
durations and so maybe HiRes determined which call to make based on your
request? Or maybe it just uses a call that does allow time >4.2 seconds?

All that said, I am very impressed with the accuracy of this timer
because I can literally get my code to within a clock tick of accuracy,
something I think is very lacking in most of the standard performance
monitoring tools - since many of them don't provide long term logging or
fine-grained timestamps, people just don't realize it.

-mark

Re: Time::HiRes < 1.91 and glibc 2.4 incompatibility

am 15.04.2008 18:07:33 von Mark Seger

> The useconds_t type is only defined to support values up to 1,000,000.
> Depending on undefined behavior is a mistake on the caller's part.
> The AIX C library also returns an error if values >1M are passed in;
> it's not a glibc bug.
> ualarm is replaced by setitimer, which has seconds and microseconds.
> --S

I've been thinking about this some more, and I guess the question in my
mind is how did this ever work? pre HiRes .91, ularm of 10 seconds
works with glibc 2.3 and doesn't work with glibc 2.4.

Has nobody else tripped over this?

-mark

Re: Time::HiRes < 1.91 and glibc 2.4 incompatibility

am 15.04.2008 20:58:11 von Martijn Lievaart

On Tue, 15 Apr 2008 12:07:33 -0400, Mark Seger wrote:

>> The useconds_t type is only defined to support values up to 1,000,000.
>> Depending on undefined behavior is a mistake on the caller's part. The
>> AIX C library also returns an error if values >1M are passed in; it's
>> not a glibc bug.
>> ualarm is replaced by setitimer, which has seconds and microseconds.
>> --S
>
> I've been thinking about this some more, and I guess the question in my
> mind is how did this ever work? pre HiRes .91, ularm of 10 seconds
> works with glibc 2.3 and doesn't work with glibc 2.4.

If something is defined only when holds it is not defined
that it will not work doesn't hold. It might even work in
case condition is not satisfied! But depending on that undefined
behaviour might break in the next release, as was most probably the case
here.

M4