Profiling Perl with better granularity than Devel::DProf

am 08.10.2007 23:22:13 von Clint Olsen

Hi:

I have managed to get a profile with Devel::DProf, and as I suspected, my
REs from one subroutine are taking the most time, however, I would like to
know if it's possible to get a bit more information than that. For
example, finding out if any of my REs in particular are inefficient in some
way would be helpful.

I ran VTune just for giggles, and I found that I am spending most of my
time in memcpy. I was curious why that was, so I ran callgraph profiling
and got the following execution graph:

main->perlrun->Perl_runops->Perl_regexec->Perl_savepvn->memc py

I tried to do a little Googlin' to figure out what savepn is all about, but
I wasn't able to dig up much from reading perldebguts.

So, I've got two profiles, one that's a bit too general, and one that is
detailed but so low-level I am not sure how to easily relate it to my Perl
source code.

Any suggestions?

Thanks,

-Clint

Re: Profiling Perl with better granularity than Devel::DProf

am 08.10.2007 23:44:34 von xhoster

Clint Olsen wrote:
> Hi:
>
> I have managed to get a profile with Devel::DProf, and as I suspected, my
> REs from one subroutine are taking the most time, however, I would like
> to know if it's possible to get a bit more information than that. For
> example, finding out if any of my REs in particular are inefficient in
> some way would be helpful.
>
> I ran VTune just for giggles, and I found that I am spending most of my
> time in memcpy. I was curious why that was, so I ran callgraph profiling
> and got the following execution graph:
>
> main->perlrun->Perl_runops->Perl_regexec->Perl_savepvn->memc py
>
> I tried to do a little Googlin' to figure out what savepn is all about,
> but I wasn't able to dig up much from reading perldebguts.
>
> So, I've got two profiles, one that's a bit too general, and one that is
> detailed but so low-level I am not sure how to easily relate it to my
> Perl source code.
>
> Any suggestions?

What is your goal? If you just want to figure out which one out of several
regexes in that one subroutine is the culprit, you could use
Devel::SmallProf. I'm generally not all that fond of SmallProf, but this
is the kind of thing it could work well for. Sometimes it dramatically
increases the program's run time so you may want to profile a reduced input
set to shorten the profiling time, if that is possible.

Or you could move each regex into a separate dummy subroutines so that
DProf can pick them up. Or just comment out each regex, one at a time
or in a binary search pattern (assuming you can do so without affecting
downstream execution) and see which one makes the run time drop.

Or you could look at the regex and, knowing what kind if input they
typically operate on, take an educated guess as to which one would be
likely to be doing an exponential amount of backtracking. I'm not sure I
could explain in a compact way how to do this, but usually I know it when I
see it.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.

Re: Profiling Perl with better granularity than Devel::DProf

am 08.10.2007 23:59:38 von Clint Olsen

On 2007-10-08, xhoster@gmail.com wrote:
> What is your goal? If you just want to figure out which one out of
> several regexes in that one subroutine is the culprit, you could use
> Devel::SmallProf. I'm generally not all that fond of SmallProf, but this
> is the kind of thing it could work well for. Sometimes it dramatically
> increases the program's run time so you may want to profile a reduced
> input set to shorten the profiling time, if that is possible.

The goal is just checking that I'm not shooting myself in the foot for any
reason. This thing is only going to get slower as I continue to add code,
so I was just checkpointing my effort so far. My parsing takes on the
order of 10 seconds with my current input set, so I'm all set for runtime
at the moment.

> Or you could move each regex into a separate dummy subroutines so that
> DProf can pick them up. Or just comment out each regex, one at a time or
> in a binary search pattern (assuming you can do so without affecting
> downstream execution) and see which one makes the run time drop.

Yeah, I'd have to probably split them up into subs in order to get what I
want. Since this is a lexer, I cannot afford to disable some patterns
and have things still work. Of course, these kinds of transformations are
very invasive and I'd spend quite a bit of time getting it working right.
The problem with these kinds of programs is that they are a challenge to
get working right (and well), and they are very brittle.

> Or you could look at the regex and, knowing what kind if input they
> typically operate on, take an educated guess as to which one would be
> likely to be doing an exponential amount of backtracking. I'm not sure I
> could explain in a compact way how to do this, but usually I know it when
> I see it.

Yeah, I already did this for a couple of more difficult RE's. I printed
out debug traces in (?{ }) blocks so it became clear when they were
backtracking, and I felt the pattern should match without it.

Thanks,

-Clint

Re: Profiling Perl with better granularity than Devel::DProf

am 09.10.2007 05:02:22 von Ben Morrow

Quoth Clint Olsen :
>
> I have managed to get a profile with Devel::DProf, and as I suspected, my
> REs from one subroutine are taking the most time, however, I would like to
> know if it's possible to get a bit more information than that. For
> example, finding out if any of my REs in particular are inefficient in some
> way would be helpful.
>
> I ran VTune just for giggles, and I found that I am spending most of my
> time in memcpy. I was curious why that was, so I ran callgraph profiling
> and got the following execution graph:
>
> main->perlrun->Perl_runops->Perl_regexec->Perl_savepvn->memc py

Just out of interest: which version of Perl is this? 5.8.8 doesn't have
a Perl_regexec, though it does have a Perl_regexec_flags and a
Perl_pregexec.

> I tried to do a little Googlin' to figure out what savepn is all about, but
> I wasn't able to dig up much from reading perldebguts.

The Perl_* functions are documented in perlapi (perldebguts is for the
internals of the Perl-level debugger). Perl_savepvn is a simple
string-copy function. AFAICT, the only time it is called directly from
Perl_regexec_flags is when populating the $N match variables (and $`,
$', $&; but you aren't using those, are you? Check if necessary with
Devel::SawAmpersand) so you may want to see if you can reduce the number
of sets of capturing parens.

Ben

Re: Profiling Perl with better granularity than Devel::DProf

am 09.10.2007 17:53:40 von Clint Olsen

On 2007-10-09, Ben Morrow wrote:
> Just out of interest: which version of Perl is this? 5.8.8 doesn't have
> a Perl_regexec, though it does have a Perl_regexec_flags and a
> Perl_pregexec.

This is Perl 5.8.7, I believe.

> The Perl_* functions are documented in perlapi (perldebguts is for the
> internals of the Perl-level debugger). Perl_savepvn is a simple
> string-copy function. AFAICT, the only time it is called directly from
> Perl_regexec_flags is when populating the $N match variables (and $`, $',
> $&; but you aren't using those, are you? Check if necessary with
> Devel::SawAmpersand) so you may want to see if you can reduce the number
> of sets of capturing parens.

Hmm, very interesting. I was using $^N for snarfing matches inside (?{ }),
but the documentation said to avoid $& etc. for performance reasons. I
don't recall reading anything negative about $^N. I'm not sure about using
$1..n inside these kinds of blocks and whether their status would be
predictable.

Any suggestions would be much appreciated.

Thanks,

-Clint

Re: Profiling Perl with better granularity than Devel::DProf

am 09.10.2007 19:28:41 von Ben Morrow

Quoth Clint Olsen :
> On 2007-10-09, Ben Morrow wrote:
> > Just out of interest: which version of Perl is this? 5.8.8 doesn't have
> > a Perl_regexec, though it does have a Perl_regexec_flags and a
> > Perl_pregexec.
>
> This is Perl 5.8.7, I believe.

Uh... that's weird... I presume that Perl_regexec_flags somehow got
truncated, then, as there's no Perl_regexec in 5.8.7 either.

> > The Perl_* functions are documented in perlapi (perldebguts is for the
> > internals of the Perl-level debugger). Perl_savepvn is a simple
> > string-copy function. AFAICT, the only time it is called directly from
> > Perl_regexec_flags is when populating the $N match variables (and $`, $',
> > $&; but you aren't using those, are you? Check if necessary with
> > Devel::SawAmpersand) so you may want to see if you can reduce the number
> > of sets of capturing parens.
>
> Hmm, very interesting. I was using $^N for snarfing matches inside (?{ }),
> but the documentation said to avoid $& etc. for performance reasons. I
> don't recall reading anything negative about $^N. I'm not sure about using
> $1..n inside these kinds of blocks and whether their status would be
> predictable.

Sorry, I was unclear... when I said '$N', I meant $1, $2, etc., not $^N.
In any case, how you get hold of the captures later is unimportant: as
soon as a match contains ordinary capturing parens, perl has to copy the
capture every time it matches, just in case you look in $1 (or $^N)
later. If you're doing a lot of matches which fail, you may save time by
retrying the match with captures only if the pattern actually matches.
If you're actually doing a lot of capturing, then, well, you're going to
spend most of your time copying strings :).

Ben

Re: Profiling Perl with better granularity than Devel::DProf

am 09.10.2007 23:48:36 von Clint Olsen

On 2007-10-09, Ben Morrow wrote:
> Uh... that's weird... I presume that Perl_regexec_flags somehow got
> truncated, then, as there's no Perl_regexec in 5.8.7 either.

You are correct. The function was Perl_regexec_flags. It was truncated
with ellipses and I just didn't see them.

> Sorry, I was unclear... when I said '$N', I meant $1, $2, etc., not $^N.
> In any case, how you get hold of the captures later is unimportant: as
> soon as a match contains ordinary capturing parens, perl has to copy the
> capture every time it matches, just in case you look in $1 (or $^N)
> later. If you're doing a lot of matches which fail, you may save time by
> retrying the match with captures only if the pattern actually matches.
> If you're actually doing a lot of capturing, then, well, you're going to
> spend most of your time copying strings :).

I will definitely review and audit capture buffers to ensure that only the
necessary ones are kept.

Thanks,

-Clint