regex heck

am 03.11.2005 12:55:59 von Tom Allison

I've been playing with some regex, Benchmark, and 'slurping' and found
something that I could do (If I could get it to work) but not sure I
want to do it.

Benchmark:
reading a file using:

my $line = do {local $/; <$file>};
versus;
while (<$file>) {
}
is ~2x faster on my machine (caveat).

I want to read check for multiple lines and capture about 5 elements
from that 'paragraph'

paragraph would start with a dated line like:
2005/10/31/12:23:21......12345...Active Configuration failed
--or--
2005/10/31/12:23:21......12345...Configuration Request failed

eventually followed by a line like:
2005/10/31/12:32:54..............THREAD: Complete/12345/4435/

I was thinking this could be done in one regex similar to (not entirely
functional):

m|^([\d\:/]+)\tN\t(\d+)\t((?:Active )? Configuration (?:Request )?
failed)(?:.+?)THREAD: Complete/$2/(\d+)|smg;

I don't have this quite working yet. I was doing OK until I started
trying for multi-line matching. Unless I'm doing something obviously
impossible I'm hoping I can sort this one out before too long.

But then I realized there was another potential problem that I am not
sure how to address. It is possible for multiple instances to
interleave themselves such that item 12345 can have an "Active
Configuration" statement and before I find the "THREAD: Complete" for
the same statement, I run into an "Active Configuration" for item 44532.
To solve this one, I probably need to anchor the regex at the second
match { ((?:Active )? Configuration (?:Request )? failed) } but I
haven't a clue how to do this.

The "Olde School" approach for me would be to take the 'while (<$file>)'
approach and save up found bits of information into a hash until I can
get all the pieces I need for an answer. But I'm enticed by the speed
improvement. I have a LOT of data to read through.

--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org