To extract numbers from files with Perl

To extract numbers from files with Perl

am 11.11.2007 17:58:46 von lucavilla

I have thousands of files named like these:

c:\input\pumico-home.html
c:\input\ofofo-home.html
c:\input\cimaba-office.html
c:\input\plata-home.html
c:\input\plata-office.html
c:\input\zito-home.html

I need a Perl script that only for the files of those that match "c:
\input\*-home.html" performs some regular expression extractions like
in this two examples:

for a "pumico-home.html" that contains:
ziritabcdef12.80tttcucurullumnopq1zzzspugnizuabcdef1.25tttca ntabarramnopq2zzzlocomotoabcdef0.32tttyamazetamnopq1zzz

it generates a "pumico-home-extract.txt" file that contains these
three couples of numbers, delimited by "|":
12.80|1|1.25|2|0.32|1

for a "ofofo-home.html" that contains:
lumabcdef7.44tttcimizetamnopq3zzzpupopoabcdef5.11tttpletoram nopq2zzz

it generates a "ofofo-home-extract.txt" file that contains these two
couples of numbers, delimited by "|":
7.44|3|5.11|2

Note: that the numbers are always in couples as in the examples. The
number of couples in each source file can vary from one to hundreds...


I already found the regular expressions that extract the numbers:
abcdef(\d+\.\d\d)ttt
mnopq(\d+)zzz

I'm stuck on the rest... (including file handling...)


Thanks in advance for any help

Re: To extract numbers from files with Perl

am 11.11.2007 19:14:27 von lucavilla

quasi-solution:

{local @ARGV=; local $^I='.extract.txt'; local $
\=$/;
while( <> ){
print join'|',/([\d.]+)/g if /\d/
}
}

This is still not the solution because it puts the new file in pumico-
home.html and the old file in pumico-home.html.extract.txt

Re: To extract numbers from files with Perl

am 11.11.2007 19:17:58 von Michele Dondi

On Sun, 11 Nov 2007 08:58:46 -0800, Luca Villa
wrote:

>I need a Perl script that only for the files of those that match "c:
>\input\*-home.html" performs some regular expression extractions like
>in this two examples:

You can directly use glob().

>for a "pumico-home.html" that contains:
>ziritabcdef12.80tttcucurullumnopq1zzzspugnizuabcdef1.25tttc antabarramnopq2zzzlocomotoabcdef0.32tttyamazetamnopq1zzz
>
>it generates a "pumico-home-extract.txt" file that contains these

perldoc -f open

>three couples of numbers, delimited by "|":
>12.80|1|1.25|2|0.32|1

local ($,,$\)=("|", "\n");
print /\d+(?:\.\d+)?/g;

>I'm stuck on the rest... (including file handling...)

That is in the docs.


Michele
--
{$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
(($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^ ..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER 256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,

Re: To extract numbers from files with Perl

am 12.11.2007 02:39:46 von Tad McClellan

Luca Villa wrote:
> quasi-solution:
>
> {local @ARGV=; local $^I='.extract.txt'; local $
^^^
^^^
That turns on inplace editing.


> \=$/;
> while( <> ){
> print join'|',/([\d.]+)/g if /\d/
> }
> }
>
> This is still not the solution because it puts the new file in pumico-
> home.html and the old file in pumico-home.html.extract.txt


That's what inplace editing is supposed to do.

If that is not what you wanted done, then you should not have
turned on inplace editing, in which case, you would have to
handle the file naming in your own code.


# untested
foreach my $fname ( glob 'c:/input/*-home.html' ) {
(my $outname = $fname) =~ s/\.html$/-extract.txt/;
open my $extract, '>', $outname or die "could not open '$outname' $!";

local @ARGV = $fname;
local $\ = $/;
while( <> ){
next unless /\d/;
print {$extract} join( '|', /([\d.]+)/g );
}

close $extract;
}


--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"

Re: To extract numbers from files with Perl

am 12.11.2007 23:41:17 von Michele Dondi

On Mon, 12 Nov 2007 01:39:46 GMT, Tad McClellan
wrote:

>That's what inplace editing is supposed to do.
>
>If that is not what you wanted done, then you should not have
>turned on inplace editing, in which case, you would have to
>handle the file naming in your own code.

Speaking of which, the wild feature request of the day is: ^I could
take a subref which will be passed a string (the original filename)
and should return a modified string.


Michele
--
{$_=pack'B8'x25,unpack'A8'x32,$a^=sub{pop^pop}->(map substr
(($a||=join'',map--$|x$_,(unpack'w',unpack'u','G^ ..'KYU;*EVH[.FHF2W+#"\Z*5TI/ER 256),7,249);s/[^\w,]/ /g;$ \=/^J/?$/:"\r";print,redo}#JAPH,