Replacement in 3GB file

Replacement in 3GB file

am 15.03.2010 13:10:25 von Ganesh Babu N

Dear All,

I am using the following code to replace certain information in binary mode.

$s=time();
open(FH, "$ARGV[0]");
open(OUT, ">$ARGV[1]");
binmode FH;
binmode OUT;
$/=undef;
$line=;
$line=~s!(\d{3}\s(\/[^\n]*? f1)\s*([^\n]+sh\s*)+?\d{3}\s)ns!$1$2!gs
while($line=~/(\d{3}\s(\/[^\n]*? f1)\s*([^\n]+sh\s*)+?\d{3}\s)ns/gs);
print OUT $line;
$e=time();
$r=$e-$s;
close(FH);
close(OUT);
print "Done...\nRuntime: $r seconds";

This is code is loading entire file content and doing the replacement.
If we read line by line we can avoid the out of memory problem. But my
replacement is depending on previous line. The below is the input:

224 /EuclidSymbol f1
(D) -22 673 sh
......
320 ns
......
221 ns

The output should be as follows:

224 /EuclidSymbol f1
(D) -22 673 sh
......
320 /EuclidSymbol f1
......
221 /EuclidSymbol f1

I tried with Tie::File but is not loading Binary data. Please suggest
how can i solve the problem. My file size is around 3GB

Regards,
Ganesh

--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/

Re: Replacement in 3GB file

am 15.03.2010 14:00:14 von jwkrahn

Ganesh Babu N wrote:
> Dear All,
>
> I am using the following code to replace certain information in binary mode.

Did you not like the answers you got from perlmonks.org?


John
--
The programmer is fighting against the two most
destructive forces in the universe: entropy and
human stupidity. -- Damian Conway

--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/

Re: Replacement in 3GB file

am 15.03.2010 14:06:15 von Shlomi Fish

Hi Ganesh!

First a few notes on your code.

On Monday 15 Mar 2010 14:10:25 Ganesh Babu N wrote:
> Dear All,
>
> I am using the following code to replace certain information in binary
> mode.
>
> $s=time();
> open(FH, "$ARGV[0]");
> open(OUT, ">$ARGV[1]");

Please see:

http://perl.net.au/wiki/Freenode_Sharp_Perl_FAQ#How_should_I _write_my_code.3F

> binmode FH;
> binmode OUT;
> $/=undef;
> $line=;
> $line=~s!(\d{3}\s(\/[^\n]*? f1)\s*([^\n]+sh\s*)+?\d{3}\s)ns!$1$2!gs
> while($line=~/(\d{3}\s(\/[^\n]*? f1)\s*([^\n]+sh\s*)+?\d{3}\s)ns/gs);

Why are you doing a /g substitution inside a loop checking for the same regex.
The /g will replace everything. Furthermore, you have defined three captures
and use only two. One of them should be (?:...).

> print OUT $line;
> $e=time();
> $r=$e-$s;
> close(FH);
> close(OUT);
> print "Done...\nRuntime: $r seconds";
>
> This is code is loading entire file content and doing the replacement.
> If we read line by line we can avoid the out of memory problem. But my
> replacement is depending on previous line. The below is the input:

Then keep all the relevant previous lines in an array or string that will
serve as a state.

>
> 224 /EuclidSymbol f1
> (D) -22 673 sh
> .....
> 320 ns
> .....
> 221 ns
>
> The output should be as follows:
>
> 224 /EuclidSymbol f1
> (D) -22 673 sh
> .....
> 320 /EuclidSymbol f1
> .....
> 221 /EuclidSymbol f1
>

Do you want to replace all the "ns" with the "/EuclidSymbol f1". This can be
done using a loop like that:

my $symbol;
my $new_symbol;
while (my $line = <$in_fh>)
{
if (($new_symbol) = ($line =~ /....($symbol_re).../))
{
$symbol = $new_symbol;
print {$out_fh} $line;
}
else
{
$line =~ s{^(\d+\s+)ns}{$1$symbol};
print {$out_fh} $line;
}
}

Hope it helps.

Regards,

Shlomi Fish

> I tried with Tie::File but is not loading Binary data. Please suggest
> how can i solve the problem. My file size is around 3GB
>

> Regards,
> Ganesh

--
------------------------------------------------------------ -----
Shlomi Fish http://www.shlomifish.org/
First stop for Perl beginners - http://perl-begin.org/

Deletionists delete Wikipedia articles that they consider lame.
Chuck Norris deletes deletionists whom he considers lame.

Please reply to list if it's a mailing list post - http://shlom.in/reply .

--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/

Re: Replacement in 3GB file

am 18.03.2010 05:28:21 von Ganesh Babu N

Dear Sholmi,

/EuclidSymbol is not constant. I will vary based on the font used in the fi=
le.

Regards,
Ganesh


On Mon, Mar 15, 2010 at 6:36 PM, Shlomi Fish wrote:
> Hi Ganesh!
>
> First a few notes on your code.
>
> On Monday 15 Mar 2010 14:10:25 Ganesh Babu N wrote:
>> Dear All,
>>
>> I am using the following code to replace certain information in binary
>> mode.
>>
>> $s=3Dtime();
>> open(FH, "$ARGV[0]");
>> open(OUT, ">$ARGV[1]");
>
> Please see:
>
> http://perl.net.au/wiki/Freenode_Sharp_Perl_FAQ#How_should_I _write_my_cod=
e.3F
>
>> binmode FH;
>> binmode OUT;
>> $/=3Dundef;
>> $line=3D;
>> $line=3D~s!(\d{3}\s(\/[^\n]*? f1)\s*([^\n]+sh\s*)+?\d{3}\s)ns!$1$2!gs
>> while($line=3D~/(\d{3}\s(\/[^\n]*? f1)\s*([^\n]+sh\s*)+?\d{3}\s)ns/gs);
>
> Why are you doing a /g substitution inside a loop checking for the same r=
egex.
> The /g will replace everything. Furthermore, you have defined three captu=
res
> and use only two. One of them should be (?:...).
>
>> print OUT $line;
>> $e=3Dtime();
>> $r=3D$e-$s;
>> close(FH);
>> close(OUT);
>> print "Done...\nRuntime: $r seconds";
>>
>> This is code is loading entire file content and doing the replacement.
>> If we read line by line we can avoid the out of memory problem. But my
>> replacement is depending on previous line. The below is the input:
>
> Then keep all the relevant previous lines in an array or string that will
> serve as a state.
>
>>
>> 224 /EuclidSymbol f1
>> (D) -22 673 sh
>> .....
>> 320 ns
>> .....
>> 221 ns
>>
>> The output should be as follows:
>>
>> 224 /EuclidSymbol f1
>> (D) -22 673 sh
>> .....
>> 320 /EuclidSymbol f1
>> .....
>> 221 /EuclidSymbol f1
>>
>
> Do you want to replace all the "ns" with the "/EuclidSymbol f1". This can=
be
> done using a loop like that:
>
> my $symbol;
> my $new_symbol;
> while (my $line =3D <$in_fh>)
> {
> =A0 =A0 =A0 =A0if (($new_symbol) =3D ($line =3D~ /....($symbol_re).../))
> =A0 =A0 =A0 =A0{
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0$symbol =3D $new_symbol;
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0print {$out_fh} $line;
> =A0 =A0 =A0 =A0}
> =A0 =A0 =A0 =A0else
> =A0 =A0 =A0 =A0{
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0$line =3D~ s{^(\d+\s+)ns}{$1$symbol};
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0print {$out_fh} $line;
> =A0 =A0 =A0 =A0}
> }
>
> Hope it helps.
>
> Regards,
>
> =A0 =A0 =A0 =A0Shlomi Fish
>
>> I tried with Tie::File but is not loading Binary data. Please suggest
>> how can i solve the problem. My file size is around 3GB
>>
>
>> Regards,
>> Ganesh
>
> --
> ------------------------------------------------------------ -----
> Shlomi Fish =A0 =A0 =A0 http://www.shlomifish.org/
> First stop for Perl beginners - http://perl-begin.org/
>
> Deletionists delete Wikipedia articles that they consider lame.
> Chuck Norris deletes deletionists whom he considers lame.
>
> Please reply to list if it's a mailing list post - http://shlom.in/reply =
..
>

--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/