Question on grep and reading from file

Question on grep and reading from file

am 08.08.2007 23:11:06 von googler

Inside my Perl script, I had to check if a particular pattern appears
in a certain file or not (only a yes/no answer). I did it as below:
@matching_lines = grep { /$srchpat/ } ;
print "Pattern found\n" if ($#matching_lines != -1);

I was wondering if there is a more efficient way to do this. Is it
possible to use the Unix "grep" command to do this inside my script?
If so, how? Will that be more efficient (faster)?

I have another question. Is there a way to read a particular line in a
file when I know the line number (without using a loop and reading
each line at a time)? I guess the below code would work.
@lines = ;
$myline = $lines[$linenum-1];
But this will read the entire file into the array @lines and can take
up a lot of memory if the file is huge. Is there a more efficient
solution?

Thanks.

Re: Question on grep and reading from file

am 08.08.2007 23:25:28 von Mirco Wahab

googler wrote:
> Inside my Perl script, I had to check if a particular pattern appears
> in a certain file or not (only a yes/no answer). I did it as below:
> @matching_lines = grep { /$srchpat/ } ;
> print "Pattern found\n" if ($#matching_lines != -1);
>
> I was wondering if there is a more efficient way to do this. Is it
> possible to use the Unix "grep" command to do this inside my script?
> If so, how? Will that be more efficient (faster)?

In almost all cases, a sequential approach will be *much*
faster on *large* files (>= 100MB), like


...
my @matching_lines;
while( ) {
push @matching_lines, $_
if /$srchpat/
}
print "Pattern found\n"
if scalar @matching_lines;
...

> I have another question. Is there a way to read a particular line in a
> file when I know the line number (without using a loop and reading
> each line at a time)? I guess the below code would work.
> @lines = ;
> $myline = $lines[$linenum-1];
> But this will read the entire file into the array @lines and can take
> up a lot of memory if the file is huge. Is there a more efficient
> solution?

No, not really. Besides the 'tie' approach (which is sometimes
too slow), you can always read large files fast 'record by record'
(eg. lines) and check the line no via "$." ...

Regards

M.

Re: Question on grep and reading from file

am 08.08.2007 23:35:28 von Lawrence Statton

googler writes:
> Inside my Perl script, I had to check if a particular pattern appears
> in a certain file or not (only a yes/no answer). I did it as below:
> @matching_lines = grep { /$srchpat/ } ;
> print "Pattern found\n" if ($#matching_lines != -1);
>
> I was wondering if there is a more efficient way to do this.

Yes, don't look at every line in the file -- stop looking as soon as
you've found any line that matches.

#!/usr/bin/perl
use strict;
use warnings;
use List::Util qw/first/; # part of CORE in modern perls.

open my $myfile, '<', 'myfile' or die "Could not open `myfile' $!";

my $srchpat = qr/larkvomit/i;

print "Pattern found" if first { m/$srchpat/ } <$myfile>;

> Is it
> possible to use the Unix "grep" command to do this inside my script?
> If so, how? Will that be more efficient (faster)?

Could be, depending on a lot of variables -- the overhead of forking
and execing a grep process could easily swamp the gains of a machine
language grep vs. a perl loop.

>
> I have another question. Is there a way to read a particular line in a
> file when I know the line number (without using a loop and reading
> each line at a time)?

Not unless every line is exactly the same size. There are techniques
to hide the read line-by-line and counting behind a curtain, but it
has to happen.

I guess the below code would work.
> @lines = ;
> $myline = $lines[$linenum-1];
> But this will read the entire file into the array @lines and can take
> up a lot of memory if the file is huge. Is there a more efficient
> solution?
>
> Thanks.
>

my $myline;
while ( my $line = <$myfile> ) {
if ($. == $linenum) {
$myline = $line;
last;
}
}

At least this way you don't have to read PAST the line you're
intersted in.

--
Lawrence Statton - lawrenabae@abaluon.abaom s/aba/c/g
Computer software consists of only two components: ones and
zeros, in roughly equal proportions. All that is required is to
place them into the correct order.

Re: Question on grep and reading from file

am 09.08.2007 01:40:44 von xhoster

googler wrote:
> Inside my Perl script, I had to check if a particular pattern appears
> in a certain file or not (only a yes/no answer). I did it as below:
> @matching_lines = grep { /$srchpat/ } ;

This reads the entire file, even if the match is in the first line.
(Potentially worse, it reads the entire into memory at once, as perl
is currently implemented.)

> print "Pattern found\n" if ($#matching_lines != -1);
>
> I was wondering if there is a more efficient way to do this. Is it
> possible to use the Unix "grep" command to do this inside my script?

Sure. There are many ways. The simplest, if $srchpat and $filename don't
require protecting from shell interpretation, and $srchpat either doesn't
have special characters or only has ones that mean the same thing between
Perl and grep, would be something like this:

my $result=`grep -l $srchpat $filename`;

> If so, how? Will that be more efficient (faster)?

It would have more overhead, but will probably run faster once it gets
running (provided $srchpat is fairly simple)

> I have another question. Is there a way to read a particular line in a
> file when I know the line number (without using a loop and reading
> each line at a time)? I guess the below code would work.
> @lines = ;
> $myline = $lines[$linenum-1];
> But this will read the entire file into the array @lines and can take
> up a lot of memory if the file is huge. Is there a more efficient
> solution?

Unless you know how long each line is, or have otherwise pre-computed some
kind of index into the file, you need to read the entire file at least up
to the desired line and count newlines, either implicitly or explicitly.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB

Re: Question on grep and reading from file

am 11.08.2007 10:07:24 von googler

> > I have another question. Is there a way to read a particular line in a
> > file when I know the line number (without using a loop and reading
> > each line at a time)? I guess the below code would work.
> > @lines = ;
> > $myline = $lines[$linenum-1];
> > But this will read the entire file into the array @lines and can take
> > up a lot of memory if the file is huge. Is there a more efficient
> > solution?
>
> Unless you know how long each line is, or have otherwise pre-computed some
> kind of index into the file, you need to read the entire file at least up
> to the desired line and count newlines, either implicitly or explicitly.

OK, say I know how long each line is. How can it help in reading the n-
th line from the file directly? Can you please explain. Thanks.

Re: Question on grep and reading from file

am 11.08.2007 10:27:24 von Peter Makholm

googler writes:

>> Unless you know how long each line is, or have otherwise pre-computed some
>> kind of index into the file, you need to read the entire file at least up
>> to the desired line and count newlines, either implicitly or explicitly.
>
> OK, say I know how long each line is. How can it help in reading the n-
> th line from the file directly? Can you please explain. Thanks.

If you know that each line, including newline, is $x bytes long, you
can read line $n by doing something like:

use Fcntl (:seek);

open FH, '>', $filename;
seek FH, $n*$x, SEEK_SET;
$_ = ;

Note that the length is in bytes, not characters. So doing this on an
utf8 encoded file (or any other variable length encoding) will not
work as expected.

//Makholm

Re: Question on grep and reading from file

am 12.08.2007 02:02:27 von xhoster

googler wrote:
> > > I have another question. Is there a way to read a particular line in
> > > a file when I know the line number (without using a loop and reading
> > > each line at a time)? I guess the below code would work.
> > > @lines = ;
> > > $myline = $lines[$linenum-1];
> > > But this will read the entire file into the array @lines and can take
> > > up a lot of memory if the file is huge. Is there a more efficient
> > > solution?
> >
> > Unless you know how long each line is, or have otherwise pre-computed
> > some kind of index into the file, you need to read the entire file at
> > least up to the desired line and count newlines, either implicitly or
> > explicitly.
>
> OK, say I know how long each line is. How can it help in reading the n-
> th line from the file directly? Can you please explain. Thanks.

Compute where in the file the desired line starts, and use "seek" to
jump to it. See perldoc -f seek.

Xho

--
-------------------- http://NewsReader.Com/ --------------------
Usenet Newsgroup Service $9.95/Month 30GB