rewriting a single column in an open file... more efficient IO

am 09.05.2011 22:29:09 von Demian

Hello everyone,

I would like to learn an efficient way to change a single column in a
file that is accessed by an external program after the column is
changed each time. open write close is what I have been using. I
thought that tieing could help speed it up. While I didn't dig in too
deeply, my split entry, change value and rejoin didn't seem to gain me
much speed. The test file and script are pasted below. In practice
the file will be about 100 lines long and the 3rd column will be
rewritten thousands of times. Is there a more efficient approach?

example file:

'test'
-------------
foo ab 0
fooa b 0
foob cd 0
foo e 0
fooc f 0
foo ab 0
fooa b 0
foob cd 0
foo e 0
fooc f 0
-------------------------

script:
-------------------------------
use IO::All;
use warnings;
use strict;

my $lines = io('test')->new;

print "$_ \n" foreach @$lines;
print "\n\n\n\n\n";

my @tmp;
foreach (0 .. $#{$lines}){
$tmp[$_] = $_;
}

@$lines = map {
my @sh = split /\s+/, $lines->[$_];
join(" ",$sh[0],$sh[1],$tmp[$_]);
} 0 .. $#{$lines};

--------------------------

cat test:

foo ab 0
fooa b 1
foob cd 2
foo e 3
fooc f 4
foo ab 5
fooa b 6
foob cd 7
foo e 8
fooc f 9
foo ab 10

--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/

Re: rewriting a single column in an open file... more efficient IO

am 10.05.2011 02:52:10 von Uri Guttman

>>>>> "D" == D writes:

D> I would like to learn an efficient way to change a single column in
D> a file that is accessed by an external program after the column is
D> changed each time. open write close is what I have been using. I
D> thought that tieing could help speed it up. While I didn't dig in
D> too deeply, my split entry, change value and rejoin didn't seem to
D> gain me much speed. The test file and script are pasted below. In
D> practice the file will be about 100 lines long and the 3rd column
D> will be rewritten thousands of times. Is there a more efficient
D> approach?

D> -------------------------------
D> use IO::All;

that is an overkill module IMO. it can do all but you don't need all. in
fact iirc it uses file::slurp inside to read whole files. using that
directly will speed it up.

D> use warnings;
D> use strict;

D> my $lines = io('test')->new;

use File::Slurp ;
my $lines = read_file( 'test', { array_ref => 1 } )

just benchmark that against the io call and see which is faster. use the
Benchmark.pm module that comes with perl.

D> print "$_ \n" foreach @$lines;

that is slow to call print for each line. why are you even printing it here?

D> my @tmp;
D> foreach (0 .. $#{$lines}){
D> $tmp[$_] = $_;
D> }

why do you need to build up the array of indexes IN an array? you
already have the indexes below in the map.

D> @$lines = map {
D> my @sh = split /\s+/, $lines->[$_];
D> join(" ",$sh[0],$sh[1],$tmp[$_]);
D> } 0 .. $#{$lines};

that may seem fast to you because it is one line but it can be made MUCH
faster and with much less code. you are doing work there that doesn't
need to be done at all. i have several questions about the data and
change logic

is the file well defined with white space separation? is the third field
always the last field of non-whitespace? is the value of the third
field always 0 to start? is it always replaced by its line number? if
those are all yes, then you can do this and it will blow away your
example in speed (untested):

use File::Slurp ;

my $text ;
read_file( 'test', { buf_ref => \$text } ) ;

my $ind = 0 ;
$text =~ s/0\s*$/$ind++/emg ;

write_file( 'test', { buf_ref => $text } ) ;

done.

benchmark that and i expect it to be seriously faster. the key is the
s/// op on the whole file and no looping is done per line (the looping
is inside the regex due to the /g option). also there is no pulling
apart each line and putting it back together.

and on top of that there is a beta version of File::Slurp which has a
file_edit() call. using it would look like this:

use File::Slurp ;

my $ind = 0 ;
edit_file { s/0\s*$/$ind++/emg } 'test' ;

done. :)

that version should be released pretty soon.

uri

--
Uri Guttman ------ uri@stemsystems.com -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------

--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/

Re: rewriting a single column in an open file... more efficient IO

am 10.05.2011 03:34:48 von Demian

> is the file well defined with white space separation? is the third field
> always the last field of non-whitespace? =A0is the value of the third
> field always 0 to start? is it always replaced by its line number? if
> those are all yes, then you can do this and it will blow away your
> example in speed (untested):
>

Thanks for your response. I apologize for not being clearer. I have a
three-column file that describes a bunch of parameters:

TYPE1 NAME1 VALUE1
TYPE 1 NAME2 VALUE2
.....

The first two are identifying strings that never change values. The
values are floating numbers. This file is read in by an external
program that does some parameter dependent crunching. For each
iteration, while I only vary the values I write the whole file. My
thought was that if I could some how tie the array of values to the
file column, I could both avoid opening and closing the file and
perhaps shave off a bit of time. I'll give File::Slurp a try with
some benchmarking.

Thanks.

D

--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/

Re: rewriting a single column in an open file... more efficient IO

am 10.05.2011 05:39:46 von Uri Guttman

>>>>> "DR" == Demian Riccardi writes:

>> is the file well defined with white space separation? is the third fie=
ld
>> always the last field of non-whitespace? =A0is the value of the third
>> field always 0 to start? is it always replaced by its line number? if
>> those are all yes, then you can do this and it will blow away your
>> example in speed (untested):
>>=20

DR> Thanks for your response. I apologize for not being clearer. I have a
DR> three-column file that describes a bunch of parameters:

it is always important to be clear in problem specifications. your
numbers in the example were just row numbers which didn't mean anything.

DR> TYPE1 NAME1 VALUE1
DR> TYPE 1 NAME2 VALUE2
DR> ....

DR> The first two are identifying strings that never change values. The
DR> values are floating numbers. This file is read in by an external
DR> program that does some parameter dependent crunching. For each
DR> iteration, while I only vary the values I write the whole file. My
DR> thought was that if I could some how tie the array of values to the
DR> file column, I could both avoid opening and closing the file and
DR> perhaps shave off a bit of time. I'll give File::Slurp a try with
DR> some benchmarking.

you need to specify where and how those numbers are brought into this
program. if they are an array of values you can do it one way. if they
are derived based on other data in the row you can do another. not
knowing means i can't improve it for you.

uri

--=20
Uri Guttman ------ uri@stemsystems.com -------- http://www.sysarch.com =
--
----- Perl Code Review , Architecture, Development, Training, Support ----=
--
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com -------=
--

--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/

Re: rewriting a single column in an open file... more efficient IO

am 10.05.2011 10:05:42 von Rob Dixon

On 09/05/2011 21:29, D wrote:
> Hello everyone,
>
> I would like to learn an efficient way to change a single column in a
> file that is accessed by an external program after the column is
> changed each time. open write close is what I have been using. I
> thought that tieing could help speed it up. While I didn't dig in too
> deeply, my split entry, change value and rejoin didn't seem to gain me
> much speed. The test file and script are pasted below. In practice
> the file will be about 100 lines long and the 3rd column will be
> rewritten thousands of times. Is there a more efficient approach?
>
> example file:
>
> 'test'
> -------------
> foo ab 0
> fooa b 0
> foob cd 0
> foo e 0
> fooc f 0
> foo ab 0
> fooa b 0
> foob cd 0
> foo e 0
> fooc f 0
> -------------------------
>
> script:
> -------------------------------
> use IO::All;
> use warnings;
> use strict;
>
> my $lines = io('test')->new;
>
> print "$_ \n" foreach @$lines;
> print "\n\n\n\n\n";
>
> my @tmp;
> foreach (0 .. $#{$lines}){
> $tmp[$_] = $_;
> }
>
>
> @$lines = map {
> my @sh = split /\s+/, $lines->[$_];
> join(" ",$sh[0],$sh[1],$tmp[$_]);
> } 0 .. $#{$lines};
>
>
> --------------------------
>
> cat test:
>
> foo ab 0
> fooa b 1
> foob cd 2
> foo e 3
> fooc f 4
> foo ab 5
> fooa b 6
> foob cd 7
> foo e 8
> fooc f 9
> foo ab 10

This looks like a job for a database. If you used SQLite there would be
no need to set up a server, and as very little data is being modified at
a time it should be lightning-fast.

Rob

--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/

Re: rewriting a single column in an open file... more efficient IO

am 11.05.2011 00:36:12 von derykus

On May 9, 1:29=A0pm, demianricca...@gmail.com (D) wrote:
> Hello everyone,
>
> I would like to learn an efficient way to change a single column in a
> file that is accessed by an external program after the column is
> changed each time. =A0open write close is what I have been using. =A0I
> thought that tieing could help speed it up. =A0While I didn't dig in too
> deeply, my split entry, change value and rejoin didn't seem to gain me
> much speed. =A0The test file and script are pasted below. =A0In practice
> the file will be about 100 lines long and the 3rd column will be
> rewritten thousands of times. =A0Is there a more efficient approach?
>
> example file:
>
> 'test'
> -------------
> =A0 foo =A0 ab =A0 =A0 0
> =A0fooa =A0 =A0b =A0 =A0 0
> =A0foob =A0 cd =A0 =A0 0
> =A0 foo =A0 =A0e =A0 =A0 0
> =A0fooc =A0 =A0f =A0 =A0 0
> =A0 foo =A0 ab =A0 =A0 0
> =A0fooa =A0 =A0b =A0 =A0 0
> =A0foob =A0 cd =A0 =A0 0
> =A0 foo =A0 =A0e =A0 =A0 0
> =A0fooc =A0 =A0f =A0 =A0 0
> -------------------------
>
> script:
> -------------------------------
> use IO::All;
> use warnings;
> use strict;
>
> my $lines =3D io('test')->new;
>
> print "$_ \n" foreach @$lines;
> print "\n\n\n\n\n";
>
> my @tmp;
> foreach (0 .. $#{$lines}){
> =A0$tmp[$_] =3D $_;
>
> }
>
> @$lines =3D map {
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 my @sh =3D split /\s+/, $lines->[$_];
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 join(" =A0 ",$sh[0],$sh[1],$tmp[$_]);
> =A0 =A0 =A0 =A0 =A0 =A0 =A0 } 0 .. $#{$lines};
>
> --------------------------
>
> cat test:
>
> foo =A0 ab =A0 0
> fooa =A0 b =A0 1
> foob =A0 cd =A0 2
> foo =A0 e =A0 3
> fooc =A0 f =A0 4
> foo =A0 ab =A0 5
> fooa =A0 b =A0 6
> foob =A0 cd =A0 7
> foo =A0 e =A0 8
> fooc =A0 f =A0 9
> foo =A0 ab =A0 10

the tie that you mentioned is actually a decent
alternative if speed's not an overriding issue.
For example:

use DB_File;;

tie my %HASH, "DB_File", $file,
O_CREAT|O_RDWR, 0666, $DB_HASH
or die "error opening $file: $! ";

my @valid_keys =3D ('foo ab', 'fooa b', ... ); # valid keys

foreach my $key (@some_keys) {
$HASH{ $key } =3D 'new value'
if grep( $key eq $_, @valid_keys );
}

--
Charles DeRykus

--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/

Re: rewriting a single column in an open file... more efficient IO

am 11.05.2011 06:18:09 von Uri Guttman

>>>>> "CD" == C DeRykus writes:

CD> On May 9, 1:29=A0pm, demianricca...@gmail.com (D) wrote:
>> Hello everyone,
>>=20
>> I would like to learn an efficient way to change a single column in a
>> file that is accessed by an external program after the column is

CD> the tie that you mentioned is actually a decent
CD> alternative if speed's not an overriding issue.

but he did say speed is an issue. according to the OP this is done many
times and he needs a faster version. i highly doubt any tied interface
especially with another layer will do any good here.

uri

--=20
Uri Guttman ------ uri@stemsystems.com -------- http://www.sysarch.com =
--
----- Perl Code Review , Architecture, Development, Training, Support ----=
--
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com -------=
--

--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/

Re: rewriting a single column in an open file... more efficient IO

am 11.05.2011 07:17:19 von derykus

On May 10, 9:18=A0pm, u...@StemSystems.com ("Uri Guttman") wrote:
> >>>>> "CD" == C DeRykus writes:
>
> =A0 CD> On May 9, 1:29=A0pm, demianricca...@gmail.com (D) wrote:
> =A0 >> Hello everyone,
> =A0 >>
> =A0 >> I would like to learn an efficient way to change a single column i=
n a
> =A0 >> file that is accessed by an external program after the column is
>
> =A0 CD> the tie that you mentioned is actually a decent
> =A0 CD> alternative if =A0speed's =A0not an overriding issue.
>
> but he did say speed is an issue. according to the OP this is done many
> times and he needs a faster version. i highly doubt any tied interface
> especially with another layer will do any good here.
>

But, he didn't say speed's the "only" issue - and later on mentioned
"efficiency". I suspect DB_File would be simpler and more
maintainable at least if not an improvement over IO::All and
splitting/rejoining/rewriting the whole file each time too. That's
mainly what I wanted to demo.

Sometimes speed may be a seat-of-the pants perception that
really requires some profiling or a benchmark to shed further
light too. Of course, it may really be dog slow but, without
seeing all the code, the real culprit could be somewhere else.
Other suspects could be in the fray - a busy host, a network
component, locking, etc...

--
Charles DeRykkus

--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/