putting file columns into arrays

am 21.05.2011 06:32:44 von Eric Mooshagian

Dear All,

I would like a subroutine that will allow me to easily put columns of a =
tab delimited file into their own arrays.

I've been calling the following repeatedly for each column:

my @array1 =3D getcolvals($filehandle, 0);
my @array2 =3D getcolvals($filehandle, 1); ...etc.

sub getcolvals {
@_ and not @_ % 2 or die "Incorrect number of arguments to =
getcolvals!\n";
my $myfile =3D shift;
my $mycol =3D shift;
=09
my @column =3D ();
=09
while (<$myfile>) {
my ($field) =3D (split /\s/, $_)[$mycol];=20
push @column, $field; =20
}

return @column;
}=20

This accomplishes exactly what I want, but it requires going through the =
whole file for each column extraction which seems inefficient. Also, I =
want to know if I can modify the subroutine to return all the (arbitrary =
number of) columns at once into arrays. Any suggestions?

Many thanks,
Eric=

--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/

Re: putting file columns into arrays

am 21.05.2011 07:10:25 von Uri Guttman

>>>>> "EM" == Eric Mooshagian writes:

EM> I would like a subroutine that will allow me to easily put columns
EM> of a tab delimited file into their own arrays.

EM> I've been calling the following repeatedly for each column:

EM> my @array1 = getcolvals($filehandle, 0);
EM> my @array2 = getcolvals($filehandle, 1); ...etc.

whenever you think you need to name things with numeric parts, you
usually need an array. since you want arrays, then you really want an
array of arrays.

EM> sub getcolvals {
EM> @_ and not @_ % 2 or die "Incorrect number of arguments to getcolvals!\n";

that is sort of clunky. why not just check @_ == 2?

@_ == 2 or die ...

EM> my $myfile = shift;
EM> my $mycol = shift;

it is usually better to assign from @_. i posted not to long ago several
reasons why. check the archives for it.

my( $myfile, $mycol ) = @_ ;

and in this case you won't need a $mycol since the code will load all
the columns into arrays.

EM> my @column = ();

you don't need to initialize my arrays to () as my does that for you.

EM> while (<$myfile>) {

this will fail unless you reopen the file each time you call the sub or
you seek to the beginning of the file.

EM> my ($field) = (split /\s/, $_)[$mycol];

since you are slicing the split and getting one value, you don't need
the () around $field.

EM> push @column, $field;

and you can combing both of those lines into one:

push @column, (split /\s/, $_)[$mycol] ;
EM> }

EM> return @column;
EM> }

this is untested:

# this is a faster and easier way to get lines from a file
use File::Slurp ;

sub load_columns {

my( $file_name ) = @_ ;

$file_name or die 'load_columns: missing file name' ;

my @lines = read_file $file_name ;

my $matrix ;

foreach my $line ( @lines ) {

my @fields = split ' ', $line ;

for my $i ( 0 .. $#fields ) {

# build up the array of arrays here. each array gets the next field value

push( @{$matrix[$i]}, $field[$i] ) ;
}
}

return $matrix ;
}

for more on references and perl data structures read:

perlreftut
perllol
perldsc

uri

--
Uri Guttman ------ uri@stemsystems.com -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------

--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/