Files and Arrays - Search for values and write to the right

Files and Arrays - Search for values and write to the right

am 15.09.2011 21:48:30 von Rob

I have a file of test results it is formatted as follows:

School |fname| lname | sub| testnum|score| grade|level
MLK School | John | Smith | RE | Test 1| 95| A | Prof
MLK School | John | Smith | RE | Test 2| 97| A | Prof
MLK School | John | Smith | RE | Test 3| 93| A | Prof
MLK School | John | Smith | RE | Test 4| 89| B | NP

What I would like to come out with is as follows:

SCHOOL |fname| lname | sub|
testnum|score| grade|level
MLK School |John|Smith|RE|Test 1| 95| A | Prof| Test 2| 97|A|Prof|
Test 3|93|A|Prof|Test4|89|B|NP

I have started but can not figure out how to get this resulting file.
Here is what I have so far:


$file_to_read ="E:/My Documents/KNOWS/Second Run at Knows/
KNOWS_All_Student_Benchmark_Results_Improved_2011091402.csv" ;
$file_to_write ="E:/My Documents/KNOWS/Second Run at Knows/
KNOWS_All_Student_Benchmark_Results_One_File_2011091402.csv" ;

open( file1, $file_to_read) || die ("could not open file1");

open( file2, '>>',$file_to_write);

while($line= ) {

chomp $line;

(
$schoolname,
$studentkey,
$sfirstname,
$slastname,
$subject_code,
$testkey,
$test_grade,
$test_score,
$test_level
)=split /\|/, $line;

if length($studentkey gt 0) {
while ($line2 = ) {

chomp $line2;
(
$studentkey_file2,
$testkey_file2,
$rest_file2
) = split/\|/, $line2;

if ($studentkey_file2 gt 0 && $studentkey eq
$studentkey_file2) {

}



}
#second while statement end
}
}
#first while statement end

-Studentkey is the information I want to match on but can not figure
out what direction to go.

Can someone please help steer me in the right direction?

Thanks in advance.


--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/

Re: Files and Arrays - Search for values and write to the right

am 16.09.2011 17:32:16 von Brandon McCaig

On Thu, Sep 15, 2011 at 3:48 PM, Rob wrote:
> I have a file of test results it is formatted as follows:
>
>        School |fname| lname | sub| testnum|score| gra=
de|level
> MLK School | John | Smith | RE | Test 1| 95| A | Prof
> MLK School | John | Smith | RE | Test 2| 97| A | Prof
> MLK School | John | Smith | RE | Test 3| 93| A | Prof
> MLK School | John | Smith | RE | Test 4| 89| B | NP
>
> What I would like to come out with is as follows:
>
>                SCHOOL   =C2=
=A0                     =
 |fname| lname | sub|
> testnum|score| grade|level
> MLK School |John|Smith|RE|Test 1| 95| A | Prof| Test 2| 97|A|Prof|
> Test 3|93|A|Prof|Test4|89|B|NP

There are a number of CSV modules available on CPAN. I've only
ever used Text::CSV::Slurp and the file was the standard (?)
quoted-columns-comma-separated format. Regardless, if necessary,
you should be able to configure your module of choice to parse
whichever delimiters that you need.

To critique your code:

You should begin all programs with:

use strict;
use warnings;

> $file_to_read =3D"E:/My Documents/KNOWS/Second Run at Knows/
> KNOWS_All_Student_Benchmark_Results_Improved_2011091402.csv" ;
> $file_to_write =3D"E:/My Documents/KNOWS/Second Run at Knows/
> KNOWS_All_Student_Benchmark_Results_One_File_2011091402.csv" ;

With the above pragmas you would need to declare these variables
with 'my' (which you should be doing anyway).

> open( file1, $file_to_read) || die ("could not open file1");

You should usually use the 3-argument open. In this case
$file_to_read is known to be safe, but the program could be
changed later to allow the user to input that. There's no good
reason here to not use the 3-argument open so you might as well.
:)

You should use lexical file handles with open.

It would be useful to output $! in your die so that the user can
get a hint as to /why/ file1 could not be opened (you might
choose a more descriptive name than file1 also).

open my $in_fh, '<', $file_to_read or
die "Couldn't open '$file_to_read': $!";

> open( file2, '>>',$file_to_write);

You should test /every/ call of open for success (e.g., with `or
die "open: $!"'). Again, a lexical file handle is preferred.

> while($line=3D ) {

Again, with 'strict' you would need to declare $line with 'my':

while(my $line =3D <$in_fh>) {

> chomp $line;
>
> (
> $schoolname,
> $studentkey,
> $sfirstname,
> $slastname,
> $subject_code,
> $testkey,
> $test_grade,
> $test_score,
> $test_level
> )=3Dsplit /\|/, $line;

You appear to have lost some indentation here (or just aren't
indenting this code). Indention is important for the readability
of your code.

> if length($studentkey gt 0) {

I'm not sure exactly what you meant that by. What you're doing
though is checking whether or not $studentkey is alphanumerically
after the string '0', then passing that boolean result into
length... Assuming you actually wanted to test whether or not
$studentkey is a non-zero length string, you should do
`length($studentkey) > 0'.

I'm not certain about this, but I thought that a compound if
statement required the expression to be surrounded by
parenthesis:

if(length($studentkey) > 0) {

perldoc perlsyn appears to agree:
> if (EXPR) BLOCK

AFAICT, $studentkey should be the first name of the student.

>    while ($line2 =3D ) {

Again, $line2 should be declared with 'my'.

>        chomp $line2;
>        (
>            $studentkey_file2,
>            $testkey_file2,
>            $rest_file2
>        ) =3D split/\|/, $line2;

These names are misleading and confusing. :) You should probably
be storing these in a data structure anyway, perhaps as an array
of hash references.

>        if ($studentkey_file2 gt 0 && $studentkey eq
> $studentkey_file2) {

Now you're apparently comparing a file name with 0 in an
alphanumerical sense. :-/ Doesn't really make sense.

> -Studentkey is the information I want to match on but can not figure
> out what direction to go.

Since I'm not familiar with any of the CSV modules I won't bother
trying to offer an example that uses them. You should look into
it though, especially if you encounter data of this nature often.

#!/usr/bin/perl

use strict;
use warnings;

use Data::Dumper;

main() unless caller;

sub copy_hash_elements
{
my ($src, $dest, @elements) =3D @_;

$dest->{$_} =3D $src->{$_} for @elements;

return $dest;
}

sub main
{
my $header_line =3D <>;

my @column_headers =3D map { $_ =3D trim($_); $_; }
split /\|/, $header_line;

my %data;

while(my $line =3D <>)
{
chomp $line;

my @column_values =3D split /\|/, $line;

my %record;

for my $i (0 .. $#column_headers)
{
$record{$column_headers[$i]} =3D
trim($column_values[$i]);
}

my $name =3D "$record{fname} $record{lname}";


unless(defined $data{$name})
{
$data{$name} =3D copy_hash_elements(
\%record,
{},
qw(School fname lname sub));
}

$data{$name}->{data} ||=3D [];

push @{$data{$name}->{data}}, copy_hash_elements(
\%record,
{},
qw(testnum grade score level));
}

print STDERR Data::Dumper->Dump([\%data], ['data']);
}

sub trim
{
my $string =3D shift || '';

$string =3D~ s/\A\s+//g;
$string =3D~ s/\s+\z//g;

return $string;
}

__DATA__
School |fname| lname | sub| testnum|score| grade|level
MLK School | John | Smith | RE | Test 1| 95| A | Prof
MLK School | John | Smith | RE | Test 2| 97| A | Prof
MLK School | John | Smith | RE | Test 3| 93| A | Prof
MLK School | John | Smith | RE | Test 4| 89| B | NP


Example run:

C:\Users\bamccaig>perl -e "do 'test.pl'; print ;" | perl test.pl
$data =3D {
'John Smith' =3D> {
'School' =3D> 'MLK School',
'sub' =3D> 'RE',
'lname' =3D> 'Smith',
'fname' =3D> 'John',
'data' =3D> [
{
'level' =3D> 'Prof',
'grade' =3D> 'A',
'score' =3D> '95',
'testnum' =3D> 'Test 1'
},
{
'level' =3D> 'Prof',
'grade' =3D> 'A',
'score' =3D> '97',
'testnum' =3D> 'Test 2'
},
{
'level' =3D> 'Prof',
'grade' =3D> 'A',
'score' =3D> '93',
'testnum' =3D> 'Test 3'
},
{
'level' =3D> 'NP',
'grade' =3D> 'B',
'score' =3D> '89',
'testnum' =3D> 'Test 4'
}
]
}
};

That should get you on your way. You just need to loop over each
student (with the built-in keys sub), print the basic data, and
then loop over the array referenced by the 'data' hash element to
get the extra data to append.

Disclaimer: In parsing this CSV myself I have made certain
assumptions about the data. I don't work with CSV or know of any
'standards' or whatever for it so I don't know the rules for
properly parsing it without losing information. For example, I'm
trimming excess whitespace off because I think the result is
prettier, but perhaps that whitespace is important? Keep that in
mind. Using a CSV module would probably be easier and more
correct (once you learned how to use it).


--=20
Brandon McCaig
V zrna gur orfg jvgu jung V fnl. Vg qbrfa'g nyjnlf fbhaq gung jnl.
Castopulence Software ..org>

--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/

Re: Files and Arrays - Search for values and write to the right

am 16.09.2011 19:40:24 von Shlomi Fish

Hi Brandon,

On Fri, 16 Sep 2011 11:32:16 -0400
Brandon McCaig wrote:

> On Thu, Sep 15, 2011 at 3:48 PM, Rob wrote:
> > I have a file of test results it is formatted as follows:
> >
> >        School |fname| lname | sub| testnum|score| g=
rade|level
> > MLK School | John | Smith | RE | Test 1| 95| A | Prof
> > MLK School | John | Smith | RE | Test 2| 97| A | Prof
> > MLK School | John | Smith | RE | Test 3| 93| A | Prof
> > MLK School | John | Smith | RE | Test 4| 89| B | NP
> >
> > What I would like to come out with is as follows:
> >
> >                SCHOOL   =
                    =C2=
=A0  |fname| lname | sub|
> > testnum|score| grade|level
> > MLK School |John|Smith|RE|Test 1| 95| A | Prof| Test 2| 97|A|Prof|
> > Test 3|93|A|Prof|Test4|89|B|NP
>=20
> There are a number of CSV modules available on CPAN. I've only
> ever used Text::CSV::Slurp and the file was the standard (?)
> quoted-columns-comma-separated format. Regardless, if necessary,
> you should be able to configure your module of choice to parse
> whichever delimiters that you need.
>=20
> To critique your code:
>=20
> You should begin all programs with:
>=20
> use strict;
> use warnings;

I'd like to thank you for your detailed response. I saw the original code a=
nd
found it an overwhelming task at that point to reply to the original poster,
and so I didn't. Thanks for saving me a lot of work.

Regards,

Shlomi Fish

--=20
------------------------------------------------------------ -----
Shlomi Fish http://www.shlomifish.org/
My Favourite FOSS - http://www.shlomifish.org/open-source/favourite/

Larry Wall is lazy, impatient and full of hubris.

Please reply to list if it's a mailing list post - http://shlom.in/reply .

--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/

Re: Files and Arrays - Search for values and write to the right

am 16.09.2011 20:27:08 von Brandon McCaig

On Fri, Sep 16, 2011 at 11:32 AM, Brandon McCaig wrote=
:
>    my @column_headers =3D map { $_ =3D trim($_); $_; }
>            split /\|/, $header_line;

Sorry, that should be:

my @column_headers =3D map trim($_), split /\|/, $header_line;

I'm still relatively new to using BLOCKs within statements like map
and grep and can never remember what is significant when the block
exits. :) In my original post you can see the result of my uncertainty
(especially when the custom sub that I'm calling has an error in it,
not the map call). Once I fixed my 'trim' sub I forgot to revert map.
Using an EXPR with map is even more confusing to me, but it seems to
work here so I'll leave it at that. ;D


--=20
Brandon McCaig
V zrna gur orfg jvgu jung V fnl. Vg qbrfa'g nyjnlf fbhaq gung jnl.
Castopulence Software ..org>

--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/

Re: Files and Arrays - Search for values and write to the right

am 16.09.2011 20:54:02 von Jim Gibson

On 9/16/11 Fri Sep 16, 2011 11:27 AM, "Brandon McCaig"
scribbled:

> On Fri, Sep 16, 2011 at 11:32 AM, Brandon McCaig wro=
te:
>> =A0 =A0my @column_headers =3D map { $_ =3D trim($_); $_; }
>> =A0 =A0 =A0 =A0 =A0 =A0split /\|/, $header_line;
>=20
> Sorry, that should be:
>=20
> my @column_headers =3D map trim($_), split /\|/, $header_line;

You can also exclude the whitespace surrounding the pipe symbols by
modifying the delimiter pattern for split:

my $column_headers =3D split( /\s*\|\s*/, $header_line);

But that will not trim the whitespace at the beginning of the first column
or at the end of the last column.



--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/

Re: Files and Arrays - Search for values and write to the right

am 16.09.2011 20:55:50 von Paul Johnson

On Fri, Sep 16, 2011 at 02:27:08PM -0400, Brandon McCaig wrote:

> my @column_headers = map trim($_), split /\|/, $header_line;

> Using an EXPR with map is even more confusing to me, but it seems to
> work here so I'll leave it at that. ;D

map EXPR, @a is just the same as map { EXPR } @a.

Some people will tell you that you should never use it. Some people
will tell you that it is faster since perl doesn't have to create a
scope. You can safely ignore all those people and use it when
appropriate.

For me, it is appropriate when, as in this case, it reduces clutter and
lets you focus on the intent of the code rather than the syntax.

--
Paul Johnson - paul@pjcj.net
http://www.pjcj.net

--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/

Re: Files and Arrays - Search for values and write to the right

am 16.09.2011 21:24:10 von Brandon McCaig

On Fri, Sep 16, 2011 at 2:55 PM, Paul Johnson wrote:
> map EXPR, @a is just the same as map { EXPR } @a.

The part that I find confusing is that EXPR is evaluated against each
$_, as opposed to once when map is called, passing the resulting value
into map. :) Obviously it's just one of the many magical features of
Perl at work here. :) I don't think that there's any way for a
user-defined sub to do this; or is there?


--
Brandon McCaig
V zrna gur orfg jvgu jung V fnl. Vg qbrfa'g nyjnlf fbhaq gung jnl.
Castopulence Software

--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/

Re: Files and Arrays - Search for values and write to the right

am 16.09.2011 22:16:06 von Uri Guttman

>>>>> "PJ" == Paul Johnson writes:

PJ> On Fri, Sep 16, 2011 at 02:27:08PM -0400, Brandon McCaig wrote:
>> my @column_headers = map trim($_), split /\|/, $header_line;

>> Using an EXPR with map is even more confusing to me, but it seems to
>> work here so I'll leave it at that. ;D

PJ> map EXPR, @a is just the same as map { EXPR } @a.

PJ> Some people will tell you that you should never use it. Some people
PJ> will tell you that it is faster since perl doesn't have to create a
PJ> scope. You can safely ignore all those people and use it when
PJ> appropriate.

i have never heard anyone say to never use the block form with
map/grep. the expression form is usually fine but there are times when
you need a proper block - i.e. when you have multiple statements.

PJ> For me, it is appropriate when, as in this case, it reduces clutter and
PJ> lets you focus on the intent of the code rather than the syntax.

pretty simple syntax difference IMO. code blocks can only be the first
arg and don't have a comma following them. you can even create your own
subs with that syntax with prototypes (one of the few good uses for
them). see File::Slurp::edit_file for an example. you can't create a sub
which takes a bare expression like map/grep so you can only use the code
block syntax there.

uri

--
Uri Guttman -- uri AT perlhunter DOT com --- http://www.perlhunter.com --
------------ Perl Developer Recruiting and Placement Services -------------
----- Perl Code Review, Architecture, Development, Training, Support -------

--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/

Re: Files and Arrays - Search for values and write to the right

am 16.09.2011 22:26:31 von Uri Guttman

>>>>> "BM" == Brandon McCaig writes:

BM> On Fri, Sep 16, 2011 at 2:55 PM, Paul Johnson wrote:
>> map EXPR, @a is just the same as map { EXPR } @a.

BM> The part that I find confusing is that EXPR is evaluated against each
BM> $_, as opposed to once when map is called, passing the resulting value
BM> into map. :) Obviously it's just one of the many magical features of
BM> Perl at work here. :) I don't think that there's any way for a
BM> user-defined sub to do this; or is there?

the whole point of map/grep is to evaluate the expression/block for each
element of the input list. so there has to be a way to access the
current element in the expression and that is $_. and i just posted in
this thread you can pass a code block to a sub as the first argument by
using prototypes. File::Slurp::edit_file_lines does both of those tricks.

uri

--
Uri Guttman -- uri AT perlhunter DOT com --- http://www.perlhunter.com --
------------ Perl Developer Recruiting and Placement Services -------------
----- Perl Code Review, Architecture, Development, Training, Support -------

--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/