Adding file contents into hashes

am 08.06.2011 03:17:13 von Aravind Venkatesan

--------------050601050402010007040701
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

Hi,

This is a snippet of the data

ENTRY K00001 KO
NAME E1.1.1.1, adh
DEFINITION alcohol dehydrogenase [EC:1.1.1.1]
PATHWAY ko00010 Glycolysis / Gluconeogenesis
ko00071 Fatty acid metabolism
ko00350 Tyrosine metabolism
ko00625 Chloroalkane and chloroalkene degradation
ko00626 Naphthalene degradation
ko00830 Retinol metabolism
ko00980 Metabolism of xenobiotics by cytochrome P450
ko00982 Drug metabolism - cytochrome P450
///
ENTRY K14865 KO
NAME U14snoRNA, snR128
DEFINITION U14 small nucleolar RNA
CLASS Genetic Information Processing; Translation; Ribosome
Biogenesis [BR:ko03009]
///

I am trying to store this in the following data structure by splitting
the file along the "///" and have each record in a hash with primary key
as the ENTRY number and storing all the other info under that key :

$VAR1 = {
K00001 => {
'NAME' => [

'E1.1.1.1',

'adh'

],
'DEFINITION' =>
'alcohol dehydrogenase [EC:1.1.1.1]',
'PATHWAY' => {

'ko00010' => 'Glycolysis / Gluconeogenesis',
'ko00071' => 'Fatty acid metabolism'

}

I have started off with the following code:

sub parse{
my $kegg_file_path = shift;
my %keggData;
open my $fh, '<', $kegg_file_path || croak ("Cannot open file
'$kegg_file_path': $!");
my $contents = do{local $/, <$fh>};
my @dataArray = split ('///', $contents);
foreach my $currentLine (@dataArray){
if ($currentLine =~ /^ENTRY\s{7}(.+?)\s+/){
my $value = $1;
$keggData{'ENTRY'} = $value;
}
}
print Dumper(%keggData);
close $fh;
}

but not sure how to proceed further and bring it to the data structure
mentioned above, I am new to perl and trying to learn ways of parsing
files so any help would be much appreciated.

thanks,

Aravind

--------------050601050402010007040701--

Re: Adding file contents into hashes

am 08.06.2011 06:15:57 von jwkrahn

venkates wrote:
> Hi,

Hello,

> This is a snippet of the data
>
> ENTRY K00001 KO
> NAME E1.1.1.1, adh
> DEFINITION alcohol dehydrogenase [EC:1.1.1.1]
> PATHWAY ko00010 Glycolysis / Gluconeogenesis
> ko00071 Fatty acid metabolism
> ko00350 Tyrosine metabolism
> ko00625 Chloroalkane and chloroalkene degradation
> ko00626 Naphthalene degradation
> ko00830 Retinol metabolism
> ko00980 Metabolism of xenobiotics by cytochrome P450
> ko00982 Drug metabolism - cytochrome P450
> ///
> ENTRY K14865 KO
> NAME U14snoRNA, snR128
> DEFINITION U14 small nucleolar RNA
> CLASS Genetic Information Processing; Translation; Ribosome Biogenesis
> [BR:ko03009]
> ///
>
> I am trying to store this in the following data structure by splitting
> the file along the "///" and have each record in a hash with primary key
> as the ENTRY number and storing all the other info under that key :
>
> $VAR1 = {
> K00001 => {
> 'NAME' => [
> 'E1.1.1.1',
> 'adh'
> ],
> 'DEFINITION' => 'alcohol dehydrogenase [EC:1.1.1.1]',
> 'PATHWAY' => {
> 'ko00010' => 'Glycolysis / Gluconeogenesis',
> 'ko00071' => 'Fatty acid metabolism'
> }
>
> I have started off with the following code:
>
> sub parse{
> my $kegg_file_path = shift;
> my %keggData;
> open my $fh, '<', $kegg_file_path || croak ("Cannot open file '$kegg_file_path': $!");

Because of the high precedence of the || operator that will only croak()
if the value of $kegg_file_path is FALSE, not if the file cannot be
opened. You need to either use parentheses with open:

open( my $fh, '<', $kegg_file_path ) || croak( "Cannot open file
'$kegg_file_path': $!" );

Or use the low precedence or operator:

open my $fh, '<', $kegg_file_path or croak( "Cannot open file
'$kegg_file_path': $!" );

> my $contents = do{local $/, <$fh>};
> my @dataArray = split ('///', $contents);
> foreach my $currentLine (@dataArray){

That would probably be better as:

local $/ = "///\n";
while ( <$fh> ) {

Why read the whole file in when you are only processing one record at a
time.

> if ($currentLine =~ /^ENTRY\s{7}(.+?)\s+/){

Because you are splitting on '///' the records will start with "\nEntry"
and /^ENTRY/ will only match if 'ENTRY' is at the beginning of the
string, not "\nEntry".

> my $value = $1;
> $keggData{'ENTRY'} = $value;

You don't show a key of 'ENTRY' in your desired data structure.

> }
> }
> print Dumper(%keggData);

That is usually written as:

print Dumper( \%keggData );

> close $fh;
> }

John
--
Any intelligent fool can make things bigger and
more complex... It takes a touch of genius -
and a lot of courage to move in the opposite
direction. -- Albert Einstein

--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/