Parsing file

am 02.06.2011 11:49:40 von Aravind Venkatesan

--00248c70f3f550ec5004a4b78e2f
Content-Type: text/plain; charset=ISO-8859-1

Hi,

I want to parse a file with contents that looks as follows:

ENTRY K00001 KO
NAME E1.1.1.1, adh
DEFINITION alcohol dehydrogenase [EC:1.1.1.1]
PATHWAY ko00010 Glycolysis / Gluconeogenesis
ko00071 Fatty acid metabolism
///
ENTRY K14865 KO
NAME U14snoRNA, snR128
DEFINITION U14 small nucleolar RNA
CLASS Genetic Information Processing; Translation; Ribosome Biogenesis
[BR:ko03009]
///
ENTRY K14866 KO
NAME U18snoRNA, snR18
DEFINITION U18 small nucleolar RNA
CLASS Genetic Information Processing; Translation; Ribosome Biogenesis
[BR:ko03009]
///

each record ends with "///". The ultimate aim is to store information from
each record (for instance ENTRY, NAME) in a data structure (hash) such as
(ENTRY => K14865; NAME => [U14snoRNA, snR128]... so on)

so to start of I have produced the following snippet:

use strict;
use warnings;
use Carp;
use Data::Dumper;

my $set = &parse("D:/workspace/KEGG_Parser/data/ko");

sub parse {
my $keggFile = shift;
my $keggHash;
open my $fh, '<', $keggFile || croak ("Cannot open file '$keggFile':
$!");
my $contents = do {local $/; <$fh>};
my @rec = split ('///', $contents);

foreach my $line (@{rec}){
next if ($line =~ /^\s*$/);
if ($line =~ /^ENTRY\s{7}(.+?)\s+/){
$keggHash->{'ENTRY'}= $1;
}
elsif ($line =~ /^NAME\s{8}(.+?)$/){

push @{$keggHash->{'NAME'}}, $1;
}
else{}
print Dumper($keggHash);
close $fh;
}

The output I get is

$VAR1 = {
'ENTRY' => 'K00001'
};

Not all the lines in each element of @rec is getting read.I would appreciate
if somebody could guide me through this.

Thank to all,

--
Aravind Venkatesan
Research Fellow,
Systems Biology Group,
Dept. of Biology,
NTNU

--00248c70f3f550ec5004a4b78e2f--