Parsing file
am 02.06.2011 11:49:40 von Aravind Venkatesan--00248c70f3f550ec5004a4b78e2f
Content-Type: text/plain; charset=ISO-8859-1
Hi,
I want to parse a file with contents that looks as follows:
ENTRY K00001 KO
NAME E1.1.1.1, adh
DEFINITION alcohol dehydrogenase [EC:1.1.1.1]
PATHWAY ko00010 Glycolysis / Gluconeogenesis
ko00071 Fatty acid metabolism
///
ENTRY K14865 KO
NAME U14snoRNA, snR128
DEFINITION U14 small nucleolar RNA
CLASS Genetic Information Processing; Translation; Ribosome Biogenesis
[BR:ko03009]
///
ENTRY K14866 KO
NAME U18snoRNA, snR18
DEFINITION U18 small nucleolar RNA
CLASS Genetic Information Processing; Translation; Ribosome Biogenesis
[BR:ko03009]
///
each record ends with "///". The ultimate aim is to store information from
each record (for instance ENTRY, NAME) in a data structure (hash) such as
(ENTRY => K14865; NAME => [U14snoRNA, snR128]... so on)
so to start of I have produced the following snippet:
use strict;
use warnings;
use Carp;
use Data::Dumper;
my $set = &parse("D:/workspace/KEGG_Parser/data/ko");
sub parse {
my $keggFile = shift;
my $keggHash;
open my $fh, '<', $keggFile || croak ("Cannot open file '$keggFile':
$!");
my $contents = do {local $/; <$fh>};
my @rec = split ('///', $contents);
foreach my $line (@{rec}){
next if ($line =~ /^\s*$/);
if ($line =~ /^ENTRY\s{7}(.+?)\s+/){
$keggHash->{'ENTRY'}= $1;
}
elsif ($line =~ /^NAME\s{8}(.+?)$/){
push @{$keggHash->{'NAME'}}, $1;
}
else{}
print Dumper($keggHash);
close $fh;
}
The output I get is
$VAR1 = {
'ENTRY' => 'K00001'
};
Not all the lines in each element of @rec is getting read.I would appreciate
if somebody could guide me through this.
Thank to all,
--
Aravind Venkatesan
Research Fellow,
Systems Biology Group,
Dept. of Biology,
NTNU
--00248c70f3f550ec5004a4b78e2f--