Bookmarks

Yahoo Gmail Google Facebook Delicious Twitter Reddit Stumpleupon Myspace Digg

Search queries

Use of assignment to $[ is deprecated at /usr/local/sbin/apxs line 86. , wwwxxx vim, mysql closing table and opening table, 800c5000, setgid operation not permitted, pciehp: acpi_pciehprm on IBM, WWWXXX.DBF, 078274121, info0a ip, should prodicers of software_based services be held liable or not liable for economic injuries

Links

XODOX
Impressum

#1: looking for efficient way to parse a file

Posted on 2008-01-12 22:40:10 by Eric Martin

Hello,

I have a file with the following data structure:
#category
item name
data1
data2
item name
data1
data2
#category
item name
data1
data2
.... etc.

Any line that starts with #, indicates a new category. Between
categories, there can be any number of items, with associated data.
Each item has exactly two data properties.

My plan was to just get an array that contained the index of each of
the categories and then parse each item from there, since they are in
a set format...but I was wondering if there were any suggestions for a
more efficient way...

Report this message

#2: Re: looking for efficient way to parse a file

Posted on 2008-01-12 23:59:13 by Gunnar Hjalmarsson

Eric Martin wrote:
> I have a file with the following data structure:
> #category
> item name
> data1
> data2
> item name
> data1
> data2
> #category
> item name
> data1
> data2
> ... etc.
>
> Any line that starts with #, indicates a new category. Between
> categories, there can be any number of items, with associated data.
> Each item has exactly two data properties.
>
> My plan was to just get an array that contained the index of each of
> the categories and then parse each item from there, since they are in
> a set format...

Not sure what you mean by that. Could you please expand?

> but I was wondering if there were any suggestions for a
> more efficient way...

Efficient - in what sense?

To me, the described data structure would suggest a HoHoA (hash of
hashes of arrays):

use Data::Dumper;

my (%HoHoA, $cat);
while ( <DATA> ) {
chomp;
if ( substr($_, 0, 1) eq '#' ) {
$cat = substr $_, 1;
next;
}
for my $item ( 0, 1 ) {
chomp( $HoHoA{$cat}{$_}[$item] = <DATA> );
}
}
print Dumper \%HoHoA;

__DATA__
#category1
item1
data1
data2
item2
data1
data2
#category2
item1
data1
data2

--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

Report this message

#3: Re: looking for efficient way to parse a file

Posted on 2008-01-13 00:09:20 by xhoster

Eric Martin <emartin24@gmail.com> wrote:
> Hello,
>
> I have a file with the following data structure:
> #category
> item name
> data1
> data2
> item name
> data1
> data2
> #category
> item name
> data1
> data2
> ... etc.
>
> Any line that starts with #, indicates a new category. Between
> categories, there can be any number of items, with associated data.
> Each item has exactly two data properties.
>
> My plan was to just get an array that contained the index of each of
> the categories

That suggests the categories are already in an array, or else what is the
index the index to? I'd probably not bother to load them into an array
in the first place, just parse it on the fly. Maybe not, depending on
where it was coming from and how big I expected it to plausibly get.

> and then parse each item from there, since they are in
> a set format...but I was wondering if there were any suggestions for a
> more efficient way...

Efficient in what sense? Memory? CPU time? Programmer maintenance time?

Xho

--
-------------------- http://NewsReader.Com/ --------------------
The costs of publication of this article were defrayed in part by the
payment of page charges. This article must therefore be hereby marked
advertisement in accordance with 18 U.S.C. Section 1734 solely to indicate
this fact.

Report this message

#4: Re: looking for efficient way to parse a file

Posted on 2008-01-13 03:38:14 by jurgenex

Eric Martin <emartin24@gmail.com> wrote:
>I have a file with the following data structure:
>#category
>item name
>data1
>data2
>item name
>data1
>data2
>#category
>item name
>data1
>data2
>... etc.
>
>Any line that starts with #, indicates a new category. Between
>categories, there can be any number of items, with associated data.
>Each item has exactly two data properties.

That suggests to me a Hash(category) of Hash(item name) of Array (two data
elements)

>My plan was to just get an array that contained the index of each of
>the categories and then parse each item from there, since they are in

What's an index of a category?

>a set format...but I was wondering if there were any suggestions for a
>more efficient way...

Reading the file line by line in a linear manner is about as efficient as
you can possibly get because you need to read each item at least once and
you don't read it more than once, either. The suggested data structure would
support a linear reading, too.

jue

Report this message

#5: Re: looking for efficient way to parse a file

Posted on 2008-01-13 16:46:26 by Eric Martin

On Jan 12, 2:59 pm, Gunnar Hjalmarsson <nore...@gunnar.cc> wrote:
> Eric Martin wrote:
> > I have a file with the following data structure:
> > #category
> > item name
> > data1
> > data2
> > item name
> > data1
> > data2
> > #category
> > item name
> > data1
> > data2
> > ... etc.
>
> > Any line that starts with #, indicates a new category. Between
> > categories, there can be any number of items, with associated data.
> > Each item has exactly two data properties.
>
> > My plan was to just get an array that contained the index of each of
> > the categories and then parse each item from there, since they are in
> > a set format...
>
> Not sure what you mean by that. Could you please expand?

I was thinking of loading the file into an array, iterating over it to
find the index values for each category, then parsing the data between
each category, using the array of indexes I previously created.
However, your suggestion to use a HoHoA and code sample, proved to be
exactly what I needed.

>
> > but I was wondering if there were any suggestions for a
> > more efficient way...
>
> Efficient - in what sense?

I probably should have said effective ;)

>
> To me, the described data structure would suggest a HoHoA (hash of
> hashes of arrays):
>
> use Data::Dumper;
>
> my (%HoHoA, $cat);
> while ( <DATA> ) {
> chomp;
> if ( substr($_, 0, 1) eq '#' ) {
> $cat = substr $_, 1;
> next;
> }
> for my $item ( 0, 1 ) {
> chomp( $HoHoA{$cat}{$_}[$item] = <DATA> );
> }}
>
> print Dumper \%HoHoA;
>
> __DATA__
> #category1
> item1
> data1
> data2
> item2
> data1
> data2
> #category2
> item1
> data1
> data2
>
> --
> Gunnar Hjalmarsson
> Email:http://www.gunnar.cc/cgi-bin/contact.pl

Thanks for the code sample, it worked great! I didn't realize
referencing <DATA> in the while block would "increment" the record of
the data file.

-Eric

Report this message