Best way to parse this type of data.

Best way to parse this type of data.

am 10.07.2011 11:36:34 von shadow52

Hello Everyone,

I have finally hit my max times of banging my head on the best way to
parse some data I have like the following below:

name = "Programming Perl"
distributor = "O'Reilly"
pages = 1077
edition = "2nd"
Authors = "Larry Wall
Tom Christiansen
Jon Orwant"

The last line is giving me some trouble it has three newline seprators
which stops me from being able to use a split function like the
following:

my ( $name, $distributor, $pages, $edition, $Authors ) = split( "\n",
$stanzas);

When I print them out I get the following:

name = "Programming Perl"
distributor = "O'Reilly"
pages = 1077
edition = "2nd"
Authors = "Larry Wall


So my last to authors are left out.

I have even tried a split like the following:

split( "(\"\$\n|\n", $stanzas); This still did work as I though it
would it seems I am just missing one little thing.


What I was hoping for is where on the net or in a book that I would
need to read to get this to work. I assume I will need to use a regex
in the split command I am hoping for a little guidance in the right
direction of where I need to go.

Thanks


--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/

Re: Best way to parse this type of data.

am 11.07.2011 11:39:24 von Shlomi Fish

Hi shadow52,

On Sun, 10 Jul 2011 02:36:34 -0700 (PDT)
shadow52 wrote:

> Hello Everyone,
>=20
> I have finally hit my max times of banging my head on the best way to
> parse some data I have like the following below:
>=20
> name =3D "Programming Perl"
> distributor =3D "O'Reilly"
> pages =3D 1077
> edition =3D "2nd"
> Authors =3D "Larry Wall
> Tom Christiansen
> Jon Orwant"
>=20
> The last line is giving me some trouble it has three newline seprators
> which stops me from being able to use a split function like the
> following:
>=20
> my ( $name, $distributor, $pages, $edition, $Authors ) =3D split( "\n",
> $stanzas);
>=20

Try looking at the techniques in:

http://perl-begin.org/uses/text-parsing/

Especially look at /g /c and \G :

You can try doing something like (untested):


my $string =3D slurp($filename);

pos($string) =3D 0;
my @results;
while (pos($string) < length($string))
{
if (my ($field_name) =3D $string =3D~ m{\G(\w+)\s*=3D\s*}g))
{
my $value;
if ($string =3D~ m{\G"}gc)
{
if (($value) =3D ($string =3D~ m{\G([^"]+)"\n}gms))
{
# Everything is OK.
}
else
{
die "Cannot match quoted value.";
}
}
else
{
if (($value) =3D ($string =3D~ m{\G(\S+)\n}g)
{
# Everything is OK.
}
else
{
die "Cannot match single-line/non-whitespace
value.";
}
}
push @results, { name =3D> $field_name, value =3D> $value };
}
else
{
die "Cannot match field name!";
}
}


Regards,

Shlomi Fish

--=20
------------------------------------------------------------ -----
Shlomi Fish http://www.shlomifish.org/
Optimising Code for Speed - http://shlom.in/optimise

JATFM == â€=9CJust answer the fabulous manâ€=9D

Please reply to list if it's a mailing list post - http://shlom.in/reply .

--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/

Re: Best way to parse this type of data.

am 11.07.2011 12:34:34 von Octavian Rasnita

From: "shadow52"

> Hello Everyone,
>
> I have finally hit my max times of banging my head on the best way to
> parse some data I have like the following below:
>
> name = "Programming Perl"
> distributor = "O'Reilly"
> pages = 1077
> edition = "2nd"
> Authors = "Larry Wall
> Tom Christiansen
> Jon Orwant"
>
> The last line is giving me some trouble it has three newline seprators
> which stops me from being able to use a split function like the
> following:



You can do something like:

use strict;

my $content = do { local $/; };

my %elements = $content =~ /^\s*([^=]+)\s*=\s*"([^"]+)/gsm;

use Data::Dump 'pp';print pp \%elements;

__DATA__
name = "Programming Perl"
distributor = "O'Reilly"
pages = 1077
edition = "2nd"
Authors = "Larry Wall
Tom Christiansen
Jon Orwant"

This will print:

{
"Authors " => "Larry Wall\nTom Christiansen\nJon Orwant",
"distributor " => "O'Reilly",
"edition " => "2nd",
"name " => "Programming Perl",
}

The important line is:

my %elements = $content =~ /^\s*([^=]+)\s*=\s*"([^"]+)/gsm;

It gets everything from the beginning of the line (but not the eventual
spaces at the beginning of the line), until the first "=" sign, as the key
for the hash, and the value is everything what's not a '"' char between the
first " char and the next " char, as the value for that key.

Where you have a value with more lines, it will remain the same, and then
you will be able to split it by those line endings if you will need that.

Octavian


--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/

Re: Best way to parse this type of data.

am 11.07.2011 19:43:13 von Rob Dixon

On 10/07/2011 10:36, shadow52 wrote:
> Hello Everyone,
>
> I have finally hit my max times of banging my head on the best way to
> parse some data I have like the following below:
>
> name = "Programming Perl"
> distributor = "O'Reilly"
> pages = 1077
> edition = "2nd"
> Authors = "Larry Wall
> Tom Christiansen
> Jon Orwant"
>
> The last line is giving me some trouble it has three newline seprators
> which stops me from being able to use a split function like the
> following:
>
> my ( $name, $distributor, $pages, $edition, $Authors ) = split( "\n",
> $stanzas);
>
> When I print them out I get the following:
>
> name = "Programming Perl"
> distributor = "O'Reilly"
> pages = 1077
> edition = "2nd"
> Authors = "Larry Wall
>
>
> So my last to authors are left out.
>
> I have even tried a split like the following:
>
> split( "(\"\$\n|\n", $stanzas); This still did work as I though it
> would it seems I am just missing one little thing.
>
>
> What I was hoping for is where on the net or in a book that I would
> need to read to get this to work. I assume I will need to use a regex
> in the split command I am hoping for a little guidance in the right
> direction of where I need to go.

To do what you describe, I suggest that you use a regex to match either
form of record and look for all ocurrences in the file. The program
below shows my point.

However, I suspect that there is more processing to be done after the
data has been separated, and this may well be better donw while
accumulating each subrecord in a different way.

HTH,

Rob


use strict;
use warnings;

my $data = do {
local $/;
;
};

my @data = $data =~ /(\w+ \s*=\s* (?: "[^"]*" | .*? ) ) \s*$/mgx;

use Data::Dumper;
print Data::Dumper->Dump([\@data], ['*data']);

__DATA__
name = "Programming Perl"
distributor = "O'Reilly"
pages = 1077
edition = "2nd"
Authors = "Larry Wall
Tom Christiansen
Jon Orwant"


**OUTPUT**


@data = (
'name = "Programming Perl"',
'distributor = "O\'Reilly"',
'pages = 1077',
'edition = "2nd"',
'Authors = "Larry Wall
Tom Christiansen
Jon Orwant"'
);

--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/

Re: Best way to parse this type of data.

am 11.07.2011 23:10:30 von jwkrahn

shadow52 wrote:
> Hello Everyone,

Hello,

> I have finally hit my max times of banging my head on the best way to
> parse some data I have like the following below:
>
> name = "Programming Perl"
> distributor = "O'Reilly"
> pages = 1077
> edition = "2nd"
> Authors = "Larry Wall
> Tom Christiansen
> Jon Orwant"
>
> The last line is giving me some trouble

The last line there is:

Jon Orwant"


> it has three newline seprators

That is not how "lines" are normally defined in Perl.


> which stops me from being able to use a split function like the
> following:
>
> my ( $name, $distributor, $pages, $edition, $Authors ) = split( "\n",
> $stanzas);

Why is all that data in $stanzas in the first place?


> When I print them out I get the following:
>
> name = "Programming Perl"
> distributor = "O'Reilly"
> pages = 1077
> edition = "2nd"
> Authors = "Larry Wall
>
>
> So my last to authors are left out.

last two authors


> I have even tried a split like the following:
>
> split( "(\"\$\n|\n", $stanzas); This still did work as I though it
> would it seems I am just missing one little thing.
>
>
> What I was hoping for is where on the net or in a book that I would
> need to read to get this to work. I assume I will need to use a regex
> in the split command I am hoping for a little guidance in the right
> direction of where I need to go.

Show us how you got the data into $stanzas.



John
--
Any intelligent fool can make things bigger and
more complex... It takes a touch of genius -
and a lot of courage to move in the opposite
direction. -- Albert Einstein

--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/

Re: Best way to parse this type of data.

am 12.07.2011 08:30:23 von shadow52

Hello Everyone,

Thanks for all the help the 3 examples were what I needed and took care of the problem in 3 different ways. Thanks for the reference to the perl parsing site a great read so far.


--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/