Best way to parse this type of data.
Best way to parse this type of data.
am 10.07.2011 11:36:34 von shadow52
Hello Everyone,
I have finally hit my max times of banging my head on the best way to
parse some data I have like the following below:
name = "Programming Perl"
distributor = "O'Reilly"
pages = 1077
edition = "2nd"
Authors = "Larry Wall
Tom Christiansen
Jon Orwant"
The last line is giving me some trouble it has three newline seprators
which stops me from being able to use a split function like the
following:
my ( $name, $distributor, $pages, $edition, $Authors ) = split( "\n",
$stanzas);
When I print them out I get the following:
name = "Programming Perl"
distributor = "O'Reilly"
pages = 1077
edition = "2nd"
Authors = "Larry Wall
So my last to authors are left out.
I have even tried a split like the following:
split( "(\"\$\n|\n", $stanzas); This still did work as I though it
would it seems I am just missing one little thing.
What I was hoping for is where on the net or in a book that I would
need to read to get this to work. I assume I will need to use a regex
in the split command I am hoping for a little guidance in the right
direction of where I need to go.
Thanks
--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/
Re: Best way to parse this type of data.
am 11.07.2011 11:39:24 von Shlomi Fish
Hi shadow52,
On Sun, 10 Jul 2011 02:36:34 -0700 (PDT)
shadow52 wrote:
> Hello Everyone,
>=20
> I have finally hit my max times of banging my head on the best way to
> parse some data I have like the following below:
>=20
> name =3D "Programming Perl"
> distributor =3D "O'Reilly"
> pages =3D 1077
> edition =3D "2nd"
> Authors =3D "Larry Wall
> Tom Christiansen
> Jon Orwant"
>=20
> The last line is giving me some trouble it has three newline seprators
> which stops me from being able to use a split function like the
> following:
>=20
> my ( $name, $distributor, $pages, $edition, $Authors ) =3D split( "\n",
> $stanzas);
>=20
Try looking at the techniques in:
http://perl-begin.org/uses/text-parsing/
Especially look at /g /c and \G :
You can try doing something like (untested):
my $string =3D slurp($filename);
pos($string) =3D 0;
my @results;
while (pos($string) < length($string))
{
if (my ($field_name) =3D $string =3D~ m{\G(\w+)\s*=3D\s*}g))
{
my $value;
if ($string =3D~ m{\G"}gc)
{
if (($value) =3D ($string =3D~ m{\G([^"]+)"\n}gms))
{
# Everything is OK.
}
else
{
die "Cannot match quoted value.";
}
}
else
{
if (($value) =3D ($string =3D~ m{\G(\S+)\n}g)
{
# Everything is OK.
}
else
{
die "Cannot match single-line/non-whitespace
value.";
}
}
push @results, { name =3D> $field_name, value =3D> $value };
}
else
{
die "Cannot match field name!";
}
}
Regards,
Shlomi Fish
--=20
------------------------------------------------------------ -----
Shlomi Fish http://www.shlomifish.org/
Optimising Code for Speed - http://shlom.in/optimise
JATFM == â=9CJust answer the fabulous manâ=9D
Please reply to list if it's a mailing list post - http://shlom.in/reply .
--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/
Re: Best way to parse this type of data.
am 11.07.2011 12:34:34 von Octavian Rasnita
From: "shadow52"
> Hello Everyone,
>
> I have finally hit my max times of banging my head on the best way to
> parse some data I have like the following below:
>
> name = "Programming Perl"
> distributor = "O'Reilly"
> pages = 1077
> edition = "2nd"
> Authors = "Larry Wall
> Tom Christiansen
> Jon Orwant"
>
> The last line is giving me some trouble it has three newline seprators
> which stops me from being able to use a split function like the
> following:
You can do something like:
use strict;
my $content = do { local $/; };
my %elements = $content =~ /^\s*([^=]+)\s*=\s*"([^"]+)/gsm;
use Data::Dump 'pp';print pp \%elements;
__DATA__
name = "Programming Perl"
distributor = "O'Reilly"
pages = 1077
edition = "2nd"
Authors = "Larry Wall
Tom Christiansen
Jon Orwant"
This will print:
{
"Authors " => "Larry Wall\nTom Christiansen\nJon Orwant",
"distributor " => "O'Reilly",
"edition " => "2nd",
"name " => "Programming Perl",
}
The important line is:
my %elements = $content =~ /^\s*([^=]+)\s*=\s*"([^"]+)/gsm;
It gets everything from the beginning of the line (but not the eventual
spaces at the beginning of the line), until the first "=" sign, as the key
for the hash, and the value is everything what's not a '"' char between the
first " char and the next " char, as the value for that key.
Where you have a value with more lines, it will remain the same, and then
you will be able to split it by those line endings if you will need that.
Octavian
--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/
Re: Best way to parse this type of data.
am 11.07.2011 19:43:13 von Rob Dixon
On 10/07/2011 10:36, shadow52 wrote:
> Hello Everyone,
>
> I have finally hit my max times of banging my head on the best way to
> parse some data I have like the following below:
>
> name = "Programming Perl"
> distributor = "O'Reilly"
> pages = 1077
> edition = "2nd"
> Authors = "Larry Wall
> Tom Christiansen
> Jon Orwant"
>
> The last line is giving me some trouble it has three newline seprators
> which stops me from being able to use a split function like the
> following:
>
> my ( $name, $distributor, $pages, $edition, $Authors ) = split( "\n",
> $stanzas);
>
> When I print them out I get the following:
>
> name = "Programming Perl"
> distributor = "O'Reilly"
> pages = 1077
> edition = "2nd"
> Authors = "Larry Wall
>
>
> So my last to authors are left out.
>
> I have even tried a split like the following:
>
> split( "(\"\$\n|\n", $stanzas); This still did work as I though it
> would it seems I am just missing one little thing.
>
>
> What I was hoping for is where on the net or in a book that I would
> need to read to get this to work. I assume I will need to use a regex
> in the split command I am hoping for a little guidance in the right
> direction of where I need to go.
To do what you describe, I suggest that you use a regex to match either
form of record and look for all ocurrences in the file. The program
below shows my point.
However, I suspect that there is more processing to be done after the
data has been separated, and this may well be better donw while
accumulating each subrecord in a different way.
HTH,
Rob
use strict;
use warnings;
my $data = do {
local $/;
;
};
my @data = $data =~ /(\w+ \s*=\s* (?: "[^"]*" | .*? ) ) \s*$/mgx;
use Data::Dumper;
print Data::Dumper->Dump([\@data], ['*data']);
__DATA__
name = "Programming Perl"
distributor = "O'Reilly"
pages = 1077
edition = "2nd"
Authors = "Larry Wall
Tom Christiansen
Jon Orwant"
**OUTPUT**
@data = (
'name = "Programming Perl"',
'distributor = "O\'Reilly"',
'pages = 1077',
'edition = "2nd"',
'Authors = "Larry Wall
Tom Christiansen
Jon Orwant"'
);
--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/
Re: Best way to parse this type of data.
am 11.07.2011 23:10:30 von jwkrahn
shadow52 wrote:
> Hello Everyone,
Hello,
> I have finally hit my max times of banging my head on the best way to
> parse some data I have like the following below:
>
> name = "Programming Perl"
> distributor = "O'Reilly"
> pages = 1077
> edition = "2nd"
> Authors = "Larry Wall
> Tom Christiansen
> Jon Orwant"
>
> The last line is giving me some trouble
The last line there is:
Jon Orwant"
> it has three newline seprators
That is not how "lines" are normally defined in Perl.
> which stops me from being able to use a split function like the
> following:
>
> my ( $name, $distributor, $pages, $edition, $Authors ) = split( "\n",
> $stanzas);
Why is all that data in $stanzas in the first place?
> When I print them out I get the following:
>
> name = "Programming Perl"
> distributor = "O'Reilly"
> pages = 1077
> edition = "2nd"
> Authors = "Larry Wall
>
>
> So my last to authors are left out.
last two authors
> I have even tried a split like the following:
>
> split( "(\"\$\n|\n", $stanzas); This still did work as I though it
> would it seems I am just missing one little thing.
>
>
> What I was hoping for is where on the net or in a book that I would
> need to read to get this to work. I assume I will need to use a regex
> in the split command I am hoping for a little guidance in the right
> direction of where I need to go.
Show us how you got the data into $stanzas.
John
--
Any intelligent fool can make things bigger and
more complex... It takes a touch of genius -
and a lot of courage to move in the opposite
direction. -- Albert Einstein
--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/
Re: Best way to parse this type of data.
am 12.07.2011 08:30:23 von shadow52
Hello Everyone,
Thanks for all the help the 3 examples were what I needed and took care of the problem in 3 different ways. Thanks for the reference to the perl parsing site a great read so far.
--
To unsubscribe, e-mail: beginners-unsubscribe@perl.org
For additional commands, e-mail: beginners-help@perl.org
http://learn.perl.org/