Extract javascript strings using regex

Extract javascript strings using regex

am 22.04.2008 06:39:54 von kingskippus

Hey all, I've been trying to hammer away at this, and I just can't
figure it out. I'm hoping a regular expressions guru can help me out.

I'm trying to parse a retrieved javascript file to extract the
parameters out of a function call. Here's a contrived line that
represents what will be fetched:

foo('parameter 1', 'param with \'single\' quotes', 'param with\"double
\" quotes', 'this param, it has a comma', 'five');

The goal is to get an array with these elements:
parameter 1
param with 'single' quotes
param with "double" quotes
this param, it has a comma
five

There will always be five parameters, and the function name will
always be foo. Normally, I'm handy with regexes, but damn, those
escaped quotes and commas are killing me, and the data does have lots
of them in there.

I'm not lazy, I've been plugging away at this trying to work with look-
behind reference, greedy matching, and so on, but I'm just at an
impasse and can't extract what I want out of it. I've googled various
regex cookbooks (even have access to O'Reilly's Safari), but I've come
up with bupkiss.

Any ideas? I'd surely appreciate any help!
--TonyV

Re: Extract javascript strings using regex

am 22.04.2008 07:08:46 von Uri Guttman

>>>>> "T" == TonyV writes:

T> foo('parameter 1', 'param with \'single\' quotes', 'param with\"double
T> \" quotes', 'this param, it has a comma', 'five');

T> The goal is to get an array with these elements:
T> parameter 1
T> param with 'single' quotes
T> param with "double" quotes
T> this param, it has a comma
T> five

T> Any ideas? I'd surely appreciate any help!

text::balanced should be able to do that easily. it can parse matched
parens, quotes and other top level tokenizing syntax.

uri

--
Uri Guttman ------ uri@stemsystems.com -------- http://www.sysarch.com --
----- Perl Code Review , Architecture, Development, Training, Support ------
--------- Free Perl Training --- http://perlhunter.com/college.html ---------
--------- Gourmet Hot Cocoa Mix ---- http://bestfriendscocoa.com ---------

Re: Extract javascript strings using regex

am 22.04.2008 08:35:04 von benkasminbullock

On Apr 22, 1:39 pm, TonyV wrote:
> I'm trying to parse a retrieved javascript file to extract the
> parameters out of a function call. Here's a contrived line that
> represents what will be fetched:
>
> foo('parameter 1', 'param with \'single\' quotes', 'param with\"double
> \" quotes', 'this param, it has a comma', 'five');
>
> The goal is to get an array with these elements:
> parameter 1
> param with 'single' quotes
> param with "double" quotes
> this param, it has a comma
> five

#! perl
use warnings;
use strict;
my $parameter = qr/'(?:[^']|\\')+'/;
my $test = q/foo('parameter 1', 'param with \'single\' quotes', 'param
with\"double\" quotes', 'this param, it has a comma', 'five')/;
if ($test =~ /foo\s*\(\s*($parameter)\s*,\s*($parameter)\s*,
\s*($parameter)\s*,\s*($parameter)\s*,\s*($parameter)\s*\)/s ) {
print "Matched.\n";
print "$1\n$2\n$3\n$4\n$5\n";
}

You could also use

/foo\s*\(\s*(?:$parameter\s*,\s*){4}($parameter)\s*\)/

if you don't need the parameter values right away (e.g. match for them
using another regex later on). That would make the code tidier.

> There will always be five parameters, and the function name will
> always be foo.

Are the parameters necessarily single quoted?

Re: Extract javascript strings using regex

am 22.04.2008 13:57:12 von jurgenex

TonyV wrote:
>I'm trying to parse a retrieved javascript file to extract the
>parameters out of a function call. Here's a contrived line that
>represents what will be fetched:
>
>foo('parameter 1', 'param with \'single\' quotes', 'param with\"double
>\" quotes', 'this param, it has a comma', 'five');
>
>The goal is to get an array with these elements:
>parameter 1
>param with 'single' quotes
>param with "double" quotes
>this param, it has a comma
>five

I think Text::CSV::parse() should do the job just fine.

jue

Re: Extract javascript strings using regex

am 22.04.2008 15:48:54 von bugbear

TonyV wrote:
>
> I'm not lazy, I've been plugging away at this trying to work with look-
> behind reference, greedy matching, and so on, but I'm just at an
> impasse and can't extract what I want out of it. I've googled various
> regex cookbooks (even have access to O'Reilly's Safari), but I've come
> up with bupkiss.

IIRC it is *impossible* to fully implement nested matching quotes
with a regexp.

Ah! (google). This sounds helpful:

http://evolt.org/RegEx_Basics#comment-60762

BugBear

Re: Extract javascript strings using regex

am 22.04.2008 16:32:20 von Abigail

_
bugbear (bugbear@trim_papermule.co.uk_trim) wrote on VCCCXLVIII September
MCMXCIII in :
`' TonyV wrote:
`' >
`' > I'm not lazy, I've been plugging away at this trying to work with look-
`' > behind reference, greedy matching, and so on, but I'm just at an
`' > impasse and can't extract what I want out of it. I've googled various
`' > regex cookbooks (even have access to O'Reilly's Safari), but I've come
`' > up with bupkiss.
`'
`' IIRC it is *impossible* to fully implement nested matching quotes
`' with a regexp.

This was already possible in 5.6, and it even simpler in 5.10.

For instance,

qr [((?:\((?1)*\))*)]

is a regexp to match balanced nested parens.


But the OP isn't asking about nested matching quotes. All he wants is
delimited strings, with escapes.

I'd use Regexp::Common, but it's not hard to come up with a regexp
for a single quote delimited string (untested):

/'[^\\']*(?:\\.[^\\']*)*'/s


Abigail
--
A perl rose: perl -e '@}-`-,-`-%-'