Quotes around words

Quotes around words

am 25.01.2008 09:52:17 von patrick finger

Hi,

I have a big input file full of words, whitespace, newlines, punctuation,
and various other symbols. I want to surround every word with quotes,
UNLESS it already has quotes around it.

After some trial and error, I was seeing some unexpected results. The
closest I came to getting it right was this:

my $str = ' "these" "have" "quotes" these do not. ';
$str =~ s/([^"a-zA-Z0-9_])([a-zA-Z0-9_]+)([^"a-zA-Z0-9_])/$1"$2"$3/g s;

And the result is this:
"these" "have" "quotes" "these" do "not".

The only problem is that "do" is skipped. Is this expected? So how do I
get around this?

Thanks.

Re: Quotes around words

am 25.01.2008 10:37:42 von Gunnar Hjalmarsson

Pat wrote:
> I have a big input file full of words, whitespace, newlines, punctuation,
> and various other symbols. I want to surround every word with quotes,
> UNLESS it already has quotes around it.
>
> After some trial and error, I was seeing some unexpected results. The
> closest I came to getting it right was this:
>
> my $str = ' "these" "have" "quotes" these do not. ';
> $str =~ s/([^"a-zA-Z0-9_])([a-zA-Z0-9_]+)([^"a-zA-Z0-9_])/$1"$2"$3/g s;
>
> And the result is this:
> "these" "have" "quotes" "these" do "not".
>
> The only problem is that "do" is skipped. Is this expected?

Yes. The problem is that you include the non-word characters before and
after respective word in the match.

> So how do I get around this?

Please read the section "Extended Patterns" in "perldoc perlre". Example:

$str =~ s/(?
--
Gunnar Hjalmarsson
Email: http://www.gunnar.cc/cgi-bin/contact.pl

Re: Quotes around words

am 25.01.2008 11:24:32 von Abigail

_
Pat (none@none.none) wrote on VCCLX September MCMXCIII in
:
}} Hi,
}}
}} I have a big input file full of words, whitespace, newlines, punctuation,
}} and various other symbols. I want to surround every word with quotes,
}} UNLESS it already has quotes around it.
}}
}} After some trial and error, I was seeing some unexpected results. The
}} closest I came to getting it right was this:
}}
}} my $str = ' "these" "have" "quotes" these do not. ';
}} $str =~ s/([^"a-zA-Z0-9_])([a-zA-Z0-9_]+)([^"a-zA-Z0-9_])/$1"$2"$3/g s;
}}
}} And the result is this:
}} "these" "have" "quotes" "these" do "not".
}}
}} The only problem is that "do" is skipped. Is this expected? So how do I
}} get around this?


I'd first skip things I want to leave alone, then match a word I want
to quote, and repeat this.

What do I want to skip? Two things: quoted substrings, and substrings
consisting of non-word, non-quote characters. I can match those with
a standard unrolling technique:


$str =~ s {[^"\w]* # Non-word, non-quote sequence
(?:
"[^"]*" # Quoted
[^"\w]* # Non-word, non-quote sequence
)* # Repeat
\K # Cut
(\w+) # Capture an unquoted word.
}
{"$1"}xg; # Replace.


Abigail
--
print v74.117.115.116.32, v97.110.111.116.104.101.114.32,
v80.101.114.108.32, v72.97.99.107.101.114.10;

Re: Quotes around words

am 25.01.2008 12:07:42 von someone

Pat wrote:
>
> I have a big input file full of words, whitespace, newlines, punctuation,
> and various other symbols. I want to surround every word with quotes,
> UNLESS it already has quotes around it.
>
> After some trial and error, I was seeing some unexpected results. The
> closest I came to getting it right was this:
>
> my $str = ' "these" "have" "quotes" these do not. ';
> $str =~ s/([^"a-zA-Z0-9_])([a-zA-Z0-9_]+)([^"a-zA-Z0-9_])/$1"$2"$3/g s;
>
> And the result is this:
> "these" "have" "quotes" "these" do "not".
>
> The only problem is that "do" is skipped. Is this expected?

Yes.

> So how do I get around this?

$ perl -le'
my $str = q[ "these" "have" "quotes" these do not. ];
print $str;
$str =~ s/(? print $str;
'
"these" "have" "quotes" these do not.
"these" "have" "quotes" "these" "do" "not".



John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall

Re: Quotes around words

am 25.01.2008 14:17:50 von Petr Vileta

Abigail wrote:
> _
> Pat (none@none.none) wrote on VCCLX September MCMXCIII in
> :
> }} my $str = ' "these" "have" "quotes" these do not. ';
> }} $str =~
> s/([^"a-zA-Z0-9_])([a-zA-Z0-9_]+)([^"a-zA-Z0-9_])/$1"$2"$3/g s; }}
> }} And the result is this:
> }} "these" "have" "quotes" "these" do "not".
> }}
> }} The only problem is that "do" is skipped. Is this expected? So
> how do I }} get around this?
>
>
> $str =~ s {[^"\w]* # Non-word, non-quote sequence
> (?:
> "[^"]*" # Quoted
> [^"\w]* # Non-word, non-quote sequence
> )* # Repeat
> \K # Cut
> (\w+) # Capture an unquoted word.
> }
> {"$1"}xg; # Replace.
>
>
> Abigail

Please how to do the same in Perl 5.6.1?
I tested script bellow and I got warning "Unrecognized escape \K passed
through at L:\temp\test.pl line 4."

use strict;
use warnings;
my $str = ' "these" "have" "quotes" these do not. ';
$str =~ s {[^"\w]*(?:"[^"]*"[^"\w]*)*\K(\w+)}{"$1"}xg;
print $str;

--
Petr Vileta, Czech republic
(My server rejects all messages from Yahoo and Hotmail. Send me your
mail from another non-spammer site please.)

Please reply to

Re: Quotes around words

am 25.01.2008 15:04:58 von Abigail

_
Petr Vileta (stoupa@practisoft.cz) wrote on VCCLX September MCMXCIII in
:
,, Abigail wrote:
,, > _
,, > Pat (none@none.none) wrote on VCCLX September MCMXCIII in
,, > :
,, > }} my $str = ' "these" "have" "quotes" these do not. ';
,, > }} $str =~
,, > s/([^"a-zA-Z0-9_])([a-zA-Z0-9_]+)([^"a-zA-Z0-9_])/$1"$2"$3/g s; }}
,, > }} And the result is this:
,, > }} "these" "have" "quotes" "these" do "not".
,, > }}
,, > }} The only problem is that "do" is skipped. Is this expected? So
,, > how do I }} get around this?
,, >
,, >
,, > $str =~ s {[^"\w]* # Non-word, non-quote sequence
,, > (?:
,, > "[^"]*" # Quoted
,, > [^"\w]* # Non-word, non-quote sequence
,, > )* # Repeat
,, > \K # Cut
,, > (\w+) # Capture an unquoted word.
,, > }
,, > {"$1"}xg; # Replace.
,, >
,, >
,, > Abigail
,,
,, Please how to do the same in Perl 5.6.1?

Capture the part before \K, and use it in the replacement. And remove the \K.

,, I tested script bellow and I got warning "Unrecognized escape \K passed
,, through at L:\temp\test.pl line 4."


\K is available in 5.10.



Abigail
--
perl -Mstrict='}); print "Just another Perl Hacker"; ({' -le1