Some questions about q{} and qr{}.

am 14.04.2008 03:07:59 von Robbie Hatley

Today I was editing a URL-likifying program I wrote several
weeks ago, and I ran across some issues with q{} and qr{}
which are puzzling me.

Here's an edited-for-brevity version of the program:

my $Legal = q{[[:alnum:];/?:@=&#%$_.+!*'(),-]};
my $Regex1 = qr{($Legal+\.$Legal+/$Legal+)}
my $Regex2 = qr{(s?https?://$Legal+)};
while (<>)
{
s{$Regex1}{http://$1}g;
s{$Regex2}{\n

\n}g;
print ($_);
}

(As an afterthought, I also tacked the entire program on the
end of this post, for anyone who's interested.)

I have two questions:

1. I had a "\" before the "$" to prevent "$_" from being
interpolated. But when I took the "\" out, the regexes
still worked fine! Seems to me they should break, because
$_ is now a variable rather than just "dollar sign followed
by underscore". But $_ seems not to be interpolated.
So, is variable interpolation always strictly "one pass"?

2. I've read that qr{} "compiles" the regex; I'm hoping that
means that the s/// operators in the while loop will not
recompile $Regex1 and $Regex2 each iteration, even though
I didn't use a /o flag? (No sense wasting CPU time
recompiling, because the patterns are fixed.)

Thanks in advance for your input!

============================================================ ===
IF YOU'RE PRESSED FOR TIME, FEEL FREE TO STOP READING HERE.
THE REMAINDER OF THIS POST IS THE WHOLE PROGRAM, FOR REFERENCE.
============================================================ ===

#!/usr/bin/perl

# linkify.perl

# Converts any text document into an HTML document with all of the contents of
# the original, but with any HTTP URLs converted to clickable hyperlinks.

# First print the standard opening lines of an HTML file.
# The title will be "Linkifyed HTML Document",
# the body text is in a "div" element,
# and the paragraphs will have 5-pixel margins on all 4 sides:

use strict;
use warnings;

# Print standard opening boilerplate crap for an HTML file:
print ("\n");
print ("\n");
print ("Linkifyed HTML Document\n");
print ("\n");
print ("\n");
print ("\n");
print ("

\n");

# A valid URL must consist solely of the following 82 characters
#
# alphanumeric: [:alnum:] 62
# reserved: ;/?:@=& 7
# anchor-id: # 1
# encoding: % 1
# special: $_.+!*'(),- 11
# Total: 82
#

# Make a non-interpolated string version of a character class
# consisting of the above 82 URL-legal characters:
my $Legal = q{[[:alnum:];/?:@=&#%$_.+!*'(),-]};

# This regex says "find a string which is probably a URL minus the 'http://'
# part; save any such found string as a backreference":
my $Regex1 = qr{($Legal+\.$Legal+/$Legal+)}

# This regex says "http or shttp or https or shttps, followed by '://',
# followed by a cluster of URL-legal characters; save any such found string
# as a backreference":
my $Regex2 = qr{(s?https?://$Legal+)};

# Now loop through all lines of text in the original file, wrapping all URLS
# found in "a" and "p" elements, with the URL used as both the text and the
# "href" attribute of the "a" element:

while (<>)
{
# Linkify all http URLs, including the less-common "shttp" and "https" ones.

# This substitution says "tack 'http://' onto be beginning of any strins
# which are probably URLS sans 'http://':
s{$Regex1}{http://$1}g;

# This substitution says "replace each found URL with an html anchor element
# with the found URL used both as the "href" atttribute and as the text,
# insert the anchor element into a paragraph element,
# and bracket the paragraph element with newlines":
s{$Regex2}{\n

\n}g;

# Print the edited line. If the line did not contain a URL, it will be
# printed unexpurgated. To redirect output to a file, use ">" on the
# command line.
print ($_);
}

# Print element-closure tags for div, body, html:
print ("

\n");
print ("\n");
print ("\n");

--
Cheers,
Robbie Hatley
lonewolf aatt well dott com
www dott well dott com slant user slant lonewolf slant

Re: Some questions about q{} and qr{}.

am 14.04.2008 03:31:14 von kenslaterpa

On Apr 13, 9:07 pm, "Robbie Hatley" wrote:
> Today I was editing a URL-likifying program I wrote several
> weeks ago, and I ran across some issues with q{} and qr{}
> which are puzzling me.
>
> Here's an edited-for-brevity version of the program:
>
> my $Legal = q{[[:alnum:];/?:@=&#%$_.+!*'(),-]};
> my $Regex1 = qr{($Legal+\.$Legal+/$Legal+)}
> my $Regex2 = qr{(s?https?://$Legal+)};
> while (<>)
> {
> s{$Regex1}{http://$1}g;
> s{$Regex2}{\n

\n}g;
> print ($_);
>
> }
>
> (As an afterthought, I also tacked the entire program on the
> end of this post, for anyone who's interested.)
>
> I have two questions:
>
> 1. I had a "\" before the "$" to prevent "$_" from being
> interpolated. But when I took the "\" out, the regexes
> still worked fine! Seems to me they should break, because
> $_ is now a variable rather than just "dollar sign followed
> by underscore". But $_ seems not to be interpolated.
> So, is variable interpolation always strictly "one pass"?

q{} is equivalent to the single-quote operator. Strings inside single
quotes do not get interpolated (as opposed to double quotes - "" or
qq{}.

>
> 2. I've read that qr{} "compiles" the regex; I'm hoping that
> means that the s/// operators in the while loop will not
> recompile $Regex1 and $Regex2 each iteration, even though
> I didn't use a /o flag? (No sense wasting CPU time
> recompiling, because the patterns are fixed.)
>
Based on the documentation (perldoc perlop), qr may invoke a
precompilation of the pattern. To me that implies that it is
implementation specific, but there are others with more expertise in
this area than me.

HTH, Ken

> lines deleted

>
> --
> Cheers,
> Robbie Hatley
> lonewolf aatt well dott com
> www dott well dott com slant user slant lonewolf slant

Re: Some questions about q{} and qr{}.

am 14.04.2008 05:01:17 von someone

Robbie Hatley wrote:
> Today I was editing a URL-likifying program I wrote several
> weeks ago, and I ran across some issues with q{} and qr{}
> which are puzzling me.
>
> Here's an edited-for-brevity version of the program:
>
> my $Legal = q{[[:alnum:];/?:@=&#%$_.+!*'(),-]};
> my $Regex1 = qr{($Legal+\.$Legal+/$Legal+)}
> my $Regex2 = qr{(s?https?://$Legal+)};
> while (<>)
> {
> s{$Regex1}{http://$1}g;
> s{$Regex2}{\n

\n}g;
> print ($_);
> }
>
> (As an afterthought, I also tacked the entire program on the
> end of this post, for anyone who's interested.)
>
> I have two questions:
>
> 1. I had a "\" before the "$" to prevent "$_" from being
> interpolated.

That just adds a '\' character to your character class:

$ perl -le'$x = q{[$_]}; print qr{$x}'
(?-xism:[$_])
$ perl -le'$x = q{[\$_]}; print qr{$x}'
(?-xism:[\$_])

Which it doesn't look like you intended to include.

> But when I took the "\" out, the regexes
> still worked fine! Seems to me they should break, because
> $_ is now a variable rather than just "dollar sign followed
> by underscore". But $_ seems not to be interpolated.
> So, is variable interpolation always strictly "one pass"?

Read the "Gory details of parsing quoted constructs" section of:

perldoc perlop

> 2. I've read that qr{} "compiles" the regex; I'm hoping that
> means that the s/// operators in the while loop will not
> recompile $Regex1 and $Regex2 each iteration,

That is correct.

> even though
> I didn't use a /o flag? (No sense wasting CPU time
> recompiling, because the patterns are fixed.)

perldoc -q /o

John
--
Perl isn't a toolbox, but a small machine shop where you
can special-order certain sorts of tools at low cost and
in short order. -- Larry Wall

Re: Some questions about q{} and qr{}.

am 14.04.2008 06:29:15 von benkasminbullock

On Apr 14, 10:07 am, "Robbie Hatley" wrote:

> my $Regex1 = qr{($Legal+\.$Legal+/$Legal+)}

;

> # This regex says "find a string which is probably a URL minus the 'http://'
> # part; save any such found string as a backreference":
> my $Regex1 = qr{($Legal+\.$Legal+/$Legal+)}

;

Also, here [a-z0-9-]{3,63} (ignoring case) is enough. Your regex will
get things which aren't valid URLs. The following catches anything
valid:

my $validdns = '[0-9a-z-]{2,63}';
m/\b(($validdns\.){1,62}$validdns)\b/i # Catches any valid thing.

> s{$Regex1}{http://$1}g;

> print ($_);

You can just say

print;

here if you like.

Re: Some questions about q{} and qr{}.

am 14.04.2008 10:35:11 von Robbie Hatley

"John W. Krahn" wrote:

> Robbie Hatley wrote:
>
> > is variable interpolation always strictly "one pass"?
>
> Read the "Gory details of parsing quoted constructs" section of:
> perldoc perlop

Thanks for the tip, but that section doesn't actually say
whether Perl variable interpolation is single-pass or
multi-pass (recursive).

However, when I scrolled up from that section, I noticed
that one of the sections above that:
http://perldoc.perl.org/perlop.html#Quote-and-Quote-like-Ope rators
*does* specify what I was looking for. It says:

"Perl does not expand multiple levels of interpolation."

Bingo. That's what I was wondering. That explains why "$_"
wasn't being interpolated in my program.

perl -le 'my $Cat=q/Fifi/; my $Dog=q/$Cat/; print qq/$Dog/;'

Prints "$Cat", not "Fifi" as I had expected. Now that I
understand why, I can avoid being surprised by that.

--
Cheers,
Robbie Hatley
perl -le 'print "\154o\156e\167o\154f\100w\145ll\56c\157m"'
perl -le 'print "\150ttp\72//\167ww.\167ell.\143om/~\154onewolf/"'

Re: Some questions about q{} and qr{}.

am 14.04.2008 19:40:57 von Robbie Hatley

"Ben Bullock" wrote:

> On Apr 14, 10:07 am, "Robbie Hatley" wrote:
>
> > # This regex says "find a string which is probably a URL minus the 'http://'
> > # part; save any such found string as a backreference":
> > my $Regex1 = qr{($Legal+\.$Legal+/$Legal+)}
>
> ... [a-z0-9-]{3,63} (ignoring case) is enough. Your regex will
> get things which aren't valid URLs. The following catches anything
> valid:
>
> my $validdns = '[0-9a-z-]{2,63}';
> m/\b(($validdns\.){1,62}$validdns)\b/i # Catches any valid thing.

I can see that your pattern looks for just the dns part
of the url, which has fewer valid characters; but since it
doesn't look for "/", it will convert this string:

references in Sec 35.74 paragraph B

to

references in Sec http://35.74 paragraph B

I believe you're right in that it will find most valid dns
strings; but it also catches things that aren't part of URLs
at all (such as numbers with decimal points), and it rejects
certain well-formed domain strings (such as "j.qbc.net.ca",
which fails the "{2,63}" assertion).

My pattern at least insists on "stuff.stuff/stuff", so it
rejects "35.74". It rejects domain-level URLs and only
linkifys document-level URLs. That may be a blessing or
a curse, depending on your expectations.

Also, both your pattern and my are broken in that they match
http://www.asdf.com/qwer.html, and indeed convert it to
http://http://www.asdf.com/qwer.html .

Oops! What was really intended was to find "bare" URLs
(without "http://") and tack "http://" on the beginning.

Ok, this should do the trick; it blends features from your
approach and mine, and solves the bugs I just mentioned,
as well as some other bugs I've noticed:

#!/usr/bin/perl

# linkify.perl

# Converts any text document into an HTML document with all of the contents of
# the original, but with any HTTP URLs converted to clickable hyperlinks.

# First print the standard opening lines of an HTML file.
# The title will be "Linkifyed HTML Document",
# the body text is in a "div" element,
# and the paragraphs will have 5-pixel margins on all 4 sides:

use strict;
use warnings;

# Print initial tags for HTML file:
print ("\n");
print ("\n");
print ("Linkifyed HTML Document\n");
print ("\n");
print ("\n");
print ("\n");
print ("

\n");
print ("

\n");



# A valid URL must consist solely of the following 82 characters

#

#    alphanumeric:       [:alnum:]       62

#    reserved:           ;/?:@=&          7

#    anchor-id:          #                1

#    encoding:           %                1

#    special:            $_.+!*'(),-     11

#                                 Total: 82

#



# Make a non-interpolated string version of a character class

# consisting of the above 82 URL-legal characters:

my $Legal = q<[[:alnum:];/?:@=&#%$_.+!*'(),-]>;



# Make a non-interpolated string version of a regex specifying

# a cluster of 1-63 DNS-valid characters:

my $Dns = q<[0-9A-Za-z-]{1,63}>;



# Make a non-interpolated string version of a regex specifying

# a URL header:

my $Header = q;



# Make a non-interpolated string version of a regex specifying

# a URL suffix:

my $Suffix = qq<(?:$Dns\\.){1,62}$Dns/$Legal+>;



# This regex says "find a string which is probably a URL suffix,

# at start of line, and save any such found suffix as a backreference":

my $Regex1 = qr{^($Suffix)};



# This regex says "find a string which is probably a URL suffix,

# preceded by some space, and save any such found suffix as a backreference":

my $Regex2 = qr{(\s+)($Suffix)};



# This regex says "find a string which is probably a URL with header,

# and save any such found URL as a backreference":

my $Regex3 = qr{($Header$Suffix)};



# Now loop through all lines of text in the original file.  First add http:// to

# any URLs that need it; then wrap all URLS in "a" and "p" elements, with the

# URL used as both the text and the "href" attribute of the "a" element:



#print $Regex1,"\n";

#print $Regex2,"\n";

#print $Regex3,"\n";



while (<>)

{

   # Tack 'http://' onto be beginning of any strings which are

   # probably URLS but lack 'http://':



   $_ =~ s{$Regex1}{http://$1};    # No sense using g here (beginning of line only).

   #print ("Regex1 matched ", $&, "\n");



   $_ =~ s{$Regex2}{$1http://$2}g; # This one could be anywhere on the line.

   #print ("Regex2 matched ", $&, "\n");



   # Wrap each found URL in an html anchor element with the found URL used both

   # as the "href" atttribute and as the text:

   $_ =~ s{$Regex3}{}g;

   #print ("Regex3 matched ", $&, "\n");



   # Print the edited line.  If the line did not contain a URL, it will be

   # printed unexpurgated.  To redirect output to a file, use ">" on the

   # command line.

   print;

}



# Print element-closure tags for pre, div, body, html:

print ("

\n");
print ("

\n");
print ("\n");
print ("\n");

Re: Some questions about q{} and qr{}.

am 15.04.2008 00:33:30 von rvtol+news

John W. Krahn schreef:
> Robbie Hatley wrote:

>> 1. I had a "\" before the "$" to prevent "$_" from being
>> interpolated.
>
> That just adds a '\' character to your character class:
>
> $ perl -le'$x = q{[$_]}; print qr{$x}'
> (?-xism:[$_])
> $ perl -le'$x = q{[\$_]}; print qr{$x}'
> (?-xism:[\$_])

But it won't match a '\':

$ perl -wle'$x = q{[\$_]}; print $x; print length($x); print q{\\} =~
/$x/ ? 1 : 0'
[\$_]
5
0

$ perl -wle'$x = q{[\\$_]}; print $x; print length($x); print q{\\} =~
/$x/ ? 1 : 0'
[\$_]
5
0

$ perl -wle'$x = q{[\\\$_]}; print $x; print length($x); print q{\\} =~
/$x/ ? 1 : 0'
[\\$_]
6
1

$ perl -wle'$x = q{[\$_]}; print $x; print length($x); print chr(92) =~
/$x/ ? 1 : 0'
[\$_]
5
0

$ perl -wle'$x = q{[\\$_]}; print $x; print length($x); print chr(92) =~
/$x/ ? 1 : 0'
[\$_]
5
0

$ perl -wle'$x = q{[\\\$_]}; print $x; print length($x); print chr(92)
=~ /$x/ ? 1 : 0'
[\\$_]
6
1

(was run with a perl 5.8.5)

--
Affijn, Ruud

"Gewoon is een tijger."

Re: Some questions about q{} and qr{}.

am 15.04.2008 01:34:30 von benkasminbullock

On Mon, 14 Apr 2008 10:40:57 -0700, Robbie Hatley wrote:

> "Ben Bullock" wrote:

>> ... [a-z0-9-]{3,63} (ignoring case) is enough. Your regex will get
>> things which aren't valid URLs. The following catches anything valid:
>>
>> my $validdns = '[0-9a-z-]{2,63}';
>> m/\b(($validdns\.){1,62}$validdns)\b/i # Catches any valid thing.
>
> I can see that your pattern looks for just the dns part of the url,
> which has fewer valid characters; but since it doesn't look for "/", it
> will convert this string:
>
> references in Sec 35.74 paragraph B
>
> to
>
> references in Sec http://35.74 paragraph B
>
> I believe you're right in that it will find most valid dns strings; but
> it also catches things that aren't part of URLs at all (such as numbers
> with decimal points), and it rejects certain well-formed domain strings
> (such as "j.qbc.net.ca", which fails the "{2,63}" assertion).

Well OK but if I was going to do this for real, I would use something like

/\b(($validdns\.){1,62}(com|net|org|us|uk|ca|jp))\b/i

or similar (I haven't checked this regex with the machine yet but
hopefully you get the picture).

> My pattern at least insists on "stuff.stuff/stuff", so it rejects
> "35.74". It rejects domain-level URLs and only linkifys document-level
> URLs. That may be a blessing or a curse, depending on your
> expectations.

I hadn't really thought this through carefully, I just wanted to make the
point that the &$% stuff is not valid as part of the web address.

> Also, both your pattern and my are broken in that they match
> http://www.asdf.com/qwer.html, and indeed convert it to
> http://http://www.asdf.com/qwer.html .

Mine doesn't do anything at all, I'm not sure it even compiles!

Matching URLs with REs (was "Some questions about q{} and qr{}").

am 15.04.2008 22:35:27 von Robbie Hatley

"Ben Bullock" wrote:

> Well OK but if I was going to do this for real, I would use something like
> /\b(($validdns\.){1,62}(com|net|org|us|uk|ca|jp))\b/i
> or similar (I haven't checked this regex with the machine yet but
> hopefully you get the picture).

The problem with "(com|net|org|us|uk|ca|jp)" or similar is that there are hundreds
or thousands of such valid domain suffixes. You're forgetting "es" (Spain),
"ru" (Russia), "uk" (Ukraine), "us" (USA), not to mention "mil", "gov", "edu", "biz",
"info", etc, etc, etc. That's part of why my URL-matching regex was so vague.

> I just wanted to make the point that the &$% stuff is not valid as part of the
> web address.

Those characters all appear in web addresses. For instance, "&" is used as
a field separator for server-side script (php, Perl, etc) commands embedded in
URLs. Similarly, "?" announces that the next cluster of alphanumeric characters
is a parameter for the previous command. If you reject such characters, you reject
many valid URLs. Just look at any YouTube URL. This one, for example:
http://uk.youtube.com/watch?v=I9ciR9qR1dU&feature=bz303

Maybe what you meant is that such characters are invalid in domain names;
but I was trying to capture and linkify document URLs, not domain names or
domain-level URLs such as "http://www.acme.com/". Trying to concoct a
foolproof RE that captures every valid URL and rejects every invalid one
is a real piece of work. And any such "perfect" URL-matching RE would
quickly become obsolete anyway as the Internet changes over time.
Hence I tend to go for a vauge RE that I believe captures every valid
document URL, at the cost of occasionally caputuring a few invalid ones.
Unless someone knows a better approach.

--
Cheers,
Robbie Hatley
lonewolf aatt well dott com
www dott well dott com slant user slant lonewolf slant

Re: Some questions about q{} and qr{}.

am 15.04.2008 22:53:29 von rvtol+news

Robbie Hatley schreef:

> Today I was editing a URL-likifying program I wrote several
> weeks ago, and I ran across some issues with q{} and qr{}
> which are puzzling me.

Consider Regexp::Common.

--
Affijn, Ruud

"Gewoon is een tijger."

Re: Matching URLs with REs (was "Some questions about q{} and qr{}").

am 15.04.2008 22:54:30 von Abigail

_
Robbie Hatley (see.my.signature@for.my.email.address) wrote on VCCCXLI
September MCMXCIII in :
`'
`' Maybe what you meant is that such characters are invalid in domain names;
`' but I was trying to capture and linkify document URLs, not domain names or
`' domain-level URLs such as "http://www.acme.com/". Trying to concoct a
`' foolproof RE that captures every valid URL and rejects every invalid one
`' is a real piece of work. And any such "perfect" URL-matching RE would
`' quickly become obsolete anyway as the Internet changes over time.
`' Hence I tend to go for a vauge RE that I believe captures every valid
`' document URL, at the cost of occasionally caputuring a few invalid ones.
`' Unless someone knows a better approach.

You mean, something like:

(?:(?:(?:http)://(?:(?:(?:(?:(?:(?:[a-zA-Z0-9][-a-zA-Z0-9]*) ?[a-zA-Z0-9])[.])*(?:[a-zA-Z][-a-zA-Z0-9]*[a-zA-Z0-9]|[a-zA- Z])[.]?)|(?:[0-9]+[.][0-9]+[.][0-9]+[.][0-9]+)))(?::(?:(?:[0 -9]*)))?(?:/(?:(?:(?:(?:(?:(?:[a-zA-Z0-9\-_.!~*'():@&=+$,]+| (?:%[a-fA-F0-9][a-fA-F0-9]))*)(?:;(?:(?:[a-zA-Z0-9\-_.!~*'() :@&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*))*)(?:/(?:(?:(?:[a-z A-Z0-9\-_.!~*'():@&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*)(?:; (?:(?:[a-zA-Z0-9\-_.!~*'():@&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0- 9]))*))*))*))(?:[?](?:(?:(?:[;/?:@&=+$,a-zA-Z0-9\-_.!~*'()]+ |(?:%[a-fA-F0-9][a-fA-F0-9]))*)))?))?)|(?:(?:nntp)://(?:(?:( ?:(?:(?:(?:(?:(?:[a-zA-Z0-9][-a-zA-Z0-9]*)?[a-zA-Z0-9])[.])* (?:[a-zA-Z][-a-zA-Z0-9]*[a-zA-Z0-9]|[a-zA-Z]))|(?:[0-9]+[.][ 0-9]+[.][0-9]+[.][0-9]+)))(?::(?:(?:[0-9]+)))?)/(?:(?:[a-zA- Z][-A-Za-z0-9.+_]*))(?:/(?:[0-9]+))?))|(?:(?:file)://(?:(?:( ?:(?:(?:(?:(?:(?:[a-zA-Z0-9][-a-zA-Z0-9]*)?[a-zA-Z0-9])[.])* (?:[a-zA-Z][-a-zA-Z0-9]*[a-zA-Z0-9]|[a-zA-Z]))|(?:[0-9]+[.][ 0-9]+[.][0-9]+[.][0-9]+))|localhost)?)(?:/(?:(?:(?:(?:[-a-zA -Z0-9$_.+!*'(),:@&=]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*)(?:/(?:( ?:[-a-zA-Z0-9$_.+!*'(),:@&=]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*) )*)))))|(?:(?:ftp)://(?:(?:(?:(?:[a-zA-Z0-9\-_.!~*'();:&=+$, ]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*))(?:)@)?(?:(?:(?:(?:(?:(?:[ a-zA-Z0-9][-a-zA-Z0-9]*)?[a-zA-Z0-9])[.])*(?:[a-zA-Z][-a-zA- Z0-9]*[a-zA-Z0-9]|[a-zA-Z])[.]?)|(?:[0-9]+[.][0-9]+[.][0-9]+ [.][0-9]+)))(?::(?:(?:[0-9]*)))?(?:/(?:(?:(?:(?:(?:[a-zA-Z0- 9\-_.!~*'():@&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*)(?:/(?:(? :[a-zA-Z0-9\-_.!~*'():@&=+$,]+|(?:%[a-fA-F0-9][a-fA-F0-9]))* ))*))(?:;type=(?:[AIai]))?))?)|(?:(?:tel):(?:(?:(?:[+](?:[0- 9\-.()]+)(?:;isub=[0-9\-.()]+)?(?:;postd=[0-9\-.()*#ABCDwp]+ )?(?:(?:;(?:phone-context)=(?:(?:(?:[+][0-9\-.()]+)|(?:[0-9\ -.()*#ABCDwp]+))|(?:(?:[!'E-OQ-VX-Z_e-oq-vx-z~]|(?:%(?:2[124 -7CFcf]|3[AC-Fac-f]|4[05-9A-Fa-f]|5[1-689A-Fa-f]|6[05-9A-Fa- f]|7[1-689A-Ea-e])))(?:[!'()*\-.0-9A-Z_a-z~]+|(?:%(?:2[1-9A- Fa-f]|3[AC-Fac-f]|[4-6][0-9A-Fa-f]|7[0-9A-Ea-e])))*)))|(?:;( ?:tsp)=(?: |(?:(?:(?:[A-Za-z](?:(?:(?:[-A-Za-z0-9]+)){0,61}[A-Za-z
0-9])?)(?:[.](?:[A-Za-z](?:(?:(?:[-A-Za-z0-9]+)){0,61}[A-Za- z0-9])?))*))))|(?:;(?:(?:[!'*\-.0-9A-Z_a-z~]+|%(?:2[13-7ABDE abde]|3[0-9]|4[1-9A-Fa-f]|5[AEFaef]|6[0-9A-Fa-f]|7[0-9ACEace ]))*)(?:=(?:(?:(?:(?:[!'*\-.0-9A-Z_a-z~]+|%(?:2[13-7ABDEabde ]|3[0-9]|4[1-9A-Fa-f]|5[AEFaef]|6[0-9A-Fa-f]|7[0-9ACEace]))* )(?:[?](?:(?:[!'*\-.0-9A-Z_a-z~]+|%(?:2[13-7ABDEabde]|3[0-9] |4[1-9A-Fa-f]|5[AEFaef]|6[0-9A-Fa-f]|7[0-9ACEace]))*))?)|(?: %22(?:(?:%5C(?:[a-zA-Z0-9\-_.!~*'()]|(?:%[a-fA-F0-9][a-fA-F0 -9])))|[a-zA-Z0-9\-_.!~*'()]+|(?:%(?:[01][a-fA-F0-9])|2[013- 9A-Fa-f]|[3-9A-Fa-f][a-fA-F0-9]))*%22)))?))*)|(?:[0-9\-.()*# ABCDwp]+(?:;isub=[0-9\-.()]+)?(?:;postd=[0-9\-.()*#ABCDwp]+) ?(?:;(?:phone-context)=(?:(?:(?:[+][0-9\-.()]+)|(?:[0-9\-.() *#ABCDwp]+))|(?:(?:[!'E-OQ-VX-Z_e-oq-vx-z~]|(?:%(?:2[124-7CF cf]|3[AC-Fac-f]|4[05-9A-Fa-f]|5[1-689A-Fa-f]|6[05-9A-Fa-f]|7 [1-689A-Ea-e])))(?:[!'()*\-.0-9A-Z_a-z~]+|(?:%(?:2[1-9A-Fa-f ]|3[AC-Fac-f]|[4-6][0-9A-Fa-f]|7[0-9A-Ea-e])))*)))(?:(?:;(?: phone-context)=(?:(?:(?:[+][0-9\-.()]+)|(?:[0-9\-.()*#ABCDwp ]+))|(?:(?:[!'E-OQ-VX-Z_e-oq-vx-z~]|(?:%(?:2[124-7CFcf]|3[AC -Fac-f]|4[05-9A-Fa-f]|5[1-689A-Fa-f]|6[05-9A-Fa-f]|7[1-689A- Ea-e])))(?:[!'()*\-.0-9A-Z_a-z~]+|(?:%(?:2[1-9A-Fa-f]|3[AC-F ac-f]|[4-6][0-9A-Fa-f]|7[0-9A-Ea-e])))*)))|(?:;(?:tsp)=(?: |(?:(?:(?:[A-Za-z](?:(?:(?:[-A-Za-z0-9]+)){0,61}[A-Za-z0-9]) ?)(?:[.](?:[A-Za-z](?:(?:(?:[-A-Za-z0-9]+)){0,61}[A-Za-z0-9] )?))*))))|(?:;(?:(?:[!'*\-.0-9A-Z_a-z~]+|%(?:2[13-7ABDEabde] |3[0-9]|4[1-9A-Fa-f]|5[AEFaef]|6[0-9A-Fa-f]|7[0-9ACEace]))*) (?:=(?:(?:(?:(?:[!'*\-.0-9A-Z_a-z~]+|%(?:2[13-7ABDEabde]|3[0 -9]|4[1-9A-Fa-f]|5[AEFaef]|6[0-9A-Fa-f]|7[0-9ACEace]))*)(?:[ ?](?:(?:[!'*\-.0-9A-Z_a-z~]+|%(?:2[13-7ABDEabde]|3[0-9]|4[1- 9A-Fa-f]|5[AEFaef]|6[0-9A-Fa-f]|7[0-9ACEace]))*))?)|(?:%22(? :(?:%5C(?:[a-zA-Z0-9\-_.!~*'()]|(?:%[a-fA-F0-9][a-fA-F0-9])) )|[a-zA-Z0-9\-_.!~*'()]+|(?:%(?:[01][a-fA-F0-9])|2[013-9A-Fa -f]|[3-9A-Fa-f][a-fA-F0-9]))*%22)))?))*))))|(?:(?:fax):(?:(? :(?:[+](?:[0-9\-.()]+)(?:;isub=[0-9\-.()]+)?(?:;tsub=[0-9\-. ()]+)?(?:;postd=[0-9\-.()*#ABCDwp]+)?(?:(?:;(?:phone-context )=(?:(?
:(?:[+][0-9\-.()]+)|(?:[0-9\-.()*#ABCDwp]+))|(?:(?:[!'E-OQ-V X-Z_e-oq-vx-z~]|(?:%(?:2[124-7CFcf]|3[AC-Fac-f]|4[05-9A-Fa-f ]|5[1-689A-Fa-f]|6[05-9A-Fa-f]|7[1-689A-Ea-e])))(?:[!'()*\-. 0-9A-Z_a-z~]+|(?:%(?:2[1-9A-Fa-f]|3[AC-Fac-f]|[4-6][0-9A-Fa- f]|7[0-9A-Ea-e])))*)))|(?:;(?:tsp)=(?: |(?:(?:(?:[A-Za-z](?:(?:(?:[-A-Za-z0-9]+)){0,61}[A-Za-z0-9]) ?)(?:[.](?:[A-Za-z](?:(?:(?:[-A-Za-z0-9]+)){0,61}[A-Za-z0-9] )?))*))))|(?:;(?:(?:[!'*\-.0-9A-Z_a-z~]+|%(?:2[13-7ABDEabde] |3[0-9]|4[1-9A-Fa-f]|5[AEFaef]|6[0-9A-Fa-f]|7[0-9ACEace]))*) (?:=(?:(?:(?:(?:[!'*\-.0-9A-Z_a-z~]+|%(?:2[13-7ABDEabde]|3[0 -9]|4[1-9A-Fa-f]|5[AEFaef]|6[0-9A-Fa-f]|7[0-9ACEace]))*)(?:[ ?](?:(?:[!'*\-.0-9A-Z_a-z~]+|%(?:2[13-7ABDEabde]|3[0-9]|4[1- 9A-Fa-f]|5[AEFaef]|6[0-9A-Fa-f]|7[0-9ACEace]))*))?)|(?:%22(? :(?:%5C(?:[a-zA-Z0-9\-_.!~*'()]|(?:%[a-fA-F0-9][a-fA-F0-9])) )|[a-zA-Z0-9\-_.!~*'()]+|(?:%(?:[01][a-fA-F0-9])|2[013-9A-Fa -f]|[3-9A-Fa-f][a-fA-F0-9]))*%22)))?))*)|(?:[0-9\-.()*#ABCDw p]+(?:;isub=[0-9\-.()]+)?(?:;tsub=[0-9\-.()]+)?(?:;postd=[0- 9\-.()*#ABCDwp]+)?(?:;(?:phone-context)=(?:(?:(?:[+][0-9\-.( )]+)|(?:[0-9\-.()*#ABCDwp]+))|(?:(?:[!'E-OQ-VX-Z_e-oq-vx-z~] |(?:%(?:2[124-7CFcf]|3[AC-Fac-f]|4[05-9A-Fa-f]|5[1-689A-Fa-f ]|6[05-9A-Fa-f]|7[1-689A-Ea-e])))(?:[!'()*\-.0-9A-Z_a-z~]+|( ?:%(?:2[1-9A-Fa-f]|3[AC-Fac-f]|[4-6][0-9A-Fa-f]|7[0-9A-Ea-e] )))*)))(?:(?:;(?:phone-context)=(?:(?:(?:[+][0-9\-.()]+)|(?: [0-9\-.()*#ABCDwp]+))|(?:(?:[!'E-OQ-VX-Z_e-oq-vx-z~]|(?:%(?: 2[124-7CFcf]|3[AC-Fac-f]|4[05-9A-Fa-f]|5[1-689A-Fa-f]|6[05-9 A-Fa-f]|7[1-689A-Ea-e])))(?:[!'()*\-.0-9A-Z_a-z~]+|(?:%(?:2[ 1-9A-Fa-f]|3[AC-Fac-f]|[4-6][0-9A-Fa-f]|7[0-9A-Ea-e])))*)))| (?:;(?:tsp)=(?: |(?:(?:(?:[A-Za-z](?:(?:(?:[-A-Za-z0-9]+)){0,61}[A-Za-z0-9]) ?)(?:[.](?:[A-Za-z](?:(?:(?:[-A-Za-z0-9]+)){0,61}[A-Za-z0-9] )?))*))))|(?:;(?:(?:[!'*\-.0-9A-Z_a-z~]+|%(?:2[13-7ABDEabde] |3[0-9]|4[1-9A-Fa-f]|5[AEFaef]|6[0-9A-Fa-f]|7[0-9ACEace]))*) (?:=(?:(?:(?:(?:[!'*\-.0-9A-Z_a-z~]+|%(?:2[13-7ABDEabde]|3[0 -9]|4[1-9A-Fa-f]|5[AEFaef]|6[0-9A-Fa-f]|7[0-9ACEace]))*)(?:[ ?](?:(?:[!'*\-.0-9A-Z_a-z~]+|%(?:2[13-7ABDEabde]|3[0-9]|4[1- 9A-Fa-f]|5[
AEFaef]|6[0-9A-Fa-f]|7[0-9ACEace]))*))?)|(?:%22(?:(?:%5C(?:[ a-zA-Z0-9\-_.!~*'()]|(?:%[a-fA-F0-9][a-fA-F0-9])))|[a-zA-Z0- 9\-_.!~*'()]+|(?:%(?:[01][a-fA-F0-9])|2[013-9A-Fa-f]|[3-9A-F a-f][a-fA-F0-9]))*%22)))?))*))))|(?:(?:prospero)://(?:(?:(?: (?:(?:(?:[a-zA-Z0-9][-a-zA-Z0-9]*)?[a-zA-Z0-9])[.])*(?:[a-zA -Z][-a-zA-Z0-9]*[a-zA-Z0-9]|[a-zA-Z]))|(?:[0-9]+[.][0-9]+[.] [0-9]+[.][0-9]+)))(?::(?:(?:[0-9]+)))?/(?:(?:(?:(?:[-a-zA-Z0 -9$_.+!*'(),?:@&=]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*)(?:/(?:(?: [-a-zA-Z0-9$_.+!*'(),?:@&=]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*)) *))(?:(?:;(?:(?:[-a-zA-Z0-9$_.+!*'(),?:@&]+|(?:%[a-fA-F0-9][ a-fA-F0-9]))*)=(?:(?:[-a-zA-Z0-9$_.+!*'(),?:@&]+|(?:%[a-fA-F 0-9][a-fA-F0-9]))*))*))|(?:(?:tv):(?:(?:(?:(?:(?:[a-zA-Z0-9] [-a-zA-Z0-9]*)?[a-zA-Z0-9])[.])*(?:[a-zA-Z][-a-zA-Z0-9]*[a-z A-Z0-9]|[a-zA-Z])[.]?))?)|(?:(?:telnet)://(?:(?:(?:(?:(?:[-a -zA-Z0-9$_.+!*'(),;?&=]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*))(?:: (?:(?:(?:[-a-zA-Z0-9$_.+!*'(),;?&=]+|(?:%[a-fA-F0-9][a-fA-F0 -9]))*)))?)@)?(?:(?:(?:(?:(?:(?:(?:[a-zA-Z0-9][-a-zA-Z0-9]*) ?[a-zA-Z0-9])[.])*(?:[a-zA-Z][-a-zA-Z0-9]*[a-zA-Z0-9]|[a-zA- Z]))|(?:[0-9]+[.][0-9]+[.][0-9]+[.][0-9]+)))(?::(?:(?:[0-9]+ )))?)(?:/)?)|(?:(?:news):(?:(?:[*]|(?:(?:[-a-zA-Z0-9$_.+!*'( ),;/?:&=]+|(?:%[a-fA-F0-9][a-fA-F0-9]))+@(?:(?:(?:(?:(?:[a-z A-Z0-9][-a-zA-Z0-9]*)?[a-zA-Z0-9])[.])*(?:[a-zA-Z][-a-zA-Z0- 9]*[a-zA-Z0-9]|[a-zA-Z]))|(?:[0-9]+[.][0-9]+[.][0-9]+[.][0-9 ]+)))|(?:[a-zA-Z][-A-Za-z0-9.+_]*))))|(?:(?:wais)://(?:(?:(? :(?:(?:(?:[a-zA-Z0-9][-a-zA-Z0-9]*)?[a-zA-Z0-9])[.])*(?:[a-z A-Z][-a-zA-Z0-9]*[a-zA-Z0-9]|[a-zA-Z]))|(?:[0-9]+[.][0-9]+[. ][0-9]+[.][0-9]+)))(?::(?:(?:[0-9]+)))?/(?:(?:(?:(?:[-a-zA-Z 0-9$_.+!*'(),]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*))(?:[?](?:(?:( ?:[-a-zA-Z0-9$_.+!*'(),;:@&=]+|(?:%[a-fA-F0-9][a-fA-F0-9]))* ))|/(?:(?:(?:[-a-zA-Z0-9$_.+!*'(),]+|(?:%[a-fA-F0-9][a-fA-F0 -9]))*))/(?:(?:(?:[-a-zA-Z0-9$_.+!*'(),]+|(?:%[a-fA-F0-9][a- fA-F0-9]))*)))?))|(?:(?:gopher)://(?:(?:(?:(?:(?:(?:[a-zA-Z0 -9][-a-zA-Z0-9]*)?[a-zA-Z0-9])[.])*(?:[a-zA-Z][-a-zA-Z0-9]*[ a-zA-Z0-9]|[a-zA-Z]))|(?:[0-9]+[.][0-9]+[.][0-9]+[.][0-9]+)) )(?::(
?:(?:[0-9]+)))?/(?:(?:(?:[0-9+IgT]))(?:(?:(?:[-a-zA-Z0-9$_.+ !*'(),:@&=]+|(?:%[a-fA-F0-9][a-fA-F0-9]))*))))|(?:(?:pop):// (?:(?:(?:(?:[-a-zA-Z0-9$_.+!*'(),&=~]+|(?:%[a-fA-F0-9][a-fA- F0-9]))+))(?:;AUTH=(?:[*]|(?:(?:(?:[-a-zA-Z0-9$_.+!*'(),&=~] +|(?:%[a-fA-F0-9][a-fA-F0-9]))+)|(?:[+](?:APOP|(?:(?:[-a-zA- Z0-9$_.+!*'(),&=~]+|(?:%[a-fA-F0-9][a-fA-F0-9]))+))))))?@)?( ?:(?:(?:(?:(?:(?:[a-zA-Z0-9][-a-zA-Z0-9]*)?[a-zA-Z0-9])[.])* (?:[a-zA-Z][-a-zA-Z0-9]*[a-zA-Z0-9]|[a-zA-Z]))|(?:[0-9]+[.][ 0-9]+[.][0-9]+[.][0-9]+)))(?::(?:(?:[0-9]+)))?))
I don't believe in capturing a few invalid ones - nor in rejected valid ones.

Abigail
--
$_ = "\x3C\x3C\x45\x4F\x54"; s/< Just another Perl Hacker
EOT

Re: Matching URLs with REs (was "Some questions about q{} andqr{}").

am 16.04.2008 00:46:16 von benkasminbullock

On Tue, 15 Apr 2008 13:35:27 -0700, Robbie Hatley wrote:

> "Ben Bullock" wrote:
>
>> Well OK but if I was going to do this for real, I would use something
>> like /\b(($validdns\.){1,62}(com|net|org|us|uk|ca|jp))\b/i or similar
>> (I haven't checked this regex with the machine yet but hopefully you
>> get the picture).
>
> The problem with "(com|net|org|us|uk|ca|jp)" or similar is that there
> are hundreds or thousands of such valid domain suffixes.

I think there are only about 200 or so, most of which are rare.

> You're
> forgetting "es" (Spain), "ru" (Russia), "uk" (Ukraine), "us" (USA), not
> to mention "mil", "gov", "edu", "biz", "info", etc, etc, etc.

Um, I have both "us" and "uk" there. I didn't know that uk was Ukraine
though.

> That's
> part of why my URL-matching regex was so vague.

>> I just wanted to make the point that the &$% stuff is not valid as part
>> of the web address.
>
> Those characters all appear in web addresses.

Did you really not understand my point?

> Hence I tend to go for a vauge RE that I believe
> captures every valid document URL, at the cost of occasionally
> caputuring a few invalid ones. Unless someone knows a better approach.

Well, even if they do know a better approach, they might not have the
energy to discuss it with you.

Re: Matching URLs with REs (was "Some questions about q{} and qr{}").

am 16.04.2008 01:44:53 von 1usa

Abigail wrote in
news:slrng0a5g5.uuv.abigail@alexandra.abigail.be:

> _
> Robbie Hatley (see.my.signature@for.my.email.address) wrote on
> VCCCXLI September MCMXCIII in
> : `'
....
> `' Unless someone knows a better approach.
>
>
> You mean, something like:
>
> (?:(?:(?:http)://(?:(?:(?:(?:(?:(?:[a-zA-Z0-9][-a-zA-Z0-9]*) ?[a

OK, now you are just showing off ;-)

Joking aside, that giant block shows the utility of building regular
expressions from small building blocks.

In any case, I would like to take this opportunity to thank you
for Regexp::Common. It has saved me a lot of work over time.

The OP would benefit from using

http://search.cpan.org/~abigail/Regexp-Common-2.120/lib/Rege xp/Common/URI.pm

as opposed to resigning himself to second or third or nth rate
'solutions'.

Thank you.

Sinan

--
A. Sinan Unur <1usa@llenroc.ude.invalid>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://www.rehabitation.com/clpmisc/

Re: Matching URLs with REs (was "Some questions about q{} and qr{}").

am 16.04.2008 02:28:58 von Robbie Hatley

"Abigail" put forth into the annals of Usenet:

> (a rather large URL-capturing regex)

Hmmm... I'm curious, did you write that manually, or generate it
programmatically? If generated, using what software?

And how many decaseconds does it take a regex compiler to process
that?

> I don't believe in capturing a few invalid ones - nor in
> rejected valid ones.

I believe in simplicity over perfection. Given these choices:
A. Make a 100% perfect program taking 284 man-hours
B. Make a 97% perfect program taking 5 man-hours
I usually take B.

--
perl -le 'print "\122\157b\142\151e\40\110\141t\154\145y";'
perl -le 'print "\124\165s\164\151n\54\40\103A\54\40\125\123A";'
perl -le 'print "\154one\167olf\100\167ell\56\143om\n";'
perl -le 'print scalar reverse "/flowenol~/moc.llew.www//\72ptth";'

Re: Matching URLs with REs (was "Some questions about q{} and qr{}").

am 16.04.2008 02:47:03 von szr

Ben Bullock wrote:
> On Tue, 15 Apr 2008 13:35:27 -0700, Robbie Hatley wrote:
>
>> "Ben Bullock" wrote:
>>
>>> Well OK but if I was going to do this for real, I would use
>>> something like
>>> /\b(($validdns\.){1,62}(com|net|org|us|uk|ca|jp))\b/i or similar (I
>>> haven't checked this regex with the machine yet but hopefully you
>>> get the picture).
>>
>> The problem with "(com|net|org|us|uk|ca|jp)" or similar is that there
>> are hundreds or thousands of such valid domain suffixes.
>
> I think there are only about 200 or so, most of which are rare.
>
>> You're
>> forgetting "es" (Spain), "ru" (Russia), "uk" (Ukraine), "us" (USA),
>> not to mention "mil", "gov", "edu", "biz", "info", etc, etc, etc.
>
> Um, I have both "us" and "uk" there. I didn't know that uk was Ukraine
> though.

According to http://www.iana.org/domains/root/db/, ".uk" is United
Kingdom, and ".ua" is Ukraine (".gb" is also reserved and labeled for
the United Kingdom, though ".uk" was used instead.)

--
szr

Re: Matching URLs with REs (was "Some questions about q{} and qr{}").

am 16.04.2008 02:50:55 von 1usa

"Robbie Hatley" wrote in
news:ntednUYddefT1ZjVnZ2dnUVZ_uCinZ2d@giganews.com:

>
> "Abigail" put forth into the annals of Usenet:
>
>> (a rather large URL-capturing regex)
>
> Hmmm... I'm curious, did you write that manually, or generate it
> programmatically? If generated, using what software?

You can read how it is done by looking at the sources of
Regexp::Common modules. It is very elegant.

> And how many decaseconds does it take a regex compiler to process
> that?

Not so much that it matters. You might want to measure performance if
you care so much.

>> I don't believe in capturing a few invalid ones - nor in
>> rejected valid ones.
>
> I believe in simplicity over perfection. Given these choices:
> A. Make a 100% perfect program taking 284 man-hours
> B. Make a 97% perfect program taking 5 man-hours
> I usually take B.

But, of course, using Regexp::Common would have cut that down to 2
minutes for a perfect program.

Sinan
--
A. Sinan Unur <1usa@llenroc.ude.invalid>
(remove .invalid and reverse each component for email address)

comp.lang.perl.misc guidelines on the WWW:
http://www.rehabitation.com/clpmisc/

Re: Matching URLs with REs (was "Some questions about q{} and qr{}").

am 16.04.2008 11:24:34 von Abigail

_
Robbie Hatley (see.my.signature@for.my.email.address) wrote on VCCCXLII
September MCMXCIII in :
{}
{} "Abigail" put forth into the annals of Usenet:
{}
{} > (a rather large URL-capturing regex)
{}
{} Hmmm... I'm curious, did you write that manually, or generate it
{} programmatically? If generated, using what software?
{}
{} And how many decaseconds does it take a regex compiler to process
{} that?
{}
{} > I don't believe in capturing a few invalid ones - nor in
{} > rejected valid ones.
{}
{} I believe in simplicity over perfection. Given these choices:
{} A. Make a 100% perfect program taking 284 man-hours
{} B. Make a 97% perfect program taking 5 man-hours
{} I usually take B.

Well, that all depends, doesn't it? If you're getting paid only if it
passes the customers testing phase, you are more likely to choose A.

Furthermore, I do hope the programmer that programmed the computer in
my car didn't go for B.

Besides, where do you stop? You might as well use /./, having almost no
developer time, and it would still not reject any valid URI.

BTW, it didn't take me more than 5 hours to generate the regexp I gave you.
Writing the test suite OTOH did.

Abigail
--
perl -MLWP::UserAgent -MHTML::TreeBuilder -MHTML::FormatText -wle'print +(
HTML::FormatText -> new -> format (HTML::TreeBuilder -> new -> parse (
LWP::UserAgent -> new -> request (HTTP::Request -> new ("GET",
"http://work.ucsd.edu:5141/cgi-bin/http_webster?isindex=perl ")) -> content))
=~ /(.*\))[-\s]+Addition/s) [0]'

Re: Matching URLs with REs (was "Some questions about q{} and qr{}").

am 16.04.2008 21:23:08 von Robbie Hatley

Ben Bullock wrote:

> I think there are only about 200 or so, most of which are rare.

Well, I'm not going to spend the time to make a soon-to-be-obsolete-anyway regex
out of 200 disparate chunks. For my purposes, the regexes in my program seem to be
working fine. If I need something more perfect, I'll take the advice of the folks who pointed
out that CPAN module to me. (A bit more on this in a separate reply.)

> Did you really not understand my point?

I'd like to say, "You have a point there!", but I'm afraid you need a sharpener for that;
it seems a bit dull at the tip.

> Well, even if they do know a better approach, they might not have the
> energy to discuss it with you.

Some do. For those who don't, a 24oz coffee is $1.59 at 7-11.

--
perl -le 'print "\122\157b\142\151e\40\110\141t\154\145y";'
perl -le 'print "\124\165s\164\151n\54\40\103A\54\40\125\123A";'
perl -le 'print "\154one\167olf\100\167ell\56\143om\n";'
perl -le 'print scalar reverse "/flowenol~/moc.llew.www//\72ptth";'

Re: Matching URLs with REs (was "Some questions about q{} and qr{}").

am 16.04.2008 21:49:32 von Robbie Hatley

"A. Sinan Unur" wrote:

> "Robbie Hatley" wrote:
>
> > "Abigail" put forth into the annals of Usenet:
> >
> >> (a rather large URL-capturing regex)
> >
> > Hmmm... I'm curious, did you write that manually, or generate it
> > programmatically? If generated, using what software?
>
> You can read how it is done by looking at the sources of
> Regexp::Common modules. It is very elegant.

Ok, will do.

> ... using Regexp::Common would have cut that down to 2
> minutes for a perfect program ...

I'll try that.

Ok, I just downloaded Regexp-Common-2.120. Now I have a folder
with a bunch of stuff in it. This may sound like an incredibly
stupid question, but what do I do with it? I've never actually
used a CPAN module before. Any hints a CPAN newbie should be
aware of?

--
Cheers,
Robbie Hatley
lonewolf aatt well dott com
www dott well dott com slant user slant lonewolf slant

Re: Matching URLs with REs (was "Some questions about q{} and qr{}").

am 16.04.2008 22:47:53 von Abigail

_
Robbie Hatley (see.my.signature@for.my.email.address) wrote on VCCCXLII
September MCMXCIII in :
,,
,, "A. Sinan Unur" wrote:
,,
,, > "Robbie Hatley" wrote:
,, >
,, > > "Abigail" put forth into the annals of Usenet:
,, > >
,, > >> (a rather large URL-capturing regex)
,, > >
,, > > Hmmm... I'm curious, did you write that manually, or generate it
,, > > programmatically? If generated, using what software?
,, >
,, > You can read how it is done by looking at the sources of
,, > Regexp::Common modules. It is very elegant.
,,
,, Ok, will do.
,,
,, > ... using Regexp::Common would have cut that down to 2
,, > minutes for a perfect program ...
,,
,, I'll try that.
,,
,, Ok, I just downloaded Regexp-Common-2.120. Now I have a folder
,, with a bunch of stuff in it. This may sound like an incredibly
,, stupid question, but what do I do with it? I've never actually
,, used a CPAN module before. Any hints a CPAN newbie should be
,, aware of?

One could start with reading a file called README.

Abigail
--
print v74.117.115.116.32.97.110.111.116.104.101.114.
v32.80.101.114.108.32.72.97.99.107.101.114.10;

Re: Matching URLs with REs (was "Some questions about q{} andqr{}").

am 17.04.2008 02:11:56 von benkasminbullock

On Wed, 16 Apr 2008 12:49:32 -0700, Robbie Hatley wrote:

> Ok, I just downloaded Regexp-Common-2.120. Now I have a folder with a
> bunch of stuff in it. This may sound like an incredibly stupid
> question, but what do I do with it? I've never actually used a CPAN
> module before. Any hints a CPAN newbie should be aware of?

If I want to install a cpan module, I usually don't directly download
the .tar.gz file. Instead I log in as root and type

cpan Regexp::Common

You might need to prefix that with "sudo" if you are using Ubuntu/Debian
linux.

If you are using ActiveState Perl on Windows, you are better off using
"ppm", the Perl Package Manager, which has precompiled versions of the
modules.

Re: Matching URLs with REs (was "Some questions about q{} and qr{}").

am 17.04.2008 12:56:53 von Tad J McClellan

Robbie Hatley wrote:

> Ok, I just downloaded Regexp-Common-2.120. Now I have a folder
> with a bunch of stuff in it. This may sound like an incredibly
> stupid question, but what do I do with it? I've never actually
> used a CPAN module before. Any hints a CPAN newbie should be
> aware of?

perldoc -q CPAN
or
perldoc -q module

How do I install a module from CPAN?

--
Tad McClellan
email: perl -le "print scalar reverse qq/moc.noitatibaher\100cmdat/"