URL Trimming s{} regex
am 12.02.2008 21:04:38 von Deane.Rothenmaier
This is a multipart message in MIME format.
--===============0036812468==
Content-Type: multipart/alternative;
boundary="=_alternative 006E4961862573ED_="
This is a multipart message in MIME format.
--=_alternative 006E4961862573ED_=
Content-Type: text/plain; charset="US-ASCII"
Hi, all.
I'm having a relatively simple problem, trying to trim stuff off the end
of a URL.
INPUT:
http://www.wilbert.net
www.worful.com/after-text.htm
http://www.winston.com/argle.htm
www.windoze.net
DESIRED OUTPUT:
http://www.wilbert.net
www.worful.com
http://www.winston.com
www.windoze.net
Presently I'm trying the code:
if ($site =~ m{[^:/]/.*$}) {
$site =~ s{/.+}{};
}
which gives me the output:
http://www.wilbert.com
www.worful.com
http:
www.windoze.net
Obviously I'm doing something wrong for the third string, but I've pulled
out two or three hairs by now, trying to fix this. Anybody care to take a
shot?
TIA
Deane Rothenmaier
Programmer/Analyst
Walgreens Corp.
847-914-5150
Put three grains of sand in a vast cathedral, and the cathedral will be
more closely packed with sand than space is with stars. -- Sir James Jeans
--=_alternative 006E4961862573ED_=
Content-Type: text/html; charset="US-ASCII"
Hi, all.
I'm having a relatively simple problem,
trying to trim stuff off the end of a URL.
INPUT:
http://www.wilbert.net
www.worful.com/after-text.htm
http://www.winston.com/argle.htm
www.windoze.net
DESIRED OUTPUT:
http://www.wilbert.net
www.worful.com
http://www.winston.com
www.windoze.net
Presently I'm trying the code:
if ($site =~ m{[^:/]/.*$}) {
$site =~ s{/.+}{};
}
which gives me the output:
http://www.wilbert.com
www.worful.com
http:
www.windoze.net
Obviously I'm doing something wrong
for the third string, but I've pulled out two or three hairs by now, trying
to fix this. Anybody care to take a shot?
TIA
Deane Rothenmaier
Programmer/Analyst
Walgreens Corp.
847-914-5150
Put three grains of sand in a vast cathedral, and the cathedral will be
more closely packed with sand than space is with stars. -- Sir James Jeans
--=_alternative 006E4961862573ED_=--
--===============0036812468==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
--===============0036812468==--
Re: URL Trimming s{} regex
am 12.02.2008 21:31:16 von David Moreno
--===============0357896947==
Content-Type: multipart/alternative;
boundary="----=_Part_13123_11426754.1202848276853"
------=_Part_13123_11426754.1202848276853
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
It's a nice, a bit complex one. I'd try it as:
$url =~ m!\A(http://)?(.+?)(/.*)?\z!;
print $1 if $1;
print $2;
TIMTOWTDI.
D.
On Feb 12, 2008 3:04 PM, wrote:
>
> Hi, all.
>
> I'm having a relatively simple problem, trying to trim stuff off the end
> of a URL.
>
> INPUT:
> http://www.wilbert.net
> www.worful.com/after-text.htm
> http://www.winston.com/argle.htm
> www.windoze.net
>
> DESIRED OUTPUT:
> http://www.wilbert.net
> www.worful.com
> http://www.winston.com
> www.windoze.net
>
> Presently I'm trying the code:
>
> if ($site =~ m{[^:/]/.*$}) {
> $site =~ s{/.+}{};
> }
>
> which gives me the output:
> http://www.wilbert.com
> www.worful.com
> http:
> www.windoze.net
>
> Obviously I'm doing something wrong for the third string, but I've pulled
> out two or three hairs by now, trying to fix this. Anybody care to take a
> shot?
>
> TIA
>
> Deane Rothenmaier
> Programmer/Analyst
> Walgreens Corp.
> 847-914-5150
>
> Put three grains of sand in a vast cathedral, and the cathedral will be
> more closely packed with sand than space is with stars. -- Sir James Jeans
> _______________________________________________
> ActivePerl mailing list
> ActivePerl@listserv.ActiveState.com
> To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
>
--
David Moreno - http://www.damog.net/
Yes, you can.
------=_Part_13123_11426754.1202848276853
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
It's a nice, a bit complex one. I'd try it as:
$url =~ m!\A(http://)?(.+?)(/.*)?\z!;
print $1 if $1;
print $2;
TIMTOWTDI.
D.
On Feb 12, 2008 3:04 PM, <> wrote:
Hi, all.
I'm having a relatively simple problem,
trying to trim stuff off the end of a URL.
INPUT:
DESIRED OUTPUT:
Presently I'm trying the code:
if ($site =~ m{[^:/]/.*$}) {
$site =~ s{/.+}{};
}
which gives me the output:
http:
Obviously I'm doing something wrong
for the third string, but I've pulled out two or three hairs by now, trying
to fix this. Anybody care to take a shot?
TIA
Deane Rothenmaier
Programmer/Analyst
Walgreens Corp.
847-914-5150
Put three grains of sand in a vast cathedral, and the cathedral will be
more closely packed with sand than space is with stars. -- Sir James Jeans
_______________________________________________
ActivePerl mailing list
To unsubscribe:
Yes, you can.
------=_Part_13123_11426754.1202848276853--
--===============0357896947==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
--===============0357896947==--
Re: URL Trimming s{} regex
am 12.02.2008 22:01:35 von Williamawalters
--===============0961958152==
Content-Type: multipart/alternative;
boundary="-----------------------------1202850095"
-------------------------------1202850095
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit
hi deane --
In a message dated 2/12/2008 3:05:23 P.M. Eastern Standard Time,
Deane.Rothenmaier@walgreens.com writes:
> Hi, all.
>
> I'm having a relatively simple problem, trying to trim stuff
> off the end of a URL.
>
> INPUT:
> _http://www.wilbert.net_ (http://www.wilbert.net)
> _www.worful.com/after-text.htm_ (http://www.worful.com/after-text.htm)
> _http://www.winston.com/argle.htm_ (http://www.winston.com/argle.htm)
> _www.windoze.net_ (http://www.windoze.net)
>
> DESIRED OUTPUT:
> _http://www.wilbert.net_ (http://www.wilbert.net)
> _www.worful.com_ (http://www.worful.com)
> _http://www.winston.com_ (http://www.winston.com)
> _www.windoze.net_ (http://www.windoze.net)
how about
perl -wMstrict -e
"my $trim = qr{ (?
for (@ARGV) { s{ $trim }{}xms; print qq($_ \n); }
"
_http://www.wilbert.net_ (http://www.wilbert.net)
_www.worful.com/after-text.htm_ (http://www.worful.com/after-text.htm)
_http://www.winston.com/argle.htm_ (http://www.winston.com/argle.htm)
_www.windoze.net_ (http://www.windoze.net)
_http://www.wilbert.net_ (http://www.wilbert.net)
_www.worful.com_ (http://www.worful.com)
_http://www.winston.com_ (http://www.winston.com)
_www.windoze.net_ (http://www.windoze.net)
hth -- bill walters
**************The year's hottest artists on the red carpet at the Grammy
Awards. Go to AOL Music.
(http://music.aol.com/grammys?NCID=aolcmp00300000002565)
-------------------------------1202850095
Content-Type: text/html; charset="US-ASCII"
Content-Transfer-Encoding: quoted-printable
Arial"=20
bottomMargin=3D7 leftMargin=3D7 topMargin=3D7 rightMargin=3D7>
e_document=20
face=3DArial color=3D#000000 size=3D2>
hi deane --
In a message dated 2/12/2008 3:05:23 P.M. Eastern Standard Time,=20
Deane.Rothenmaier@walgreens.com writes:
> Hi, all.
>
> I'm having a relatively simple problem,=
=20
trying to trim stuff
> off the end of a URL.
>
> INPUT:=20
>
&=
gt;
href=3D"http://www.worful.com/after-text.htm">www.worful.com /after-text.htm<=
/A>=20
>
href=3D"http://www.winston.com/argle.htm">http://www.winston .com/argle.htm=
A>=20
>
>
>>=20
DESIRED OUTPUT:
>
href=3D"http://www.wilbert.net">http://www.wilbert.net
>
href=3D"http://www.worful.com">www.worful.com
>
href=3D"http://www.winston.com">http://www.winston.com
>
href=3D"http://www.windoze.net">www.windoze.net
how about
perl -wMstrict -e
"my $trim =3D qr{ (?<! : | /) / .* $ }xms; =
;=20
print qq(\n);
for (@ARGV) { s{ $trim }{}xms; print qq($_ \n);=
=20
}
"
 =
;
href=3D"http://www.worful.com/after-text.htm">www.worful.com /after-text.htm<=
/A>
href=3D"http://www.winston.com/argle.htm">http://www.winston .com/argle.htm=
A> =20
href=3D"http://www.worful.com">www.worful.com
href=3D"http://www.winston.com">http://www.winston.com
href=3D"http://www.windoze.net">www.windoze.net
hth -- bill walters
font: normal 10pt ARIAL, SAN-SERIF;">
The year=
's hottest artists on the red carpet at the Grammy Awards.
://music.aol.com/grammys?NCID=3Daolcmp00300000002565" href=3D"http://music.a=
ol.com/grammys?NCID=3Daolcmp00300000002565" target=3D"_blank">AOL Music take=
s you there.
-------------------------------1202850095--
--===============0961958152==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
--===============0961958152==--
Re: URL Trimming s{} regex
am 12.02.2008 23:02:02 von Andy_Bach
David Moreno wrote:
> It's a nice, a bit complex one. I'd try it as:
>
> $url =~ m!\A(http://)?(.+?)(/.*)?\z!;
> print $1 if $1;
> print $2;
>
> TIMTOWTDI.
>
Be a little more flexible ( inner non-capturing parens ( "(?: ...)" )
add https and, if needed "ftp" or "ldap" or ... and "/i" for case
insensitive) and always test, not assume a match. And if you know your
separator/marker (the slash) use that rather than 'dot':
if ( $url =~ m!\A((?:http|https)://)?([^/]+)!i ) {
print $1 if $1;
print $2;
} # else 'no URL'
Don't need to match (or not) the stuff after the end of the 'not slash'
part, as you don't care about it ... though you may need to 'chomp' $url
first (or deal w/ the "\n" if it's there - depends upon your loop). If
you're serious, though, there are a number of modules for this URL
finding that'll do it right for nearly everything legit - it's harder
than you'd think. J. Freidl's ("Mastering Regular Expressions" O'Reilly
http://www.oreilly.com/catalog/regex3/index.html
http://regex.info/
) URL matching masterpiece runs to 9 embedded REs and yikes but here's a
simpler one:
if ($url =~ m{^https?://([^/:]+)(:(\d+))?(/.*)?$}i)
{
my $host = $1;
my $port = $3 || 80; #/ Use $3 if it exists; otherwise default to 80./
my $path = $4 || "/"; #/ Use $4 if it exists; otherwise default to "/"./
print "Host: $host\n";
print "Port: $port\n";
print "Path: $path\n";
} else {
print "Not an HTTP URL\n";
}
--
Andy Bach, Sys. Mangler
Internet: andy_bach@wiwb.uscourts.gov
VOICE: (608) 261-5738 FAX 264-5932
The only function of economic forecasting is
to make astrology look respectable.
- John Kenneth Galbraith
_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
RE: URL Trimming s{} regex
am 13.02.2008 11:49:32 von Brian Raven
From: activeperl-bounces@listserv.ActiveState.com
[mailto:activeperl-bounces@listserv.ActiveState.com] On Behalf Of
Deane.Rothenmaier@walgreens.com
Sent: 12 February 2008 20:05
To: activeperl@listserv.ActiveState.com
Subject: URL Trimming s{} regex
> Hi, all. =
> =
> I'm having a relatively simple problem, trying to trim stuff off the
end of a URL. =
> =
> INPUT: =
> http://www.wilbert.net =
> www.worful.com/after-text.htm =
> http://www.winston.com/argle.htm =
> www.windoze.net =
> =
> DESIRED OUTPUT: =
> http://www.wilbert.net =
> www.worful.com =
> http://www.winston.com =
> www.windoze.net =
> =
> Presently I'm trying the code: =
> =
> if ($site =3D~ m{[^:/]/.*$}) { =
> $site =3D~ s{/.+}{}; =
> } =
> =
> which gives me the output: =
> http://www.wilbert.com =
> www.worful.com =
> http: =
> www.windoze.net =
> =
> Obviously I'm doing something wrong for the third string, but I've
pulled out two or three hairs by now, trying > to fix this. Anybody
care to take a shot? =
It can help to answer your question if you also say what you are trying
to achieve. Somebody might be able to direct you at a module that could
help with your problem. For example, the following code doesn't do
exactly what you ask, but it could be closer to what you need.
----------------------------------------------
use strict;
use warnings;
use URI::URL;
while (my $d =3D ) {
chomp $d;
my $url =3D URI::URL->new($d);
$url =3D URI::URL->new("http://$d") unless $url->scheme;
print $url->scheme, "://", $url->host, "\n";
}
__DATA__
http://www.wilbert.net
www.worful.com/after-text.htm =
http://www.winston.com/argle.htm
www.windoze.net
----------------------------------------------
.... but then again, maybe not. Just a thought.
HTH
-- =
Brian Raven =
==================== =====3D=
================
Atos Euronext Market Solutions Disclaimer
==================== =====3D=
================
The information contained in this e-mail is confidential and solely for the=
intended addressee(s). Unauthorised reproduction, disclosure, modification=
, and/or distribution of this email may be unlawful.
If you have received this email in error, please notify the sender immediat=
ely and delete it from your system. The views expressed in this message do =
not necessarily reflect those of Atos Euronext Market Solutions.
Atos Euronext Market Solutions Limited - Registered in England & Wales with=
registration no. 3962327. Registered office address at 25 Bank Street Lon=
don E14 5NQ United Kingdom. =
Atos Euronext Market Solutions SAS - Registered in France with registration=
no. 425 100 294. Registered office address at 6/8 Boulevard Haussmann 750=
09 Paris France.
L'information contenue dans cet e-mail est confidentielle et uniquement des=
tinee a la (aux) personnes a laquelle (auxquelle(s)) elle est adressee. Tou=
te copie, publication ou diffusion de cet email est interdite. Si cet e-mai=
l vous parvient par erreur, nous vous prions de bien vouloir prevenir l'exp=
editeur immediatement et d'effacer le e-mail et annexes jointes de votre sy=
steme. Le contenu de ce message electronique ne represente pas necessaireme=
nt la position ou le point de vue d'Atos Euronext Market Solutions.
Atos Euronext Market Solutions Limited Soci=E9t=E9 de droit anglais, enregi=
str=E9e au Royaume Uni sous le num=E9ro 3962327, dont le si=E8ge social se =
situe 25 Bank Street E14 5NQ Londres Royaume Uni.
Atos Euronext Market Solutions SAS, soci=E9t=E9 par actions simplifi=E9e, e=
nregistr=E9 au registre dui commerce et des soci=E9t=E9s sous le num=E9ro 4=
25 100 294 RCS Paris et dont le si=E8ge social se situe 6/8 Boulevard Hauss=
mann 75009 Paris France.
==================== =====3D=
================
_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
Re: URL Trimming s{} regex
am 11.03.2008 11:04:55 von cowboy
--===============0411050961==
Content-Type: multipart/alternative; boundary="0-823975917-1205229895=:86780"
Content-Transfer-Encoding: 7bit
--0-823975917-1205229895=:86780
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable
Hi,=20
I think the following code would be better one.
=20
while ($url =3D ) {
chomp($url);
$url =3D~ /^(?:http:\/\/)?([^\/]+)/;
print $1."\n";
}
__END__
http://www.wilbert.net
www.worful.com/after-text.htm
http://www.winston.com/argle.htm
www.windoze.net=20
Thanks and Regards,
Indra
=20
=20
---------------------------------
Chat on a cool, new interface. No download required. Click here.
--0-823975917-1205229895=:86780
Content-Type: text/html; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable
Hi,
I think the following code would be better one=
..
while ($url =3D <DATA>) {
&nbs=
p; chomp($url);
$url =3D~ /^(?:http:\/\/)?([^=
\/]+)/;
print $1."\n";
}
__END__
http:=
//www.wilbert.net
www.worful.com/after-text.htm
http://www.winston.=
com/argle.htm
www.windoze.net
Thanks and Regards,
Indra=
Chat on a cool, new interface. No downl=
oad required.
http://in.messenger.yahoo.com/webmessengerpromo.php">Click here.
--0-823975917-1205229895=:86780--
--===============0411050961==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline
_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
--===============0411050961==--