URL Trimming s{} regex

URL Trimming s{} regex

am 12.02.2008 21:04:38 von Deane.Rothenmaier

This is a multipart message in MIME format.
--===============0036812468==
Content-Type: multipart/alternative;
boundary="=_alternative 006E4961862573ED_="

This is a multipart message in MIME format.
--=_alternative 006E4961862573ED_=
Content-Type: text/plain; charset="US-ASCII"

Hi, all.

I'm having a relatively simple problem, trying to trim stuff off the end
of a URL.

INPUT:
http://www.wilbert.net
www.worful.com/after-text.htm
http://www.winston.com/argle.htm
www.windoze.net

DESIRED OUTPUT:
http://www.wilbert.net
www.worful.com
http://www.winston.com
www.windoze.net

Presently I'm trying the code:

if ($site =~ m{[^:/]/.*$}) {
$site =~ s{/.+}{};
}

which gives me the output:
http://www.wilbert.com
www.worful.com
http:
www.windoze.net

Obviously I'm doing something wrong for the third string, but I've pulled
out two or three hairs by now, trying to fix this. Anybody care to take a
shot?

TIA

Deane Rothenmaier
Programmer/Analyst
Walgreens Corp.
847-914-5150

Put three grains of sand in a vast cathedral, and the cathedral will be
more closely packed with sand than space is with stars. -- Sir James Jeans
--=_alternative 006E4961862573ED_=
Content-Type: text/html; charset="US-ASCII"



Hi, all.



I'm having a relatively simple problem,
trying to trim stuff off the end of a URL.




INPUT:

http://www.wilbert.net

www.worful.com/after-text.htm

http://www.winston.com/argle.htm

www.windoze.net



DESIRED OUTPUT:

http://www.wilbert.net

www.worful.com

http://www.winston.com

www.windoze.net



Presently I'm trying the code:



if ($site =~ m{[^:/]/.*$}) {

   $site =~ s{/.+}{};

}



which gives me the output:

http://www.wilbert.com

www.worful.com

http:

www.windoze.net



Obviously I'm doing something wrong
for the third string, but I've pulled out two or three hairs by now, trying
to fix this.  Anybody care to take a shot?




TIA



Deane Rothenmaier

Programmer/Analyst

Walgreens Corp.

847-914-5150



Put three grains of sand in a vast cathedral, and the cathedral will be
more closely packed with sand than space is with stars. -- Sir James Jeans

--=_alternative 006E4961862573ED_=--


--===============0036812468==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
--===============0036812468==--

Re: URL Trimming s{} regex

am 12.02.2008 21:31:16 von David Moreno

--===============0357896947==
Content-Type: multipart/alternative;
boundary="----=_Part_13123_11426754.1202848276853"

------=_Part_13123_11426754.1202848276853
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

It's a nice, a bit complex one. I'd try it as:

$url =~ m!\A(http://)?(.+?)(/.*)?\z!;
print $1 if $1;
print $2;

TIMTOWTDI.

D.

On Feb 12, 2008 3:04 PM, wrote:

>
> Hi, all.
>
> I'm having a relatively simple problem, trying to trim stuff off the end
> of a URL.
>
> INPUT:
> http://www.wilbert.net
> www.worful.com/after-text.htm
> http://www.winston.com/argle.htm
> www.windoze.net
>
> DESIRED OUTPUT:
> http://www.wilbert.net
> www.worful.com
> http://www.winston.com
> www.windoze.net
>
> Presently I'm trying the code:
>
> if ($site =~ m{[^:/]/.*$}) {
> $site =~ s{/.+}{};
> }
>
> which gives me the output:
> http://www.wilbert.com
> www.worful.com
> http:
> www.windoze.net
>
> Obviously I'm doing something wrong for the third string, but I've pulled
> out two or three hairs by now, trying to fix this. Anybody care to take a
> shot?
>
> TIA
>
> Deane Rothenmaier
> Programmer/Analyst
> Walgreens Corp.
> 847-914-5150
>
> Put three grains of sand in a vast cathedral, and the cathedral will be
> more closely packed with sand than space is with stars. -- Sir James Jeans
> _______________________________________________
> ActivePerl mailing list
> ActivePerl@listserv.ActiveState.com
> To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
>



--
David Moreno - http://www.damog.net/
Yes, you can.

------=_Part_13123_11426754.1202848276853
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

It's a nice, a bit complex one. I'd try it as:

$url =~ m!\A(http://)?(.+?)(/.*)?\z!;
print $1 if $1;
print $2;

TIMTOWTDI.

D.



On Feb 12, 2008 3:04 PM, <> wrote:



Hi, all.



I'm having a relatively simple problem,
trying to trim stuff off the end of a URL.




INPUT:











DESIRED OUTPUT:











Presently I'm trying the code:



if ($site =~ m{[^:/]/.*$}) {

   $site =~ s{/.+}{};

}



which gives me the output:





http:





Obviously I'm doing something wrong
for the third string, but I've pulled out two or three hairs by now, trying
to fix this.  Anybody care to take a shot?




TIA



Deane Rothenmaier

Programmer/Analyst

Walgreens Corp.

847-914-5150



Put three grains of sand in a vast cathedral, and the cathedral will be
more closely packed with sand than space is with stars. -- Sir James Jeans

_______________________________________________
ActivePerl mailing list


To unsubscribe:

Yes, you can.

------=_Part_13123_11426754.1202848276853--

--===============0357896947==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
--===============0357896947==--

Re: URL Trimming s{} regex

am 12.02.2008 22:01:35 von Williamawalters

--===============0961958152==
Content-Type: multipart/alternative;
boundary="-----------------------------1202850095"


-------------------------------1202850095
Content-Type: text/plain; charset="US-ASCII"
Content-Transfer-Encoding: 7bit


hi deane --

In a message dated 2/12/2008 3:05:23 P.M. Eastern Standard Time,
Deane.Rothenmaier@walgreens.com writes:

> Hi, all.
>
> I'm having a relatively simple problem, trying to trim stuff
> off the end of a URL.
>
> INPUT:
> _http://www.wilbert.net_ (http://www.wilbert.net)
> _www.worful.com/after-text.htm_ (http://www.worful.com/after-text.htm)
> _http://www.winston.com/argle.htm_ (http://www.winston.com/argle.htm)
> _www.windoze.net_ (http://www.windoze.net)
>
> DESIRED OUTPUT:
> _http://www.wilbert.net_ (http://www.wilbert.net)
> _www.worful.com_ (http://www.worful.com)
> _http://www.winston.com_ (http://www.winston.com)
> _www.windoze.net_ (http://www.windoze.net)


how about

perl -wMstrict -e
"my $trim = qr{ (? for (@ARGV) { s{ $trim }{}xms; print qq($_ \n); }
"
_http://www.wilbert.net_ (http://www.wilbert.net)
_www.worful.com/after-text.htm_ (http://www.worful.com/after-text.htm)
_http://www.winston.com/argle.htm_ (http://www.winston.com/argle.htm)
_www.windoze.net_ (http://www.windoze.net)

_http://www.wilbert.net_ (http://www.wilbert.net)
_www.worful.com_ (http://www.worful.com)
_http://www.winston.com_ (http://www.winston.com)
_www.windoze.net_ (http://www.windoze.net)

hth -- bill walters





**************The year's hottest artists on the red carpet at the Grammy
Awards. Go to AOL Music.
(http://music.aol.com/grammys?NCID=aolcmp00300000002565)

-------------------------------1202850095
Content-Type: text/html; charset="US-ASCII"
Content-Transfer-Encoding: quoted-printable





Arial"=20
bottomMargin=3D7 leftMargin=3D7 topMargin=3D7 rightMargin=3D7> e_document=20
face=3DArial color=3D#000000 size=3D2>


hi deane --  

 

In a message dated 2/12/2008 3:05:23 P.M. Eastern Standard Time,=20
Deane.Rothenmaier@walgreens.com writes:

 

> Hi, all.
>
> I'm having a relatively simple problem,=
=20
trying to trim stuff
> off the end of a URL.
>
> INPUT:=20

>
&=
gt; href=3D"http://www.worful.com/after-text.htm">www.worful.com /after-text.htm<=
/A>=20

> href=3D"http://www.winston.com/argle.htm">http://www.winston .com/argle.htm A>=20

>
> >>=20
DESIRED OUTPUT:
> href=3D"http://www.wilbert.net">http://www.wilbert.net
> href=3D"http://www.worful.com">www.worful.com
> href=3D"http://www.winston.com">http://www.winston.com
> href=3D"http://www.windoze.net">www.windoze.net

 

how about

 

perl -wMstrict -e
"my $trim =3D qr{ (?<! : | /) / .* $ }xms; =
;=20
print qq(\n);
 for (@ARGV) { s{ $trim }{}xms;  print qq($_ \n);=
=20
}
"
 =
; href=3D"http://www.worful.com/after-text.htm">www.worful.com /after-text.htm<=
/A>
href=3D"http://www.winston.com/argle.htm">http://www.winston .com/argle.htm A> =20

 


href=3D"http://www.worful.com">www.worful.com
href=3D"http://www.winston.com">http://www.winston.com
href=3D"http://www.windoze.net">www.windoze.net

 

hth -- bill walters  

 



font: normal 10pt ARIAL, SAN-SERIF;">
The year=
's hottest artists on the red carpet at the Grammy Awards. ://music.aol.com/grammys?NCID=3Daolcmp00300000002565" href=3D"http://music.a=
ol.com/grammys?NCID=3Daolcmp00300000002565" target=3D"_blank">AOL Music take=
s you there.


-------------------------------1202850095--

--===============0961958152==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
--===============0961958152==--

Re: URL Trimming s{} regex

am 12.02.2008 23:02:02 von Andy_Bach

David Moreno wrote:
> It's a nice, a bit complex one. I'd try it as:
>
> $url =~ m!\A(http://)?(.+?)(/.*)?\z!;
> print $1 if $1;
> print $2;
>
> TIMTOWTDI.
>

Be a little more flexible ( inner non-capturing parens ( "(?: ...)" )
add https and, if needed "ftp" or "ldap" or ... and "/i" for case
insensitive) and always test, not assume a match. And if you know your
separator/marker (the slash) use that rather than 'dot':
if ( $url =~ m!\A((?:http|https)://)?([^/]+)!i ) {
print $1 if $1;
print $2;
} # else 'no URL'

Don't need to match (or not) the stuff after the end of the 'not slash'
part, as you don't care about it ... though you may need to 'chomp' $url
first (or deal w/ the "\n" if it's there - depends upon your loop). If
you're serious, though, there are a number of modules for this URL
finding that'll do it right for nearly everything legit - it's harder
than you'd think. J. Freidl's ("Mastering Regular Expressions" O'Reilly
http://www.oreilly.com/catalog/regex3/index.html
http://regex.info/
) URL matching masterpiece runs to 9 embedded REs and yikes but here's a
simpler one:

if ($url =~ m{^https?://([^/:]+)(:(\d+))?(/.*)?$}i)
{
my $host = $1;
my $port = $3 || 80; #/ Use $3 if it exists; otherwise default to 80./
my $path = $4 || "/"; #/ Use $4 if it exists; otherwise default to "/"./
print "Host: $host\n";
print "Port: $port\n";
print "Path: $path\n";
} else {
print "Not an HTTP URL\n";
}


--
Andy Bach, Sys. Mangler
Internet: andy_bach@wiwb.uscourts.gov
VOICE: (608) 261-5738 FAX 264-5932

The only function of economic forecasting is
to make astrology look respectable.
- John Kenneth Galbraith


_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

RE: URL Trimming s{} regex

am 13.02.2008 11:49:32 von Brian Raven

From: activeperl-bounces@listserv.ActiveState.com
[mailto:activeperl-bounces@listserv.ActiveState.com] On Behalf Of
Deane.Rothenmaier@walgreens.com
Sent: 12 February 2008 20:05
To: activeperl@listserv.ActiveState.com
Subject: URL Trimming s{} regex

> Hi, all. =

> =

> I'm having a relatively simple problem, trying to trim stuff off the
end of a URL. =

> =

> INPUT: =

> http://www.wilbert.net =

> www.worful.com/after-text.htm =

> http://www.winston.com/argle.htm =

> www.windoze.net =

> =

> DESIRED OUTPUT: =

> http://www.wilbert.net =

> www.worful.com =

> http://www.winston.com =

> www.windoze.net =

> =

> Presently I'm trying the code: =

> =

> if ($site =3D~ m{[^:/]/.*$}) { =

> $site =3D~ s{/.+}{}; =

> } =

> =

> which gives me the output: =

> http://www.wilbert.com =

> www.worful.com =

> http: =

> www.windoze.net =

> =

> Obviously I'm doing something wrong for the third string, but I've
pulled out two or three hairs by now, trying > to fix this. Anybody
care to take a shot? =


It can help to answer your question if you also say what you are trying
to achieve. Somebody might be able to direct you at a module that could
help with your problem. For example, the following code doesn't do
exactly what you ask, but it could be closer to what you need.

----------------------------------------------
use strict;
use warnings;

use URI::URL;

while (my $d =3D ) {
chomp $d;
my $url =3D URI::URL->new($d);
$url =3D URI::URL->new("http://$d") unless $url->scheme;
print $url->scheme, "://", $url->host, "\n";
}

__DATA__
http://www.wilbert.net
www.worful.com/after-text.htm =

http://www.winston.com/argle.htm
www.windoze.net
----------------------------------------------

.... but then again, maybe not. Just a thought.

HTH

-- =

Brian Raven =


==================== =====3D=
================
Atos Euronext Market Solutions Disclaimer
==================== =====3D=
================

The information contained in this e-mail is confidential and solely for the=
intended addressee(s). Unauthorised reproduction, disclosure, modification=
, and/or distribution of this email may be unlawful.
If you have received this email in error, please notify the sender immediat=
ely and delete it from your system. The views expressed in this message do =
not necessarily reflect those of Atos Euronext Market Solutions.

Atos Euronext Market Solutions Limited - Registered in England & Wales with=
registration no. 3962327. Registered office address at 25 Bank Street Lon=
don E14 5NQ United Kingdom. =

Atos Euronext Market Solutions SAS - Registered in France with registration=
no. 425 100 294. Registered office address at 6/8 Boulevard Haussmann 750=
09 Paris France.

L'information contenue dans cet e-mail est confidentielle et uniquement des=
tinee a la (aux) personnes a laquelle (auxquelle(s)) elle est adressee. Tou=
te copie, publication ou diffusion de cet email est interdite. Si cet e-mai=
l vous parvient par erreur, nous vous prions de bien vouloir prevenir l'exp=
editeur immediatement et d'effacer le e-mail et annexes jointes de votre sy=
steme. Le contenu de ce message electronique ne represente pas necessaireme=
nt la position ou le point de vue d'Atos Euronext Market Solutions.
Atos Euronext Market Solutions Limited Soci=E9t=E9 de droit anglais, enregi=
str=E9e au Royaume Uni sous le num=E9ro 3962327, dont le si=E8ge social se =
situe 25 Bank Street E14 5NQ Londres Royaume Uni.

Atos Euronext Market Solutions SAS, soci=E9t=E9 par actions simplifi=E9e, e=
nregistr=E9 au registre dui commerce et des soci=E9t=E9s sous le num=E9ro 4=
25 100 294 RCS Paris et dont le si=E8ge social se situe 6/8 Boulevard Hauss=
mann 75009 Paris France.
==================== =====3D=
================

_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs

Re: URL Trimming s{} regex

am 11.03.2008 11:04:55 von cowboy

--===============0411050961==
Content-Type: multipart/alternative; boundary="0-823975917-1205229895=:86780"
Content-Transfer-Encoding: 7bit

--0-823975917-1205229895=:86780
Content-Type: text/plain; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable

Hi,=20
I think the following code would be better one.
=20
while ($url =3D ) {
chomp($url);
$url =3D~ /^(?:http:\/\/)?([^\/]+)/;
print $1."\n";
}

__END__
http://www.wilbert.net
www.worful.com/after-text.htm
http://www.winston.com/argle.htm
www.windoze.net=20


Thanks and Regards,
Indra
=20
=20
---------------------------------
Chat on a cool, new interface. No download required. Click here.
--0-823975917-1205229895=:86780
Content-Type: text/html; charset=iso-8859-1
Content-Transfer-Encoding: quoted-printable

Hi,
    I think the following code would be better one=
..
   
while ($url =3D <DATA>) {
 &nbs=
p;  chomp($url);
    $url =3D~ /^(?:http:\/\/)?([^=
\/]+)/;
    print $1."\n";
}

__END__
http:=
//www.wilbert.net
www.worful.com/after-text.htm
http://www.winston.=
com/argle.htm
www.windoze.net


Thanks and Regards,
Indra=





Chat on a cool, new interface. No downl=
oad required. http://in.messenger.yahoo.com/webmessengerpromo.php">Click here.
--0-823975917-1205229895=:86780--

--===============0411050961==
Content-Type: text/plain; charset="us-ascii"
MIME-Version: 1.0
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
ActivePerl mailing list
ActivePerl@listserv.ActiveState.com
To unsubscribe: http://listserv.ActiveState.com/mailman/mysubs
--===============0411050961==--