Regex

Regex

am 05.10.2006 10:31:04 von nugget1960

I want some code to go through HTML pages, identify external links and
insert a small image

Any suggestions?

Re: Regex

am 05.10.2006 11:26:02 von paduille.4058.mumia.w

On 10/05/2006 03:31 AM, Mark & Ingrid Nugent wrote:
> I want some code to go through HTML pages, identify external links and
> insert a small image
>
> Any suggestions?
>
>

Have you tried to write this yourself? It could be interesting and fun.


--
paduille.4058.mumia.w@earthlink.net
Posting Guidelines for comp.lang.perl.misc:
http://www.augustmail.com/~tadmc/clpmisc/clpmisc_guidelines. html

Re: Regex

am 05.10.2006 14:18:08 von Paul Lalli

Mark & Ingrid Nugent wrote:
> I want some code to go through HTML pages, identify external links and
> insert a small image
>
> Any suggestions?

There are multiple modules available on CPAN to assist with parsing
HTML. I recommend you head to http://search.cpan.org and search for
HTML::TokeParser

Paul Lalli

Re: REGEX

am 05.10.2006 16:05:30 von paduille.4058.mumia.w

On 10/05/2006 05:39 AM, Mark & Ingrid Nugent wrote:
> Thanks for responding to my post. I have spent some time today trying to get my head around some perl code. It doesn't seem to conform to the information on the web. For example in the first section what does the s{ mean. I was expecting this kind of format $v =~ s///.
>
> Also, what is the gsi at the end?
>
> # document links: require target="_blank", append file size and
> icon
> $v =~ s{( ]*?href="([^""]+\.(avi|bmp|csv|dat|doc|dot|eps|exe|gif|jpg|m ov|mp3|mpg|pdf|pps|ppt|pub|rtf|tif|txt|xls|zip))")([^>]*>.*?)}{$1
> target="_blank"$4 new window }gsi;
>
>
> # TRIM document links: require target="_blank", append file size
> and icon
> $v =~ s{(]*?href="([^""]+\.(trf|tr5))")([^>]*>.*?)}{$1
> target="_blank"$4 new TRIM window }gsi;
>
>
> # if it already had a target, that was all that is required
> $v =~ s{\s*target="_blank"(\s*target=".*?")}{$1}gsi;
>
>
> # remove file size for external documents
> $v =~ s|\s*||gs;
>
>
> Mark Nugent
>
>

(Re-directed to alt.perl)

Keep conversations in the group.

s{}{} is the same as s/// except that the delimiter characters are
different.

The /gsi options mean, Global, Single-line, Insensitive-to-case.

Read the perl documentation:
perldoc perlrequick
perldoc perlre


--
paduille.4058.mumia.w@earthlink.net
Posting Guidelines for comp.lang.perl.misc:
http://www.augustmail.com/~tadmc/clpmisc/clpmisc_guidelines. html

Re: Regex

am 05.10.2006 16:53:02 von Sherm Pendley

"Mark & Ingrid Nugent" writes:

> I want some code to go through HTML pages, identify external links and
> insert a small image
>
> Any suggestions?

HTML::Parser is a good start.

sherm--

--
Web Hosting by West Virginians, for West Virginians: http://wv-www.net
Cocoa programming in Perl: http://camelbones.sourceforge.net