Bookmarks

Yahoo Gmail Google Facebook Delicious Twitter Reddit Stumpleupon Myspace Digg

Search queries

bind-address mysql multiple, sanibleone xxxx, ftp://192.168.100.100/, www.xxxcon, which comes first ob_start or session, wwwxxx/58/2010, xxxxdup, xxxxdup, mailx informatii, should producers of software-based services, such as atms, be held liable for economic injuries suffered when their systems fail?

Links

XODOX
Impressum

#1: WANTED: HTML parser with inline replace

Posted on 2008-01-07 02:41:30 by ML

Hi,

Not sure what group this really belongs on, sorry.

I'm looking to take ANY HTML document from ANY site. I want to
then be able to resolve *!ALL!* links (Be they CSS, background, form,
text, image, etc {I know weird things like Javascript or possibly
applets might be pushing it}, and then be able to go through the list,
get both the listed and absolute URL for the link, and globally change
it.

Is there one good module to do that. I was going down the path
of WWW::Mechanize, and had it doing text/images, but when I got to
forms I started my testing and realized it wasn't handling CSS, background
and some other things.

Any pointers are appreciated. I've been looking at module after
module and just not seeming to find it.

Thanks, Tuc

Report this message

#2: Re: WANTED: HTML parser with inline replace

Posted on 2008-01-07 02:59:35 by miyagawa

I'm not sure what you mean by "resolve ALL links" but my
HTML::ResolveLink would do the job quite nicely.
http://search.cpan.org/~miyagawa/HTML-ResolveLink-0.05/lib/H TML/ResolveLink.pm

It uses HTML::TagSet implementation to figure out which links are to
be resolved. Even if it doesn't work for you, the module's code might
be worth looking at.

On 1/6/08, Tuc at T-B-O-H.NET <ml@t-b-o-h.net> wrote:
> Hi,
>
> Not sure what group this really belongs on, sorry.
>
> I'm looking to take ANY HTML document from ANY site. I want to
> then be able to resolve *!ALL!* links (Be they CSS, background, form,
> text, image, etc {I know weird things like Javascript or possibly
> applets might be pushing it}, and then be able to go through the list,
> get both the listed and absolute URL for the link, and globally change
> it.
>
> Is there one good module to do that. I was going down the path
> of WWW::Mechanize, and had it doing text/images, but when I got to
> forms I started my testing and realized it wasn't handling CSS, background
> and some other things.
>
> Any pointers are appreciated. I've been looking at module after
> module and just not seeming to find it.
>
> Thanks, Tuc
>


--
Tatsuhiko Miyagawa

Report this message

#3: Re: WANTED: HTML parser with inline replace

Posted on 2008-01-07 03:33:54 by ML

Hi,

Thank you for the reply.

I did look at your module. If I find something else that does what
I want, but lacks the functionality you implemented, it is already on my
list. :)

As an example of what I'm looking at, in WWW::Mechanize its
the ability to call "url" for what the page actually calls it, and
"url_abs" for what the resolved link path is.

I was hoping for something that abstracted the tags up a bit,
like the "find_all_links/find_all_images" in WWW::Mechanize. If not,
I might have to dig deeper into your code and HTML::TagSet to see if
it can do it.

Thanks, Tuc
>
> I'm not sure what you mean by "resolve ALL links" but my
> HTML::ResolveLink would do the job quite nicely.
> http://search.cpan.org/~miyagawa/HTML-ResolveLink-0.05/lib/H TML/ResolveLink.pm
>
> It uses HTML::TagSet implementation to figure out which links are to
> be resolved. Even if it doesn't work for you, the module's code might
> be worth looking at.
>
> On 1/6/08, Tuc at T-B-O-H.NET <ml@t-b-o-h.net> wrote:
> > Hi,
> >
> > Not sure what group this really belongs on, sorry.
> >
> > I'm looking to take ANY HTML document from ANY site. I want to
> > then be able to resolve *!ALL!* links (Be they CSS, background, form,
> > text, image, etc {I know weird things like Javascript or possibly
> > applets might be pushing it}, and then be able to go through the list,
> > get both the listed and absolute URL for the link, and globally change
> > it.
> >
> > Is there one good module to do that. I was going down the path
> > of WWW::Mechanize, and had it doing text/images, but when I got to
> > forms I started my testing and realized it wasn't handling CSS, background
> > and some other things.
> >
> > Any pointers are appreciated. I've been looking at module after
> > module and just not seeming to find it.
> >
> > Thanks, Tuc
> >
>
>
> --
> Tatsuhiko Miyagawa
>

Report this message

#4: Re: WANTED: HTML parser with inline replace

Posted on 2008-01-07 03:50:20 by Andy

On Jan 6, 2008, at 8:33 PM, Tuc at T-B-O-H.NET wrote:

> I was hoping for something that abstracted the tags up a bit,
> like the "find_all_links/find_all_images" in WWW::Mechanize. If not,
> I might have to dig deeper into your code and HTML::TagSet to see if
> it can do it.


Modules are not static. Changes get made to modules all the time
based on the suggestions and needs of its users, and potential users.
Perhaps you should try talking to module authors and explain what's
missing and how you can help update the module with new fucntionality.

--
Andy Lester => andy@petdance.com => www.petdance.com => AIM:petdance

Report this message