WANTED: HTML parser with inline replace

WANTED: HTML parser with inline replace

am 07.01.2008 02:41:30 von ML

Hi,

Not sure what group this really belongs on, sorry.

I'm looking to take ANY HTML document from ANY site. I want to
then be able to resolve *!ALL!* links (Be they CSS, background, form,
text, image, etc {I know weird things like Javascript or possibly
applets might be pushing it}, and then be able to go through the list,
get both the listed and absolute URL for the link, and globally change
it.

Is there one good module to do that. I was going down the path
of WWW::Mechanize, and had it doing text/images, but when I got to
forms I started my testing and realized it wasn't handling CSS, background
and some other things.

Any pointers are appreciated. I've been looking at module after
module and just not seeming to find it.

Thanks, Tuc

Re: WANTED: HTML parser with inline replace

am 07.01.2008 02:59:35 von miyagawa

I'm not sure what you mean by "resolve ALL links" but my
HTML::ResolveLink would do the job quite nicely.
http://search.cpan.org/~miyagawa/HTML-ResolveLink-0.05/lib/H TML/ResolveLink.pm

It uses HTML::TagSet implementation to figure out which links are to
be resolved. Even if it doesn't work for you, the module's code might
be worth looking at.

On 1/6/08, Tuc at T-B-O-H.NET wrote:
> Hi,
>
> Not sure what group this really belongs on, sorry.
>
> I'm looking to take ANY HTML document from ANY site. I want to
> then be able to resolve *!ALL!* links (Be they CSS, background, form,
> text, image, etc {I know weird things like Javascript or possibly
> applets might be pushing it}, and then be able to go through the list,
> get both the listed and absolute URL for the link, and globally change
> it.
>
> Is there one good module to do that. I was going down the path
> of WWW::Mechanize, and had it doing text/images, but when I got to
> forms I started my testing and realized it wasn't handling CSS, background
> and some other things.
>
> Any pointers are appreciated. I've been looking at module after
> module and just not seeming to find it.
>
> Thanks, Tuc
>


--
Tatsuhiko Miyagawa

Re: WANTED: HTML parser with inline replace

am 07.01.2008 03:33:54 von ML

Hi,

Thank you for the reply.

I did look at your module. If I find something else that does what
I want, but lacks the functionality you implemented, it is already on my
list. :)

As an example of what I'm looking at, in WWW::Mechanize its
the ability to call "url" for what the page actually calls it, and
"url_abs" for what the resolved link path is.

I was hoping for something that abstracted the tags up a bit,
like the "find_all_links/find_all_images" in WWW::Mechanize. If not,
I might have to dig deeper into your code and HTML::TagSet to see if
it can do it.

Thanks, Tuc
>
> I'm not sure what you mean by "resolve ALL links" but my
> HTML::ResolveLink would do the job quite nicely.
> http://search.cpan.org/~miyagawa/HTML-ResolveLink-0.05/lib/H TML/ResolveLink.pm
>
> It uses HTML::TagSet implementation to figure out which links are to
> be resolved. Even if it doesn't work for you, the module's code might
> be worth looking at.
>
> On 1/6/08, Tuc at T-B-O-H.NET wrote:
> > Hi,
> >
> > Not sure what group this really belongs on, sorry.
> >
> > I'm looking to take ANY HTML document from ANY site. I want to
> > then be able to resolve *!ALL!* links (Be they CSS, background, form,
> > text, image, etc {I know weird things like Javascript or possibly
> > applets might be pushing it}, and then be able to go through the list,
> > get both the listed and absolute URL for the link, and globally change
> > it.
> >
> > Is there one good module to do that. I was going down the path
> > of WWW::Mechanize, and had it doing text/images, but when I got to
> > forms I started my testing and realized it wasn't handling CSS, background
> > and some other things.
> >
> > Any pointers are appreciated. I've been looking at module after
> > module and just not seeming to find it.
> >
> > Thanks, Tuc
> >
>
>
> --
> Tatsuhiko Miyagawa
>

Re: WANTED: HTML parser with inline replace

am 07.01.2008 03:50:20 von Andy

On Jan 6, 2008, at 8:33 PM, Tuc at T-B-O-H.NET wrote:

> I was hoping for something that abstracted the tags up a bit,
> like the "find_all_links/find_all_images" in WWW::Mechanize. If not,
> I might have to dig deeper into your code and HTML::TagSet to see if
> it can do it.


Modules are not static. Changes get made to modules all the time
based on the suggestions and needs of its users, and potential users.
Perhaps you should try talking to module authors and explain what's
missing and how you can help update the module with new fucntionality.

--
Andy Lester => andy@petdance.com => www.petdance.com => AIM:petdance