Subclassing HTML::Parser to support $p->include()

Subclassing HTML::Parser to support $p->include()

am 24.02.2006 18:40:37 von Andy

I'm using HTML::Parser as part of a templating system that parses
HTML formatted templates and interprets certain special tags. I'd
like to be able to implement a tag like



To do that I'd like to subclass HTML::Parser and add an include()
method that can be called in a tag handler and has the effect of
including a chunk of text in the parser's input. I need the included
text to appear in the HTML stream that HTML::Parser sees right after
the tag (so that the included text is in the right place).

My first thought is to provide a callback to $p->parse() that returns
the input text in chunks, breaking the text after each '>' - so that
the text immediately after each tag is in a new chunk. The $p->include
() method will tell the chunk-reading callback to read from the
included text up to EOF and then return where it left off in the
original text (there'll be an include stack of course so that nested
includes work).

For that to work I have to rely on HTML::Parser issuing a tag
callback as soon as it sees the closing character of a tag - if it
reads ahead then it will have already digested text beyond the
tag by the time it issues the callback for the > tage.

I can easily check what the current behaviour is - and I shall - but
there's no contract that I can see about the relationship between the
text that HTML::Parser has read via a callback and when the handlers
trigger. So even if it works now it could potentially change in the
future - I don't want to rely on undocumented behaviour.

So, is what I'm proposing sensible? If not is there a better way?
Assuming HTML::Parser currently behaves in the way I need it to is it
likely always to do so?

Thanks :)

--
Andy Armstrong, hexten.net

Re: Subclassing HTML::Parser to support $p->include()

am 24.02.2006 18:47:20 von metaperl

------=_Part_24283_28396806.1140803240423
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

oops, I forgot to mention - HTML::Template by Sam Tregar has done what it
looks like you want to do down to the letter. Only he has been working on
that module for numerous years and it has very good optimization and there
are compiler versions of it out too.


On 2/24/06, Terrence Brannon wrote:
>
>
>
> On 2/24/06, Andy Armstrong wrote:
> >
> > I'm using HTML::Parser as part of a templating system that parses
> > HTML formatted templates and interprets certain special tags. I'd
> > like to be able to implement a tag like
> >
> >
>
>
> If you want to stay pure HTML, then do this:
>
>
>
> No need to use non-HTML and expect an HTML parser to parse it. If you wan=
t
> non-HTML, then maybe XML parsing is more appropriate.
>
> Also, what you want appears to be do-able using the formerCPAN module
> HTML_Tree by Paul J Lucas:
> http://homepage.mac.com/pauljlucas/software/html_tree/
>
> And you might like the CPAN module PeTaL or my own HTML::Seamstress
>
> and XML::LibXML and XML::LibXSLT is XML floats your boat :)
>
>
> To do that I'd like to subclass HTML::Parser and add an include()
>
>
> why not use HTML::Tree on CPAN (HTML::Tree is on CPAN, HTML_Tree is at th=
e
> URL I gave you previously).
>
>
>
>
> --
> http://slowchess.com/profile.php?username=3Dtbrannon
> http://www.moneycoop.org http://www.osogd.org http://www.metaperl.com
> http://www.livingcosmos.org
>
>


--
http://slowchess.com/profile.php?username=3Dtbrannon http://www.moneycoop.o=
rg
http://www.osogd.org http://www.metaperl.com http://www.livingcosmos.org

------=_Part_24283_28396806.1140803240423--

Re: Subclassing HTML::Parser to support $p->include()

am 24.02.2006 19:01:51 von Andy

On 24 Feb 2006, at 17:47, Terrence Brannon wrote:
> oops, I forgot to mention - HTML::Template by Sam Tregar has done
> what it
> looks like you want to do down to the letter. Only he has been
> working on
> that module for numerous years and it has very good optimization
> and there
> are compiler versions of it out too.

Thanks. I'm aware of and have used quite a few of the other
templating systems include HTML::Template - but I have a bunch of
ideas for a template engine that have been brewing for a few years so
I'm keen to build this myself.

--
Andy Armstrong, hexten.net

Re: Subclassing HTML::Parser to support $p->include()

am 24.02.2006 19:34:03 von gisle

Andy Armstrong writes:

> On 24 Feb 2006, at 17:46, Terrence Brannon wrote:
> > If you want to stay pure HTML, then do this:
> >
> >
>
> The syntax was just chosen for conciseness but the actual syntax
> looks like
>
>
>
> I /assumed/ (perhaps mistakenly) that HTML::Parser would be OK with
> arbitrary tag names provided everything was syntactically correct. Is
> that a stupid assumption?

No, that's a completely valid assumption. HTML::Parser does not care
what the name of the tags are. This is needed to have any hope of
parsing real-world HTML and it will certainly stay that way. No need
to worry.

HTML::Parser does treat a few tags (title, script, style, textarea)
specially in that their content are parsed according to different rules
than normal and unknown tags.

--Gisle

Re: Subclassing HTML::Parser to support $p->include()

am 24.02.2006 19:50:19 von gisle

Andy Armstrong writes:

> For that to work I have to rely on HTML::Parser issuing a tag
> callback as soon as it sees the closing character of a tag - if it
> reads ahead then it will have already digested text beyond the
> tag by the time it issues the callback for the > > tage.
>
> I can easily check what the current behaviour is - and I shall - but
> there's no contract that I can see about the relationship between the
> text that HTML::Parser has read via a callback and when the handlers
> trigger. So even if it works now it could potentially change in the
> future - I don't want to rely on undocumented behaviour.
>
> So, is what I'm proposing sensible?

Seems so to me. You can depend on any complete tag to be reported
before $p->parse(), passing in the corresponding text, returns. This
is implicitly documented by the description of $p->eof that says that
its effect is to flush any remaining _text_.

One caveat is that your tag will not be recognized inside tags that
force literal text content. For instance:



Here will be parsed as text whatever you do.

--Gisle

Re: Subclassing HTML::Parser to support $p->include()

am 24.02.2006 20:15:15 von Andy

On 24 Feb 2006, at 18:50, Gisle Aas wrote:
> Seems so to me. You can depend on any complete tag to be reported
> before $p->parse(), passing in the corresponding text, returns. This
> is implicitly documented by the description of $p->eof that says that
> its effect is to flush any remaining _text_.

Ah, that's great, thanks.

> One caveat is that your tag will not be recognized inside tags that
> force literal text content. For instance:
>
>
>
> Here will be parsed as text whatever you do.

I hadn't thought of that - but I think I can live with it, thanks.

--
Andy Armstrong, hexten.net