HTML::Parser bug

HTML::Parser bug

am 20.03.2005 16:02:26 von goblin

Hello libwww,

using it to parse html-forms etc...
noticed, that it recognizes strange comment
like as starting of the comment,
not like the whole empty comment, as IE.

--
Best regards,
goblin mailto:goblin@nnt.ru

Re: HTML::Parser bug

am 20.03.2005 22:51:25 von moseley

On Sun, Mar 20, 2005 at 06:02:26PM +0300, goblin@nnt.ru wrote:
> Hello libwww,
>
> using it to parse html-forms etc...
> noticed, that it recognizes strange comment
> like as starting of the comment,
> not like the whole empty comment, as IE.

Doesn't seem like that's a valid comment.

http://www.w3.org/TR/WD-html40-970917/intro/sgmltut.html#h-3 .1.4




>

--
Bill Moseley
moseley@hank.org

Re: HTML::Parser bug

am 21.03.2005 01:25:28 von Andy

>
> like as starting of the comment,
> not like the whole empty comment, as IE.

Lots of browsers allow crap that modules don't.

--
Andy Lester => andy@petdance.com => www.petdance.com => AIM:petdance

Re: HTML::Parser bug

am 21.03.2005 18:51:42 von RP

On Sun, Mar 20, 2005 at 01:51:25PM -0800, Bill Moseley wrote:
> On Sun, Mar 20, 2005 at 06:02:26PM +0300, goblin@nnt.ru wrote:
> > Hello libwww,
> >
> > using it to parse html-forms etc...
> > noticed, that it recognizes strange comment
> > like as starting of the comment,
> > not like the whole empty comment, as IE.
>
> Doesn't seem like that's a valid comment.
>
> http://www.w3.org/TR/WD-html40-970917/intro/sgmltut.html#h-3 .1.4

Well, the HTML:Parser perldoc says:

HTML::Parser is not a generic SGML parser. We have tried to make it
able to deal with the HTML that is actually "out there", and it normally
parses as closely as possible to the way the popular web browsers do it
instead of strictly following one of the many HTML specifications from
W3C. Where there is disagreement, there is often an option that you can
enable to get the official behaviour.

But do all versions of IE parse this the same way?
What do other popular user agents do?

--
Reinier

Re: HTML::Parser bug

am 21.03.2005 19:05:21 von moseley

On Mon, Mar 21, 2005 at 06:51:42PM +0100, Reinier Post wrote:
> On Sun, Mar 20, 2005 at 01:51:25PM -0800, Bill Moseley wrote:
> > On Sun, Mar 20, 2005 at 06:02:26PM +0300, goblin@nnt.ru wrote:
> > > Hello libwww,
> > >
> > > using it to parse html-forms etc...
> > > noticed, that it recognizes strange comment
> > > like as starting of the comment,
> > > not like the whole empty comment, as IE.
> >
> > Doesn't seem like that's a valid comment.
> >
> > http://www.w3.org/TR/WD-html40-970917/intro/sgmltut.html#h-3 .1.4
>
> Well, the HTML:Parser perldoc says:
>
> HTML::Parser is not a generic SGML parser. We have tried to make it
> able to deal with the HTML that is actually "out there", and it normally
> parses as closely as possible to the way the popular web browsers do it
> instead of strictly following one of the many HTML specifications from
> W3C. Where there is disagreement, there is often an option that you can
> enable to get the official behaviour.

Hard to imagine handling every possibility as an option.

I would have thought an empty comment would be at a minimum:



or maybe



although I'm still trying to grasp the concept of an empty comment.


--
Bill Moseley
moseley@hank.org

RE: HTML::Parser bug

am 21.03.2005 23:43:26 von Forrest.Cahoon

Although not identical to your short "comment", Microsoft intentionally
uses similar comments like=20



See http://office.microsoft.com/en-us/assistance/HA010549981033. aspx for
more info.

Forrest Cahoon
not speaking for merrill corporation

> -----Original Message-----
> From: Reinier Post [mailto:rp@win.tue.nl]=20
> Sent: Monday, March 21, 2005 11:52 AM
> To: libwww@perl.org
> Subject: Re: HTML::Parser bug
>=20
> On Sun, Mar 20, 2005 at 01:51:25PM -0800, Bill Moseley wrote:
> > On Sun, Mar 20, 2005 at 06:02:26PM +0300, goblin@nnt.ru wrote:
> > > Hello libwww,
> > >=20
> > > using it to parse html-forms etc...
> > > noticed, that it recognizes strange comment like as=20
> starting=20
> > > of the comment, not like the whole empty comment, as IE.
> >=20
> > Doesn't seem like that's a valid comment.
> >=20
> > http://www.w3.org/TR/WD-html40-970917/intro/sgmltut.html#h-3 .1.4
>=20
> Well, the HTML:Parser perldoc says:
>=20
> HTML::Parser is not a generic SGML parser. We have tried to make it
> able to deal with the HTML that is actually "out there",=20
> and it normally
> parses as closely as possible to the way the popular web=20
> browsers do it
> instead of strictly following one of the many HTML=20
> specifications from
> W3C. Where there is disagreement, there is often an option=20
> that you can
> enable to get the official behaviour.
>=20
> But do all versions of IE parse this the same way?
> What do other popular user agents do?
>=20
> --
> Reinier
>=20