HTML::Parser: how can I reset report_tags to report all tags?

HTML::Parser: how can I reset report_tags to report all tags?

am 14.06.2005 09:59:53 von nkiesel

--=-/2kZwCb5fJslpggVysXY
Content-Type: text/plain
Content-Transfer-Encoding: quoted-printable

Hello,

I'm trying to use HTML::Parser to parse some web pages. During that
process, I'm normally just interested in some specific parts, so I use
report_tags. However, at some point I need to get the whole text
including all embedded HTML tags. What I currently have are a start
handler , an end handler, and a text handler which together reconstruct
the text. This works reasonably well, but only for the tags which I
explicitly list using report_tags.

So I look for either a completely different approach (e.g. is the raw
HTML available somehow so that I don't have to reconstruct it), or a way
to reset report_tags/ignore_tags to report all tags (without me listing
all possible HTML tags, that is).

I tried to use ->ignore_tags(()) and ->ignore_tags(qw(none)), but it
seems that after calling ->report_tags() once it alsways uses a positive
tag filter.

Any ideas/comments?

Best,
Norbert


--=-/2kZwCb5fJslpggVysXY
Content-Type: application/pgp-signature; name=signature.asc
Content-Description: This is a digitally signed message part

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQBCro55OIJDAvi0wRwRAk6SAKCVnk4YeGzTbtTsC9L2a+PSeFuMKgCg seu4
fld526zP3UbSDGnNsfLq8O0=
=4z+M
-----END PGP SIGNATURE-----

--=-/2kZwCb5fJslpggVysXY--

Re: HTML::Parser: how can I reset report_tags to report all tags?

am 14.06.2005 13:44:32 von gisle

Norbert Kiesel writes:

> I tried to use ->ignore_tags(()) and ->ignore_tags(qw(none)), but it
> seems that after calling ->report_tags() once it alsways uses a positive
> tag filter.

Calling ->report_tags() without any arguments should reset the filter.

Regards,
Gisle

Re: HTML::Parser: how can I reset report_tags to report all tags?

am 14.06.2005 18:26:46 von nkiesel

On Tue, 2005-06-14 at 04:44 -0700, Gisle Aas wrote:
> Calling ->report_tags() without any arguments should reset the filter.

Thanks, I will try and report back.

Best,
Norbert

Re: HTML::Parser: how can I reset report_tags to report all tags?

am 14.06.2005 20:12:08 von nkiesel

--=-gvkUzjg9d7Gjxi2rbu/Y
Content-Type: text/plain
Content-Transfer-Encoding: quoted-printable

On Tue, 2005-06-14 at 04:44 -0700, Gisle Aas wrote:
> Norbert Kiesel writes:
>=20
> > I tried to use ->ignore_tags(()) and ->ignore_tags(qw(none)), but it
> > seems that after calling ->report_tags() once it alsways uses a positiv=
e
> > tag filter.
>=20
> Calling ->report_tags() without any arguments should reset the filter.

Thanks, just tested it and it works beautifully. Perhaps this
information could be added to the manual or FAQ? Seems I was confused
with ->ignore_tags() and ->report_tags(): reading that ->ignore_tags()
can be used to suppress tags, I thought suppressing no tags would result
in all tags to be reported.

Also, what is the semantics of combining ->ignore_tags() and
->report_tags()? My understanding is that the last call wins, i.e. it
replaces (instead of modifies) the internal filter. Is this correct?

Best,
Norbert


--=-gvkUzjg9d7Gjxi2rbu/Y
Content-Type: application/pgp-signature; name=signature.asc
Content-Description: This is a digitally signed message part

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQBCrx34OIJDAvi0wRwRAgcgAJ9HC9sv1Kw8kaHpP1fTwSVIE0a+rgCd Fokp
fTAmpyVnL1/ZGyV1H3QOdOo=
=rrpG
-----END PGP SIGNATURE-----

--=-gvkUzjg9d7Gjxi2rbu/Y--

Re: HTML::Parser: how can I reset report_tags to report all tags?

am 14.06.2005 20:48:20 von gisle

Norbert Kiesel writes:

> On Tue, 2005-06-14 at 04:44 -0700, Gisle Aas wrote:
> > Norbert Kiesel writes:
> >
> > > I tried to use ->ignore_tags(()) and ->ignore_tags(qw(none)), but it
> > > seems that after calling ->report_tags() once it alsways uses a positive
> > > tag filter.
> >
> > Calling ->report_tags() without any arguments should reset the filter.
>
> Thanks, just tested it and it works beautifully. Perhaps this
> information could be added to the manual or FAQ?

Documenting this better would be a good idea. I'll try to do get it
done for the next release. Patches welcome.

> Also, what is the semantics of combining ->ignore_tags() and
> ->report_tags()? My understanding is that the last call wins, i.e. it
> replaces (instead of modifies) the internal filter. Is this correct?

If both ignore_tags and report_tags are set then they both filter out
events and ignore_tags effectively takes precendence. For example:

$p->ignore_tags("a");
$p->report_tags("a", "b");

Only "b" will be reported. This should probably also be documented
better.

Regards,
Gisle

Re: HTML::Parser: how can I reset report_tags to report all tags?

am 20.06.2005 01:36:38 von nkiesel

--=-w1TDAj4X61C9JMWoVC5J
Content-Type: text/plain
Content-Transfer-Encoding: quoted-printable

Hello,

here is a patch which adds some mpre documentation for repot_tags and
ignore_tags. I also added some test cases.

I changed the t/filter-methods.t to use Test::More, as this gives nice
error messages if tests fail (which they did while I was trying to
figure this stuff out :-). Only drawback is that this makes it
dependent on Test::More, but I think that is used by many other modules
as well. If you think that's unacceptable, I'd be willing to rewrite it
without using Test::More.

Best,
Norbert

On Tue, 2005-06-14 at 11:48 -0700, Gisle Aas wrote:
> Norbert Kiesel writes:
>=20
> > On Tue, 2005-06-14 at 04:44 -0700, Gisle Aas wrote:
> > > Norbert Kiesel writes:
> > >=20
> > > > I tried to use ->ignore_tags(()) and ->ignore_tags(qw(none)), but i=
t
> > > > seems that after calling ->report_tags() once it alsways uses a pos=
itive
> > > > tag filter.
> > >=20
> > > Calling ->report_tags() without any arguments should reset the filter=
..
> >=20
> > Thanks, just tested it and it works beautifully. Perhaps this
> > information could be added to the manual or FAQ?
>=20
> Documenting this better would be a good idea. I'll try to do get it
> done for the next release. Patches welcome.
>=20
> > Also, what is the semantics of combining ->ignore_tags() and
> > ->report_tags()? My understanding is that the last call wins, i.e. it
> > replaces (instead of modifies) the internal filter. Is this correct?
>=20
> If both ignore_tags and report_tags are set then they both filter out
> events and ignore_tags effectively takes precendence. For example:
>=20
> $p->ignore_tags("a");
> $p->report_tags("a", "b");
>=20
> Only "b" will be reported. This should probably also be documented
> better.
>=20
> Regards,
> Gisle
>=20

--=-w1TDAj4X61C9JMWoVC5J
Content-Type: application/pgp-signature; name=signature.asc
Content-Description: This is a digitally signed message part

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.1 (GNU/Linux)

iD8DBQBCtgGGOIJDAvi0wRwRAu9xAJ9xa77k5Ssx/vv9sTAFsGMJQv4OvgCd H+u7
VCMuQ3xTSPR8pG73DAWC+Xw=
=Tky6
-----END PGP SIGNATURE-----

--=-w1TDAj4X61C9JMWoVC5J--