HTML::Parser is not thread-safe

HTML::Parser is not thread-safe

am 11.01.2007 08:26:32 von Sprout

The README file for HTML::Parser says to report bugs to this mailing
list.


I have tried using HTML::Parser in a threaded application. If an
HTML::Parser object exists when there is more than one thread, it has
problems destroying the object afterwards (and the program dies).

Here are a few one-liners (and their output) that demonstrate the
problem:

$ perl -MHTML::Parser -Mthreads -e'(async{new HTML::Parser})->join'
Bad signature in parser state object at 3767c0.
Unbalanced string table refcount: (1) for "_hparser_xs_state" during
global destruction.
Scalars leaked: 4

$ perl -MHTML::Parser -Mthreads -e'$p=new HTML::Parser; (async{})->join'
Scalars leaked: -13
Bad signature in parser state object at 62ccd0 during global
destruction.

But if I destroy my HTML::Parser object before creating a thread,
there is no problem:
$ perl -MHTML::Parser -Mthreads -le'$p=new HTML::Parser; undef $p;
(async{})->join; print "ok"'
ok

I hope this is helpful. I'm afraid know almost nothing about C and
XS, so I can't be of any more help.


Father Chrysostomos.


P.S.: I am using threads.pm version 1.57 and HTML::Parser version 3.55.
Here is the output from perl -V:

Summary of my perl5 (revision 5 version 8 subversion 8) configuration:
Platform:
osname=darwin, osvers=8.8.0, archname=darwin-thread-multi-2level
uname='darwin treebeard.local 8.8.0 darwin kernel version 8.8.0:
fri sep 8 17:18:57 pdt 2006; root:xnu-792.12.6.obj~1release_ppc power
macintosh powerpc '
config_args=''
hint=recommended, useposix=true, d_sigaction=define
usethreads=define use5005threads=undef useithreads=define
usemultiplicity=define
useperlio=define d_sfio=undef uselargefiles=define usesocks=undef
use64bitint=undef use64bitall=undef uselongdouble=undef
usemymalloc=n, bincompat5005=undef
Compiler:
cc='cc', ccflags ='-g -pipe -fno-common -DPERL_DARWIN -no-cpp-
precomp -fno-strict-aliasing -I/usr/local/include',
optimize='-O3',
cppflags='-no-cpp-precomp -g -pipe -fno-common -DPERL_DARWIN -no-
cpp-precomp -fno-strict-aliasing -I/usr/local/include'
ccversion='', gccversion='4.0.0 20041026 (Apple Computer, Inc.
build 4061)', gccosandvers='darwin8'
intsize=4, longsize=4, ptrsize=4, doublesize=8, byteorder=4321
d_longlong=define, longlongsize=8, d_longdbl=define, longdblsize=16
ivtype='long', ivsize=4, nvtype='double', nvsize=8,
Off_t='off_t', lseeksize=8
alignbytes=8, prototype=define
Linker and Libraries:
ld='env MACOSX_DEPLOYMENT_TARGET=10.3 cc', ldflags =' -L/usr/
local/lib'
libpth=/usr/local/lib /usr/lib
libs=-ldbm -ldl -lm -lc
perllibs=-ldl -lm -lc
libc=, so=dylib, useshrplib=false, libperl=libperl.a
gnulibc_version=''
Dynamic Linking:
dlsrc=dl_dlopen.xs, dlext=bundle, d_dlsymun=undef, ccdlflags=' '
cccdlflags=' ', lddlflags=' -bundle -undefined dynamic_lookup -L/
usr/local/lib'


Characteristics of this binary (from libperl):
Compile-time options: MULTIPLICITY PERL_IMPLICIT_CONTEXT
PERL_MALLOC_WRAP USE_ITHREADS USE_LARGE_FILES
USE_PERLIO
Built under darwin
Compiled at Jan 9 2007 19:29:53
@INC:
/usr/local/lib/perl5/5.8.8/darwin-thread-multi-2level
/usr/local/lib/perl5/5.8.8
/usr/local/lib/perl5/site_perl/5.8.8/darwin-thread-multi-2le vel
/usr/local/lib/perl5/site_perl/5.8.8
/usr/local/lib/perl5/site_perl
/System/Library/Perl/5.8.6/darwin-thread-multi-2level
/System/Library/Perl/5.8.6/darwin-thread-multi-2level
/System/Library/Perl/5.8.6
/Library/Perl/5.8.6/darwin-thread-multi-2level
/Library/Perl/5.8.6/darwin-thread-multi-2level
/Library/Perl/5.8.6
/Library/Perl
/Network/Library/Perl/5.8.6/darwin-thread-multi-2level
/Network/Library/Perl/5.8.6
/Network/Library/Perl
/System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level
/System/Library/Perl/Extras/5.8.6/darwin-thread-multi-2level
/System/Library/Perl/Extras/5.8.6
/Library/Perl/5.8.1
.

Re: HTML::Parser is not thread-safe

am 11.01.2007 22:55:07 von gisle

On 1/11/07, Father Chrysostomos wrote:
> I have tried using HTML::Parser in a threaded application. If an
> HTML::Parser object exists when there is more than one thread, it has
> problems destroying the object afterwards (and the program dies).
>
> Here are a few one-liners (and their output) that demonstrate the
> problem:
>
> $ perl -MHTML::Parser -Mthreads -e'(async{new HTML::Parser})->join'
> Bad signature in parser state object at 3767c0.
> Unbalanced string table refcount: (1) for "_hparser_xs_state" during
> global destruction.
> Scalars leaked: 4

I see the same noise here on Linux, so there is definitively a problem
here. I have no idea what the problem is though. I have tried to
ignore knowing much about 'threads'. There should be no inherent
reason for HTML::Parser not to be thread safe. Anybody with threads
know-how that can help?

--
Gisle Aas

Re: HTML::Parser is not thread-safe

am 12.01.2007 12:12:12 von gisle

On 1/11/07, Gisle Aas wrote:
> Anybody with threads know-how that can help?

Bo Lindbergh provided a patch that fixes this problem and
HTML-Parser-3.56 has now been uploaded to CPAN with this fix. Thanks
Bo!

--Gisle

Re: HTML::Parser is not thread-safe

am 12.01.2007 22:22:01 von Sprout

On Jan 12, 2007, at 3:12 AM, Gisle Aas wrote:

> On 1/11/07, Gisle Aas wrote:
>> Anybody with threads know-how that can help?
>
> Bo Lindbergh provided a patch that fixes this problem and
> HTML-Parser-3.56 has now been uploaded to CPAN with this fix. Thanks
> Bo!
>
> --Gisle

That was quick. Thank you!

It seems to work fine now, except that, if a Parser object is created
within a thread *other than* the main thread, perl complains about
leaking scalars when the program exits.

$ perl -MHTML::Parser -Mthreads -le'(async{new HTML::Parser})->join;
END { print "end" }'
end
Scalars leaked: 1

But if the object is created in the main thread, there are no error
messages at all.


Father Chrysostomos


P.S.: I am not trying to put pressure on anyone--this module works
well enough for me as it is. I am simply trying to help by pointing
out bugs.

Re: HTML::Parser is not thread-safe

am 13.01.2007 10:36:40 von blgl

In article ,
sprout@cpan.org (Father Chrysostomos) wrote:
> It seems to work fine now, except that, if a Parser object is created
> within a thread *other than* the main thread, perl complains about
> leaking scalars when the program exits.
>
> $ perl -MHTML::Parser -Mthreads -le'(async{new HTML::Parser})->join;
> END { print "end" }'
> end
> Scalars leaked: 1

However, perldoc threads says:
> Returning objects from threads does not work.

So don't do what you did in that example. :-)


/Bo Lindbergh

Re: HTML::Parser is not thread-safe

am 13.01.2007 21:34:00 von Sprout

> In article ,
> sprout[at]cpan.org (Father Chrysostomos) wrote:
> > It seems to work fine now, except that, if a Parser object is
> created
> > within a thread *other than* the main thread, perl complains about
> > leaking scalars when the program exits.
> >
> > $ perl -MHTML::Parser -Mthreads -le'(async{new HTML::Parser})->join;
> > END { print "end" }'
> > end
> > Scalars leaked: 1
>
> However, perldoc threads says:
> > Returning objects from threads does not work.
>
> So don't do what you did in that example. :-)
>
>
> /Bo Lindbergh
>
I'm sorry. You're right. My example was badly written. If I put ";
return" before the closing brace, it works.

Father Chrysostomos