calling decoded_content on gzipped content destroys raw content
am 05.12.2004 15:06:56 von anbe
Hi,
I found the new decoded_content method destroying the raw content if Content-Encoding was gzip.
This happens because:
Compress::Zlib::memGunzip
...
The contents of the buffer parameter are
destroyed after calling this function.
I fixed this the following way:
HTTP/Message.pm:
- $content_ref = \Compress::Zlib::memGunzip($$content_ref);
+ $content_ref = \Compress::Zlib::memGunzip(my $buf = $$content_ref);
I didn't check the other decoding functions, so this could happen at other places, too.
Thanks for the decoded_content funktion - this makes using compression a lot easier :-)
Perhaps an option to replace the current raw content could be added, this would also have to change the Content-Encoding and Content-Type/Charset headers.
Please CC: me in your replies as I'm not on this list.
Andreas
Re: calling decoded_content on gzipped content destroys raw content
am 06.12.2004 14:39:53 von gisle
Andreas Beckmann writes:
> I found the new decoded_content method destroying the raw content if
> Content-Encoding was gzip.
>
> This happens because:
>
> Compress::Zlib::memGunzip
> ...
> The contents of the buffer parameter are
> destroyed after calling this function.
>
> I fixed this the following way:
>
> HTTP/Message.pm:
> - $content_ref = \Compress::Zlib::memGunzip($$content_ref);
> + $content_ref = \Compress::Zlib::memGunzip(my $buf = $$content_ref);
>
> I didn't check the other decoding functions, so this could happen at
> other places, too.
Encode::decode() also destroy its argument. I've now applied the patch below.
> Thanks for the decoded_content funktion - this makes using
> compression a lot easier :-)
>
> Perhaps an option to replace the current raw content could be added,
> this would also have to change the Content-Encoding and
> Content-Type/Charset headers.
I can see that might be useful. The 'content' is supposed to be bytes
so the result would have to be encoded UTF-8, while 'decoded_content'
returns decoded UTF-8.
I think it is better to have a 'decode_content' method (a verb) then
for 'decoded_content' to suddenly have a side effect on the message
when given an option.
Regards,
Gisle
Index: lib/HTTP/Message.pm
============================================================ =======
RCS file: /cvsroot/libwww-perl/lwp5/lib/HTTP/Message.pm,v
retrieving revision 1.54
retrieving revision 1.55
diff -u -p -r1.54 -r1.55
--- lib/HTTP/Message.pm 3 Dec 2004 08:35:41 -0000 1.54
+++ lib/HTTP/Message.pm 6 Dec 2004 13:27:20 -0000 1.55
@@ -1,10 +1,10 @@
package HTTP::Message;
-# $Id: Message.pm,v 1.54 2004/12/03 08:35:41 gisle Exp $
+# $Id: Message.pm,v 1.55 2004/12/06 13:27:20 gisle Exp $
use strict;
use vars qw($VERSION $AUTOLOAD);
-$VERSION = sprintf("%d.%02d", q$Revision: 1.54 $ =~ /(\d+)\.(\d+)/);
+$VERSION = sprintf("%d.%02d", q$Revision: 1.55 $ =~ /(\d+)\.(\d+)/);
require HTTP::Headers;
require Carp;
@@ -161,6 +161,7 @@ sub decoded_content
{
my($self, %opt) = @_;
my $content_ref;
+ my $content_ref_iscopy;
eval {
@@ -183,6 +184,12 @@ sub decoded_content
next unless $ce || $ce eq "identity";
if ($ce eq "gzip" || $ce eq "x-gzip") {
require Compress::Zlib;
+ unless ($content_ref_iscopy) {
+ # memGunzip is documented to destroy its buffer argument
+ my $copy = $$content_ref;
+ $content_ref = \$copy;
+ $content_ref_iscopy++;
+ }
$content_ref = \Compress::Zlib::memGunzip($$content_ref);
die "Can't gunzip content" unless defined $$content_ref;
}
@@ -190,11 +197,13 @@ sub decoded_content
require Compress::Bzip2;
$content_ref = Compress::Bzip2::decompress($$content_ref);
die "Can't bunzip content" unless defined $$content_ref;
+ $content_ref_iscopy++;
}
elsif ($ce eq "deflate") {
require Compress::Zlib;
$content_ref = \Compress::Zlib::uncompress($$content_ref);
die "Can't inflate content" unless defined $$content_ref;
+ $content_ref_iscopy++;
}
elsif ($ce eq "compress" || $ce eq "x-compress") {
die "Can't uncompress content";
@@ -202,10 +211,12 @@ sub decoded_content
elsif ($ce eq "base64") { # not really C-T-E, but should be harmless
require MIME::Base64;
$content_ref = \MIME::Base64::decode($$content_ref);
+ $content_ref_iscopy++;
}
elsif ($ce eq "quoted-printable") { # not really C-T-E, but should be harmless
require MIME::QuotedPrint;
$content_ref = \MIME::QuotedPrint::decode($$content_ref);
+ $content_ref_iscopy++;
}
else {
die "Don't know how to decode Content-Encoding '$ce'";
@@ -218,7 +229,16 @@ sub decoded_content
$charset = lc($charset);
if ($charset ne "none") {
require Encode;
- $content_ref = \Encode::decode($charset, $$content_ref, Encode::FB_CROAK());
+ if (do{my $v = $Encode::VERSION; $v =~ s/_//g; $v} < 2.0901 &&
+ !$content_ref_iscopy)
+ {
+ # LEAVE_SRC did not work before Encode-2.0901
+ my $copy = $$content_ref;
+ $content_ref = \$copy;
+ $content_ref_iscopy++;
+ }
+ $content_ref = \Encode::decode($charset, $$content_ref,
+ Encode::FB_CROAK() | Encode::LEAVE_SRC());
}
}
};