zipping large file using Archive::Zip

zipping large file using Archive::Zip

am 09.11.2007 20:55:10 von mark

I have a 16 GB file that I zipped using perl's Archive::zip module.
When I try to unzip this file using WinZip, it shows the "uncompressed
size" = 4294967295.

Upon trying to extract this file via WinZip, the winzip program
deflates the file correctly to 16 GB, but then before it completes and
yields control to the user, WinZip does a file size check between
actual decompressed size and the file information (uncompressed size)
stored with the Zip, and, since those don't match, it automatically it
considers the decompressed file as invalid and removes it from the
disk.

This root cause of this problem is due to perl's ZLIB module writing
an incorrect header information on the zip file

Can someone please tell me why Perl's Archive::zip module writes
incorrect information in the zip file header ? Are there any settings
that can override such behaviour?

Thanks

Re: zipping large file using Archive::Zip

am 10.11.2007 12:59:57 von Christian Winter

Mark wrote:
> I have a 16 GB file that I zipped using perl's Archive::zip module.
> When I try to unzip this file using WinZip, it shows the "uncompressed
> size" = 4294967295.
>
> Upon trying to extract this file via WinZip, the winzip program
> deflates the file correctly to 16 GB, but then before it completes and
> yields control to the user, WinZip does a file size check between
> actual decompressed size and the file information (uncompressed size)
> stored with the Zip, and, since those don't match, it automatically it
> considers the decompressed file as invalid and removes it from the
> disk.
>
> This root cause of this problem is due to perl's ZLIB module writing
> an incorrect header information on the zip file
>
> Can someone please tell me why Perl's Archive::zip module writes
> incorrect information in the zip file header ? Are there any settings
> that can override such behaviour?

The only way around I know of is to use a different decompression
tool. Archive::Zip makes use of Compress::Zlib, which in turn
calls the native zlib library. zlib, however, hasn't adopted the
somewhat proprietary deflate64 extension which PKWare introduced
in the pkzip sdk to make size values > 2^32 possible. In the standard
zip format, "compressed size" and "uncompressed size" are only
4 bytes, therefore the value 4294967295.

Unless deflate64 gets incorporated into zlib, I don't see much
chance that Archive::Zip will be able to produce "correct" zips
of this size. So the short solutions would be to either split
up things into separate files, each smaller than 4 GB uncompressed,
or use a different compression (e.g. tar/bzip2) - at least if
dropping WinZip is a no-go.

-Chris

Re: zipping large file using Archive::Zip

am 12.11.2007 14:15:32 von Paul Marquess

From: Christian Winter [mailto:thepoet_nospam@arcor.de]

> Mark wrote:
> > I have a 16 GB file that I zipped using perl's Archive::zip module.
> > When I try to unzip this file using WinZip, it shows the "uncompressed
> > size" = 4294967295.
> >
> > Upon trying to extract this file via WinZip, the winzip program
> > deflates the file correctly to 16 GB, but then before it completes and
> > yields control to the user, WinZip does a file size check between
> > actual decompressed size and the file information (uncompressed size)
> > stored with the Zip, and, since those don't match, it automatically it
> > considers the decompressed file as invalid and removes it from the
> > disk.
> >
> > This root cause of this problem is due to perl's ZLIB module writing
> > an incorrect header information on the zip file
> >
> > Can someone please tell me why Perl's Archive::zip module writes
> > incorrect information in the zip file header ? Are there any settings
> > that can override such behaviour?
>
> The only way around I know of is to use a different decompression
> tool. Archive::Zip makes use of Compress::Zlib, which in turn
> calls the native zlib library. zlib, however, hasn't adopted the
> somewhat proprietary deflate64 extension which PKWare introduced
> in the pkzip sdk to make size values > 2^32 possible. In the standard
> zip format, "compressed size" and "uncompressed size" are only
> 4 bytes, therefore the value 4294967295.
>
> Unless deflate64 gets incorporated into zlib, I don't see much
> chance that Archive::Zip will be able to produce "correct" zips
> of this size. So the short solutions would be to either split
> up things into separate files, each smaller than 4 GB uncompressed,
> or use a different compression (e.g. tar/bzip2) - at least if
> dropping WinZip is a no-go.

That's not correct - you are confusing Deflate64 with Zip64. Here is the
definition of Deflate64 from the Zip definition (from
http://www.pkware.com/documents/casestudies/APPNOTE.TXT)


Enhanced Deflating - Method 9
-----------------------------

The Enhanced Deflating algorithm is similar to Deflate but
uses a sliding dictionary of up to 64K. Deflate64(tm) is supported
by the Deflate extractor.

To support compressed files > 2^32, your Zip implementation needs to support
Zip64. That has nothing to do with the underlying zlib implementation.

I don't think that Archive::Zip supports Zip64, but IO::Compress::Zip does.

Paul