Emergency! Performance downloading big files
Emergency! Performance downloading big files
am 01.12.2009 23:48:45 von Brian Dunning
This is a holiday-crunch emergency.
I'm dealing with a client from whom we need to download many large PDF =
docs 24x7, several thousand per hour, all between a few hundred K and =
about 50 MB. Their security process requires the files to be downloaded =
via https using a big long URL with lots of credential parameters.
Here's how I'm doing it. This is on Windows, a quad Xeon with 16GB RAM:
$ctx =3D stream_context_create(array('http' =3D> array('timeout' =3D> =
1200)));
$contents =3D file_get_contents($full_url, 0, $ctx);
$fp =3D fopen('D:\\DocShare\\'.$filename, "w");
$bytes_written =3D fwrite($fp, $contents);
fclose($fp);
It's WAY TOO SLOW. I can paste the URL into a browser and download even =
the largest files quite quickly, but the PHP method bottlenecks and =
cannot keep up.
Is there a SUBSTANTIALLY faster way to download and save these files? =
Keep in mind the client's requirements cannot be changed. Thanks for any =
suggestions.
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Re: Emergency! Performance downloading big files
am 01.12.2009 23:51:41 von Brian Dunning
Oops, it's several hundred per hour, several thousand per day. Sorry for =
the accidental superlative.
> I'm dealing with a client from whom we need to download many large PDF =
docs 24x7, several thousand per hour, all between a few hundred K and =
about 50 MB.=20
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Re: Emergency! Performance downloading big files
am 01.12.2009 23:55:35 von Ashley Sheridan
--=-ZNqC1Mow2WwXdorMDPMo
Content-Type: text/plain
Content-Transfer-Encoding: 7bit
On Tue, 2009-12-01 at 14:51 -0800, Brian Dunning wrote:
> Oops, it's several hundred per hour, several thousand per day. Sorry for the accidental superlative.
>
> > I'm dealing with a client from whom we need to download many large PDF docs 24x7, several thousand per hour, all between a few hundred K and about 50 MB.
>
>
>
Why not put the files behind a secured directory. Apache will handle
that, so PHP needs not be involved. Once logged in, they can download
loads without it being asked for again.
Thanks,
Ash
http://www.ashleysheridan.co.uk
--=-ZNqC1Mow2WwXdorMDPMo--
Re: Emergency! Performance downloading big files
am 02.12.2009 00:04:00 von Mari Masuda
On Dec 1, 2009, at 2:48 PM, Brian Dunning wrote:
> This is a holiday-crunch emergency.
[snip]
> Is there a SUBSTANTIALLY faster way to download and save these files? =
Keep in mind the client's requirements cannot be changed. Thanks for any =
suggestions.
Could you just put the URLs of the files into something that you can =
iterate over and use curl to get the PDFs? I don't know if this would =
be any faster than your method but it was just an idea.
Mari=
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Re: Emergency! Performance downloading big files
am 02.12.2009 00:21:04 von James Mclean
On Wed, Dec 2, 2009 at 9:18 AM, Brian Dunning wrote:
> This is a holiday-crunch emergency.
Aren't they all! :)
> It's WAY TOO SLOW. I can paste the URL into a browser and download even the largest files quite quickly, but the PHP method bottlenecks and cannot keep up.
Are you certain you also have the bandwidth to support it as well?
The suggestion from other users of off-loading the PDF downloading to
Apache (or another webserver) is a good idea also.
Cheers
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Re: Emergency! Performance downloading big files
am 02.12.2009 00:51:15 von mike503
On Tue, Dec 1, 2009 at 3:21 PM, James McLean wrote:
> The suggestion from other users of off-loading the PDF downloading to
> Apache (or another webserver) is a good idea also.
^
I never allow PHP to be [ab]used and kept open to spoonfeed clients
with fopen/readfile/etc.
in nginx:
header("X-Accel-Redirect: /uri/path/to/file")
in lighttpd (if I recall)
header("X-Sendfile: /full/path/to/file") or now header("X-Sendfile2:
/full/path/to/file byterange")
in apache there is a "mod_sendfile" module I think. never used it.
be sure to set appropriate other headers too. nginx will fill in the
content-length for you, but you need to fill in content-type i believe
still. things like that.
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Re: Emergency! Performance downloading big files
am 02.12.2009 00:56:38 von LinuxManMikeC
On Tue, Dec 1, 2009 at 3:48 PM, Brian Dunning wrot=
e:
>
> This is a holiday-crunch emergency.
>
> I'm dealing with a client from whom we need to download many large PDF do=
cs 24x7, several thousand per hour, all between a few hundred K and about 5=
0 MB. Their security process requires the files to be downloaded via https =
using a big long URL with lots of credential parameters.
>
> Here's how I'm doing it. This is on Windows, a quad Xeon with 16GB RAM:
>
> $ctx =3D stream_context_create(array('http' =3D> array('timeout' =3D> 120=
0)));
> $contents =3D file_get_contents($full_url, 0, $ctx);
> $fp =3D fopen('D:\\DocShare\\'.$filename, "w");
> $bytes_written =3D fwrite($fp, $contents);
> fclose($fp);
>
> It's WAY TOO SLOW. I can paste the URL into a browser and download even t=
he largest files quite quickly, but the PHP method bottlenecks and cannot k=
eep up.
>
> Is there a SUBSTANTIALLY faster way to download and save these files? Kee=
p in mind the client's requirements cannot be changed. Thanks for any sugge=
stions.
>
>
Well one problem with your code is file_get_contents. Its downloading
the entire file, putting it in a variable, then returning that
variable. Then you write this super huge variable (as much as 50MB
from what you said) to a file. If you think about what might be going
on underneath that seemingly simple function, there could be millions
of memory reallocations occurring to accommodate this growing
variable. I would instead use fopen and read a set number of bytes
into a buffer variable (taking into consideration available bandwidth)
and write it to the file. That said, I would never do this kind of a
program in PHP. Like other's have suggested, use curl or wget, and
you can interface with it through PHP to initiate and control the
process if you need to.
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Re: Emergency! Performance downloading big files
am 02.12.2009 06:03:03 von Brian Dunning
Can someone explain how this would work? It's a Windows web server =
running IIS and the files are saved to a drive that is outside the web =
root. PHP is grabbing each filename from a MySQL database, along with =
the URL and credentials for it, and ends up with a url something like =
this:
https://server.com?filename=3Dfilename.pdf&user=3Dxxx&pass=3 Dxxx&something=
=3Dxxx
On Dec 1, 2009, at 2:55 PM, Ashley Sheridan wrote:
> Why not put the files behind a secured directory. Apache will handle
> that, so PHP needs not be involved. Once logged in, they can download
> loads without it being asked for again.
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Re: Emergency! Performance downloading big files
am 02.12.2009 06:32:31 von Nathan Nobbe
--000e0cd31314823c940479b833a4
Content-Type: text/plain; charset=UTF-8
On Tue, Dec 1, 2009 at 4:56 PM, LinuxManMikeC wrote:
> On Tue, Dec 1, 2009 at 3:48 PM, Brian Dunning
> wrote:
> >
> > This is a holiday-crunch emergency.
> >
> > I'm dealing with a client from whom we need to download many large PDF
> docs 24x7, several thousand per hour, all between a few hundred K and about
> 50 MB. Their security process requires the files to be downloaded via https
> using a big long URL with lots of credential parameters.
> >
> > Here's how I'm doing it. This is on Windows, a quad Xeon with 16GB RAM:
> >
> > $ctx = stream_context_create(array('http' => array('timeout' => 1200)));
> > $contents = file_get_contents($full_url, 0, $ctx);
> > $fp = fopen('D:\\DocShare\\'.$filename, "w");
> > $bytes_written = fwrite($fp, $contents);
> > fclose($fp);
> >
> > It's WAY TOO SLOW. I can paste the URL into a browser and download even
> the largest files quite quickly, but the PHP method bottlenecks and cannot
> keep up.
> >
> > Is there a SUBSTANTIALLY faster way to download and save these files?
> Keep in mind the client's requirements cannot be changed. Thanks for any
> suggestions.
> >
> >
>
> Well one problem with your code is file_get_contents. Its downloading
> the entire file, putting it in a variable, then returning that
> variable. Then you write this super huge variable (as much as 50MB
> from what you said) to a file. If you think about what might be going
> on underneath that seemingly simple function, there could be millions
> of memory reallocations occurring to accommodate this growing
> variable. I would instead use fopen and read a set number of bytes
> into a buffer variable (taking into consideration available bandwidth)
> and write it to the file. That said, I would never do this kind of a
> program in PHP. Like other's have suggested, use curl or wget, and
> you can interface with it through PHP to initiate and control the
> process if you need to.
agreed. ideally a memory buffer size would be defined and as it filled, it
would periodically be flushed to disk.. thinks back to C programming in
college.
in this day and age, id just give the curl option CURLOPT_FILE a shot as its
most likely implementing said logic already.
depending on the upstream bandwidth your client has, and your download
bandwidth, you may also see greater throughput by downloading multiple files
in parallel, aka, curl_multi_init() ;)
-nathan
--000e0cd31314823c940479b833a4--
Re: Emergency! Performance downloading big files
am 02.12.2009 10:19:03 von Kim Madsen
Brian Dunning wrote on 2009-12-01 23:48:
> This is a holiday-crunch emergency.
>
> I'm dealing with a client from whom we need to download many large PDF docs 24x7, several thousand per hour, all between a few hundred K and about 50 MB. Their security process requires the files to be downloaded via https using a big long URL with lots of credential parameters.
>
> Here's how I'm doing it. This is on Windows, a quad Xeon with 16GB RAM:
>
> $ctx = stream_context_create(array('http' => array('timeout' => 1200)));
> $contents = file_get_contents($full_url, 0, $ctx);
> $fp = fopen('D:\\DocShare\\'.$filename, "w");
> $bytes_written = fwrite($fp, $contents);
> fclose($fp);
>
> It's WAY TOO SLOW. I can paste the URL into a browser and download even the largest files quite quickly, but the PHP method bottlenecks and cannot keep up.
>
> Is there a SUBSTANTIALLY faster way to download and save these files? Keep in mind the client's requirements cannot be changed. Thanks for any suggestions.
try readfile()
--
Kind regards
Kim Emax - masterminds.dk
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Re: Emergency! Performance downloading big files
am 02.12.2009 10:42:04 von Colin Guthrie
'Twas brillig, and Michael Shadle at 01/12/09 23:51 did gyre and gimble:
> On Tue, Dec 1, 2009 at 3:21 PM, James McLean wrote:
>
>> The suggestion from other users of off-loading the PDF downloading to
>> Apache (or another webserver) is a good idea also.
>
> ^
>
> I never allow PHP to be [ab]used and kept open to spoonfeed clients
> with fopen/readfile/etc.
I think there has been some confusion.... The OP wanted a way to
*download* the files *from* somewhere, not dish them up to his clients.
I think some or the replies were assuming he wanted to have a PHP script
as a guardian to protect content from unauthorised users but that is not
what he actually said!
> in apache there is a "mod_sendfile" module I think. never used it.
The above said, I didn't know about this module and it looks rather
useful, so thanks for pointing it out :D
Here is the first Google result I found on this issue which explains it
a bit.
http://codeutopia.net/blog/2009/03/06/sending-files-better-a pache-mod_xsendfile-and-php/
Col
--
Colin Guthrie
gmane(at)colin.guthr.ie
http://colin.guthr.ie/
Day Job:
Tribalogic Limited [http://www.tribalogic.net/]
Open Source:
Mandriva Linux Contributor [http://www.mandriva.com/]
PulseAudio Hacker [http://www.pulseaudio.org/]
Trac Hacker [http://trac.edgewall.org/]
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Re: Re: Emergency! Performance downloading big files
am 02.12.2009 10:59:54 von mike503
Ah I didn't pay attention to the first part. Just gave my typical
don't spoonfeed bytes from php rant :)
Sent from my iPhone
On Dec 2, 2009, at 1:42 AM, Colin Guthrie wrote:
> 'Twas brillig, and Michael Shadle at 01/12/09 23:51 did gyre and
> gimble:
>> On Tue, Dec 1, 2009 at 3:21 PM, James McLean
>> wrote:
>>> The suggestion from other users of off-loading the PDF downloading
>>> to
>>> Apache (or another webserver) is a good idea also.
>> ^
>> I never allow PHP to be [ab]used and kept open to spoonfeed clients
>> with fopen/readfile/etc.
>
> I think there has been some confusion.... The OP wanted a way to
> *download* the files *from* somewhere, not dish them up to his
> clients.
>
> I think some or the replies were assuming he wanted to have a PHP
> script as a guardian to protect content from unauthorised users but
> that is not what he actually said!
>
>> in apache there is a "mod_sendfile" module I think. never used it.
>
> The above said, I didn't know about this module and it looks rather
> useful, so thanks for pointing it out :D
>
> Here is the first Google result I found on this issue which explains
> it a bit.
> http://codeutopia.net/blog/2009/03/06/sending-files-better-a pache-mod_xsendfile-and-php/
>
> Col
>
> --
>
> Colin Guthrie
> gmane(at)colin.guthr.ie
> http://colin.guthr.ie/
>
> Day Job:
> Tribalogic Limited [http://www.tribalogic.net/]
> Open Source:
> Mandriva Linux Contributor [http://www.mandriva.com/]
> PulseAudio Hacker [http://www.pulseaudio.org/]
> Trac Hacker [http://trac.edgewall.org/]
>
>
> --
> PHP General Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>
--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php
Re: Emergency! Performance downloading big files
am 02.12.2009 18:18:42 von daniel danon
--001636ed6c2eff0ec40479c210c7
Content-Type: text/plain; charset=UTF-8
Try using CURL - with that you can download many links simultaneously!
On Wed, Dec 2, 2009 at 12:48 AM, Brian Dunning wrote:
> This is a holiday-crunch emergency.
>
> I'm dealing with a client from whom we need to download many large PDF docs
> 24x7, several thousand per hour, all between a few hundred K and about 50
> MB. Their security process requires the files to be downloaded via https
> using a big long URL with lots of credential parameters.
>
> Here's how I'm doing it. This is on Windows, a quad Xeon with 16GB RAM:
>
> $ctx = stream_context_create(array('http' => array('timeout' => 1200)));
> $contents = file_get_contents($full_url, 0, $ctx);
> $fp = fopen('D:\\DocShare\\'.$filename, "w");
> $bytes_written = fwrite($fp, $contents);
> fclose($fp);
>
> It's WAY TOO SLOW. I can paste the URL into a browser and download even the
> largest files quite quickly, but the PHP method bottlenecks and cannot keep
> up.
>
> Is there a SUBSTANTIALLY faster way to download and save these files? Keep
> in mind the client's requirements cannot be changed. Thanks for any
> suggestions.
>
>
>
> --
> PHP General Mailing List (http://www.php.net/)
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>
--
Use ROT26 for best security
--001636ed6c2eff0ec40479c210c7--