gz compression rates with custom buffer callback

gz compression rates with custom buffer callback

am 08.01.2008 03:02:40 von thyb0

Hi,

First, thanks to those who'll read me 'til the end, I know my code can
seem a bit messed up, that they may be pretty obvious solutions, that I
may speak some bad English; but optimization madness bring me here. I'm
sure you'll understand :).

Basically, what I want to do is a classic output-buffer callback
function that gzencode() the buffer. There's a native function for that,
I know, but I want a little more: compression stats (and maybe even more
later).

Here's the idea: (don't scream, I'll explain)

# gz stats (buffer callback)
function gz_cmp($buffer) {
$buffer = preg_replace_callback(
'/\$gz\-stats=(\d+)\$/',
create_function(
'$status',
'if( $status[1] ) {
# ' . ($gz_activated = true) . '
# ' . ($s_size = strlen($buffer)) . '
# ' . ($c_size = strlen(gzencode($buffer, 9))) . '

return ' . round((100 - $c_size / $s_size * 100), 1) .
' . \'%\';
}'
),
$buffer
);

if( $gz_activated ) {
header('Content-Encoding: gzip');
$buffer = gzencode($buffer);
}

return $buffer;
}

gz_cmp() = GZ Compression callback function
$status = array('$gz-stats=1', '1')
-> $status[1] = GZ Compression availability (1 or 0)
$gz_activated = $status[1], as a flag for the function's ending
$s_size = Plain-text buffer (document) size
$c_size = GZ compressed buffer (document) size
[weird formula] = Compression rate (in %)


Explanations:

Somewhere in the page is output $gz-stats=1$ or $gz-stats=0$, depending
on whether GZ comp is used or not (through a constant and a few checks
as GZ module availability and browser's Accept-Encoding HTTP header;
well, whatever). Of course, 1=Enabled and 0=Disabled.

Now, the output buffer ends, place to the callback function: gz_cmp().
The first thing that comes into your mind might be that we have to
search for $gz-stats=x$ and THEN replace it by the stats, actually
encoding the doc, sending Content-Encoding header,.. that stuff; OR
simply return back the buffer unchanged if $gz-stats=0$.

But, this means two regexp searches in the whole doc in case GZ is
activated: one for the check of $gz-stats=x$, one for the replacement.
-> I want one.

Thus, I thought. I ain't genius and that's probably why it isn't really
working as expected, but here's the idea: to directly make the
replacement using preg_replace_callback() which, as it name implies,
calls some function back too. The only argument passed to the callback
function is an array of matches, with the first value for the whole
found pattern and the rest for each parenthesis. I got only one
parenthesis which should only be 1 or 0 (GZ activated or not), at the
second value of the array.

I decided to make a lambda callback function for the replacement (or
'anonymous function') with create_function(). This permits me to get the
buffer sizes (original and compressed) from outside the replace callback
function (as I can NOT pass them as parameters). The other advantage I
thought this system would give me was that I could set an external flag
($gz_activated) from within the lambda callback in order to ACTUALLY
encode the doc AFTER having replaced $gz-stats=1$ by the compression
stats (which is a simple rate in %, by the way).

Why so much complications? I don't want to use globals. I know you
thought of it ;). Portability purposes only.

This stuff seems to work great, as you can see, I comment some lines in
the lambda callback to set the flag and compute the lengths outside the
string. Then I return the stats (which, by the way, turns around 80%, GZ
rocks!) to the preg_replace() function which will replace the
$gz-stats=1$ with them. Once done, the result is stored in $buffer
(which is actually updated). The flag is set, so I can now send the HTTP
header to tell the browser we're gonna send some encoded stuff, and then
actually encode it.

The $buffer, now modified and encoded, is eventually returned, the
output buffer is flushed and here we go.


\o/. Or not..
Now I try to set $gz-stats=0$. The stats aren't displayed, as expected,
but after having sniffed the headers, I found the content was still
encoded in GZ. For whatever reason $gz-stats$ as been set to 0 by the
main script, so we don't want it.

The reason?
Apparently, PHP parses all the lambda function code twice. First to
'decode' it (don't forget it's just a string), and then to execute it
properly. Well, it's my guess anyway.

As you can imagine, it's impossible (at least I think) to pass custom
parameters to the OB callback, thus I found myself screwed. I'm now
asking you: would you imagine any solution to
- get the flag out of the lambda func ONLY when expected, OR
- get the GZ Availability value into the OB callback by any other way..

...knowing that I can't bear with globals, and that I'd already forget to
reload the page after storing the value in any dead mem, if I were you ^^'.

Is that some kind of challenge, or am I just blind? I think I got into
something that maybe isn't of my level -.-'

Thanks for all !

-thibĀ“

PS If you have a totally different solution, I wouldn't mind throwing
all of this away; it always hurts, but I think I got used to. =P.