Fixing A Strange PHP Gzip Issue

Jason Knight
CodeX
Published in
4 min readAug 29, 2022

--

Photo by Tomas Sobek on Unsplash

A friend asked me for help figuring out why his rewrite of some very, very old software was giving him such headaches. I don’t normally do stuff for friends, but this seemed simple enough. The problem turned out to be so screwy, I had to share.

The company he works for got hacked, and the reason was painfully obvious. They were still running software that only worked on PHP 4.0. Yes, we’re talking well over 20 year old software.

Generally he’s able to handle modern PHP coding so I was really curious what had him hung up, and it was threefold.

  1. An old software they had that analyzed the links of pages of the site was crashing on his rewrite not even getting so far as parsing the markup, odd since it is more modern and uses XMLDocument and his output markup was valid!
  2. Errors were showing up in his error log but not in the output despite “all” error reporting being enabled.
  3. A number of pages were delivering blank white nothing, zero content in the response.

All of these were caused by a string of coding disasters all of which were related to how gzipping / packing the HTML output was being handled.

PHP.INI Zlib Is Convenient, Not Reliable

One of the first things he did in the recode was to pull out ob_start() with the old-school manual compression method. From before gz_handler was even an option. In its place he just enabled compression in the PHP.INI with:

zlib.output_compression = "1"

This is actually what was making the third party site analysis software fail. It doesn’t support gzipped markup, and apparently — at least on his hosting — the output compression routine doesn’t check the headers to see if compression is even supported!

Hurr-Durrz on PHP / zlib’s part.

Output Buffering

A number of the pages that were crashing were trying to set cookies or headers after output started. If you know anything about PHP — or web dev in general — you know you can’t do that… so why did it work in the old program?

Because it was doing an ob_start. Technically the PHP.ini zlib should have been handling premature exits and compressing properly, but no. It was acting as if the output was empty on error. Thus the errors in the logs and the blank pages.

A partial fix would have used ob_start, but errors might still terminate without a flush.

The Solution?

Go back to doing it manually, just with a more modern approach. As it’s a single entry program it’s easy to hook the compression right at the start. Turn off the php.ini compression, go back to ob_start but using gzhandler instead, and use register_shutdown_function to handle setting the header() and flush.

foreach (['gzip', 'x-gzip', 'x-compress'] as $type) {
if (strpos($_SERVER['HTTP_ACCEPT_ENCODING'], $type) !== false) {
define('CONTENT_ENCODING', $type);
break;
}
}
ob_start(defined('CONTENT_ENCODING') ? 'ob_gzhandler' : null);
ob_implicit_flush(0);
register_shutdown_function(function() {
if (defined('CONTENT_ENCODING')) header(
'Content-Encoding: ' . CONTENT_ENCODING
);
ob_end_flush();
});

Detect if the browser even supports gzip compression. If it does start with ob_gzhandler, if not still buffer so that you can header() or set cookies() until blue in the face. Turn off implicit flushing so that we can buffer all the output not just one chunk at a time. Register the shutdown function to set the header if appropriate, then do an ob_end_flush to send the content output.

No having to remember to flush at the end or blindly hoping PHP does it for you.

Conclusion

This was a simple fix, and has let my friend go forward in debugging his rewrite. I’ve always used code similar to this from almost the day I started using PHP decades past, just because being able to use header and cookies wherever is needed greatly simplifies using the language.

It’s not necessarily pretty, but it gets the job done better than most alternatives.

I’m a little dismayed that for some reason the php.ini setting just blindly sends it zipped without checking the HTTP_ACCEPT_ENCODING value. I don’t know if that’s normal or not, but it’s not good if so.

Anyhow, it’s easy to start going “rawrz I can modernize this!” and to start ripping things out without giving consideration to why the decision was made in the first place. Especially when the documentation says things like “You should use the PHP.INI setting instead.”

I see that far too often, people blindly applying advice that says “when appropriate” to everything. See the crazy hoops people jump through to avoid using tables “because”, or the misinterpretation of STRONG and EM over B and I (all four tags still serve unique and different purposes), and so forth. It really seems like these days everyone is turning “when appropriate” or “preferred” into “ALWAYS”, when that’s not the message.

Hope someone finds this if not useful, at least interesting.

--

--

Jason Knight
CodeX

Accessibility and Efficiency Consultant, Web Developer, Musician, and just general pain in the arse