Nobody Understands HTTP Caching

On the site I’m supporting, I ran into an issue where page updates weren’t being seen. You’d have to force a cache reload to see the changes. Knowing this is a cache control issue, I decided to dig into it a bit and boy, is there some bad information out there. It seems most people don’t understand how caching works and what the different pieces are for. Here are the top offenders.

No-cache does not mean do not cache at all

I’ve seen a lot of places make the claim that the cache control header no-cache means what it says and the response won’t be cached. Well, that’s not what it means in reality. no-cache means that intermediate caches are not allowed to store the response. The browser’s own cache is free to store it. This is meant to protect sensitive data from being cached on public caches. Since your browser’s cache is considered private, it is allowed to store the response in cache. If indeed you don’t want the response stored at all, that is what no-store is for.

Everything should have long expire dates and no ETags or Last-Modified headers

This is the thing that cause my trouble. Current SEO practice is to allow long cache times, and as part of that, resources which occasionally need updating like scripts and CSS files are given unique names whenever they need to be updated. Here’s the kicker though. If the file referencing that script or CSS file has a long expire and no ETag or Last-Modified header, the browser won’t update it and thus won’t see the new reference.

Some people seem overzealous about being sure their cache is used as long as possible. That is why ETags are out of favor. People don’t want an updated time stamp or something like that to cause an unneeded cache refresh. That is understandable. The problem comes in when the needed cache refresh doesn’t happen. That’s why the Last-Modified or ETag is important. They are there to invalidate a currently un-expired cache. They let you make the updates you need between the long expire dates on your cache control.

If you are using the renaming technique for cache-busting, then you only need these headers on your pages themselves. In my case, the pages were all PHP files served as fast cgi. Apache will not set ETags or Last-Modified headers on scripts served this way. The headers needed to be set from within PHP.

In my case, this is more of a difficulty than perhaps others. PHP has a mechanism to prepend a given PHP file to all other PHP files. That sounds perfect for a case like this. The problem I have is I’m in a shared environment and that facility is already in use. My site needs its own file prepended which in turn needs to include the global file already in use. So, how can this be accomplished?

The facility I found is the .user.ini file. This file lets you set most PHP parameters for a given directory. It will only do this for the directory it is in. The documentation suggests that PHP will check for this file in parent directories up to the document root, but that was not the case when I tested. This means the .user.ini file needs to exist in every directory.

Maintenance of this will be a headache, but a quick Python script was able to get it set up for me initially. I suppose a more robust script could be used long term. At least for now, I have a working solution. It doesn’t seem ideal, but it will do the job.

So, when it comes to cache, no-cache still caches. no-store will not cache. Long expire dates are good, but best used with Last-Modified or ETags for resources who’s names cannot change.

Published by

Dan

Been a programmer, primarily in the internet space, since late 1997. Google+ Profile