Nobody Understands HTTP Caching

On the site I’m supporting, I ran into an issue where page updates weren’t being seen. You’d have to force a cache reload to see the changes. Knowing this is a cache control issue, I decided to dig into it a bit and boy, is there some bad information out there. It seems most people don’t understand how caching works and what the different pieces are for. Here are the top offenders.

No-cache does not mean do not cache at all

I’ve seen a lot of places make the claim that the cache control header no-cache means what it says and the response won’t be cached. Well, that’s not what it means in reality. no-cache means that intermediate caches are not allowed to store the response. The browser’s own cache is free to store it. This is meant to protect sensitive data from being cached on public caches. Since your browser’s cache is considered private, it is allowed to store the response in cache. If indeed you don’t want the response stored at all, that is what no-store is for.

Everything should have long expire dates and no ETags or Last-Modified headers

This is the thing that cause my trouble. Current SEO practice is to allow long cache times, and as part of that, resources which occasionally need updating like scripts and CSS files are given unique names whenever they need to be updated. Here’s the kicker though. If the file referencing that script or CSS file has a long expire and no ETag or Last-Modified header, the browser won’t update it and thus won’t see the new reference.

Some people seem overzealous about being sure their cache is used as long as possible. That is why ETags are out of favor. People don’t want an updated time stamp or something like that to cause an unneeded cache refresh. That is understandable. The problem comes in when the needed cache refresh doesn’t happen. That’s why the Last-Modified or ETag is important. They are there to invalidate a currently un-expired cache. They let you make the updates you need between the long expire dates on your cache control.

If you are using the renaming technique for cache-busting, then you only need these headers on your pages themselves. In my case, the pages were all PHP files served as fast cgi. Apache will not set ETags or Last-Modified headers on scripts served this way. The headers needed to be set from within PHP.

In my case, this is more of a difficulty than perhaps others. PHP has a mechanism to prepend a given PHP file to all other PHP files. That sounds perfect for a case like this. The problem I have is I’m in a shared environment and that facility is already in use. My site needs its own file prepended which in turn needs to include the global file already in use. So, how can this be accomplished?

The facility I found is the .user.ini file. This file lets you set most PHP parameters for a given directory. It will only do this for the directory it is in. The documentation suggests that PHP will check for this file in parent directories up to the document root, but that was not the case when I tested. This means the .user.ini file needs to exist in every directory.

Maintenance of this will be a headache, but a quick Python script was able to get it set up for me initially. I suppose a more robust script could be used long term. At least for now, I have a working solution. It doesn’t seem ideal, but it will do the job.

So, when it comes to cache, no-cache still caches. no-store will not cache. Long expire dates are good, but best used with Last-Modified or ETags for resources who’s names cannot change.

Selenium Rocks!

I’ve recently been learning Python. This has been a pleasurable experience! Not since I was first making a computer do what I say with code has it been just so much fun to make things. I’ve been playing with lots of different things to do with Python, mostly centered around the idea of automation. This is what led me to Selenium.

Selenium is a technology used to programtically control a web browser. It is useful for testing web sites, and that is exactly how I intend to use it. Right now I’m trying to understand how to structure the tests themselves into a cohesive suite. How I learned Python was rather loose, so now that I want to build a tool I want better structure.

Once I have the test suite structure understood, the way to go is to use the page object model. I’ve seen some developers disparage this pattern as bloat, but I see things differently. I see the benefits of putting the page specific knowledge in a page specific class. The page doesn’t need to know what a valid test is, it just needs to know what it can do. That is exactly what page objects allow.

My goal is to create a test that will verify a web site after a large processing operation. The site I am supporting uses a decoupled CMS which occasionally requires a full re-publish of all the pages in the site. This process is to be automated and thus I want an automated test to verify the process has not damaged the site. I want it to email me with the results of the tests. Selenium and Python are well suited to this task, specially because the supplied automation code for the publish process is already in Python.

To date, I’ve done many individual tests, including verifying form submissions function and crawling a site to gather links. I’ve used Selenium for the crawler instead of the requests library because of the use of some Angular components on the site. I need to be able to crawl everything. Selenium is quite the toolkit for this job.

Learning AngularJS

I’ve been spending some time lately looking into Angular. Angular is a JavaScript application framework. It is designed to take the code you would use to make an interactive web page and to structure more formally. This allows you to make reusable components and it takes care of some of the update coding for you.

When I first started looking at it, I didn’t understand the purpose. It seemed much too complicated for what it did for you. Now I’ve come to a better understanding of its purpose. It allows you to move portions of your application off the server and onto the client. It does so in a way that forces your code to be more thoughtfully structured. Now that I understand it better, I intend to use it.

Angular 2 was recently released into the world. This has complicated my learning a bit, but I feel I have a better understanding now of how to go about learning Angular. I would suggest learning Angular 1 first, at least for the time being. It is simpler to get going and will allow you to understand Angular’s concepts better. Once you have a handle on Angular 1, then look at Angular 2. For Angular 1 developers, I think Angular 2 will not be too hard a transition. For now I think Angular 2 is too steep a learning curve for people new to Angular all together.

Learning the Facebook PHP API, What they don’t tell you

I finally had a use for a Facebook app so I decided to learn how to code one. My education started with some video lessons, one on the PHP API and one on the JavaScript API. They are related courses and I did the PHP one first, then the JavaScript one. They should be done in the other order. The JavaScript one contains some background information that is useful. In the end, the PHP one is quite out of date, as the current version has made much of the information obsolete. If you have access and the time, I’d still go through it as the stuff about the Graph API structure is still quite relevant, just don’t expect to be able to follow along with the code samples.

The new API has quite a different structure from the old one. It is written in an object-oriented fashion and has been expanded and refined greatly. Security has changed quite a bit as well. All this is fine and good, and there is plenty of getting started type tutorials around, but unfortunately, not all the documentation is easily accessible and many things are undocumented. In this article I will document the issues I ran into and how I got around them.

If you aren’t using Composer, read the Github page

Facebook recommends using Composer to download and install the SDK and to generate an autoloader script to more easily handle inclusion of necessary files into your PHP scripts. I do not use Composer, so I had to use the manual steps. These steps are shown on the Github site, not on Facebook’s getting started page. So first thing you need to know, if you aren’s using Composer, go to this link on Github to see how to include files.

The short version of this is once you download the API, put the whole directory on your server, not just the src/Facebook directory. There is an autoloader script you will want. Then, you will add these two lines to your PHP script:

define('FACEBOOK_SDK_V4_SRC_DIR'. 'path/to/sdk');
require __DIR__ . '/path/to/sdk/autoload.php';

Now, nowhere did I find how these paths were to be defined. The second one, but virtue of the use if __DIR__ is a fully qualified path to the autoload file in the sdk. That can be reasoned easily enough. The first one, however, is not as well defined. I found that for the first one, you can use either a fully qualified path, or a relative path if you leave off the leading /. For example, if your files are in /home/myproject/public_html/facebook_app, and the sdk you downloaded was the same one I did and you left it labeled the same (facebook-php-sdk-v4-4.0-dev), then they can look like this:

define('FACEBOOK_SDK_V4_SRC_DIR'. '/home/myproject/public_html/facebook_app/facebook-php-sdk-v4-4.0-dev/src/Facebook');
require __DIR__ . '/facebook-php-sdk-v4-4.0-dev/autoload.php';


define('FACEBOOK_SDK_V4_SRC_DIR'. 'facebook-php-sdk-v4-4.0-dev/src/Facebook');
require __DIR__ . '/facebook-php-sdk-v4-4.0-dev/autoload.php';

Facebook needs more than 128M

Once the path to the autoloader was sorted out, I got memory errors in PHP. Nowhere could I find a guideline on how much memory the API requires. I realize that memory usage will be specific to whatever you are doing with your application, but a minimum needed for example exercises would be nice to know. In my case, I set memory_limit to 256M and it worked for me.

You will need to set a timezone

While you are in the php.ini altering the memory_limit, set a date.timezone to something useful to you. I got an error when trying to get a session until this was done. This list of supported time zones may come in handy for you.

Sessions come from different places when your app is in a canvas vs direct from your site

Something to note as you are developing is that Facebook treats sessions differently if you are viewing the app within a Facebook canvas. In may places it is noted that a Facebook app is just a page on your server shown thought the Facebook wrapper. This is mostly still true, but if you need to get a session to do actual requests, the process will be different in the canvas context than on your page direct from your site.

The easy way to see if your app is being viewed within a canvas it to check the request for a signed_request parameter. Facebook will send that parameter when you are within a Facebook canvas.

FacebookSession documentation does exist

Speaking of the Facebook session object, there is official documentation for it even though it isn’t linked anywhere on the Facebook developer site. I don’t know why, but in most places the various objects are linked to their documentation pages as you’d expect. The FacebookSession object however is not linked, just colored like a link. Anyway, the documentation exists and here it is.

Ok, so that’s all I have for now. If I find any other useful nuggets I’ll post them here on this site. Happy coding!

WP-United comes back to life!

It’s been a bit over a month that I’ve been reading about the resurrection of WP-United, but the day I’d been waiting for has come. A release date has been set for the newest version of WP-United. It will be released on the same day as WordPress 3.5, which is in two weeks. Jhong is asking for people to install the beta and give feedback, so if you have a site where you can oblige, please do so and help us all out!

Light Table: The Future of IDEs Please

Light Table: A Glimpse at Programming’s Future

It appears I wasn’t the only fan of Bret Victor’s talk and the possibilities of the tools he described. Chris Granger took the notions and ran with them to create a Clojure IDE called Light Table. It doesn’t look like the IDE is built specifically for Clojure, but that was his language of choice. Light Table is an interactive draft table for the programmer. That’s the best way to describe it. It gets out of the way, makes documentation and code easier to see, read and find. It has the dynamic execution described by Bret to encourage exploration and help find bugs before they cause problems. This tool looks amazing! It’s a first step in what should be a new breed of programming editor.

For me, this would move me out of the simple code editor I currently use into an IDE. When I worked in Java more, I used Eclipse, but when I started working more in PHP and other web languages, I migrated back to a text editor with some nice features. My editor of choice currently is jEdit. It does what I need. It has an FTP plugin so I can edit directly on my servers when I need to. It has nice syntax highlighting and a pretty good diff tool as well. Also, it runs the same on both my work Mac and my personal Windows 7 notebook. Works for me. I’d definitely switch to Light Table if it supported PHP, Perl, and Javascript.

wp_remote_post and Cookies

Learning about wp_remote_post

In the beginning I was’t using wp_remote_get or wp_remote_post. In my Piwigo Embed plugin, I was doing some HTTP requests to get the gallery output to place inside a WordPress page. The intent was to be able to skin Piwigo independently from WordPress. I wanted the Piwigo content within the WordPress theme without having to recreate the look specifically for Piwigo. I was doing the requests using file_get_contents for GET requests and fopen for POST requests. It worked well for my purpose. I took the output and manipulated the URLs in links and for images and then spit the contents to a page using a shortcode.

This solution worked fine for me. I had control of the execution environment, so I didn’t have to worry about configuration not allowing fopen on URLs or anything like that. Then I saw and article about the WordPress wp_remote_get, wp_remote_post, and wp_remote_request functions. This seemed a much better solution to me. I prefer to use built-in tools if I can and these functions take care of the execution environment differences and would allow me to treat the get and post requests more similar to each other. I started modifying my code.

The Problem

I ran into trouble very quickly. Piwigo needed a cookie set in order to access the admin functions. I tried setting the cookies as an associative array of cookie_name => cookie_value and sending that with my request. This failed. I then tried a simple array with name then value and this failed. I kept getting the error: Fatal error: Call to a member function getHeaderValue() on a non-object. I looked around to find a solution to this problem but could not find one.

The Solution

The documentation on this subject is poor in the WordPress Codex. It is mentioned that cookies are sent in the arguments associative array to the request, but the form of those cookies is not specified. It only mentions that the cookies need to be in an array. Then, I stumbled across this link and the answer was clear. The cookies needed to be set as objects of type WP_Http_Cookie. There is no mention of the WP_Http_Cookie object in the description of the wp_remote_get, wp_remote_post, or wp_remote_request functions. I also did not find it referenced in the articles I found online about these functions. It is because of the obscurity of this information that I write this post. I hope others will be spared the issues I ran into.

So the solution to the problem: If you need to send cookies with your wp_remote_get, wp_remote_post, wp_remote_request call, create the cookies at WP_Http_Cookie objects, add them to an array, and send them in the arguments labeled as ‘cookies’. Once I changed my code to use the WP_Http_Cookie object, it all worked fine. Hope this helps and happy coding!

Mobile Safari Centering Content Problem

My Problem with Mobile Safari Centering Content

I ran into an issue with one of my sites and had difficulty finding the solution spelled out so I figured I’d write about it here. I could not get Mobile Safari centering content correctly. The page worked fine in Safari, IE, Firefox, anything I tried on Mac or PC, but on mobile Safari centering content failed.

I decided to try experimenting to see if I could replicate the problem, and I very easily did. From my experimenting, I found when content was larger than 960px wide, I could not center other content that was narrower. In my case, I had a wide top image and a 960px content column in the bottom. In Mobile Safari, that content below the top, wide image would left align. It would left align wether I centered using margin auto on the left and right, or used the left 50% and negative 1/2 with left margin centering option. Both these solutions worked elsewhere.

A ‘duh’ moment

Earlier in my development of this page, I had set the meta viewport to be 960px. I think at the time the top portion of the page was going to be that width. It appears that may have been partially to blame. The solution is to set the meta viewport to the widest content. Once that was done, centering worked exactly as expected. In my reading, I think the tipping point width is actually 980px, but I have not verified that. I should mention that my simple recreation of the problem did not have the viewport set initially, so it being set low was not really the issue. It needs to be set correctly.

So, the lesson is if your site is wider that 980px, set the meta viewport to the width. It’s probably best to always set the viewport to the proper width, but if your site is wide, it’s essential.

State of the Compost Pile March 2012

Things have certainly been keeping me on my toes lately. My sites have had two different hacks aimed at them in the past couple of weeks and the sites I’m working on at my job are finally nearing launch. I’ve been doing lots of reading lately and should really document more of what I’ve found here. Blogging is pretty low on my priority list at the moment and it’s unfortunate because I do think I could help some people if time permitted.

I have done no additional investigation into WP_United in months. I will look at it again once things calm down here a bit. I believe I have found the answer to my remaining problem with the Piwigo embed. I had to solve a similar problem with a gallery software called EmAlbum and found the answer there. Just need to port it back to Piwigo. That will be happening within a few weeks.

Been doing more investigation into SEO in general and some specific implementation details. This is another area where I need to share what I find, and hope to after these launches are complete.

As for the hacks I mentioned, one was against Web Host Manager and CPanel. This was easy to undo and my hosting provider was looking into closing the vulnerability. The other hack was a bit more insidious. In that hack, every PHP file on that one site was modified to add some code. That code attempted to insert what appeared to be some bad Javascript. The hack happened to cause issues with some plugins I use so I found it. I don’t know how the editing was accomplished. Files outside of WordPress were edited. I’ll try to show the details at a later point. This ended up being pretty easy to undo with the use of some command line perl code. A search and replace to the rescue!

That’s a quick overview of my recent developments. If you watched the video I posted by Bret Victor, you might also be interested in visiting his site. He has lots of interesting things to see there, plus lots of interesting transitions and such for his content. Worth look at.