I’ve been using CloudFlare on this site for quite a few months. On the whole I was quite happy with it — it certainly has filtered out a large chunk of visits from spambots and other nasties. There are some quirks however. Here are two conditional requests I made to my home page, along with the response headers (I’ve removed some headers that are not relevant):

Request 1:

GET / HTTP/1.1
Host: rayofsolaris.net
User-Agent: Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
If-Modified-Since: Thu, 02 Aug 2012 05:50:55 GMT
Cache-Control: max-age=0

Response 1:

HTTP/1.1 304 Not Modified
Server: cloudflare-nginx
Date: Fri, 17 Aug 2012 07:18:50 GMT
Cache-Control: max-age=604800
Expires: Thu, 09 Aug 2012 05:50:55 GMT
Last-Modified: Thu, 02 Aug 2012 05:50:55 GMT

* * *

Request 2:

GET / HTTP/1.1
Host: rayofsolaris.net
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:14.0) Gecko/20100101 Firefox/14.0.1
If-Modified-Since: Thu, 02 Aug 2012 05:50:55 GMT
Cache-Control: max-age=0

Response 2:

HTTP/1.1 304 Not Modified
Server: cloudflare-nginx
Date: Fri, 17 Aug 2012 07:20:13 GMT
Cache-Control: max-age=604800
Expires: Thu, 09 Aug 2012 05:50:55 GMT

The only difference between the two requests was the User-Agent header. In the first request I spoofed Googlebot’s user agent.

Nothing out of the ordinary here. Both requests received identical 304 Not Modified responses from the CloudFlare proxy. But the weird bit is that my web server (the one that the CloudFlare proxy connects to on behalf of the client), saw very different requests. Here are the log entries for the two requests:

Request 1:

[17/Aug/2012:08:18:50 +0100] "GET / HTTP/1.1" 200 1638 "-" 
	"Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

Request 2:

[17/Aug/2012:08:20:13 +0100] "GET / HTTP/1.1" 304 0 "-" 
	"Mozilla/5.0 (Windows NT 6.1; WOW64; rv:14.0) Gecko/20100101 Firefox/14.0.1"

Notice that with first the request, my web server returned a 200 response - i.e., it delivered the entire content of the web page. With the second request, it delivered a 304 response (and no content). In both cases, the client received a 304 response. So the CloudFlare proxy is doing something funky in between.

After some investigation, I found that when it sees a Googlebot (or other crawler) user agent, CloudFlare strips the If-Modified-Since header from the request before passing it on to the web server. So my server always receives a non-conditional request. It means that all crawler visits are given a full 200 response (even though that may not be what it receives from the CloudFlare proxy), and that my server is sending out a whole lot more data that it needs to. This is particularly wasteful for items that are fetched frequently, such as feeds being retrieved by Feedfetcher.

I’m not sure why CloudFlare would be intercepting and modifying requests from crawlers in this way. I’ve asked, and will update this post with their response.

In any case, the point to note is that CloudFlare isn’t necessarily cutting the load on your server — it might actually be increasing it! In my own case, this behaviour actually wiped out any benefit gained from static content caching (in terms of data transfer and processor usage) in the last month, and I’ve stopped using CloudFlare because of that.

Edit (5th September 2012): No response from CloudFlare — the support ticket that I raised has been closed, twice, without anyone addressing it.

Edit (12th September 2012): After reopening the ticket, I’ve been told that the issue has been passed on to the engineering team to investigate, but have not been given an explanation of why this is happening, and whether it is supposed to happen (I’m pretty sure it’s a bug).

I’ve recently been using sc3cmd to back up a lot of data to Amazon S3. Version 1.1.0 (currently in beta) supports multi-part uploads. It has borked a few times half way through large uploads, without properly aborting the operation server-side. This meant that the parts uploaded so far were not removed from the server, and that’s bad because Amazon charges for this storage.

s3cmd doesn’t currently have any way to list or abort interrupted multi-part uploads, which meant I had to figure out some other way to do it. It turned out to be quite simple using Python and the boto library:

from boto.s3.connection import S3Connection
connection = S3Connection("your_access_key", "your_secret_key")
bucket = connection.get_bucket("your_s3_bucket_name")
uploads = bucket.get_all_multipart_uploads()
print len(uploads), "incomplete multi-part uploads found."
for u in uploads:
	u.cancel_upload()
# If another client is in the process of uploading, then it won't have been cancelled
uploads = bucket.get_all_multipart_uploads()
if len(bucket) > 0
	print "Warning: incomplete uploads still exist."

The way WordPress handles commenting on attachments (media) is a bit flaky. The comment status of attachment pages is set to the global default comment status when the media item is attached to a post. After that, there’s no way to change it! Surprising? Yes. There are a couple of truly ancient bug reports related to this, but they’ve received very little attention.

I ran into this issue when working on some of my comment-related plugins. So here’s how to fix it.

If you’re lazy

If you want to just disable comments on all media items, use the Disable Comments plugin.

If you want media items to inherit their comment status from their parent post, use the Comment Control plugin.

If you’re bored :)

To disable comments on all media items, put something like this in your theme’s functions.php file:

function filter_media_comment_status( $open, $post_id ) {
	$post = get_post( $post_id );
	if( $post->post_type == 'attachment' ) {
		return false;
	}
	return $open;
}

add_filter( 'comments_open', 'filter_media_comment_status', 10 , 2 );

If you want media items to inherit their comment status from their parent post:

function filter_media_comment_status( $open, $post_id ) {
	$post = get_post( $post_id );
	if( $post->post_type == 'attachment' && $post->post_parent ) {
		return comments_open( $post->post_parent );
	}
	return $open;
}

add_filter( 'comments_open', 'filter_media_comment_status', 10 , 2 );

Google is celebrating Jamhuri day in Kenya with a nice doodle today. I particularly like the detail on the lettering.

Google Doodle for Kenya, 12 December 2011

Jamhuri day is the anniversary of Kenya’s independence from the United Kingdom (1963), and also of its formally becoming a republic (1964).

“Harambee” loosely translates to “let’s all pull together”.

For anyone who has suddenly found that LibreOffice crashes during startup (no error message) and, like me, couldn’t figure it out for ages: LibreOffice >3.3 isn’t able to detect Java Runtime Environment version 1.7.x, so if you upgrade Java don’t uninstall the 1.6.x version.

There’s a bug report. It’s going to annoy a lot of people when Java 1.7.x goes mainstream.

Edit (4th January 2012): This issue has apparently been fixed in LibreOffice 3.4.5 (which is currently in beta, and due to be released before the end of the month).