RSS Cloud Fail

October 20th, 2009

Way back when I first started working on Snarfer, I was intrigued by the idea of the RSS cloud element which, in theory at least, should enable near real-time updates. Unfortunately, at the time nobody seemed to be using it, and I got the impression it wasn’t really intended for desktop clients anyway.

However, with the recent announcement from WordPress.com that they would be supporting RSS cloud in all 7.5 million of their feeds, Dave Winer’s reboot of the protocol, and an implementation of RSS cloud in his own desktop client, I figured the idea might be worth revisiting.

First impressions

Getting a basic implementation working with Dave’s example feed was easy. I figured the real test, though, would be to see how well my code worked with some real-world feeds. For that, I would rely on my trusty OPML of the Bloglines Top 1000 feed list.

My initial results were extremely disappointing. Of those 1000 feeds, a mere 32 claimed to support RSS cloud (that is, they included a cloud element in the feed). Of those 32, seven were duplicates (a common problem in the Bloglines Top 1000 – more than 10% are duplicate entries).

Worst of all, the number of feeds that successfully accepted a registration request from my client was precisely one. That’s right: out of 1000 feeds, only one actually worked. And that was Dave Winer’s Scripting News feed.

What went wrong

The first problem I discovered was with feeds running on UserLand Frontier. Annoyingly, Frontier includes cloud elements in all of its feeds, but when you attempt to subscribe to any of them, they return an error to the effect that “notification is disabled on the server”. If notification is disabled, why on earth include a cloud element in the first place? I was not impressed.

The second problem was with WordPress. When you attempt to register with a WordPress cloud, it calls back to your notification URL as a test. My client was replying to these callbacks, reasonably enough I thought, with an HTTP 204 response. However, WordPress, for reasons I cannot fathom, had decided that the only acceptable response was 200 – anything else would be considered a failure.

That was an easy fix, but it’s a problem that should never have arisen. Either the spec should have made it clear that a 200 response was required, or the WordPress implementation should have been more lenient in what it accepted. You can’t expect widespread adoption of the protocol if people have to guess the requirements of each individual implementation by trial and error.

Still, we were getting somewhere. There were only two remaining unexplained failures. These were WordPress feeds that were somehow failing to connect to my notification URL. Eventually, after digging through various releases of the WordPress source code, I discovered that these servers were actually running a buggy, older version of the WordPress plugin. Regardless of what the client specified, this version of the plugin would always attempt connections on port 80.

I could work around that bug by always listening on port 80, but that’s hardly a feasible solution in the long run. Ultimately these servers are going to need to upgrade, if for no other reason than the version of the plugin that they’re using is also full of security holes.

The big lie

At this stage, though, you’re probably thinking that all of my problems had been solved, or at least reasonably explained. Other than the Frontier servers which don’t actually support RSS cloud, I had now gotten all of the rest of the test feeds to return a successful response to my registration requests.

Unfortunately, for many of them, that response was a lie. Despite their claims of successful registration, most of those feeds would never actually call back with a notification when they were updated.

The problem is with the way WordPress handles registration requests. When you register a subscription with a cloud server, you need to pass the URL of the feed that you want to monitor. And while it is not uncommon for a feed to be known by a number of URLs, there is only one URL that WordPress will consider as valid – and there’s a fair chance that you’re not using the right one.

The worst part about it, though, is that WordPress won’t actually tell you that your notification request has been unsuccessful. It’ll silently ignore your request while claiming that everything is fine. You’ll just be left wondering why you never receive any notifications.

Now there are ways to work around this issue, but it’s ultimately a waste of time. Most of these feeds are never going to work, even if could you get them to accept your registration.

The final straw

The real problem is with FeedBurner and possibly caching in general. Even with all of my hacks and tricks and workarounds, getting to a point where most feeds were notifying me of new updates, the feeds themselves still never seemed to have any new entries when I retrieved the latest version from the server.

The thing is, when you download a FeedBurner feed, you’re actually receiving a cached (and slightly modified) version of the source feed. My understanding is that FeedBurner only updates their copy from the source every 30 minutes or so. This means that the version you see is on average around 15 minutes out of date.

There’s not much point in knowing the exact second that someone has posted a new article to their feed, if you’re only going to be able to view it 15 to 30 minutes later.

And I don’t think FeedBurner is the only instance of this problem either. While a good percentage of my tests feeds were obviously using FeedBurner, at least some of them appeared to be returning the default WordPress feed, and they still had problems showing the latest updates.

The weird thing is they weren’t even returning a 304 response to my request. Most of the time I would just get an exact copy of the content I already had, complete with identical Etag and Last-Modified headers.

I tried all sorts of tricks, including a variety of no-cache headers, and delaying 60 seconds before downloading the feed. Some of these things may have helped a little, but the results could at best be described as flaky. This is not the sort of thing I would be willing to ship as a finished product.

What about PubSubHubbub?

For those of you that are thinking that PubSubHubbub (or PuSH, as many are now calling it) is the solution to all these problems, I can assure it is not. In my not-so-humble opinion, it’s a terrible, terrible protocol, and the sooner someone pulls the plug on it, the better. But that’s a story for another day.

For now, I’ve not yet given up on real-time feeds completely, but I fear it may be some time still before we see a workable solution on the desktop.