PubSubHubbub Security Concerns

For those of you that don’t know, PubSubHubbub is a protocol for facilitating the real-time distribution of Atom and RSS feeds. In some ways it could be considered a competitor to the RSS Cloud protocol which I wrote about last month; although, in my opinion, they really serve quite different roles.

As of right now, the PubSubHubbub protocol has no mention of security anywhere in the specification.⁠1 As a point of comparison, if it were an IETF document, having a section on security considerations would be a requirement for publication.

Whether any of the attacks outlined in this post are genuine threats or not isn’t really of much consequence. I’d just like to see some sign that people are at least considering the potential for such attacks.

Fake publishing

Let’s start with publishing. When you want to let a hub know that your feed has been updated, you send it a New Content Notification. The hub should then fetch your feed to see what has changed, so it can pass on those changes to any subscribers. How does it know where to find your feed? You tell it the URL to use – any URL you want – and it will blindly trust you.

So what happens when an attacker lies, and gives the hub a URL for someone else’s feed? Then tells the hub that said feed is updating constantly – like say a thousand times a second? It requires very little bandwidth to launch such an attack (especially if the hub supports persistent connections), but can result in a substantial load on the victim’s server.

Now in an ideal world, both the hub and the feed publisher would support ETag and Last-Modified headers, which would essentially nullify any real damage from this sort of attack. However, there’s an easy way for an attacker to bypass that defence – just use a URL that isn’t actually a feed.

How do you recognise a feed?

Typically a hub won’t be sure of the kind of file it is downloading until it has actually finished downloading it. As a result, there’s really nothing stopping an attacker from “publishing” say an image, or a movie, or a multi-gigabyte ISO if they were lucky enough to find such a thing on the target server.

Not only does this mean that the victim will have to deal with a much higher load, but because the file isn’t a feed, the hub most likely won’t cache it, and won’t store ETags and Last-Modified headers. That means the next time the attacker “publishes” that URL (say a couple of milliseconds later), the hub will be forced to download the entire file all over again.

There are ways of constraining this kind of attack, say by limiting the rate at which a hub accepts new content notifications. But the attacker could then just use multiple hubs to achieve the same effect – it would require very little extra effort on their part.

Fake subscriptions

Another potential avenue of attack, involves tricking a server into subscribing to a bunch of high-traffic feeds. This then forces the hub (or hubs) to push a load of unwanted content at the target server. The server doesn’t even need to be a feed subscriber of any sort – as long as it’s listening for HTTP connections it’s a potential target.

Now the PubSubHubbub subscription mechanism does have a form of authentication to verify that the subscriber did genuinely request a subscription, but that’s fairly easy for an attacker to work around.

All they have to do is setup a temporary domain name pointing to their own server, and subscribe to a bunch of feeds as if they were a regular subscriber. Once the verification process was complete, and the hub was satisfied that the subscriptions were genuine, the attacker could then reconfigure their DNS to redirect the flood of updates to their victim’s IP address.

Now because the DNS entry is faked, the host headers in the update will obviously not match the target server. Thus, depending on how the server is configured, and the kind of web framework they are running, it’s possible this attack won’t have much effect. However, I’ve done enough testing to know that at least some server configurations would be vulnerable.

Controlling the feed source

At the end of the day, though, fake subscriptions are a fairly limited form of attack. Since data is only pushed to the victim’s server when a feed actually updates, the attacker would likely have to subscribe to hundreds of thousands of feeds to have any noticeable impact; and even then it wouldn’t be a particularly reliable attack vector.

So let’s consider a different approach. What if the attacker were providing the feed themselves? Since they have complete control over the source, they could make sure that the feed was different every time it was refreshed. And of course they could send new content notifications to the hub as frequently as they liked.

You might think that the attacker would then suffer the same bandwidth hit as the victim’s computer, but that’s not necessarily true. If the hub supports XML entity expansion (and some implementations clearly do), the attacker could easily create a feed that is minuscule on their end, but would expand considerably when pushed to the victim’s server.

Similar results can be achieved with HTTP compression. If the hub supports compressed downloads (which is highly probable), the attacker can serve up a tiny gzipped feed, that would expand into something much more massive.

Duplicate subscriptions

While the hub and target server may have mechanisms in place to limit the amount of data that can be POSTed, there is another way an attacker can magnify the attack. The trick is to subscribe multiple times – perhaps thousands of times – using a different callback URL each time; say with a random path or query string.

From the point of view of the hub, each subscription is unique, so while the attacker’s source feed is only downloaded once, the victim’s computer would be hit with thousands of notification POSTs each time there was an update.

What’s worse, since these POSTs would likely result in an error response, the hub is encouraged by the specification to retry the notification attempt, repeatedly. It’s almost as if the protocol has been designed to cause as much damage as possible.

Unintentional attacks

So far we’ve been considering deliberate attacks against specific targets, but there is also potential for unintentional (or at least undirected) attacks.

Consider a user on a home computer running some form of PubSubHubbub client software. If their ISP assigns them a dynamic IP address, as is often the case, and their connection to the Internet is disconnected, what happens to all of their subscriptions?

Sooner or later that ISP is going to recycle the IP address and assign it to another user, and that lucky user could find themselves inundated with incoming notifications that were never requested. If they also happen to be running a PubSubHubbub client, listening on the same port (not unlikely given the limited port support⁠2), this could cause them considerable grief.

While this scenario may not be likely, it has the potential to be quite damaging if it does occur. Home users often have limited bandwidth, and in some cases may be charged considerable fees for exceeding their download quota. They may not even realise anything is wrong until they find themselves with a massive bill at the end of the month.


It seems to me that the PubSubHubbub protocol is potentially quite dangerous. However, it’s equally possible that I’m wrong, the above-mentioned attacks are no threat at all, and my concerns are unfounded.

Either way, though, I think the specification is flawed in not having a section on security considerations. If nothing else, I hope that at least is something that the authors will address.


  1. Yes, I realise the word “security” does actually appear in the specification, but that’s just an attempt to justify the shortcomings of the GAE servers on which the PubSubHubbub reference implementation is run. It has little to do with the security of the protocol itself.
  2. Google’s reference implementation only supports around 16 ports.