Sniffing out RSS

I’m sure everyone knows by now that upcoming versions of Internet Explorer and Firefox will be sniffing for RSS (and Atom) whenever you access a page that might plausibly be a feed. If this spares users from having to face an incomprehensible stream of XML, I’m all for it. Unfortunately, from what I’ve seen in recent builds,⁠1 it’s not going to work very well.

Microsoft’s implementation deliberately chooses to ignore feeds served as application/rdf+xml (the recommended media type for RSS 1.0 feeds). Their reasoning? There are many RDF files that are not related to feeds. Yeah. That’s why you are supposed to be sniffing them.

Their second problem is that they can’t handle UTF-16, one of only two encodings that XML processors are required to support. UTF-16 may not be commonly used by English speakers, but it’s often the most sensible choice for other languages (character sets in the upper Unicode blocks are a lot more efficiently encoded in UTF-16 than UTF-8). But I guess if you’re a foreigner you don’t matter to them.

Even when Internet Explorer encounters a stream that it is willing to sniff, it still doesn’t do a very good job. Obviously nobody told them that simple string searching on an XML stream is not going to cut it. Anything out of the ordinary will easily confuse the browser. Inevitably there are a lot of misrecognised feeds.

Is Firefox any better? At the moment, it seems not. Their code is supposedly based on the same heuristic as Internet Explorer, so obviously they share many of the same flaws. However, Firefox is still in alpha, so with any luck the developers may be persuaded to improve on their current implementation before the final release. I’m not going to be holding my breath though.


  1. Internet Explorer build 7.0.5346.5 and Firefox 2.0a2.
  2. This feed was successfully detected by Firefox but not by Internet Explorer.
  3. The RSS 1.0 spec advises document creators to consider “rdf:” normative, but any valid namespace prefix may be used.
  4. Yes, this is a contrived example, but it’s still technically valid.