Obfuscated Atom

Early one morning, while staggering home after a night of heavy drinking, I came up with the wild idea of a feed, constructed almost entirely from PE references within an XML DTD. The result would be a valid feed, yet would look nothing like a feed. It was some time before I got a chance to put my my little plan into action, but that was the moment when the idea was first conceived.

I’ll spare you all the boring details of the construction, and just say that the final result ended up looking something like this:

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE feed [
<!ENTITY a 'http://www.w3.org/2005/Atom'>
<!ENTITY b '&e;&f;&g;&h;&i;'>
<!ENTITY c 'tag:xn--8ws00zhy3a.com,2006-05-04:/tests/atom/obfuscated/'>
<!ENTITY d 'http://www.xn--8ws00zhy3a.com/tests/atom/obfuscated/'>
<!ENTITY e '<title>Obfuscated Atom</title>
  <updated>2007-10-19T00:00:00+00:00</updated>
  <id>&c;</id><link rel="alternate" href="&d;"/>
  <link rel="self" href="&d;1_4.atom"/>
  <author><name>James Holderness</name></author>'>
<!ENTITY % a '<entry><title>This is title #'>
<!ENTITY % b '</title><updated>2007-10-18T23:'>
<!ENTITY % c ':00+00:00</updated><id>&c;'>
<!ENTITY % d '</id><link href="&d;'>
<!ENTITY % e '.html"/><summary type="html">
  This is the &lt;code&gt;summary&lt;/code&gt; for entry number '>
<!ENTITY % f '.</summary></entry>'>
<!ENTITY % g '<!ENTITY f "&#37;a;1&#37;b;59&#37;c;1&#37;d;1&#37;e;1&#37;f;">'>
<!ENTITY % h '<!ENTITY g "&#37;a;2&#37;b;58&#37;c;2&#37;d;2&#37;e;2&#37;f;">'>
<!ENTITY % i '<!ENTITY h "&#37;a;3&#37;b;57&#37;c;3&#37;d;3&#37;e;3&#37;f;">'>
<!ENTITY % j '<!ENTITY i "&#37;a;4&#37;b;56&#37;c;4&#37;d;4&#37;e;4&#37;f;">'>
%g;%h;%i;%j;
]>
<feed xmlns="&a;">&b;</feed>

Of course this is a contrived example, with a lot of repeated content, but the techniques employed could just as easily be applied to a real feed (although the compression wouldn’t be nearly as effective).

If you’re curious whether your feed reader can handle something that ridiculous, try subscribing to this URL. There’s also a text/xml version of the feed, which should produce a neat view of the expanded XML (at least from within Firefox).

Surprisingly, there were a number of feed readers that had no problems with the feed: Bloglines, the Firefox feed preview and Snarfer being just some examples. The XML Syntax Checker at XML.com considered it well-formed. The Feed Validator considered it valid. However, the parser in Internet Explorer was not at all happy with the XML, returning the error: “Parameter entities cannot be used inside markup declarations in an internal subset”.

Now that isn’t a big deal as far as the IE feed reader is concerned, since it wouldn’t support feeds containing a DTD anyway. However, it is a problem for other Windows-based feed readers which use the same XML parser (RSS Bandit being one example, if I’m not mistaken).

The question is, who is correct? Should IE’s XML parser be capable of processing that DTD, or should the XML.com Syntax Checker and the Feed Validator have been returning an error?

Any XML experts out there with an opinion on this?