HTML5 and Atom Gone Wrong

June 2nd, 2010

This week the W3C HTML WG finally made the sensible decision to remove the Atom conversion algorithm from HTML5. It should never have been included in the specification in the first place, so I genuinely am grateful to those that fought to have it removed.

That doesn’t mean I can’t still have a good laugh at how truly awful the algorithm was now that it’s gone. So with that in mind, I’m proud to present:

The HTML5 Atom Spot-the-Difference Competition

Of the two feeds below, one is valid Atom, and the other a bastardisation of Atom as might be produced by the HTML5 algorithm. For those of you that have never played this sort of game before, the idea is to identify the nine differences between the two feeds.

A feed produce by the HTML5 conversion algorithm

I’ve presented the feeds as images to make things a little more challenging and to discourage the use of the feed validator or file comparison tools. However, if you really need it, the source XML can be obtained by following the longdesc links on each image.

At least be grateful that I’m not asking you to read through the pages and pages of incomprehensible HTML5 pseudocode that I had to endure.

The Prize

The first correct set of answers drawn wins a pay-your-own-way trip to the sunny city of London, England, and a chance to meet yours truly in some dodgy London nightclub.

Prize doesn’t include cost of travel, accommodation, entrance to dodgy nightclub, or anything at all. Chances of actually meeting me are virtually nil. Entries must be received by midnight of April 1st, 2010. Entrants must be over 18 years of age and legal residents of the planet of Thundera.

Answers

The URLs for atom:link elements should be stored in the href attribute and not in the link content.
The values used for atom:id should be both stable and unique. Using a copy of the permalink meets neither requirement.
Stripping the markup from atom:title elements has resulted in one title changing its meaning entirely.
The type attribute on atom:content elements should be xhtml for XHTML content, and not xml.
The XHTML div element that is an immediate child of atom:content is not correctly namespaced.
The dates in the atom:published elements are incorrectly formatted and in the wrong timezone.
The atom:updated elements are merely duplicates of the atom:published elements, failing to detect the correct update times.
Without an xml:base attribute, relative URLs inside the atom:content elements will not be correctly resolved.
The atom:author elements are missing altogether since the algorithm is only capable of recognising feed-level authors, at best.