I’ve been thinking about presentation info in RSS feeds recently, thanks to some of the ugly abuses of CSS on Advogato, and validation issues on planet gnome. So it’s good to see some discussion out there about the same problems.
The problem is that the description element contains encoded gumpf. You can’t even say it contains encoded html – there’s no way to validate the gumpf without decoding and parsing it. Just gumpf. And even if it were a block of straight, unencoded XHTML (which would be better for just about everybody, except perhaps the few users who write their own markup in blog entries), you’d still have CSS problems through style attributes.
A fascist solution, stripping unacceptable tags (which would have to include img tags) and some attributes (even style on a p tag is going to be a PITA), would make planet gnome suck a fair bit. For instance, the screenshots in Nat‘s Dashboard blog are absolutely necessary. You couldn’t strip those without harming the content. They are content. 88MPH includes little images on entries as highlights and jokes, but occasionally they’re also important to the content.
Luke Stroven, the quiet but incredible contributor behind gnomedesktop.org, recently added category icons to the backend2.php feed, which show up very nicely when aggregated. He prodded me to add hacker head icons to the other feeds, but due to an inflexibility in the aggregation software, I have to fudge spacer images for the feeds that don’t have associated icons. This makes the FootNotes feed look a bit odd (the image is stuck out a bit by the spacer).
Do I strip images with align attributes? Or maybe the ones with text-align styles? Should I just strip all of them out? Mail me if you have any bright ideas.
Maybe we shouldn’t be syndicating encoded snippets of HTML at all.
