Just a Theory

By David E. Wheeler

Posts about Atom

Atom Sources

I’m working on a project where I aggregate entries from a slew of feeds into a single feed. The output feed will be a valid Atom feed, and of course I want to make sure that I maintain all the appropriate metadata for each entry I collect. The <source> element seems to be exactly what I need:

If an entry is copied from one feed into another feed, then the source feed’s metadata (all child elements of feed other than the entry elements) should be preserved if the source feed contains any of the child elements author, contributor, rights, or category and those child elements are not present in the source entry.

<source>
  <id>http://example.org/</id>
  <title>Fourty-Two</title>
  <updated>2003-12-13T18:30:02Z</updated>
  <rights>© 2005 Example, Inc.</rights>
</source>

That’s perfect: It allows me to keep the title, link, rights, and icon of the originating blog associated with each entry.

Except, maybe it’s the database expert in me, but I’d like to be able to have it be more normalized. My feed might have 1000 entries in it from 100 sources. Why would I want to dupe that information for every single entry from a given source? Is there now better way to do this, say to have the source data once, and to reference the source ID only for each entry? That would make for a much smaller feed, I expect, and a lot less duplication.

Is there any way to do this in an Atom feed?

Looking for the comments? Try the old layout.

Teasers Only Atom Feed

Select a feed

I’ve just added a new feed: teasers only. It makes things a log shorter for those who just want to get a teaser for each blog entry, rather than complete entries, such as Planet Perl and Planet PostgreSQL.

Any questions or problems? Leave a comment. Thanks!

Looking for the comments? Try the old layout.

More about…

Has Google Forgotten its on Tagline?

My friend Chad Dickerson, the exiting CTO of Infoworld, has blogged about a recent move by Google to patent advertising in RSS!

Incorporating targeted ads into information in a syndicated, e.g., RSS, presentation format in an automated manner is described. Syndicated material e.g., corresponding to a news feed, search results or web logs, are combined with the output of an automated ad server. An automated ad server is used to provide keyword or content based targeted ads. The ads are incorporated directly into a syndicated feed, e.g., with individual ads becoming items within a particular channel of the feed.

This despite the fact that InfoWorld was itself sending targeted ads out in is RSS feeds before Google filed for its patent! Is this another one-click debacle in the making? Does it really make any sense to patent delivering targeted ads over HTTP just because they’re in XML instead of HTML?

What do you think?

Looking for the comments? Try the old layout.