06.18.05

Priming the Microformat Aggregation Pump

Posted in Microformats at 12:28 pm by Todd

The wonderful, liberating thing about microformats is that anyone, anywhere can author an instance. Whether you create by hand or use one of the ever-expanding set of tools, the fact that microformats are XHTML means that you can slip them — typically unnoticed by their host — into any system that accepts HTML. Planning an event? Simply drop an hCalendar instance into your blog, and voila!

Voila? Normally, that expression is accompanied by a rabit being pulled out of a hat. It suggests that something magic and satisfying is going to happen. Having gone to the slight extra trouble of describing my event using hCalendar, what further benefit do I derive?

Well, there is at least one “voila!” effect that you can enjoy from your hCalendar instance: someone reading your blog post where you have encoded an event using hCalendar can use Brian Suda’s cool X2V to transform that hCalendar item into an iCalendar item, and thus get it into your desktop calendar. Eric Meyer shows how.

That’s a cool private benefit, but I want more. And I believe there need to be more demonstrable, immediate, and compelling benefits before the virtuous cycle driving adoption of microformats becomes self-sustaining. We have yet to see something compelling for microformats like hCalendar, hCard, and hReview.

On the other hand, we have seen a compelling application for another microformat: relTags. By adding a little bit of markup to my blog post, my post appears on Technorati’s search results pages and tag pages in such a way that it’s much more likely to be seen by my targeted audience. I get more people reading my post, I get more comments, and that, dear reader, is worthy of a “voila!”

The key difference between these two examples is aggregation. Microformats allow the stealthy, distributed, deployment of semantically rich data, but there’s not much value to using them until someone is aggregating them, collecting them from the far reaches of the web into a large pile that can be searched, categorized, etc. As a consumer, I don’t want to have to go looking in a thousand blogs to find an hCalendar event that I can add to my desktop calendar.

But therein lies the rub. If you want to develop an application that aggregates a certain type of microformat from blog postings, you’re going to have to scan every blog posting. The vast majority of posts won’t contain what you’re looking for. And that would be fine if there weren’t so many posts to scan, with their number increasing exponentially. What kind of hardware and bandwidth do you need today to scan all new blog posts? I don’t know the answer to that question, but I think it’s safe to assume that if it’s reasonable now, it won’t be in 6 months. And the real kicker is that if you were to go looking for hCalendar events, today you’d likely find only dozens — if that many. What’s your cost per item aggregated? Much too high, I’d wager.

Andy Baio, the creator of upcoming.org, has already set up his service to produce hCalendar marked-up events based on information that users have manually entered into his database. Even though there’s no compelling reason for him to have done so (other than the private benefit described above), it was easy to do, so he did it. On the other hand, aggegating hCalendar events, which would be much more valuable, is something he’s waiting to do until hCalendar becomes more widely used. Why? Because it’s too hard and too expensive. But imagine if it were otherwise, and Andy added aggregation as an additional means for getting content into upcoming.org: I drop an hCalendar event into my blog and it shows up, minutes later, as an event in upcoming.org. Now that would be worthy of a “voila!” — exactly the kind of benefit that’s needed to drive adoption.

So the question I pose is this: in the face of rising number of blog posts, how do we reduce the cost of aggregation of microformats to enable more services to aggregate? One thought is this: deploy a service tied into ping-o-matic that scans all new blog postings looking for microformats of a variety of types. When it finds a post containing one of those microformats, it turns around and pings a list of clients who are interested in that microformat type. So, for example, I drop an hCalendar event in my blog, my blog pings ping-o-matic, which in turn pings the “Microformat Router,” which scans the content, sees that it contains an hCalendar, and then pings upcoming.org, which retrieves the content and creates a new entry in upcoming.org’s database of event listings. Voila!

Now, you’re probably wondering how this reduces the cost of aggregating content. True, someone still has to scan through all the content looking for microformats. But that only has to happen once; each new client application adds only marginal cost. Further, this is something that could easily be deployed by a company that’s already scanning through all new blog postings. Don’t make me name names.

Alternatively, this could be set up as an independent service, much like ping-o-matic, serving the common good. Licensing terms could be established for client companies that successfully make use of the aggregation service to subsidize its cost of operation. Regardless, the efficiency gains that would result would be eventually recoupable somehow, and the availability of the service would really allow Microformats to deliver on their ultimate promise of making content more useful and discoverable.

And that, too, would be worthy of a “Voila!” Any takers?

Tags:

2 Comments »

  1. brian said,

    June 19, 2005 at 4:31 pm

    i completely agree, imagine if you could go to google and type something like;

    postal-code:123456 dtstart:20050505 tech

    this could return results or an RSS feed of all events in zip code 123456 starting the 5th of may 2005 with the keyword “tech”

    you could have a social calendar aggrigated from 8 billion pages!

  2. Todd said,

    June 19, 2005 at 4:43 pm

    Brian,

    Yeah, I think the possibilities really start to open up once you start aggressive aggregation (learn to love the alliteration).

    Upcoming.org already provides most of what you describe (events by area, live search results, RSS). But the utility of the service is limited, IMHO, because it requires someone to manually key in the details of the event. That’s a deal-killer, in my book.

    Set it up so I can just mark up an event on my blog and you’re in a whole different ballgame.

Leave a Comment