06.01.05
MicroFormats: What’s their problem?
[Update 6/2/2005: Added tags]
In response to my previous post, Brian Suda provided some valuable commentary on the limitations of MicroFormats, especially as compared with RDF. Some of the discussion between us happened offline via email, but with Brian’s permission I am paraphrasing and summarizing here to get the discussion back into the public domain.
The purpose of this analysis was to gain an understanding of contexts in which using a MicroFormat might be successful as an easy-to-author, good-enough representation of structured data, and, in the same vein, understand situations in which using a MicroFormat would be an invitation to semantic disaster. The description of MicroFormats provides limited guidance here. Thus we begin probing the soft underbelly of MicroFormats:
You can’t validate a MicroFormat
In comparison with RDF schemas, XMDP is weak. There’s no standard way to specify datatypes for property values, nor can you specify anything about the relationship among properties. These two factors combine to make machine validation of MicroFormat content against XDMP an intractable problem. In comparison, validating RDF content against an RDF Schema is relatively straight-forward.
Not being able to validate content is a problem, although my sense is that the need to validate increases in proportion to complexity. And by their nature, MicroFormats aren’t terribly complex. Still, with no validation there’s no feedback to an author whether they’ve done the right thing. Faced with an improperly constructed MicroFormat, some aggregators will choke, while others will use heuristics or other “out of spec” algorithms to arrive at the (hopefully) intended meaning. Without care here, we potentially wind up with divergence of MicroFormats based on what popular aggregators can digest — similar to the way that the practical implementation of HTML diverged for a while depending on how IE or Netscape handled certain tags. It seems an unlikely scenario, but without an authoritative facility for validation, it is possible.
No namespaces
In contrast to RDF, MicroFormats live in a single, flat namespace. This implies limitations when you attempt to embed one MicroFormat inside another or otherwise mix ‘em up. Examples of this kind of compositionality are not too far-feteched; you could easily imagine someone writing a review of an event (hCalendar inside of hReview), or providing the contact information for the person responsible for an event (hCard inside of hCalendar). With its multiple namespaces, RDF handles this kind of mixture with aplomb, while MicroFormats get into some (potential) trouble. Additionally, since there is order significance in the specification of multiple XMDP profiles (the first one has precedence over subsequent ones), there’s another opportunity for things to go astray. (I believe, though, that this assertion is based on the idea that machines will use XMDP to do something semantically “tight” with a document, but I think by the reasoning above that we’ve already ruled out doing much useful machine-processing of XMDP other than to note its presence as an indication that there might be a MicroFormat of a certain type lurking within a document).
No Inferences
RDF is well-suited to doing inferences, for what that’s worth. MicroFormats: non.
And finally…
Here’s a question I’ve posed before that I’ll ask again because I still don’t know the answer: where are the horizontal applications of RDF that implement something that would be foolhardy to implement as a MicroFormat?
Tags: microformats rdf
ryan king said,
June 2, 2005 at 8:55 pm
You’re right in saying that microformats can’t be completely validated. Validation can be useful as feedback for authors/creators, but is not, of course, necessary. Personally, I hope we can work towards building some good microformat tools that will help people author valid, useful microformat conent (I’ve already done a couple)
Second, regarding namespaces…. I don’t think they’re necessary. You claim that nesting of microformats could lead to a dangerous situation of potential conflicts. I disagree. I think the nestablity of microformats is a great way to reuse simple building blocks. For example, in hReview, if you’re reviewing a business, the identifying data for that business should be presented as an hCard. I think nestablity is a feature, not a bug.
brian said,
June 3, 2005 at 9:31 am
I’m certainly not advocating adding namespaces to microformats, but when nesting multiple different formats there could be collisions in property names. For example, URL is in both hCard and hCalendar. Take the following code for example:
1:<html>
2:<head profile="http://path.to/hCard http://path.to/hCal">
...
10:<div class="vevent">
...
20:<div class="vcard">
21: <a href="http://example.com" class="url">link 1</a>
22:</div>
...
30:</div> <!-- end vevent -->
...
40: <a href="http://example.org" class="url">link 2</a>
...
50:</html>
The head profile attribute is defined in the W3C spec:
http://www.w3.org/TR/REC-html40/struct/global.html#h-7.4.1
(user agents should consider the value to be a list even though this specification only considers the first URI to be significant).
So in the example above, hCard profile would take precidence over the hCal. Without some scope, a problem arrises on Line 21 because the class ‘url’ appears in two profiles. If you take local scope into consideration, the class=’url’ is a child of vcard, so it is the url described by the hCard profile. Class=’url’ is also a child of vevent, so is it to be considered the url defined by the hCalendar profile? Or do you take the order of profiles in the head where hCard would have more weight. Then on Line 40, the instance of class=”url” what does that mean (if anything?).
If i extract a vCard from this example, the URL on line 21 would be included, when i extract the iCal from this example would/should the URL on line 21 be extracted as well? (URL might be a bad example, but you could easily create a situation where a term in one profile does not have the same “meaning” as the same term in a different profile. A ‘title’ on a house is not the same ‘title’ on a page)
Namespaces would certainly solve this, although i’m not advocating doing so. Another way to solve this would be to explictly define what happens in this sort of situation in the microformat specs. A third, more complex alternative, is to add another layer to XMDP and attempt to use ontologies to map that hCard:url is semantically equivalent to hCal:url.
I hope the example above illustrates how it COULD be dangerous, but also a benefit.
-brian
Information Overload » MicroFormats Continued said,
June 3, 2005 at 3:38 pm
[…] Posted by Todd on June 03rd 2005 to Microformats
In response to my previous inflammatory post, I got a pair of good comments from Ryan King […]