06.13.05

Tags: Can they be more?

Posted in Tags at 6:39 pm by Todd

I spent some time today looking at tags. In comparison with formal ontologies, tags are really appealing because there’s no initial hump to get over before you can start using them. (Some have pointed out that even the low barrier to entry posed by tags is too high for many: a large fraction of photos uploaded on Flickr aren’t tagged at all.)

Of course, this simplification comes at a cost. Because there’s no ontology, semantics may emerge but they’re not explicit. So, in general, one user’s precise meaning in attaching a tag may be at odds with the meaning attached by another user. This lack of crispness simply comes with the territory, and introducing ontologies into this arena would solve the problem only by destroying the attraction of tags: their simplicity.

There’s been a lot of discussion about tag clouds, but frankly, other than the amusing visual representations, I don’t see much utility. I wonder, though, whether there’s a more interesting co-occurence statistical analysis that can be done on tags that might reveal useful semantics. It seems like it’d be possible to use Delicious’ API to pull out a bunch of tags to do some analysis.

This post raises an interesting question regarding users’ mental models for tags: is a tag a folder? Or is a tag an annotation? The author posits that Delicious encourages users to think of tags as being the former, while Flickr encourages the latter. He cites a single piece of data from the way his own blog had been tagged by users to suggest that Flickr’s “annotative” model is winning the battle for mindshare.

This is significant because Flickr’s annotative model supports a potential extension for tags that isn’t so well supported by Delicious’ “categorization” model. Namely, I wonder what might happen if we extended the definition of tags so that they are actually user-defined attributes which can have user-defined values. Suppose, for example, that you could tag something as “age: 30″ — would people actually do that, and what kinds of additional expressive power would be unleashed? What kinds of search and aggregation would be enabled?

06.07.05

Writing Microformat Parsers

Posted in Microformats at 2:16 pm by Todd

The embedded microformat example from my previous post got me thinking about different approaches to writing a parser to consume microformats.

A month ago, there were some comments about different parsing approaches here. In thinking about the embedded example above, it’s clear that different approaches can lead to different results (beyond, of course, differences in performance).

You might, for instance, take the approach of using a regular expression to identify a node that indicates the start of a target microformat as Tantek suggested. Once you’ve identified an XML node that contains a microformat, you still have a couple of options.

In one approach, you simply query against the subtree rooted by the identified start node looking for values that are interesting. This is simple code to write but potentially gets you the wrong result — in the embedded example above, for instance, your query might return the wrong “url.”

The other approach, which is a bit more complex because it requires implementing a state machine, actually traverses the subtree looking for interesting bits. If it gets to something it doesn’t understand (or, more likely, something it isn’t interested in), it keeps going. If it does find something that it’s interested in, it dives in to extract the relevant value.

This is where there’s a subtle interaction with the embedding example cited above. If the hCard embedded in the hCalendar microformat is bound to a known property of hCalendar (in the example, hCard is bound to hCalendar:location), then my parser will probably not get confused about the hCard:url property because it has enough state to know that it’s processing a known hCalendar property. Thus my hCalendar parser doesn’t really have to know much of anything about hCard, which is a bit of a relief.

If, however, the hCard is not bound to any of hCalendar’s properties — it’s merely inside it but not explicitly embedded in some known property of hCalendar — then I’ve got a potential problem. Either I have to know about hCard’s definition or I’m going to misinterpret hCard’s url as an hCalendar url.

But I wonder: why would someone embed an hCard inside an hCalendar without binding it (i.e., embedding) inside one of hCalendar’s properties? What would such embedding mean? If there’s no real reason to do this (because it doesn’t really mean anything) then the problem fairly evaporates, I think.

Is this right?

Tags:

06.06.05

Head-Spinnin’ on MicroFormats

Posted in Microformats at 11:50 am by Todd

[Update: 6/6/2005]Corrected some syntactic mistakes in the embedded microformat example. Apologies to any who got caught in the crossfire.

Okay, discussion on my most recent post about MicroFormats has left me dazed and confused. Do I need to adjust my medication? Probably. But I list below the following points of bewilderment. Somebody, please help!

  • Tantek seems to be saying that this discussion is the misguided theoretical contemplation of microformat-bashing naysayers. I find this perplexing as everybody (your author humbly excluded pending the final adjustment of his meds; see above) involved in the discussion seems to be an intelligent, fairly ardent supporter of microformats trying to understand how to build systems around them today. As near as I can tell, we’re all in this dicussion to see how to clear the path for more rapid adoption of microformats. Part of that involves looking at areas where there might be stumbling blocks, not to highlight them as reasons not to proceed but to understand them as areas that require caution, and perhaps invent solutions. None of this strikes me as antithetical to progress, but perhaps I’m missing something (is it 1 red pill and 2 blue ones, or 2 red ones and 1 blue one? Aaargh.).
  • Brian further explicates some of the issues regarding “url” appearing as a property in both hCalendar and hCard. But I am again confused when he writes “I agree that the URL in vCard IS the same URL in iCal.” I just realized that part of my problem is that I’m not sure I understand the theoretical example we’ve been discussing, where an hCard is embedded inside an hCalendar instance. Are we assuming that the hCard encapsulates information about the location of the event (the famous Argent Hotel in San Francisco in the canonical example)? Or is it subject of the event (the Web 2.0 Conference)? I’ve been assuming that it’s the former, but I suppose it could be either. At any rate, there are (at least) two potential url’s (one for the hotel and one for the conference), so it’s not clear to me what Brian means when he says they’re the same. I’m biting the bullet and writing an example to illustrate.
    
    
     
      Web 2.0 Conference:
      October 5;-
      7,
     at the
      
       
        Argent Hotel, San Francisco, CA
        
       
      
     
    
    

    I have a sneaking suspicion that the example above is not structured the way that others have been thinking about it. In constructing it, I started to feel that I was on shaky ground in using an hCard to represent the hotel. Is that inappropriate? The Technorati Wiki refers to hCard as a representation for people and companies; the vcard spec says it’s for representing a “white-pages person object.” I’m assuming that, despite the apparent vcard limitation of scope, that people will/do use vcard’s for contact information for companies in addition to people.

    Anyway, I’m hoping that in light of the example above, Brian will help me to understand what he means when he says that url in hCalendar and url in hCard are the same.

Finally, I wanted to thank Ryan for pointing me to Douglas Clifton’s DRX. Although I haven’t been able to see it yet (site appears to be down), it’s always helpful to see how others are using/intepreting these ideas. And props to Brian for pointing me at the various brainstorming pages — somehow I had missed those.

Tags:

06.03.05

MicroFormats Continued

Posted in Microformats at 3:36 pm by Todd

In response to my previous inflammatory post, I got a pair of good comments from Ryan King and Brian Suda. To recap, we were focusing on two areas where microformats might run into difficulty: inability to perform validation against a machine-readable profile, and namespace collisions.

Ryan suggests that microformat authoring applets are a good way to mitigate problems that might otherwise crop up due to lack of validation. If microformats aren’t being coded by hand, they’re more likely to be valid. Anybody following this debate has probably already seen these.

Brian points out that, even within the current handful of microformats, collisions in the property namespace are already a real problem. He shows that because hCard and hCalendar both use “url” as a property name (albeit in similar ways), someone parsing an hCalendar that happens to contain an embedded hCard inside is liable to misinterpret hCard’s “url” as belonging to hCalendar. Is there a way around this? Can I write a parser for a particular kind of microformat such that it can handle other embedded microformats that I’ve never seen before without choking if there’s a namespace collision?

Tags:

06.02.05

Peer Production and Structure

Posted in Uncategorized at 1:11 pm by Todd

Spurred by Chao’s recent post, I rolled up my sleeves and tucked into Coase’s Penguin, a discourse on the emergence of Commons-Based Peer Production. CP was authored several years ago, but I’d never seen it before, and I have to confess that it was really rather mind-blowing. I had a dozen “aha!” moments, as CP really captures some core shifts in paradigm, changes that I’d seen and, perhaps, understood only intuitively. CP really helped to coalesce some my thinking.
Read the rest of this entry »

06.01.05

MicroFormats: What’s their problem?

Posted in Microformats, Semantics at 4:06 pm by Todd

[Update 6/2/2005: Added tags]

In response to my previous post, Brian Suda provided some valuable commentary on the limitations of MicroFormats, especially as compared with RDF. Some of the discussion between us happened offline via email, but with Brian’s permission I am paraphrasing and summarizing here to get the discussion back into the public domain.

The purpose of this analysis was to gain an understanding of contexts in which using a MicroFormat might be successful as an easy-to-author, good-enough representation of structured data, and, in the same vein, understand situations in which using a MicroFormat would be an invitation to semantic disaster. The description of MicroFormats provides limited guidance here. Thus we begin probing the soft underbelly of MicroFormats:
Read the rest of this entry »

05.31.05

Semantic Web: Heated Debate

Posted in Uncategorized at 3:47 pm by Todd

Another provactive post on the value of the Semantic Web compared to the semantic web has spurred some more debate at Danny Ayer’s blog.

What’s clear from the discussion is that there’s still fairly serious disagreement about what the thrust of the Semantic Web is, as well as the practical limitations of its implementation via RDF/XML.

What I’d like to understand: what are the practical limitations of using MicroFormats? What do you lose by taking an approach that is based on MicroFormats instead of RDF? And if GRDDL provides a data migration path from MF to RDF, do those limitations really carry any weight?

05.27.05

Semantic Web: A Critique

Posted in General at 8:54 am by Todd

I just ran across this gem, which somewhat mercilessly shreds the idea of the Semantic Web as hopelessly at odds with the complexities of the world it is trying to represent. It concludes with this bit:

Much of the proposed value of the Semantic Web is coming, but it is not coming because of the Semantic Web. The amount of meta-data we generate is increasing dramatically, and it is being exposed for consumption by machines as well as, or instead of, people. But it is being designed a bit at a time, out of self-interest and without regard for global ontology. It is also being adopted piecemeal, and it will bring with it with all the incompatibilities and complexities that implies. There are significant disadvantages to this process relative to the shining vision of the Semantic Web, but the big advantage of this bottom-up design and adoption is that it is actually working now.

Are MicroFormats the kind of “piecemeal” effort described above? I think so.

Stanford Search on TAP

Posted in General at 8:26 am by Todd

Slashdot this morning has an article on a project at Stanford’s Knowledge Systems Lab called Search on TAP, which gives an idea of what search could be like on the Semantic Web.

The site has been dropped to its knees under the Slashdot effect, but I hope to be able to take it for a test spin later today.

05.26.05

Structure on the Web: A Survey

Posted in General at 11:49 pm by Todd

Okay, having made a long-winded setup in previous posts, I want to delve into the real substance of the matter. If you accept the idea that adding structure (read: semantics) to content on the web will open up grand new possibilities, making content more accessible and useful, the question is: what’s the best approach? What follows is a brief survey of the various alternatives currently under development and discussion.
Read the rest of this entry »

« Previous entries · Next entries »