Yahoo! Chats with Semantic Web Expert, Ben Adida

  • Posted June 24th, 2008 at 8:30 am by Yahoo! Search
  • Categories: Interviews

Yahoo!’s plans to “open up” really started circulating at the beginning of this year. Not long after, Yahoo! Search announced its plans to support semantic mark-ups, specifically our crawler support for markups like RDFa and eRDF, as well as provided a glimpse into our open approach to search.

As Yahoo! prepares to support standards, like RDFa for example, we’ve continued to work closely with the best and brightest in the semantic markup community. We were thrilled to have Ben Adida visit the Sunnyvale campus a few weeks ago. Ben is a member of the Faculty at Harvard Medical School and at the Children’s Hospital Informatics Program, as well as a research fellow with the Center for Research on Computation and Society with the Harvard School of Engineering and Applied Sciences. He is also the Creative Commons representative to the W3C and chair of the RDF-in-HTML task force, focusing on bridging the semantic and clickable webs.

Ben was kind enough to submit himself to a barrage of questions on RDFa, its development and the opportunities it provides. Take a look and feel free to drop questions you have in the comments. We’ll do our best to cycle them through to Ben.

Lawrence Kim, Yahoo! Search &
Peter Mika, Yahoo! Research

Yahoo! (Y!): RDFa has been long in the making… is it ready now?
Ben Adida (BA): Indeed it has been long in the making, and for good reason. We had to make sure we didn’t step on other specifications’ toes, that we respected existing design and uses of HTML, that we enabled the expression of enough flexible data to be useful in a number of current and future use cases, and that we had a valid processing model with test cases to help implementors.

We have all of that now. So yes, RDFa is ready. It has just been approved by the W3C as a Candidate Recommendation, with the specific text of the specification and a brand new Primer published on June 20th.

Y!: What can I do with RDFa?
BA: You can tell the world what various components on your web page mean by marking up things like:

  • The title of a photo
  • Your name and contact information
  • The license under which you’re distributing your latest MP3
  • The ingredients of a cooking recipe
  • The price of an item
  • A gene on which you recently wrote a paper
  • … Anything that you want to make more machine-readable

With RDFa, you can reuse existing concepts, e.g. the title and price of an item, no matter what that item is. If there’s a field you need that doesn’t exist, you can create it.

This level of granularity encourages you to mark up your content as fully as possible, while letting applications consume only as much of the data as it needs.

Y!: Who is supporting RDFa?
BA: Creative Commons and Digg are two early adopters of RDFa, and there are a number of smaller web publishers who have begun adding RDFa markup to their pages. We’ve also just heard that the UK National Archives are committed to adopting RDFa.

Y!: What advantages does RDFa provide compared to microformats, eRDF and AB Meta?
BA: Microformats, eRDF and RDFa share a common goal: to make it easy for HTML authors to add machine-readable tags to express the meaning of their web data. So before we get into a fight, it’s important to realize that all three share this important common goal.

Microformats work well for well-defined items, such as contact information (hCard) and calendar items (hCal). They tend to become more complicated when the data gets more varied. Fields can’t easily be shared across microformats, and all microformats must go through a centralized approval process to make sure no conflicts arise.

RDFa doesn’t have vocabulary conflicts: data fields, e.g. “title” can be reused by anyone, and there’s never any confusion as to what a given field means, since fields are, in fact, URLs. Entirely different types of data can share fields, which is exactly what applications need for extensibility. Multiple data items can be published on a single web page and, in contrast with microformats, relationships between the data items can be easily expressed.

eRDF has a similar vocabulary approach to RDFa, but it cannot express nearly as much data as RDFa. In particular, expressing relations between multiple items on a page is more complicated, and describing inline PDFs or images is not always possible. Also, eRDF is not quite as modular: vocabularies can only be imported in the HEAD of a document, so a widget-ized page would have an easier time using RDFa over eRDF.

AB Meta, which is new to me, appears to be a small subset of the intersection between RDFa and eRDF. Because it is a limited subset, it suffers a bit from the limitations of microformats: who gets to extend AB Meta? I would recommend sticking to the collaborative efforts such as RDFa and eRDF.

If you need more complete expressivity and the modularity required in a widget-ized web world, then you need RDFa.

Y!: What would you say to the critics who say that RDFa is too difficult to author?
BA: It’s a matter of taste and finding the right compromise.

In my opinion, RDFa and eRDF have similar levels of complexity as far as authors are concerned. I prefer writing RDFa, and I’m sure Ian Davis prefers writing eRDF. But I don’t think either one of us would seriously argue that one is much easier than the other.

It’s a little bit more complicated to write RDFa than it is to write microformats, but that’s not surprising given that microformats are more limited in scope, and there are notable extensibility costs to using microformats.

In general, we expect that web publishers will write RDFa in HTML templates, rather than every time they have an item to publish. Most microformat deployments work this way, too, few people write them by hand each time. So the increased complexity is negligible in the bigger picture.

Y!: Unlike microformats, RDFa depends on the availability of shared vocabularies (ontologies). Is that a problem?
BA: A number of vocabularies are already available and particularly stable: Dublin Core for documents, FOAF for people and their networks, Creative Commons for document licensing, hAudio and hVideo for online media. Then there are highly specialized vocabularies, like Uniprot and the Open Biomedical Ontologies (OBO) for the life sciences.

In my opinion, this is a huge win for RDFa. You really want vocabularies developed by experts in the appropriate field. Bio-informaticians develop vocabularies for biomedical research, musicians develop vocabularies for music, and lawyers develop vocabularies for copyright licensing.

Y!: What’s next for RDFa?
BA: For the next few months, we’re going to focus on helping publishers produce RDFa and tool builders parse it correctly. Yahoo! is playing a pivotal role in this space with SearchMonkey. We hope to see Yahoo! properties publish RDFa soon!

Y!: Where can I learn more about RDFa?
BA: Our wiki has all the relevant material:

And you should join our brand new users’ mailing list:

  • Subscribe
back to yahoo! search

subscription options

Facebook Fans

latest posts