« Yahoo! Shortcuts: Find It Fast | Main | Announcing Media RSS 1.0 »

May 16, 2005

Tag Soup du Jour From YSDN

Ever since the Yahoo! Search Developer Network (YSDN) launched a few months ago, we've been keeping an eye out for trinkets, toys, and hacks that folks are building with our publicly accessible application programming interfaces (APIs). Open APIs make it possible for developers to create new applications based on Yahoo! Search content (types of data) and services (ways of sharing data).

Developers are busy writing plug-ins, smart utilities, Flash-based widgets, and a host of cool apps in their favorite web programming languages, using Yahoo!'s open web services, which include several flavors of web search as well as local search, image search, news search, and more. You'll find many of these interesting applications listed on the YSDN Wiki.

John Herren's Yahoo! News Tag Soup was definitely this past week's hot app on campus. Tag Soup is inspired by the growing popularity of tags for describing information. Herren's application riffs on rising interest in folksonomy or social tagging: a collaborative, non-hierarchical way to organize and display information by assigning freely chosen keywords to web sites, photos and digital images, blog posts, URLs, you name it. A self-described PHP fan, Herren spent a few hours over a recent weekend coding Tag Soup because he "thought it would be fun to see what happens when you automate the [tagging] process."

One notable by-product of tagging is a new mode of displaying weighted lists of keywords. This visual representation, referred to as a tag map or tag cloud is turning up in many unexpected places. A tag cloud shows importance or frequency of word occurrences by font size and/or bolding. Usually tags are displayed in alphabetical order, sprawling fluidly across a squarish chunk of the page. Perhaps you've seen this horizontal list metaphor on Flickr, Technorati, or 43 Things.

For his Tag Soup recipe, Herren uses Yahoo!'s content analysis web service to extract significant words and phrases. (This type of term extraction also powers Y!Q, our contextual "embedded" search. Y!Q analyzes the content of the web page you're on or the text you select to provide results "at the point of inspiration.")

Next, Herren grabs a collection of Yahoo! News RSS feeds, massages them into a database to eliminate duplicate stories, and extracts the key ingredient (his tags) from the article headlines and summaries. Finally, he uses a simple scaling function to display the most popular and frequently occurring terms. Last time we looked, President Bush was far and away the biggest and boldest tag, followed by: United States, In Iraq, United Nations, and Microsoft. Go figure.

When tagging is practiced by a community of users in a social context--not just as a personal system for labeling information--network effects begin to take place. This is part of the enchantment of Flickr, where inventive tags like squaredcircle, longline, and lenstagged acquire a life and momentum of their own. Tags become a medium of ideas, connecting people, starting conversations, and transmitting ideas. This is already happening with Herren's Tag Soup.

While writing this post, we spotted Justin Flavin's tag hacks, inspired by Herren's work and the launch of BBC Backstage. (The Backstage motto is: Use our stuff to build your stuff.) Flavin, an Irish developer based in the UK, is fetching business and tech news RSS feeds from multiple news sources, and running them through Yahoo!'s content analysis (term extraction) service to generate his keywords. He's working on a similar hack for entertainment news.

John Herren describes his project as "a proof of concept to show the kind of cool stuff that can happen when folks, in this case Yahoo! opens up content and services for a developer to use." Justin Flavin's pages prove that cool stuff is catchy.

Meantime, we're keeping the Yahoo! News Tag Soup concept on a low simmer and giving it an occasional stir, because we believe its flavors can only improve over time and nourish other promising innovations. What do you think?

Havi Hoffman
Yahoo! Editorial

Comments

Jonas Luster also had an experiment with Yahoo Term Extraction. Of course I didn't think it was a good idea.

I've set up something similar at http://feeds.karlus.net/soup.php

It's based on an experimental site(http://feeds.karlus.net/) that tries to create feeds for some Portuguese news sites. It only gathers the headlines, not the full article.

One thing i noticed, using the Yahoo! Search API is that Yahoo! 'content analysis web service' doesn't handle very well Portuguese 'stop words'. They show up on the results very often... or am i missing someting ? Right now i'm filtering them myself in the code.

Yeah, it probably is a language issue. I think we'll be rolling out better support for some non-english languages before too long.

Search Tabs on Yahoo frontpage are no longer displayed in new version of Safari 2.0

Sounds like this tag soup will be very useful once is up and runnning.

Will this product tagging be applied to Yahoo shopping ?