May 16, 2005

Tag Soup du Jour From YSDN

Ever since the Yahoo! Search
Developer Network
(YSDN) launched a few months ago, we’ve been keeping an
eye out for trinkets, toys, and hacks that folks are building with our
publicly accessible application programming interfaces (APIs). Open
APIs make it possible for developers to create new applications based
on Yahoo! Search content (types of data) and services (ways of sharing
data)
.

Developers are busy writing plug-ins, smart utilities, Flash-based
widgets, and a host of cool apps in their favorite web programming
languages, using Yahoo!’s open web services, which include several
flavors of web search as well as local search, image search, news
search, and more. You’ll find many of these href="http://developer.yahoo.net/wiki/index.cgi?ApplicationList">interesting
applications listed on the YSDN Wiki.

John Herren’s Yahoo! News
Tag Soup
was definitely this past week’s hot app on campus. Tag
Soup is inspired by the growing popularity of tags for describing
information. Herren’s application riffs on rising interest in href=" http://en.wikipedia.org/wiki/Folksonomy">folksonomy or
social tagging: a collaborative, non-hierarchical way to organize and
display information by assigning freely chosen keywords to web sites,
photos and digital images, blog posts, URLs, you name it. A
self-described PHP fan, Herren spent a few hours over a recent weekend
coding
Tag Soup
because he “thought it would be fun to see what happens
when you automate the [tagging] process.”

One notable by-product of tagging is a new mode of displaying
weighted lists of keywords. This visual representation, referred to as
a tag map or tag
cloud
is turning up in many href="http://www.amazon.com/gp/product/sitb-next/0385503865/ref=sbx_con/102-9557591-0066530?%5Fencoding=UTF8#concordance">unexpected
places. A tag cloud shows importance or frequency of word
occurrences by font size and/or bolding. Usually tags are displayed
in alphabetical order, sprawling fluidly across a squarish chunk of
the page. Perhaps you’ve seen this horizontal list metaphor on href="http://www.flickr.com/photos/tags/">Flickr, href="http://www.technorati.com/tags">Technorati, or href="http://www.43things.com/">43 Things.

For his Tag Soup recipe, Herren uses Yahoo!’s content
analysis web service
to extract significant words and phrases. (This type of term extraction also powers Y!Q, our contextual “embedded” search. Y!Q analyzes the content of the web page you’re on or the text you select to provide results “at the point of inspiration.”)

Next, Herren grabs a collection of Yahoo! News RSS feeds, massages them into a database to eliminate duplicate stories, and extracts the key ingredient (his tags) from the article headlines and summaries. Finally, he uses a simple scaling function to display the most popular and frequently
occurring terms. Last time we looked, President
Bush
was far and away the biggest and boldest tag, followed by:
United States, In Iraq, United Nations, and Microsoft. Go figure.

When tagging is practiced by a community of users in a social
context–not just as a personal system for labeling information– href="http://en.wikipedia.org/wiki/Network_effect">network effects
begin to take place. This is part of the enchantment of Flickr, where
inventive tags like href="http://www.flickr.com/photos/tags/squaredcircle/">squaredcircle,
longline, and href="http://www.flickr.com/photos/tags/lenstagged/">lenstagged
acquire a life and momentum of their own. Tags become a medium of
ideas, connecting people, starting conversations, and transmitting
ideas. This is already happening with Herren’s Tag Soup.

While writing this post, we spotted href="http://www.justinflavin.com/item/18/tagsoup/tagsoup/tagsoup.php">Justin
Flavin’s tag hacks, inspired by Herren’s work and the launch of href=" http://backstage.bbc.co.uk/">BBC Backstage.

(The Backstage motto is: Use our stuff to build
your stuff.)
Flavin, an Irish developer based in
the UK, is fetching business and tech news RSS feeds
from multiple news sources, and running them through
Yahoo!’s content analysis (term extraction) service to generate his
keywords. He’s working on a similar hack for
entertainment news.

John Herren describes his
project as “a proof of concept to show the kind of cool stuff
that can happen when folks, in this case Yahoo! opens up content and
services for a developer to use.” Justin Flavin’s pages prove that
cool stuff is catchy.

Meantime, we’re keeping the Yahoo! News Tag Soup concept on a low simmer and
giving it an occasional stir, because we believe its flavors can only
improve over time and nourish other promising innovations. What do you
think?

Havi Hoffman

Yahoo! Editorial

Comments

  1. Jonas Luster also had an experiment with Yahoo Term Extraction. Of course I didn’t think it was a good idea.

  2. I’ve set up something similar at http://feeds.karlus.net/soup.php

    It’s based on an experimental site(http://feeds.karlus.net/) that tries to create feeds for some Portuguese news sites. It only gathers the headlines, not the full article.

    One thing i noticed, using the Yahoo! Search API is that Yahoo! ‘content analysis web service’ doesn’t handle very well Portuguese ’stop words’. They show up on the results very often… or am i missing someting ? Right now i’m filtering them myself in the code.

  3. Yeah, it probably is a language issue. I think we’ll be rolling out better support for some non-english languages before too long.

  4. Search Tabs on Yahoo frontpage are no longer displayed in new version of Safari 2.0

  5. Sounds like this tag soup will be very useful once is up and runnning.

  6. Will this product tagging be applied to Yahoo shopping ?