Hadoop Now at the Heart of Every Yahoo! Search

  • Posted February 19th, 2008 at 3:45 pm by Yahoo! Search
  • Categories: Search

Those of you who listened to Yahoo!’s fourth quarter earnings call may remember Sue Decker mentioning our embrace of open source infrastructure. On a very related note, we’re announcing today that we implemented what we believe is the world’s largest commercial application of Apache Hadoop. We are now using Hadoop to process the Webmap — the application which produces the index from the billions of pages crawled by Yahoo! Search. Matt McAlister posted today about the Hadoop implementation, including some numbers that will give you a feel for the scale of this implementation.

Our implementation of a Hadoop-based Webmap is part of a larger strategy of Yahoo! moving toward openness — both in our infrastructure and throughout the network (our recent OpenID announcement is another good example). Using open source software is a win-win situation for Yahoo! and the wider community. We achieve cost savings, faster processing, reduced maintenance, and increased scale and the community can benefit from the myriad improvements it took to make Hadoop viable for such a large-scale commercial implementation.

I’d like to thank the Hadoop and Apache communities, and reinforce our commitment to the open source world. We’re definitely standing on the shoulders of giants here! For more info on this announcement, check out Matt’s post and let us know what you think below.

Sean Suchter
VP, Yahoo! Search Engineering

  • 0 Comment
  • Subscribe
back to yahoo! search

subscription options

Facebook Fans

latest posts

archives