« February 2006 | Main | April 2006 »

March 28, 2006

Weather Report: Yahoo! Search Index Update

We completed an index update over the weekend. As a result, you may see some changes in ranking as well as some shuffling of the pages that are included in the index. You might also be seeing a temporary spike in crawler activity. As these things go, all this should stabilize in the near future. Meantime, if you have any comments about the new index, please let us know.

Many thanks!

Priyank Garg
Product Manager
Yahoo! Search

[ Yahoo! ] options

March 26, 2006

New Tools in the Toolbar

We've been working hard on the Yahoo! Toolbar for IE and Firefox so here’s a quick update for you all.

On the IE front we've added support for tabbed browsing (yay!). The latest release (v 6.3) now includes the ability to use tabbed browsing from within Internet Explorer 5 and 6 so you can tab to your heart’s content in advance of the Internet Explorer 7 release.

On the Firefox side, we just released version 1.1.1 for all our supported counties. This improves a few things you've told us about with bookmarks, mail alerts, and Anti Spy, among other things.

Speaking of changes, you must have heard about del.icio.us by now! People use del.icio.us to save and share web favorites. We've just created a del.icio.us button for our US users. You add it to your IE or Firefox toolbar to get easy access to your del.icio.us account from anywhere on the web. If you already have toolbar installed, you can add it with a click, otherwise, find it on the Add/Edit Buttons page in the "Personal Tools" section.

Grab the latest toolbar from http://toolbar.yahoo.com and let us know if you have any suggestions.

Alwin Chan
Jon Granrose
Toolbar Product Managers

[ Yahoo! ] options

March 16, 2006

B-Ball Madness, Web Video Style…

These guys are hungry, that’s why we watch them. No contract disputes, no ego tirades – at least not yet – just pure college basketball and lots of it.

Starting this week, we’re pulling together as much NCAA tournament coverage as we can and featuring it on the Yahoo! Video home page so you can get your fix. All the upsets, buzzer-beaters and highlights from across the Web, as well as from our own Yahoo! Sports, right at your fingertips.

It’s our way of adding to the distractions of office brackets and fantasy pools. So, go ahead and play them over and over, memorize them, then run out to your driveway hoop and try your best to live the moment. It’s OK, really. We'll be doing the same with our hoop.


(Photo: Cackhanded; flickr)

Ethan Fassett
Yahoo! Video

[ Yahoo! ] options

Know Any Good Engineers or Operations Managers?

One of the benefits of del.icio.us now being part of Yahoo is that we can afford to hire more people to give the service the attention it needs to grow bigger, faster, and better. In fact, we're looking to beef up all of our "social search" efforts. That includes Yahoo! Answers, MyWeb, del.icio.us, and more.

Following that theme, we're hoping to use people (you) to find other people (more Yahoos). Specifically, we're looking for people in the following three roles:

  • Web Development (aka, front-end engineering): PHP, JavaScript, HTML, CSS, AJAX
  • Operations Management: monitoring, outages, reporting, hardware upgrades
  • Engineering Management: specifically with experience in Web Services, APIs, Databases, and Open Source

Are you interested or do you know anyone you can recommend? Send me a resume at jzawodn@yahoo-inc.com

Jeremy Zawodny

Update: If you've seen David Utter's commentary on this post, you may wish to also read my response.

[ Yahoo! ] options

March 15, 2006

A Chat with Andrei Broder (Part III)

A while back, Andrei Broder, a Yahoo! Research Fellow and Vice President of Emerging Search Technology, spent an afternoon telling us a bit about his decades-long history within the search industry and talking about his future projects. To wrap up our interview, we close with some of Andrei’s observations on several Yahoo! Search blog reader questions.

So, several people asked how you feel about what happened to AltaVista…

AltaVista had almost perfect bad timing; it started with huge technology advantage but an unsustainable business model at that time, and squandered its early lead in core search competency.

One of the questions from a reader was about your taxonomy paper. Can you talk a bit about that?

In there, I talk about the three stages of search, that I mentioned before. Web search started in early-mid nineties, really as a scale-out of the classic information retrieval model. At that time, people were still trying to find the best way to adapt classic information retrieval to the scale of the Web: Boolean models, Probabilistic models, etc. The second phase, in the late 90s, was about metadata. Hyperlinks, labels, clickthru data, all sorts of metadata of any kind. The structure of the web. But it was still very syntactic in nature, basically matching words against text. There is no understanding of meaning here. The third generation, still in progress, is about text semantics and analysis, where you starting to understand what the queries are about. That’s roughly where the paper stops . And now there are things like Yahoo! Shortcuts, or a lot of the information that is derived from the meaning of the query. Semantic, shortcuts, local search all seems to be taking off. So it seems that the paper was correct in predicting semantic search as the next phase at the time. Of course if I were to expand the paper now, I would write about the fourth phase: information supply.

Have you looked at blog search? Why does it “suck”?

Blog search is difficult. If you look at web search in general, the biggest help comes from metadata, anchor text, links, web graph analysis, etc. For blogs we have very little useful metadata. And even if you do have metadata for blogs, it is often wrong, or you can’t trust it, so you use it very little.

Furthermore, context is not always there. A lot of blogs are not self-contained, context surrounds them. Even a human doesn’t understand what’s going in a blog if dropped into the middle of it. I’m not sure how much progress we will see, (but then again, I’m not focusing on that!)

Finally, there were some questions about spam.

Any kind of information signal one might use in a search engine, spammers will try to pollute it. We need to be careful not only about link spam, fake sites spam, and so on but also about pollution in the query log, and other more subtle sources of information. On the other hand, spamming is an economics game, people think spammers are kids up to no good, but it is not. Spam is about economics, and we want to raise the difficulty of spamming to the point where it’s not economic to spam. As we go to a more personalized search experience, social aspects of search will play an increasing role. It remains to be seen to what extent this is spamable – It is hard to make robots that behave as humans, it might well be the case that social aspects of search are fairly spam resistant.

Thank you so much for your time, Andrei, and to all our readers. Please leave us a comment below and let us know what you thought!

[ Yahoo! ] options

March 10, 2006

Achtung Maybe: Report from the ETech Attention Zone

O'Reilly's fifth Emerging Technology Conference wrapped up in San Diego on Thursday. The weather was unsettled and unseasonably cool, but it never put a chill on the flow of big ideas or diminished the quality of conversation in the conference rooms, carpeted corridors, or out on the breezy balcony. There were laptops and devices plugged into every available outlet, cameras flashed, people schmoozed in casual circles around pastel-colored inflatable armchairs, and grumbled (per usual) about the wifi.

In quiet corners podcasters, bloggers, and reporters were documenting and commenting the events of the day. Coverage was everywhere. There were plenty of meta-moments -- like when I looked over at the laptop screen of the guy sitting next to me and discovered I was in the middle of browsing his Flickr stream, a pictorial travelogue called 10,000 Miles to ETech.

O'Reilly's coterie of alpha geeks included entrepreneurs, hackers, futurists, designers, developers, journalists, professors, visionaries, storytellers, party people, and business leaders. For four days our collective mind focused on the challenges of managing limitless digital shelf-space, booming bandwidth, endless storage, and the cornucopia of content pouring forth from Web 2.0. We were even treated to glimpses and precursors of Web 3.0. If we figured out how to harness all that intelligent attention we could probably have powered a roomful of roombas to filter more than dust.

This year the theme of ETech was attention and the emergent attention economy. Hint: Imagine designing a Noah's ark to ride out the information deluge. What tools, equipment, and supplies do we need? What can we buy and sell? What creatures do we bring on board to ride out the flood? People brought cool survival gear and new essentials to show and tell.

According to Clay Shirky, we need a pattern language for online community and conversation, page-level strategies that can embrace and accurately reflect opposing points of view. We need social spaces that amplify signal and dampen noise: A space to debate opposing views where we can participate or simply read along without fear.

Peter "Ambient Findability" Morville described why "a wealth of information creates a poverty of attention. " And Linda Stone, who coined the term continuous partial attention in the late nineties, asked salient true/false questions, like whether technology improves or harms our quality of life when we think of ourselves as live nodes on the network.

At ETech, Yahoo! was everywhere, going with the flow, giving and receiving attention, serving up Yahootinis at a memorable "mashup or shutup" party.

On Monday, Flickr's Cal Henderson did an all-day sold out session called Scaling Fast and Cheap - How We Built Flickr and London-based Yahoo Simon Willison offered a tutorial called A (Re-)Introduction to JavaScript.

On Tuesday, Chief Product Officer Ash Patel introduced the Yahoo! Developer Network. Jeffrey MacManus spoke about web services and Yahoo!'s participation platform. He introduced a collection of current and upcoming APIs, including new Y! Shopping APIs recently mentioned here, the soon to be released Y! Calendar APIs, and all the del.icio.us, Flickr, Maps, Search, RSS goodness you're already hacking, mashing up, and building on.

On Wednesday, Bill Scott, Ajax evangelist and pattern shepherd presented The Language of Attention, about the evolving interaction pattern library that we recently released. Later that day, Tom Coates tickled us and made us giggle (really!) with a talk titled "Native to a Web of Data: Designing a Part of the Aggregate Web." Danah Boyd spoke about the mysteries of MySpace, articulated what happens "When Global Information and Local Interaction Collide," and explained about her iconic hat.

On Thursday, Bradley Horowitz elated the audience with a talk about innovation and social media at Yahoo!.

But don't worry if you couldn't be there. The best ideas, memes, applications, and ventures launched at ETech 2006 are likely to be defining new currents, carving new channels, and raising all boats by this time next year. So keep an eye on the rising tide.

Thank you for your kind attention.

Havi Hoffman

[ Yahoo! ] options

March 09, 2006

A chat with Andrei Broder (Part II)

Last week, we published the first of a three-part interview with Andrei Broder, Yahoo! Research Fellow and VP of emerging search technology for Yahoo! In today’s segment, we spend some time chatting with Andrei about what he means by “search without a box” and moving from information retrieval to information supply.

Where do you see web search being right now?

Some things still haven’t been solved. If you look back at papers in the WWW conferences from the mid-90s, about duplication, crawling strategies, web graph analysis, and so on, they are still relevant now. All the problems are still with us and plenty of improvements are possible. In the same vein if you look at cars, you still have technology improvements in steel, engines, structural framing, but the focus of research is on hybrid cars and so on. For web search I believe that the next stage for research is on the side of Information Supply and the integration of multiple sources.

Would you say we (as an industry) have made good steps since the beginning days of search?

Yes, absolutely. When AltaVista first came out, we needed 3 months to build a 30 million corpus of documents, and it had lots of duplicates and other problems. But, a 50,000 corpus was big in the early 90s. Then “big” meant millions, and now “big” is tens of billions. It wasn’t just quantitative, but qualitative, improvements that happened and made web search much better.

So, Andrei, where do we go from here?
My paper on the taxonomy of Web search talks about three generations of web search. I believe that we are now entering an entirely new phase. I call this next phase “search without a box”. Search today is confined to putting in something and getting something back, a pull model. The next stage is for information to come in a context without actively searching, a push model. My favorite example is GPS. Instead of looking up your way on a paper map, you are in your car, and your GPS navigator gives you directions, shows gas stations near you, and so on. A year or two from now perhaps it will show you where those gas stations are, but only when you are low on gas. So you get information on an “as needed, when needed” basis without explicitly asking for it. In the same vein, we will move from information retrieval to information supply.

Is RSS like that?

Alerts are an information supply that answers recurrent needs. What I’m talking about is more contextual. For example, advertising is a form of contextual information supply. The key is for the supply to be appropriate to the context. For instance in a skiing magazine – ads for skis are a perfectly desirable form of content. Information supply as a science will continue to grow because of advertising.

And those are some of the things you are working on?
Yes, I am trying to understand how the information supply will take shape-- there is a fine line between annoying and useful. We also want the user to help define their role in this. You have to understand the context, the user, and the social effects. If we understand what other people like you are doing, we can sometime move from information retrieval to information supply by understanding the class of equivalent users. But we still do not have a theory of information supply, or a definitive model. It’s completely open area. it is not necessarily something we’ll see next year, but it’s the next stage.

In fact, we’re already pretty good in some contexts, commerce sites for example. You go to a travel site, you do a search, you find that the temperature is nice or stormy or whatever, and here are some hotels where you might like to stay, and here are some things you might want to do, etc. That’s already a case of information supply. But we have to come up with how to do it in less constrained contexts.

Essentially we’re going from 2.7 words per query to 0! How do we do that? There’s a funny Dilbert cartoon about buying things online, instead of 1-click shopping, you have 0-click shopping. If you don’t say no fast enough, Dogbert ships you something! (He laughs). It’s tricky, you need a lot of magic behind the curtains and good UI to hide it; it’s a good research direction.

Know where this cartoon is? Drop a comment below. Next week, in our third and final segment, Andrei fields several reader questions. Stay tuned!

[ Yahoo! ] options

March 07, 2006

Making Money with Shopping APIs, and More

Yahoo! is now accepting applications for a commercial version of our Shopping APIs featuring product search, price comparison, ratings & reviews, and shopping browse. The program, which is now in limited beta, is similar to an affiliate model in that it enables websites to share in income generated from providing shopping search and other services powered by Yahoo! Shopping on their sites. Whether you are building a local shopping mash-up, a next generation social commerce destination, or you simply want to integrate shopping comparison features into your content site, this program is a first major step in providing you with tools to help you generate revenue.

Here’s an example from one of our first paid syndication partners, Family.com:


During this initial beta phase, we’re accepting a small number of applicants with special consideration given to sites with high traffic volume and fresh, unique content. But in the future, we plan to open this program to a larger audience. In the meantime, anyone can use our shopping APIs on a non-commercial basis.

Speaking of that, we’re also unveiling some useful enhancements to the Product Search API based on feedback from the developer community. The latest version includes:

  • Search narrowing options. For example, if you query the API for “digital cameras” in electronics, now you’ll find not only the product results that match that search term, but also sub-category and attribute narrowing options such as brand, product line, and number of mega-pixels.

  • The ability to create a browse experience for your site based on Yahoo! Shopping’s comprehensive product hierarchy.

  • Various product image sizes that allow developers to choose between thumbnail size and large photos of the merchandise.

Head on over to the Yahoo! Developer Network for details and full documentation on these enhanced features.

Thanks, and please let us know what you think!

Jen Faenza
Senior Product Manager

Pinank Gogri
Technical Yahoo!

[ Yahoo! ] options

March 04, 2006

Searching out the video buzz on Hollywood’s biggest night

Here at Yahoo!, we work on bringing you lots of ways to tap the web for whatever interests you – movies, music, games, etc – both on Yahoo and from across the web.

This weekend, we’re featuring news, images, video and other related Oscar goodness for the 78th Annual Academy Awards, with a special Oscar site and videos you can find from across the web using Yahoo! Video Search. You can search for trailers for Brokeback Mountain, Munich, and Walk the Line as well as tons of other content, like movie trailer mashups.

And if you miss it, we’ll also feature highlights from the evening the day after. It's another way to take in Hollywood however, and whenever, you want – the best and worst dresses, weepy acceptance speeches, lifetime achievement awards, you name it.

Have a great weekend!

Yahoo! Video team

[ Yahoo! ] options

March 03, 2006

"Search without a box" - A chat with Andrei Broder (Part 1)

A while back, we spent an hour interviewing a new colleague of ours, Andrei Broder. Andrei joins our talented team here at Yahoo!, in the role of Yahoo! Research Fellow and Vice President of Emerging Search Technology. Andrei's decades-long career in search includes his time at AltaVista as vice president for research and chief scientist, and as we noted before, Broder is co-winner of the Best Paper award at WWW6 for his work on duplicate elimination of web pages and at WWW9 for his work on mapping the web.

In this first segment of a three-part interview, we asked Andrei about his decision to come to Yahoo!, and generally got out of the way as we listened in on Andrei’s extraordinary relationship with search. We have combined the normal Q&A format with some audio for your listening pleasure.

Happy reading!
Tim, Jeremy and Tara


When it was announced that you were joining Yahoo!, you mentioned in an interview that you knew you’d be disappointing 2/3rds of your friends. Why did you say that?

Well, the industry is pretty small, and I had offers from Yahoo! and the other big guys in search. I have many friends at all three, and no matter which one I chose, two-thirds of my friends would be unhappy that I didn’t chose them!

So… why did you choose Yahoo!?

My background is research. People often ask what is the difference between research and advanced development. It’s a very interesting question these days, because it used to be that research looks five years forward and advanced development is much shorter term. That’s not true any longer because the cycle has become so short. Research and advanced development are beginning to sync up.

But there is a fundamental difference: The goal of research is to advance the state of the art in the world. The entire research community together advances the state of the art. Companies, such as IBM and Microsoft, support research because the pie gets larger and everyone benefits. Yahoo! intends to pursue a similar open approach to development, research and publishing and the research environment and goals at Yahoo are more compelling to me right now.

Where were you before?

I was in New York, but I am very glad to be back in California. I was working in Hawthorne, just outside of Manhattan, and lived in Riverdale, it was nice. There’s no place like New York, culturally. And by the way, we have offices in New York; Yahoo! Research has an outfit there in the old HotJobs office.

What do you do outside of work?

I ski. I broke my shoulder skiing four years ago, and now that I’ve moved back to California, I’m ready to go skiing again!

Have you ever had an epiphany about your research or work while you were skiing?

Ha! Not while skiing, but While I was at AltaVista, I traveled a lot. On a trip from Rome to Zurich, I was writing email and doing other things you normally do on a business trip, and seated next to me was a Korean-American girl, 9 years old, very talkative. She was asking me lots of questions, what do you do, what kind of computer is that. And I was telling her I work at AltaVista, and she said, “Oh, I know that: it’s a search engine! But we are not allowed to use it.” So a precocious 9 year old knows what I am working on. And that was pretty amazing. If I had said Digital or Compaq, she would have no idea what I was talking about. That’s the magic of the web.

At what point did you decide to get into search?

In this audio segment, Andrei talks about his graduate student roots, his advisor Don Knuth’s impact on his future, and one of his earliest, and best known papers on New Duplicates.


That’s it for today. Next week, Andrei talks about moving from information retrieval to information supply, and “search without a box.”

[ Yahoo! ] options

Hosting by Yahoo!