January 27, 2006

Questions for Andrei Broder re emerging search technology?

You may have heard some buzz about Andrei Broder joining Yahoo! as a research fellow and vice president of emerging search technology. Longtime search industry folks will know that Broder is a noted expert on design, analysis, and implementation of algorithms for Web-scale information retrieval and applications. We’re pleased to host him for an exclusive Q&A for Yahoo! Search blog readers.

Broder is co-winner of the Best Paper award at WWW6 for his work on duplicate elimination of web pages and at WWW9 for his work on mapping the web. And here’s a big list of Andrei’s papers. Andre also serves as chair of the IEEE Technical Committee on Mathematical Foundations of Computing and has recently been named a 2006 IEEE Fellow ‘for contributions to the theory and application of randomized algorithms’.

So, got a question for Andrei? We’ll be conducting the interview next week and posting it shortly thereafter.

Tara Kirchner
Yahoo! Search

PS. If you are annoyed by all these sites that ask you to copy some letters or numbers to prove you are not a bot, you can blame Andrei as well: he co-invented this challenge, back in 1998.

Comments

  1. During The late 1990s, AltaVista WAS the choice among professional Searchers because of its relatively huge database (for that time) …

    Although there was a lot of Spam – you would eventually find almost everything you needed by Page 3 & using Bolean operators….

    They admited that the Age of the Site, Title Tags, Meta Tags, and Keywords in Body Text (2 times Max) was factored into SERPs Algos…

    One interesting factor in its’ Algos was the AGE of a site -the older the better, helping it tremdously in the SERPs….

    Around early 2000, there was a sudden, sharp, overnight decrease in relevancy. Although there was less spam sites to search through, the database was noticeably smaller….

    What is of grave concern is HOW did the “Google” of its day be allowed to atrophy.

    Even Now, Yahoo is not developing it any future!!

    AltaVista, had it been nurtured properly could have been more than what Google is Today – with its Three Year Head Start!!!

    This concern must be answered and addressed Honestly – No PR – B***T – This is a major Mystery among Search Engine Historians , so it is time to face the facts.

  2. Blog Search. It sucks. How is Yahoo going to make it better? Conversation based search? Comment search?

  3. 1. Currently the results for non-English (specific: European local) sites are far below avarage. Even MSN performs better, are language specific searches a problem?

    2. I am sure that, before you joined Yahoo, you thought about their search algoritms. What did you think of them and what did you want to change?

  4. Dr. Broder – How long will it take for you and your team at Yahoo to match Google in organic search relevance with Yahoo Search?

  5. Hi Dr. Broder.

    Thanks for the chance to ask a question or two. Yahoo was granted a patent earlier this week on categorization, training pages, and similarity. It sounds pretty exciting, and I’m sure that more interesting things will be coming from Yahoo! in the months to come. Anything new that you can share with us in terms of web search, that we might see shaking things up any?

  6. Two part:

    I’ve enjoyed reading your work very much and thanks for taking the time to speak with me on the phone every now and again.

    Your taxonomy of search paper certainly helped to classify the nature of search at that time. Reflecting on your thoughts then, compared to search query streams in 2006, what changes, if any, could you describe?

    And possibly connected…

    This is a general observation. Early search engines based their ranking methods around the vector space model which was very open to manipulation (text spam). Kleinberg and Page/Brin incorporated link analysis into the ranking mechanism, which made for more relevant results, but still left ranking open to types of manipulation (link spam). Now we hear a lot more about end user behavior data also being folded in.

    Can you give an indication of the type of end user data which can be used and may have an impact on the ranking mechanism?

  7. I know Nguyen maybe against this, but i think Blog is the next big for search engines. The Blogosphere is probably growing faster than any other plae in the internet so it makes sense for yahoo to beef their search there.

  8. Dr. Broder,
    I would like to ask a simple question:

    Could you explain in simple terms how you would identify so-called “splogs”, and especially sites recycling content lifted from elsewhere?

    The comment from “Gratis Kontaktanzeige” above is one example.

    I am not really interested in what you would prefer to do with these sites once identified, I am more concerned about the “how to” part.

  9. Tara, or whom it may concern:

    ——————————
    PS. If you are annoyed by all these sites that ask you to copy some letters or numbers
    ——————————

    I am not annoyed by that, but if you push a submit button (any button) you should always get a confirmation. Yes, I am talking about this exact blog.

    Right now I don’t know if my Question for Dr. Broder disappeared or what?

    I could try to re-enter it, but if it is stored somewhere already that would only make the entry a duplicate.

  10. Andrei, we missed you at IPAM’s Document Space workshop.

    Can you shed a bit of more light on Yahoo!’s research in the area of embedding documents in the diffusion space? Specifically, what would be the drawbacks when applied to a structural directory as opposed to a search engine or to a sphere (blogosphere or newsphere).

    Edel Garcia

  11. Thanks for all your questions folks! Stay tuned for the interview.

    Tara

  12. Hi Dr. Broder,
    what do you think about Google censuring in China?

  13. With all the focus on server side search, I was wonderign what if anything is Yahoo doing with regard to using the context of a user’s activity, work, documents, and applications to help them find what they need?