Our Blog is Growing Up ‘ And So Has Our Index
It’s hard to believe it’s been nearly a year since we launched the Yahoo! Search Blog. On this anniversary I thought this would be a good time to update you on some of what we’ve been up to at Yahoo! Search. If you are a regular reader then you’ve seen lots of posts over the past year highlighting what we’ve been working on:
- Vertical search ‘ Local, Video, Audio, Creative Commons, Subscriptions as well as ongoing updates to Image Search
- Personal & social search ‘ My Web and My Web 2.0
- Flickr
- Y!Q our contextual search technology
- Our award winning Yahoo! Desktop Search
- Open search APIs across all our verticals via the Yahoo! Developer Network
But you will notice we haven’t talked much about plain old Web search. You know, the one that gets used billions of times a month by people all over the world. We don’t blog about it as often as our other products, but since it is the foundation for everything we do it’s always top of mind. Since our first post on the search blog was about Yahoo! Search, I thought I would give you an update on what’s been happening.
As those of you who follow this blog know, I recently posted a weather report alerting you to material changes to our index. Since that post, we’ve seen some discussion from webmasters who have noticed more of their documents in our index. As it turns out we have grown our index and just reached a significant milestone at Yahoo! Search ‘ our index now provides access to over 20 billion items. While we typically don’t disclose size (since we’ve always said that size is only one dimension of the quality of a search engine), for those who are curious this update includes just over 19.2 billion web documents, 1.6 billion images, and over 50 million audio and video files. Note that as with all index updates we are still tuning things so you’ll continue to see some fluctuation in ranking over the next few weeks.
Ensuring you find what you’re looking for is the true measure of search engine quality and something we strive for every day. We measure quality in terms of RCFP ‘ Relevance, Comprehensiveness, Freshness, and Presentation and continue to work on improving those metrics. While we’re never satisfied, it is nice to see some of our efforts over the past year have been recognized ‘ including winning the 5th annual Outstanding Search Service award from Search Engine Watch and our top position in the Search Engine Relevancy Challenge.
Going forward, I am most excited about the talent that we have on the team, including some notable new additions. Dr. Prabhakar Ragavan recently joined and is heading up Yahoo! Research Labs. We also just opened Yahoo! Research Labs ‘ Berkeley and will be tapping into the world-class talent pool at U.C. Berkeley. Across labs and the Yahoo! Search team we will continue to explore new technologies in areas like information retrieval, machine learning, social search, and mobile search.
So what are you missing?
Chris Sherman over at Search Engine Watch wrote an article last week about the variance in search engine results among different providers. It was interesting to note how little overlap there is ‘ nearly 85% of first page results are unique to one engine. Sort of makes you wonder what you are missing if you are stuck in a search engine rut…
Give us a spin at www.yahoo.com or search.yahoo.com and leave us a comment here to let us know what you think.
Tim Mayer
Yahoo! Search

The improvements in the SERPs Relevancy are quite obvious to professional Searchers….
The updates are fairly frequent - and new sites are being added faster than ever in Yahoo’s history.
The Organic ALGOs - however - although improved dramatically, still fall shy of Google in most queries - but in some (e.g. commercial services,
webmaster tools) are clearly superior.
The less extreme emphasis on “Link Popularity” -and more balance with relevant Body Text - seems to be working in Yahoo’s favor.
The only concern is PLEASE - PLEASE - PLEASE - don’t let Altavista and AlltheWeb atrophy!!!
Tim,
Nice work … although as you correctly point out, quality matters more than quantity … and as a person who uses the search engines a LOT to find information, I appreciate your effort to provide relevent results - especially hard these days gives all the crap that is out there.
The “Big G” has a rep in the industry of not being very communicative, so kudo’s to you for letting us know how it is going and keep up the good work - my impression as a hobbiest/observer of this industry is that Yahoo! has got the Mojo back! ;-)
alek
P.S. On a perhaps related note, you guys really oughta fix the “drop the trailing / for pathname URL’s in the search engine results” issue. You guys are sending through you own redirector (MSN and Google don’t - why do you - maybe a future blog entry?) but you still drop the trailing “/” if there is a pathname - yea, this all works for the end-user, but generates a 301 redirect in the Apache logs since “www.domain.com/pathname” gets sent to “www.domain.com/pathname/” - a minor thing, but a nice polish that you oughta fix - MSN used to do this, but they fixed it a while back in their HREF’s … although the screen display still drops it.
From the above post:
“The less extreme emphasis on “Link Popularity” -and more balance with relevant Body Text - seems to be working in Yahoo’s favor.”
In complete agreement!!!
Google officially enters “evil” territory.
Check these memegraph results!
http://www.realmeme.com/Main/savinggoogle/index.jsp
Much like realmeme’s site, tho trendwatcher providers real time graphs of search results on the top 4 engines, you can see a
jump in just about all of them, and add your own too
internet Trendwatcher
I am missing News Search for Switzerland. We are NOT a part of Germany, i think som Yahoo managers haven’t got that. But for everything else, Thumbs up for your great work!
please guys, i’ve been watching this over the years.. yahoo and most other pages by your company (excluding flickr) have not set a background color. now what am i telling you? most people have white as default background color in their operating systems (and as such in their browser), so most people won’t see this. being a webdesigner myself (and for other reasons), i chose a light grey background. when i come to yahoo, the page is getting displayed in grey, which would be no problem.. if you hadn’t some boxes with rounded edges, that reveal that the page is meant to be on white background. i know many other websites with the same problem. this has been going on for years. in some cases i find it embarrassing for a company’s webmaster. please change!
Is it just me, or does search.yahoo.com look so much like google.com?
I mean, see how the links (Images, News, etc) are positioned on top of the search form, and how “Advanced Search” is on the right side of the search form. Even the Copyright notice looks so google-y.
I don’t dislike Yahoo!, but hey, maybe you guys can come up with a more original and distinct look.
The weakness in the results seems to be explained by the lack of the word “authority” among the metrics. Relevant, fresh… all that is less than icing on the cake, mostly unimportant. Focus on pages that “know what they are talking about”.
I agree with the above, all the links given:
My Web 2.0
Yahoo! Desktop Search
Image Search
Look like google clones, down to the typeface and layout. It’s just like MSN, under the impression that blindly copying the leader will make their search more attractive to people.
Google are top for a reason - they innovate and leave the woft of corporate bull**** to other companies whose marketting departments still believe this 1980/90s style of selling things e.g. it buzzwords, works.
I switched from Google to Yahoo Search full time a few months back and absolutely love it. 20 billion and relevant!!! You guys are on fire! Keep up the great work.
Hello Tim. Whilst noting that you point back to your last ‘weather report’, and whilst obviously being aware of PR issues, if you have read all the comments in said thread it would surely take world-beating spin abilities to interpret those comments - and, indeed, comments on other ‘biz’ forums - in anything other than a pretty negative light.
Great that you’re different, great that you’re seeming to keep cards further from your corporate chest than others, less great that names that we all know and recognise over the years are, without being hysterical, decrying these newer results as being desperately lacking in Quality Signals and, therefore, authority sites.
I know that this is just so much p*ssing against the wind, but, dog with a bone, I still can’t quite stop myself. I applaud a step back from links-’n-nothing-but-links, but pure on page just isn’t cutting it, and neither is a bigger index helping this. As I said elsewhere, keyword in domain, keyword in path, keyword in title are usually indicators of something, but not, bless your faith, the something you seem to have so emphatically decided upon.
Boy do I hope this shakes out. Good luck!
Wow! That is an impressive index. Size might not be everything but I’m sure it will give GOOG something to think about ;)
I find web search to be a very dangerous industry to be in. We all remember the days when Alta vista was the king of search, but those days are clearly numbered. Yahoo has obviously made great strides in the search technology, and if - at some point - you are able to overtake your somewhat more prominent rival, then we will move to Yahoo without hesitation. That day has not yet come though. The frequency with which Google crawls the web appears to be several orders of magnitude over its rivals. While Google crawls the Princeton CS departmental pages at least once a day (and even crawls all personal pictures once a month!), Yahoo and MSN crawl the person websites about once a week. And Yahoo has yet to crawl any of the pictures. While Yahoo search has made much progress, it is equally clear that much catch-up work remains to be done. Good luck!
I love Yahoo very much. And always use Yahoo search .It is really fantastic. And Myweb 2.0 is a real cool. I’ve found many great sites from here.
So, does this new index page count include the 20,000 or so that were de-indexed across my three sites about 1.5 months ago? They still aren’t showing…but creeping back up slowly.
I was very excited to see the new and improved yahoo search. As a webmaster I saw an immediate increase in search hits to my sites, which was good to see since it was devoid of yahoo before.
I am seeing a very large swing in some keywords from like position #5 to not top 200 on an almost daily basis. So hopfully this will calm down as you say.
On a related note - I know 1000’s of webmasters around the world would love if yahoo provided some form of “page value” when linking to peoples sites/url’s. I don’t know if yahoo ever has a plan for this???
As I said before and will forever say, Yahoo rules!
nice job Tim… pretty good year for you folks.
keep up the great work :)
Great new milestone, Tim - congrats!
Roger
Wish you could fix the defective way you handle att member pages.
As a webmaster, Yahoo search user, the latest Yahoo search results have been the most irrelevant I have seen since MSN came out with their beta search version.
For the past 3 weeks, when searching for a variety of search terms, non-relevant sites appear more than I have every seen in Yahoo. I am assuming that you are shaking things up, but I hope when its all settled in, we do see the same quality results as you have suggested above from Yahoo.
It’s good to know the index has grown so much, but what about the many Yahoo store owners such as myself who’s indexed content has virtually dropped from the face of the planet during this last Yahoo index update?
My site dropped from 245 pages indexed pre July 20th to about 27 currently, with no obvious recovery in sight.
I have not seen any comments from Yahoo nor received any response to emails about when this will be rectified. An explanation is long overdue.
As a web dev my experiences with Yahoo! have been positive. Our sites are ranking with relevancy and our prefixed pages are getting in there too.
Congrats Yahoo!
Ron P.
In my own analysis (today) I found Google’s index to be significantly larger than Yahoo’s. I did a quick test by performing 20 queries on each engine. I then compared the unique result count and created a few ratios. On average for my queries, it seems like Google returns 50% more unique results - thus indicated a 50% larger index. You can read my full results on my blog.
Thoughts?
Looks like the previous comment got its links stripped. For those interested, the analysis is here:
http://blog.akashjain.org/2005/08/12/is-yahoos-index-really-bigger-methinks-not-really-googles-index-seems-50-larger/
meeps :)
I’ve always used Yahoo a lot more then Google. Terms always have seemed very relevant to my searches and I didn’t have to filter out the bad content that I recieved in google.
Another reason I now like Yahoo so much is they rank my site so well. I created http://www.yourchildsnanny.com and within a couple weeks I was ranked in the first spot in many key words. My pages are filled with great content, but not a lot of links. Thank you for ranking us on good content and not our ability to do linking campaigns.
Focus efforts on these serps because this is just aweful.
http://search.yahoo.com/search?p=ma+classifieds&fr=FP-tab-web-t&toggle=1&cop=&ei=UTF-8
Several researchers (myself included) at the National Center for Supercomputing Applications at the University of Illinois at Urbana-Chamapign ran an extensive study (about 10,000 queries) about Yahoo!’s new web index and found (surprisingly) that Google returns more results than Yahoo in almost every single case. We found that Google returned well over 150% more results and gave more results in about 97% of our queries.
Our full study and test code is available online at: http://vburton.ncsa.uiuc.edu/indexsize.html
Yahoo’s index bigger than Googles? I don’t think so!
All 4 messages in topic - view as tree
mig31m6 Aug 13, 8:57 pm show options
Newsgroups: google.public.support.general
From: “mig31m6″ - Find messages by this author
Date: 13 Aug 2005 17:57:34 -0700
Local: Sat, Aug 13 2005 8:57 pm
Subject: Yahoo’s index bigger than Googles? I don’t think so!
Yahoo’s index of 20B! Yeah, right! No scientific proof has been
presented to actually demonstrate their index is any bigger than what
it used to be; whatever that was. We don’t know because Yahoo doesn’t
tell you. Their algorithm still sucks, even if it turns out that Yahoo
has a bigger index than Google. When the original version of MSN
search first came out, which was almost a year ago, they claimed at the
time they had 5MB pages in their index which trumped Googles 4.2B pages
at the time. In a period of a few weeks, Google went from 4.2B to just
over 8B and that was just web pages and didn’t include the rest of
their indexed content. Microsoft certainly shut up about that one
after it happened. Google then came up with GMail including 1 gig of
email storage. A few months ago Yahoo announced it was going to offer
1 gig of storage to all their users. Fast forward 1 week or so after
Yahoo’s announcement when the GMail team upped all of their users by
another gig which now gave all Google’s users 2 gig of free space and
as an extra surprise, the space would dynamically increase every day.
It will be interesting to see what Google does with it’s index in
response to Yahoo. I’ve recently noticed Google add a lot more blog
pages as part of their index. As the old saying goes ‘don’t throw out
the baby with the bath water’. In the end, Yahoo should never quit
it’s day job as a portal. It’s a very nice portal but thats it. It
really shouldn’t call itself a search engine because it is useless.
Finally, there are reported rumors that Apple and Google may be getting
together in a deal that would place hundreds of millions of ITunes
songs at the disposal of Google users probably for a very reasonable
price. No, they can’t do everything for free. Everyone calm down!
And lets not forget about the Google Payment Service to compete with
EBay’s Paypal service. God help EBay. This concludes the random rants
and thoughts of this user. As they say, long live Google.
A Comparison of the Size of the Yahoo! and Google Indices as conducted by the NCSA
http://vburton.ncsa.uiuc.edu/indexsize.html
I think Yahoo! might have jumped the gun a little bit. Claims are different from evidence and proof. While Yahoo has been making good progress in improving their search lately, I don’t think its quite where they claim it is yet. Especially according to some independently conducted research by the NCSA.
Read it here.
http://vburton.ncsa.uiuc.edu/indexsize.html
The NCSA study starts with a list of words from the ispell list. That same list appears on the web several times. Not surprisingly EVERY SINGLE SEARCH in the NCSA “study” is hitting these useless english word-list pages, so _each_ of these pages is counted 10012 times in favor of google… Yahoo seems to prune those long “word lists” as spam since they are essentially useless.
What the study tells me is that google is much less aggressive in pruning spam than Yahoo.
Whose index is bigger, I don’t really know, but this study is clearly, fundamentally flawed.
[I wrote this before reading luke's comments above, I came to identical conclusions]
Flaws in NCSA Yahoo/Google study
I’ve dug into some of the study’s data, and written an initial
quick blog post to point out two bad flaws. The methodology used does
indeed have a selective bias, towards both:
1) search-engine spam pages, and 2) large word lists.
Briefly, by using searches for random words from a large
wordlist, that created a tendency to select *large* *wordlists*, and
also gibberish spam pages which happened to have those words (probably
derived from the same large wordlists). Moreover, this effect applies
(to some extent) to *every* *search* *sample*. In fact, many of the
searches could be repeatedly selecting the *same wordlist file*,
or similar. Since either Google had more large wordlists indexed, or
Yahoo eliminated many of them as useless data, this results in an
extremely misleading conclusion about the relative size of their databases.
In effect, the outcome is that a relatively small number of
dubious documents are being repeatedly sampled, rather than any sort
of comprehensive examination.
Hmm, the URL didn’t seem to go through. It’s:
http://sethf.com/infothought/blog/archives/000899.html
i am not able to open yahoo web site. I am not able to see my mail.
Tim,
Even though Yahoo does well on the “Search Engine Relevancy Challenge” by Rusty Brick, I’m curious about what you would think of many search marketing people on many different forums being given the challenge (broken down by catagory/topic) of trying to come up with ways of measuring relevancy. Danny Sullivan’s recent article: http://searchenginewatch.com/searchday/article.php/3527636 has a great challenge for search engines, but why not start out challenging many people who work with keywords and relevancy every day? After that, then the search engines could “Establish a research center, a consortium or something and a methodology that all will agree upon.” as Danny suggests. Getting all forum participants to agree on results would not need to happen, and it would surly get media attention on the more lofty goals of search engines.
Sincerely,
Bill Kelm
I think Yahoo is cheating. I have numerous cases similar to the following:
search [site:sampledomain] yields 17000+ results
search [site:sampledomain keyword] yields 53 results where keyword is a word common to all the pages in the domain.
search [uniquekeyword] yields 53 results where uniquekeyword is common to all pages and unique only to the site.
This tells me that somehow Yahoo has a file of all 17000+ pages that responds to the site: command but not to the general query command.
This smells like Google’s supplemental index where a lot of crap pages are kept just for the sake of being counted in the “total” index!
Do you have mobile wap access to your blog? I don’t see it on your wap menu…
Chuck Darling http://tinyurl.com/wxot
I LOV YAHOO
I work for an online guide to New York City spas. We have almost 150 pages dedicated to NYC spa profiles and reviews. Yet if you search for New York City Spas the #3 result is about a New York City Skateboard park. The word “Spa” never appears on any page in that site! Yahoo should be more relevant with the overall content. A site with 100 pages on topic should come before a site with 1 page.
I am having a hard time adding my blog address to the Yahoo Search. Can someone tell me how to do so. Whenever I look for my blog in yahoo it doesn’t appear. Thanks
what do i do to place a yahoo search engine and other yahoo features in my blog?
My blog id:
http://techfeast.blogspot.com
Hey bros, you guys dont need to care about who talks bad of yahoo search, just listen to me: you all rock, keep on this work im quite assure you will get better and better ;)