Yahoo! Slurp 3.0
Over the past few weeks, we’ve been preparing for the latest version of the Yahoo! Search crawler with some infrastructure updates, which recently caused a variance in our crawl behavior.
With everything now in place, the rollout has officially begun. The new Yahoo! Slurp 3.0 recognizes the same user-agent and all robots.txt directives for ‘Yahoo! Slurp,’ though it’ll identify itself as Slurp 3.0 in your web logs.
As the new software undergoes a phased rollout to our production crawlers over the next several weeks, you’ll see the following changes:
- a) The crawlers will start crawling from a different and much smaller set of IP addresses, but it’ll still be from the crawl.yahoo.net domain. Any reverse DNS checks to identify our crawler will continue to work. Please note that if you’re using IP-based recognition of our crawlers, you might see a drop in crawl/coverage from Yahoo! We strongly recommend that you move to reverse DNS-based identification of Yahoo! Slurp if you’re using any other method to avoid this problem. The current set of IPs will disappear from your web logs in the next several weeks.
b) The crawlers will also publish a new user-agent, ‘Yahoo! Slurp/3.0.’ Existing robots.txt directives for ‘Slurp’ or ‘Yahoo! Slurp’ will continue to work, but if you have directives specific to ‘Slurp/2.0,’ they won’t be recognized by the new crawler (though usage of the ‘Slurp/2.0′ user-agent is very rare on the web, so you won’t likely be affected). We recommend specifying the shorter version of: User-agent: Slurp. Check out “How do I prevent my site or certain subdirectories from being crawled?” on our Help page for more details.
These changes will affect the main Yahoo! Web Search crawlers. Crawlers that similarly respect the Yahoo! Slurp directive but identify themselves more specifically, such as Yahoo! Slurp China and others, will not be impacted.
Let us know if you have any questions or observe anything unusual.
Sharad Verma & Yoram Arnon
Yahoo! Search

I dont know if it’s part of the migration, but i’m getting this:
User Agent: Wget/1.10.2 (Red Hat modified)<——-
Get String: http://www.yummyfood.net/index.php
Forwarded For: 67.195.4.142
Client IP: none
Remote Address: 67.195.50.114
Remote Port: 40084
Request Method: GET
I do not allow wget on my sites, so the bot is being blocked all the time.
What’s up?
I’m getting the same handyszene’s logs, I’ve verified them and they come from crawl.yahoo.net addresses, using wget as user agent.
HTTP_VIA:1.0 llf330007.crawl.yahoo.net:4080 (squid/2.6.STABLE1)
HTTP_ACCEPT:*/*
HTTP_USER_AGENT:Wget/1.10.2 (Red Hat modified)
HTTP_X_FORWARDED_FOR:67.195.4.140
&ALL_RAW=Cache-Control: max-age=31536000
Via: 1.0 llf330007.crawl.yahoo.net:4080 (squid/2.6.STABLE)
The crawler is indexing a site with robots.txt that disallows crawling!!!!
Now I’m blocking requests using the ip range, please fix this bug !
I’m facing the same problem as Tarry. The same crawler indexes a robots.txt file and everything gets stuck.
Has Yahoo had an update over the past several days.
It appears that social sites are even more dominant on the SERPs than they were during the previous month
Unfortunately, this bias towards social sites means excessive amounts of spammy sites getting high rankings – even pages that are redirects or no longer valid.
Look at this example while just shopping for Mothers Day and Birthday gifts:
http://search.yahoo.com/search?p=replica+handbags
The concept of focusing on the social web has tremendous potential, but it must be tweaked and optimized so as to protect search quality
Thank You for your time :-)
What about the Latin America market?
Should we be expecting some “radical” changes on the SEPR’s?
I appreciate the update. I’ll change the directives that are specific to ‘Slurp/2.0,’ now that they won’t be recognized by the new crawler.
Nice work, after some tweaking all seems to work fine, thanks!
I hope this changes will give great impact to my SERP.
I’ve seen it in my logs already. Slurp is working!
Slup is real great and it has really good functionality. Go on Yahoo!
I have a site that has a depth of about 6 million pages to be index. I have uploaded sitemaps and I see the slurp bot on the server everyday but it crawls like a snail, what could possibly be the reason? That bot has plenty of food on this site that it should be running like an SOB. I have another site that has a depth of about 800k links but Yahoo has never indexed more than 4k in the last 4 years. Any help is appreciated. Thanks, Paulie Walnuts.
Really great info. That explains why your crawlers looked little bit “lazy”recently.
Using Yahoo slurp is amazingly easy and convenient. A real fun tool….
good work, after some tweaking all seems to work fine, thanks!
I hope this clears up the spammy results somewhat. Yahoo seems to fill it’s serps with less than great sites for some searches.
Thanks for the update.
According to my weblogs is appears that the Yahoo Slurp is running as intended and error free on my sites, great job!
How does one distinguish between Yahoo and Yahoo Explorer. I was checking the indexing of my site
http://www.firsttimebuyernetwork.co.uk today and I found that seven on my pages were indexed on Yahoo Explorer but not on Yahoo. I don’t understand – is this the same platform?
Perplexed http://www.firsttimebuyernetwork.co.uk
Yahoo Site Explorer doesn’t work at all with any .travel domains. This is probably very frustrating for owners of the over 6 million .travel pages out there.
Please fix this simple bug, and allow Site Explorer to accept 6-letter top-level domains.
Nice work, after some tweaking all seems to work fine, thanks!
I have used yahoo over the past 6 years and the results are really starting to get more relevant now than in the past.
Yes im starting to see the yahoo slurp also working. Im glad about this because without teh site explorer i would be as blind as a bat for my site.
Nice work, after some tweaking all seems to work fine, thanks!
Nice work, after some tweaking all seems to work fine
Using Yahoo slurp is amazingly easy and onvenient.
Great job. You are better and better.
thanks
This is a nice feature we have been waiting for!
Nice work, after some tweaking all seems to work fine, thanks!