« See More on Yahoo! Maps | Main | In the City by the Bay for Web 2.0 Expo »
Yahoo! Slurp 3.0
Over the past few weeks, we've been preparing for the latest version of the Yahoo! Search crawler with some infrastructure updates, which recently caused a variance in our crawl behavior.
With everything now in place, the rollout has officially begun. The new Yahoo! Slurp 3.0 recognizes the same user-agent and all robots.txt directives for 'Yahoo! Slurp,' though it'll identify itself as Slurp 3.0 in your web logs.
As the new software undergoes a phased rollout to our production crawlers over the next several weeks, you'll see the following changes:
- a) The crawlers will start crawling from a different and much smaller set of IP addresses, but it'll still be from the crawl.yahoo.net domain. Any reverse DNS checks to identify our crawler will continue to work. Please note that if you're using IP-based recognition of our crawlers, you might see a drop in crawl/coverage from Yahoo! We strongly recommend that you move to reverse DNS-based identification of Yahoo! Slurp if you're using any other method to avoid this problem. The current set of IPs will disappear from your web logs in the next several weeks.
b) The crawlers will also publish a new user-agent, 'Yahoo! Slurp/3.0.' Existing robots.txt directives for 'Slurp' or 'Yahoo! Slurp' will continue to work, but if you have directives specific to 'Slurp/2.0,' they won't be recognized by the new crawler (though usage of the 'Slurp/2.0' user-agent is very rare on the web, so you won't likely be affected). We recommend specifying the shorter version of: User-agent: Slurp. Check out "How do I prevent my site or certain subdirectories from being crawled?" on our Help page for more details.
These changes will affect the main Yahoo! Web Search crawlers. Crawlers that similarly respect the Yahoo! Slurp directive but identify themselves more specifically, such as Yahoo! Slurp China and others, will not be impacted.
Let us know if you have any questions or observe anything unusual.
Sharad Verma & Yoram Arnon
Yahoo! Search


Comments
I dont know if it's part of the migration, but i'm getting this:
User Agent: Wget/1.10.2 (Red Hat modified)<-------
Get String: www.yummyfood.net/index.php
Forwarded For: 67.195.4.142
Client IP: none
Remote Address: 67.195.50.114
Remote Port: 40084
Request Method: GET
I do not allow wget on my sites, so the bot is being blocked all the time.
What's up?
Posted by: Guido | April 14, 2008 08:44 PM
I'm getting the same handyszene's logs, I've verified them and they come from crawl.yahoo.net addresses, using wget as user agent.
HTTP_VIA:1.0 llf330007.crawl.yahoo.net:4080 (squid/2.6.STABLE1)
HTTP_ACCEPT:*/*
HTTP_USER_AGENT:Wget/1.10.2 (Red Hat modified)
HTTP_X_FORWARDED_FOR:67.195.4.140
&ALL_RAW=Cache-Control: max-age=31536000
Via: 1.0 llf330007.crawl.yahoo.net:4080 (squid/2.6.STABLE)
The crawler is indexing a site with robots.txt that disallows crawling!!!!
Now I'm blocking requests using the ip range, please fix this bug !
Posted by: Damiano | April 15, 2008 02:56 AM
I'm facing the same problem as Tarry. The same crawler indexes a robots.txt file and everything gets stuck.
Posted by: Paul | April 15, 2008 04:55 AM
Has Yahoo had an update over the past several days.
It appears that social sites are even more dominant on the SERPs than they were during the previous month
Unfortunately, this bias towards social sites means excessive amounts of spammy sites getting high rankings - even pages that are redirects or no longer valid.
Look at this example while just shopping for Mothers Day and Birthday gifts:
http://search.yahoo.com/search?p=replica+handbags
The concept of focusing on the social web has tremendous potential, but it must be tweaked and optimized so as to protect search quality
Thank You for your time :-)
Posted by: Most Beautiful Woman in Universe | April 15, 2008 09:23 AM
What about the Latin America market?
Should we be expecting some "radical" changes on the SEPR's?
Posted by: Charlie | April 15, 2008 11:16 AM
I appreciate the update. I'll change the directives that are specific to 'Slurp/2.0,' now that they won't be recognized by the new crawler.
Posted by: Office Furniture | April 15, 2008 06:15 PM
Nice work, after some tweaking all seems to work fine, thanks!
Posted by: interaction design | April 15, 2008 11:02 PM
I hope this changes will give great impact to my SERP.
Posted by: Free Stuff | April 16, 2008 03:02 AM
Great job. Thanx a lot.
Posted by: dassad | April 16, 2008 05:22 AM
Its really Great to read your blog, I’ve learn lots of tips. thanx,
Posted by: online pharmacy | April 16, 2008 02:18 PM
I've seen it in my logs already. Slurp is working!
Posted by: Green Grant | April 17, 2008 02:07 PM
thanks for that
Posted by: bloggers mosaic | April 18, 2008 04:13 AM
Slup is real great and it has really good functionality. Go on Yahoo!
Posted by: Martin | April 19, 2008 01:20 AM
Thanks for the update.
Posted by: Andy | April 19, 2008 12:16 PM
This is my first experience visit this blog, it was great
Posted by: Logicb0x | April 20, 2008 02:15 AM
I have a site that has a depth of about 6 million pages to be index. I have uploaded sitemaps and I see the slurp bot on the server everyday but it crawls like a snail, what could possibly be the reason? That bot has plenty of food on this site that it should be running like an SOB. I have another site that has a depth of about 800k links but Yahoo has never indexed more than 4k in the last 4 years. Any help is appreciated. Thanks, Paulie Walnuts.
Posted by: Paulie Walnuts | April 20, 2008 05:42 PM
Really great info. That explains why your crawlers looked little bit "lazy"recently.
Posted by: pat609 | April 26, 2008 12:23 AM
That is a real update!
Posted by: Mark | April 26, 2008 03:17 AM
Using Yahoo slurp is amazingly easy and convenient. A real fun tool....
Posted by: Jordans | April 29, 2008 11:40 PM
really easy and usefull
Posted by: oyun | May 3, 2008 10:03 AM
Thanks for the update.
Posted by: firmalar | May 3, 2008 10:05 AM
good work, after some tweaking all seems to work fine, thanks!
Posted by: george tomas | May 11, 2008 09:04 AM