Yahoo! Slurp 3.0

  • Posted April 14th, 2008 at 12:00 pm by Yahoo! Search
  • Categories: Site Explorer

Over the past few weeks, we’ve been preparing for the latest version of the Yahoo! Search crawler with some infrastructure updates, which recently caused a variance in our crawl behavior.

With everything now in place, the rollout has officially begun. The new Yahoo! Slurp 3.0 recognizes the same user-agent and all robots.txt directives for ‘Yahoo! Slurp,’ though it’ll identify itself as Slurp 3.0 in your web logs.

As the new software undergoes a phased rollout to our production crawlers over the next several weeks, you’ll see the following changes:

    a) The crawlers will start crawling from a different and much smaller set of IP addresses, but it’ll still be from the crawl.yahoo.net domain. Any reverse DNS checks to identify our crawler will continue to work. Please note that if you’re using IP-based recognition of our crawlers, you might see a drop in crawl/coverage from Yahoo! We strongly recommend that you move to reverse DNS-based identification of Yahoo! Slurp if you’re using any other method to avoid this problem. The current set of IPs will disappear from your web logs in the next several weeks.

    b) The crawlers will also publish a new user-agent, ‘Yahoo! Slurp/3.0.’ Existing robots.txt directives for ‘Slurp’ or ‘Yahoo! Slurp’ will continue to work, but if you have directives specific to ‘Slurp/2.0,’ they won’t be recognized by the new crawler (though usage of the ‘Slurp/2.0′ user-agent is very rare on the web, so you won’t likely be affected). We recommend specifying the shorter version of: User-agent: Slurp. Check out “How do I prevent my site or certain subdirectories from being crawled?” on our Help page for more details.

These changes will affect the main Yahoo! Web Search crawlers. Crawlers that similarly respect the Yahoo! Slurp directive but identify themselves more specifically, such as Yahoo! Slurp China and others, will not be impacted.

Let us know if you have any questions or observe anything unusual.

Sharad Verma & Yoram Arnon
Yahoo! Search

  • 26 Comments
  • Subscribe

RSS feed

26 Comments

Comment by Guido
2008-04-14 20:44:27

I dont know if it’s part of the migration, but i’m getting this:
User Agent: Wget/1.10.2 (Red Hat modified)<——-
Get String: http://www.yummyfood.net/index.php
Forwarded For: 67.195.4.142
Client IP: none
Remote Address: 67.195.50.114
Remote Port: 40084
Request Method: GET

I do not allow wget on my sites, so the bot is being blocked all the time.
What’s up?

 
Comment by Damiano
2008-04-15 02:56:27

I’m getting the same handyszene’s logs, I’ve verified them and they come from crawl.yahoo.net addresses, using wget as user agent.

HTTP_VIA:1.0 llf330007.crawl.yahoo.net:4080 (squid/2.6.STABLE1)
HTTP_ACCEPT:*/*
HTTP_USER_AGENT:Wget/1.10.2 (Red Hat modified)
HTTP_X_FORWARDED_FOR:67.195.4.140
&ALL_RAW=Cache-Control: max-age=31536000
Via: 1.0 llf330007.crawl.yahoo.net:4080 (squid/2.6.STABLE)

The crawler is indexing a site with robots.txt that disallows crawling!!!!

Now I’m blocking requests using the ip range, please fix this bug !

 
Comment by Paul
2008-04-15 04:55:57

I’m facing the same problem as Tarry. The same crawler indexes a robots.txt file and everything gets stuck.

 
2008-04-15 09:23:52

Has Yahoo had an update over the past several days.

It appears that social sites are even more dominant on the SERPs than they were during the previous month

Unfortunately, this bias towards social sites means excessive amounts of spammy sites getting high rankings – even pages that are redirects or no longer valid.

Look at this example while just shopping for Mothers Day and Birthday gifts:
http://search.yahoo.com/search?p=replica+handbags

The concept of focusing on the social web has tremendous potential, but it must be tweaked and optimized so as to protect search quality

Thank You for your time :-)

 
Comment by Charlie
2008-04-15 11:16:54

What about the Latin America market?
Should we be expecting some “radical” changes on the SEPR’s?

 
Comment by Office Furniture
2008-04-15 18:15:35

I appreciate the update. I’ll change the directives that are specific to ‘Slurp/2.0,’ now that they won’t be recognized by the new crawler.

 
Comment by interaction design
2008-04-15 23:02:34

Nice work, after some tweaking all seems to work fine, thanks!

 
Comment by Free Stuff
2008-04-16 03:02:05

I hope this changes will give great impact to my SERP.

 
Comment by Green Grant
2008-04-17 14:07:26

I’ve seen it in my logs already. Slurp is working!

 
Comment by Martin
2008-04-19 01:20:46

Slup is real great and it has really good functionality. Go on Yahoo!

 
Comment by Paulie Walnuts
2008-04-20 17:42:13

I have a site that has a depth of about 6 million pages to be index. I have uploaded sitemaps and I see the slurp bot on the server everyday but it crawls like a snail, what could possibly be the reason? That bot has plenty of food on this site that it should be running like an SOB. I have another site that has a depth of about 800k links but Yahoo has never indexed more than 4k in the last 4 years. Any help is appreciated. Thanks, Paulie Walnuts.

 
Comment by pat609
2008-04-26 00:23:15

Really great info. That explains why your crawlers looked little bit “lazy”recently.

 
Comment by Jordans
2008-04-29 23:40:44

Using Yahoo slurp is amazingly easy and convenient. A real fun tool….

 
Comment by george tomas
2008-05-11 09:04:39

good work, after some tweaking all seems to work fine, thanks!

 
Comment by Colorado
2008-06-02 04:56:17

I hope this clears up the spammy results somewhat. Yahoo seems to fill it’s serps with less than great sites for some searches.
Thanks for the update.

 
Comment by Self Defense Taser
2008-08-01 11:07:42

According to my weblogs is appears that the Yahoo Slurp is running as intended and error free on my sites, great job!

 
Comment by Eileen
2008-08-03 17:45:29

How does one distinguish between Yahoo and Yahoo Explorer. I was checking the indexing of my site
http://www.firsttimebuyernetwork.co.uk today and I found that seven on my pages were indexed on Yahoo Explorer but not on Yahoo. I don’t understand – is this the same platform?

Perplexed http://www.firsttimebuyernetwork.co.uk

 
Comment by iGuide
2008-08-06 10:44:09

Yahoo Site Explorer doesn’t work at all with any .travel domains. This is probably very frustrating for owners of the over 6 million .travel pages out there.

Please fix this simple bug, and allow Site Explorer to accept 6-letter top-level domains.

 
Comment by Peter Szabo
2008-08-08 03:39:49

Nice work, after some tweaking all seems to work fine, thanks!

 
2008-08-08 18:03:39

I have used yahoo over the past 6 years and the results are really starting to get more relevant now than in the past.

 
Comment by credit cards
2008-08-08 18:59:51

Yes im starting to see the yahoo slurp also working. Im glad about this because without teh site explorer i would be as blind as a bat for my site.

 
Comment by Alex Griffin
2008-08-24 21:31:06

Nice work, after some tweaking all seems to work fine, thanks!

 
Comment by key ödemeleri
2008-09-10 10:37:24

Nice work, after some tweaking all seems to work fine

 
Comment by Mio
2008-09-16 12:00:35

Using Yahoo slurp is amazingly easy and onvenient.
Great job. You are better and better.
thanks

 
2008-09-18 13:29:43

This is a nice feature we have been waiting for!

 
Comment by 物理治療
2008-09-19 01:08:04

Nice work, after some tweaking all seems to work fine, thanks!

 

Sorry, the comment form is closed at this time.

back to yahoo! search

subscription options

Facebook Fans

latest posts

archives