Yahoo! Search Crawler, Slurp, has a new Address and Signature Card
- Posted June 5th, 2007 at 7:49 am by Yahoo! Search
- Categories: Search
As we mentioned a few weeks ago, we’ve been migrating our crawler, Yahoo! Slurp, over to the new domain at crawl.yahoo.net. As of today, the transition is complete and all machines crawling as Slurp are now in crawl.yahoo.net. You can see this change in your web server logs, where the page accesses from inktomisearch.com are being fully replaced by crawl.yahoo.net contacts. Note that this does not cover other Yahoo! crawlers, such Yahoo! China, and other verticals, like Yahoo! Shopping, Yahoo! Travel, etc., which have their own user-agent.
Don’t fret though; there is no need to change your robots.txt file because the crawler user-agent is still Yahoo! Slurp. If you use IP based filtering, there is no need to change that either, since the IP addresses from which we crawl remain the same. However, please ensure that your network or firewall setup does not keep crawl.yahoo.net out as we won’t be able to include your content in our results.
With this transition complete, we also encourage you to setup reverse DNS-based authentication of our crawler to ensure that no rogue bots masquerading as ‘Slurp’ visit your site. Here is how it works:
- 1. For each page view request, check the user-agent and IP address. All requests from Yahoo! Search utilize a user-agent starting with ‘Yahoo! Slurp.’
- 2. For each request from ‘Yahoo! Slurp’ user-agent, you can start with the IP address (i.e. 74.6.67.218) and use reverse DNS lookup to find out the registered name of the machine.
- 3. Once you have the host name (in this case, lj612134.crawl.yahoo.net), you can then check if it really is coming from Yahoo! Search. The name of all Yahoo! Search crawlers will end with ‘crawl.yahoo.net,’ so if the name doesn’t end with this, you know it’s not really our crawler.
- 4. Finally, you need to verify the name is accurate. In order to do this, you can use Forward DNS to see the IP address associated with the host name. This should match the IP address you used in Step 2. If it doesn’t, it means the name was fake.
If you find a false DNS signature that you know is not ‘Yahoo! Slurp’ calling, you can manage access to your content appropriately. By simply returning an HTTP Error, you can block people from seeing your content.
We highly recommend you use this mechanism to manage access to our crawler, instead of using IP address based access. This ensures your setup to be robust for network and data center changes.
If you have any other feedback, please let us know.
Yoram Arnon
Yahoo! Slurp
Priyank Garg
Yahoo! Search
UPDATE: While we’ve confirmed that all our production crawlers that are crawling under the ‘Slurp’ user-agent have now been migrated over to crawl.yahoo.net, we wanted to clarify that some test crawl machines (that are used to test various ongoing improvements to our algorithms) may continue to crawl from inktomisearch.com. The few stragglers may still leave âinktomisearch.comâ in your web server logs, but rest assured, we intend to move everyone over to the new domain.
- 12 Comments
- Subscribe
Hopefully the new crawler will be better. Currently when you search for my domain name without the .com it isn’t listed at all… not even the first 1000 listings. The domain name isn’t even a dictionary word so I would expect it to be #1. Strange.
very handy post, very much dugg!
I will add this information to the various scripts i write that like to know what the crawlers are up to.
What about the other crawlers that you have running, such as:
Yahoo-MMCrawler/3.x (mms dash mmcrawler dash support at yahoo dash inc dot com)
Nokia6682/2.0 (3.01.1) SymbianOS/8.0 Series60/2.6 Profile/MIDP-2.0 configuration/CLDC-1.1 UP.Link/6.3.0.0.0 (compatible;YahooSeeker/M1A1-R2D2; http://help.yahoo.com/help/us/ysearch/crawling/crawling-01.html)
Hopefully the results will now better for Intensa Spanish language schools with the new crawler.
I prefer the evolution of Yahoo Slurp in order to change it’s Identity !
why can’t i save more then 5000 web sites on my yahoo ?? YAHOO is geting all this free info on me and you, like my music and news.give me launchcast for free !!! i am a owner of site named http://www.movieguruclub.com, how can i increase its rank in yahoo search.
How long will it take for the new crawler to update its indexes like i search for things on yahoo and it still shows links in results to websites are no longer there… but yet they still have high ranking in search results.
———-
n0mie
http://www.n0mie.com
I have checked in my cpanel stats for one of my site that yahoo slurp used 600 MB as bandwidth but only 50 pages has been indexed by google while google on the other end has indexed 18K pages of the same site. What could be the possible reason?
Hopefully the new crawler will be better. But when I am applying online for a checking account online from US Bank. It tells me to mail in a check made out to me that will be used as a opening deposit. It says to include a signature card. What is that, do I just sign the back of the check and send it or do I need something else?
Excellent stuff. Thats great news you can use reverse dns lookup with yahoo now to verify the bot.
Is there any way we can test this?
I have not seen any updates by the crawler and no changes to the position of Intensa Spanish language schools. I expected a little more, but there are no changes no matter what I try
600 yahoo crawler visits with 600 differents ip over a few hours ? this is near DOS for small server also great for ip based counters,and such…
So, no thanks, Blacklisted yahoo, I feel better without it.
Dan