<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Yahoo! Search Tips for Webmasters: Saving Bandwidth</title>
	<atom:link href="http://www.ysearchblog.com/2005/02/10/yahoo-search-tips-for-webmasters-saving-bandwidth/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.ysearchblog.com/2005/02/10/yahoo-search-tips-for-webmasters-saving-bandwidth/</link>
	<description></description>
	<lastBuildDate>Fri, 20 Nov 2009 18:49:51 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: kris</title>
		<link>http://www.ysearchblog.com/2005/02/10/yahoo-search-tips-for-webmasters-saving-bandwidth/comment-page-1/#comment-7284</link>
		<dc:creator>kris</dc:creator>
		<pubDate>Thu, 14 Apr 2005 13:36:19 +0000</pubDate>
		<guid isPermaLink="false">http://ysearchblog.com/blog/2005/02/10/yahoo-search-tips-for-webmasters-saving-bandwidth/#comment-7284</guid>
		<description>good article!!!

is there anything in robots.txt that can speed up the indexing process as opposed to slowing it down ie crawl delay....
</description>
		<content:encoded><![CDATA[<p>good article!!!</p>
<p>is there anything in robots.txt that can speed up the indexing process as opposed to slowing it down ie crawl delay&#8230;.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ulrich Babiak</title>
		<link>http://www.ysearchblog.com/2005/02/10/yahoo-search-tips-for-webmasters-saving-bandwidth/comment-page-1/#comment-7283</link>
		<dc:creator>Ulrich Babiak</dc:creator>
		<pubDate>Mon, 11 Apr 2005 08:34:18 +0000</pubDate>
		<guid isPermaLink="false">http://ysearchblog.com/blog/2005/02/10/yahoo-search-tips-for-webmasters-saving-bandwidth/#comment-7283</guid>
		<description>Hi,

I wonder about the order of robots.txt directives for slurp and other robots.

For about a week I have a crawl-delay directive in place for 60 seconds -  with slurp coming in very regularly at 2 hits per minute. This is a lot better than the 30 requests per minute I had to deal with before, so obviously slurp obeys the directive at least in part.

Now I wonder about my disallow-directives:
If I have a special Slurp section in my robots txt like this:
-----------------snip----------
User-agent: Slurp
Crawl-delay: 60
---------------snap-------------

then do I have to repeat all allow/disallow commands in this section or will slurp take them from the global &quot;User-agent: *&quot; section?

Also I wonder whether the order of
User-agent: * and User-agent: Slurp sections matters. Given that many spiders have size limits when parsing robots.txt I think that the global section must be the the first with specific  User-agents sections following.

Regards,
Ulrich
</description>
		<content:encoded><![CDATA[<p>Hi,</p>
<p>I wonder about the order of robots.txt directives for slurp and other robots.</p>
<p>For about a week I have a crawl-delay directive in place for 60 seconds &#8211;  with slurp coming in very regularly at 2 hits per minute. This is a lot better than the 30 requests per minute I had to deal with before, so obviously slurp obeys the directive at least in part.</p>
<p>Now I wonder about my disallow-directives:<br />
If I have a special Slurp section in my robots txt like this:<br />
&#8212;&#8212;&#8212;&#8212;&#8212;&#8211;snip&#8212;&#8212;&#8212;-<br />
User-agent: Slurp<br />
Crawl-delay: 60<br />
&#8212;&#8212;&#8212;&#8212;&#8212;snap&#8212;&#8212;&#8212;&#8212;-</p>
<p>then do I have to repeat all allow/disallow commands in this section or will slurp take them from the global &#8220;User-agent: *&#8221; section?</p>
<p>Also I wonder whether the order of<br />
User-agent: * and User-agent: Slurp sections matters. Given that many spiders have size limits when parsing robots.txt I think that the global section must be the the first with specific  User-agents sections following.</p>
<p>Regards,<br />
Ulrich</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: F. Andy Seidl</title>
		<link>http://www.ysearchblog.com/2005/02/10/yahoo-search-tips-for-webmasters-saving-bandwidth/comment-page-1/#comment-7282</link>
		<dc:creator>F. Andy Seidl</dc:creator>
		<pubDate>Sat, 02 Apr 2005 21:56:17 +0000</pubDate>
		<guid isPermaLink="false">http://ysearchblog.com/blog/2005/02/10/yahoo-search-tips-for-webmasters-saving-bandwidth/#comment-7282</guid>
		<description>My sites are getting *lots* of malformed URL requests from Slurp--or an agent claiming to be Slurp; perhaps it is a spoof.

I set up some mod_rewrite rules to detect several classes of bad requests that would result from failure to recognize a base tag.  For the past several weeks, I&#039;ve been logging such requests.  Over that time, the vast majority of such requests (over 95%) claim to be from Slurp.  I have yet to see a request in this log from Google or MSN.

So, I suspect that either Slurp is sometimes missing base tags or my sites are being visited by an agent spoofing as Slurp.

I&#039;d be happy to share details with a Yahoo! engineer to help track down the root cause.  If there&#039;s a Slurp bug, it would be good to identify that.  If its a spoofing agent, at least we could start identifying IP addresses to ignore.
</description>
		<content:encoded><![CDATA[<p>My sites are getting *lots* of malformed URL requests from Slurp&#8211;or an agent claiming to be Slurp; perhaps it is a spoof.</p>
<p>I set up some mod_rewrite rules to detect several classes of bad requests that would result from failure to recognize a base tag.  For the past several weeks, I&#8217;ve been logging such requests.  Over that time, the vast majority of such requests (over 95%) claim to be from Slurp.  I have yet to see a request in this log from Google or MSN.</p>
<p>So, I suspect that either Slurp is sometimes missing base tags or my sites are being visited by an agent spoofing as Slurp.</p>
<p>I&#8217;d be happy to share details with a Yahoo! engineer to help track down the root cause.  If there&#8217;s a Slurp bug, it would be good to identify that.  If its a spoofing agent, at least we could start identifying IP addresses to ignore.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Paul Loberg</title>
		<link>http://www.ysearchblog.com/2005/02/10/yahoo-search-tips-for-webmasters-saving-bandwidth/comment-page-1/#comment-7281</link>
		<dc:creator>Paul Loberg</dc:creator>
		<pubDate>Tue, 08 Mar 2005 18:53:28 +0000</pubDate>
		<guid isPermaLink="false">http://ysearchblog.com/blog/2005/02/10/yahoo-search-tips-for-webmasters-saving-bandwidth/#comment-7281</guid>
		<description>The Yahoo-Newscrawler does not yet support the crawl-delay option in robots.txt. It is a different crawler than the Slurp web crawler and is only used to crawl selected news sites.

Kind Regards,

Paul Loberg
Yahoo! News Search engineer
</description>
		<content:encoded><![CDATA[<p>The Yahoo-Newscrawler does not yet support the crawl-delay option in robots.txt. It is a different crawler than the Slurp web crawler and is only used to crawl selected news sites.</p>
<p>Kind Regards,</p>
<p>Paul Loberg<br />
Yahoo! News Search engineer</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Ed</title>
		<link>http://www.ysearchblog.com/2005/02/10/yahoo-search-tips-for-webmasters-saving-bandwidth/comment-page-1/#comment-7280</link>
		<dc:creator>Ed</dc:creator>
		<pubDate>Mon, 07 Mar 2005 21:50:41 +0000</pubDate>
		<guid isPermaLink="false">http://ysearchblog.com/blog/2005/02/10/yahoo-search-tips-for-webmasters-saving-bandwidth/#comment-7280</guid>
		<description>Hello,

My site seems to be getting flooded with requests from Yahoo-Newscrawler/3.8-RSS&quot; and &quot;Yahoo-Newscrawler/3.9 RSS&quot;. I was wondering if it responds to &#039;Crawl-delay&#039; in the robots.txt file? I&#039;ve tried and it doesn&#039;t seem to.

It also seems to ignore my disallow. This is what I have in my robot.txt file:

User-agent: Yahoo-NewsCrawler*
Crawl-delay: 5
Disallow: /servlets/

Thanks,
Ed
</description>
		<content:encoded><![CDATA[<p>Hello,</p>
<p>My site seems to be getting flooded with requests from Yahoo-Newscrawler/3.8-RSS&#8221; and &#8220;Yahoo-Newscrawler/3.9 RSS&#8221;. I was wondering if it responds to &#8216;Crawl-delay&#8217; in the robots.txt file? I&#8217;ve tried and it doesn&#8217;t seem to.</p>
<p>It also seems to ignore my disallow. This is what I have in my robot.txt file:</p>
<p>User-agent: Yahoo-NewsCrawler*<br />
Crawl-delay: 5<br />
Disallow: /servlets/</p>
<p>Thanks,<br />
Ed</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Walter</title>
		<link>http://www.ysearchblog.com/2005/02/10/yahoo-search-tips-for-webmasters-saving-bandwidth/comment-page-1/#comment-7279</link>
		<dc:creator>Walter</dc:creator>
		<pubDate>Wed, 02 Mar 2005 16:56:38 +0000</pubDate>
		<guid isPermaLink="false">http://ysearchblog.com/blog/2005/02/10/yahoo-search-tips-for-webmasters-saving-bandwidth/#comment-7279</guid>
		<description>Yahoo is generating various 404 in my site..

There´s a lot of 404 that:

www.waltercruz.com/ler/graca

but the correct is www.waltercruz.com/morningstar/ler/graca

For some reason, the Yahoo is not reading the morningstar part :(
</description>
		<content:encoded><![CDATA[<p>Yahoo is generating various 404 in my site..</p>
<p>There´s a lot of 404 that:</p>
<p><a href="http://www.waltercruz.com/ler/graca" rel="nofollow">http://www.waltercruz.com/ler/graca</a></p>
<p>but the correct is <a href="http://www.waltercruz.com/morningstar/ler/graca" rel="nofollow">http://www.waltercruz.com/morningstar/ler/graca</a></p>
<p>For some reason, the Yahoo is not reading the morningstar part :(</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Richard Clarke</title>
		<link>http://www.ysearchblog.com/2005/02/10/yahoo-search-tips-for-webmasters-saving-bandwidth/comment-page-1/#comment-7278</link>
		<dc:creator>Richard Clarke</dc:creator>
		<pubDate>Wed, 02 Mar 2005 01:08:32 +0000</pubDate>
		<guid isPermaLink="false">http://ysearchblog.com/blog/2005/02/10/yahoo-search-tips-for-webmasters-saving-bandwidth/#comment-7278</guid>
		<description>Hi Dave,

Sorry im sounding like a right pain but as yet we still dont see any action.

Bot action Today, Google Hits 23,000, MSN 8,748 ASK 5,490 Inktomi Slurp 91 hits.

Even Alexa is hiting us harder than your slurp bot.

If you can advise what we should try to do to get your slurp bot to hit us harder it would be much appreciated. We have everything on the pages, links, site map etc, etc yet currently only a small number of pages have been cashed by your slurp bot.

Thanks in advance

Richard Clarke
richc@redgoldfish.co.uk
www.redgoldfish.co.uk
</description>
		<content:encoded><![CDATA[<p>Hi Dave,</p>
<p>Sorry im sounding like a right pain but as yet we still dont see any action.</p>
<p>Bot action Today, Google Hits 23,000, MSN 8,748 ASK 5,490 Inktomi Slurp 91 hits.</p>
<p>Even Alexa is hiting us harder than your slurp bot.</p>
<p>If you can advise what we should try to do to get your slurp bot to hit us harder it would be much appreciated. We have everything on the pages, links, site map etc, etc yet currently only a small number of pages have been cashed by your slurp bot.</p>
<p>Thanks in advance</p>
<p>Richard Clarke<br />
<a href="mailto:richc@redgoldfish.co.uk">richc@redgoldfish.co.uk</a><br />
<a href="http://www.redgoldfish.co.uk" rel="nofollow">http://www.redgoldfish.co.uk</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Richard Clarke</title>
		<link>http://www.ysearchblog.com/2005/02/10/yahoo-search-tips-for-webmasters-saving-bandwidth/comment-page-1/#comment-7277</link>
		<dc:creator>Richard Clarke</dc:creator>
		<pubDate>Sat, 26 Feb 2005 17:59:42 +0000</pubDate>
		<guid isPermaLink="false">http://ysearchblog.com/blog/2005/02/10/yahoo-search-tips-for-webmasters-saving-bandwidth/#comment-7277</guid>
		<description>Hi Dave,

Not heard a thing yet from anyone in your sales support. Your bot is still producing a mere 150 odd hits a day at our site with just 723 pages out of 100,000 taken in over a year!.

So far this month c3000 slurp bot hits V 140,000 google bot hits &amp; 40,000 MSN bot hits, even Ask jeeves hits us harder than your bot taking 26,000 hits!.

I just dont get it, We feature high in MSN, ASK, have various high positions in Google yet feature nowhere in Yahoo yet poor sites full of spam and doorway pages dominate the top of the sections where we should feature.

It certainly would be a start if the volume could be some how turned up a bit when your bot visits us.

Any advise you could give us would be apreciated

Kind Regards

Richard Clarke
richc@redgoldfish.co.uk
www.redgoldfish.co.uk
</description>
		<content:encoded><![CDATA[<p>Hi Dave,</p>
<p>Not heard a thing yet from anyone in your sales support. Your bot is still producing a mere 150 odd hits a day at our site with just 723 pages out of 100,000 taken in over a year!.</p>
<p>So far this month c3000 slurp bot hits V 140,000 google bot hits &#038; 40,000 MSN bot hits, even Ask jeeves hits us harder than your bot taking 26,000 hits!.</p>
<p>I just dont get it, We feature high in MSN, ASK, have various high positions in Google yet feature nowhere in Yahoo yet poor sites full of spam and doorway pages dominate the top of the sections where we should feature.</p>
<p>It certainly would be a start if the volume could be some how turned up a bit when your bot visits us.</p>
<p>Any advise you could give us would be apreciated</p>
<p>Kind Regards</p>
<p>Richard Clarke<br />
<a href="mailto:richc@redgoldfish.co.uk">richc@redgoldfish.co.uk</a><br />
<a href="http://www.redgoldfish.co.uk" rel="nofollow">http://www.redgoldfish.co.uk</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Dave</title>
		<link>http://www.ysearchblog.com/2005/02/10/yahoo-search-tips-for-webmasters-saving-bandwidth/comment-page-1/#comment-7276</link>
		<dc:creator>Dave</dc:creator>
		<pubDate>Fri, 25 Feb 2005 19:50:48 +0000</pubDate>
		<guid isPermaLink="false">http://ysearchblog.com/blog/2005/02/10/yahoo-search-tips-for-webmasters-saving-bandwidth/#comment-7276</guid>
		<description>Thomas:

No, our crawler doesn&#039;t use etags at this time.  It&#039;s something we might consider adding, but most server implementations use etags in addition to the last-modified header

&quot;bot slave&quot;:

You should send feedback to our support staff via the link in blog posting (&quot;feedback, questions or new ideas&quot;) if a robots.txt solution isn&#039;t working for you.

However, robots.txt is the way to opt out... you can disallow our bot altogether, though if the issue is rate of requests, use the crawl-delay tag instead.  You may want to re-verify that the bot that is causing you problems is actually Yahoo Slurp as there are other Yahoo crawlers that have different agent names.  Otherwise make sure you validate your robots.txt file (try a search for &quot;robots.txt validator&quot;).
</description>
		<content:encoded><![CDATA[<p>Thomas:</p>
<p>No, our crawler doesn&#8217;t use etags at this time.  It&#8217;s something we might consider adding, but most server implementations use etags in addition to the last-modified header</p>
<p>&#8220;bot slave&#8221;:</p>
<p>You should send feedback to our support staff via the link in blog posting (&#8221;feedback, questions or new ideas&#8221;) if a robots.txt solution isn&#8217;t working for you.</p>
<p>However, robots.txt is the way to opt out&#8230; you can disallow our bot altogether, though if the issue is rate of requests, use the crawl-delay tag instead.  You may want to re-verify that the bot that is causing you problems is actually Yahoo Slurp as there are other Yahoo crawlers that have different agent names.  Otherwise make sure you validate your robots.txt file (try a search for &#8220;robots.txt validator&#8221;).</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: bot  slave</title>
		<link>http://www.ysearchblog.com/2005/02/10/yahoo-search-tips-for-webmasters-saving-bandwidth/comment-page-1/#comment-7275</link>
		<dc:creator>bot  slave</dc:creator>
		<pubDate>Thu, 24 Feb 2005 02:32:09 +0000</pubDate>
		<guid isPermaLink="false">http://ysearchblog.com/blog/2005/02/10/yahoo-search-tips-for-webmasters-saving-bandwidth/#comment-7275</guid>
		<description>ok, 1 simple question, how to I tell slup to stay of my site? they are nailing me to no end, i just want it to go away and robots.txt has not helped.  I do not need slurp I want out, how do I opt out of this? thanks
</description>
		<content:encoded><![CDATA[<p>ok, 1 simple question, how to I tell slup to stay of my site? they are nailing me to no end, i just want it to go away and robots.txt has not helped.  I do not need slurp I want out, how do I opt out of this? thanks</p>
]]></content:encoded>
	</item>
</channel>
</rss>
