<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
		>
<channel>
	<title>Comments on: Yahoo! Search Crawler (Yahoo! Slurp) &#8211; Supporting wildcards in robots.txt</title>
	<atom:link href="http://www.ysearchblog.com/2006/11/02/yahoo-search-crawler-yahoo-slurp-supporting-wildcards-in-robotstxt/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.ysearchblog.com/2006/11/02/yahoo-search-crawler-yahoo-slurp-supporting-wildcards-in-robotstxt/</link>
	<description></description>
	<lastBuildDate>Fri, 20 Nov 2009 18:49:51 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.4</generator>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
		<item>
		<title>By: AussieWebmaster</title>
		<link>http://www.ysearchblog.com/2006/11/02/yahoo-search-crawler-yahoo-slurp-supporting-wildcards-in-robotstxt/comment-page-1/#comment-3589</link>
		<dc:creator>AussieWebmaster</dc:creator>
		<pubDate>Mon, 13 Nov 2006 15:47:05 +0000</pubDate>
		<guid isPermaLink="false">http://ysearchblog.com/blog/2006/11/02/yahoo-search-crawler-yahoo-slurp-supporting-wildcards-in-robotstxt/#comment-3589</guid>
		<description>What about a link from another site that contains tracking code... even though the robots.txt file tells the spider not to index the page - it is not coming from the site initially and if it is the arrival page are the rules of the robots.txt file still applied and disallow what is already recorded?
</description>
		<content:encoded><![CDATA[<p>What about a link from another site that contains tracking code&#8230; even though the robots.txt file tells the spider not to index the page &#8211; it is not coming from the site initially and if it is the arrival page are the rules of the robots.txt file still applied and disallow what is already recorded?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Kevin</title>
		<link>http://www.ysearchblog.com/2006/11/02/yahoo-search-crawler-yahoo-slurp-supporting-wildcards-in-robotstxt/comment-page-1/#comment-3588</link>
		<dc:creator>Kevin</dc:creator>
		<pubDate>Mon, 13 Nov 2006 12:52:47 +0000</pubDate>
		<guid isPermaLink="false">http://ysearchblog.com/blog/2006/11/02/yahoo-search-crawler-yahoo-slurp-supporting-wildcards-in-robotstxt/#comment-3588</guid>
		<description>Hello

So if we use /*? will these exclude all urls with query strings?

Pete
</description>
		<content:encoded><![CDATA[<p>Hello</p>
<p>So if we use /*? will these exclude all urls with query strings?</p>
<p>Pete</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jean-Luc</title>
		<link>http://www.ysearchblog.com/2006/11/02/yahoo-search-crawler-yahoo-slurp-supporting-wildcards-in-robotstxt/comment-page-1/#comment-3587</link>
		<dc:creator>Jean-Luc</dc:creator>
		<pubDate>Sat, 04 Nov 2006 12:28:33 +0000</pubDate>
		<guid isPermaLink="false">http://ysearchblog.com/blog/2006/11/02/yahoo-search-crawler-yahoo-slurp-supporting-wildcards-in-robotstxt/#comment-3587</guid>
		<description>You wrote: &quot;Oh, by the way, if you thought we didn&#039;t support the &#039;Allow&#039; tag, as you can see from these examples, we do.&quot;

We need to know the priority rules that apply when &quot;Allow:&quot; and &quot;Disallow:&quot; directives cover the same URL&#039;s (as far as I know, it is not standardized).

Example :
User-agent: Slurp
Allow: /*.html$
Disallow: /backup/

Will /backup/yes_or_not.html be spidered ?

Regarding these priority rules, there is an opinion here ( &lt;a href=&quot;http://www.conman.org/people/spc/robots2.html#format.directives.allow&quot; rel=&quot;nofollow&quot;&gt;http://www.conman.org/people/spc/robots2.html#format.directives.allow&lt;/a&gt; ) and another conflicting opinion here ( &lt;a href=&quot;http://books.google.com/webmasters/bot.html#robots&quot; rel=&quot;nofollow&quot;&gt;http://books.google.com/webmasters/bot.html#robots&lt;/a&gt; ). What about Yahoo&#039;s implementation ?

Jean-Luc
</description>
		<content:encoded><![CDATA[<p>You wrote: &#8220;Oh, by the way, if you thought we didn&#8217;t support the &#8216;Allow&#8217; tag, as you can see from these examples, we do.&#8221;</p>
<p>We need to know the priority rules that apply when &#8220;Allow:&#8221; and &#8220;Disallow:&#8221; directives cover the same URL&#8217;s (as far as I know, it is not standardized).</p>
<p>Example :<br />
User-agent: Slurp<br />
Allow: /*.html$<br />
Disallow: /backup/</p>
<p>Will /backup/yes_or_not.html be spidered ?</p>
<p>Regarding these priority rules, there is an opinion here ( <a href="http://www.conman.org/people/spc/robots2.html#format.directives.allow" rel="nofollow">http://www.conman.org/people/spc/robots2.html#format.directives.allow</a> ) and another conflicting opinion here ( <a href="http://books.google.com/webmasters/bot.html#robots" rel="nofollow">http://books.google.com/webmasters/bot.html#robots</a> ). What about Yahoo&#8217;s implementation ?</p>
<p>Jean-Luc</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: yaph</title>
		<link>http://www.ysearchblog.com/2006/11/02/yahoo-search-crawler-yahoo-slurp-supporting-wildcards-in-robotstxt/comment-page-1/#comment-3586</link>
		<dc:creator>yaph</dc:creator>
		<pubDate>Fri, 03 Nov 2006 09:12:04 +0000</pubDate>
		<guid isPermaLink="false">http://ysearchblog.com/blog/2006/11/02/yahoo-search-crawler-yahoo-slurp-supporting-wildcards-in-robotstxt/#comment-3586</guid>
		<description>Google also supports the wildcard characters *, ?, and $.
</description>
		<content:encoded><![CDATA[<p>Google also supports the wildcard characters *, ?, and $.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Surya</title>
		<link>http://www.ysearchblog.com/2006/11/02/yahoo-search-crawler-yahoo-slurp-supporting-wildcards-in-robotstxt/comment-page-1/#comment-3585</link>
		<dc:creator>Surya</dc:creator>
		<pubDate>Fri, 03 Nov 2006 07:56:05 +0000</pubDate>
		<guid isPermaLink="false">http://ysearchblog.com/blog/2006/11/02/yahoo-search-crawler-yahoo-slurp-supporting-wildcards-in-robotstxt/#comment-3585</guid>
		<description>Hi
Yes Brian, it will really be helpful if we can get the robots.txt approved from some Yahoo! authority.
What I get from the update is:

User-Agent: Yahoo! Slurp
Disallow: /*.gif$

using this will restrict this url
/public/images/xyz.gif

and allow
/public/images/xyz.gif?sessid=1234asd..

Please let me know if this is what it intends?
And what if I would like to disallow all files in a folder except one?

User-Agent: Yahoo! Slurp
Disallow: /images
Allow: /images/xyz.gif
Just out of curiosity :)
</description>
		<content:encoded><![CDATA[<p>Hi<br />
Yes Brian, it will really be helpful if we can get the robots.txt approved from some Yahoo! authority.<br />
What I get from the update is:</p>
<p>User-Agent: Yahoo! Slurp<br />
Disallow: /*.gif$</p>
<p>using this will restrict this url<br />
/public/images/xyz.gif</p>
<p>and allow<br />
/public/images/xyz.gif?sessid=1234asd..</p>
<p>Please let me know if this is what it intends?<br />
And what if I would like to disallow all files in a folder except one?</p>
<p>User-Agent: Yahoo! Slurp<br />
Disallow: /images<br />
Allow: /images/xyz.gif<br />
Just out of curiosity :)</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Harry G</title>
		<link>http://www.ysearchblog.com/2006/11/02/yahoo-search-crawler-yahoo-slurp-supporting-wildcards-in-robotstxt/comment-page-1/#comment-3584</link>
		<dc:creator>Harry G</dc:creator>
		<pubDate>Fri, 03 Nov 2006 07:52:36 +0000</pubDate>
		<guid isPermaLink="false">http://ysearchblog.com/blog/2006/11/02/yahoo-search-crawler-yahoo-slurp-supporting-wildcards-in-robotstxt/#comment-3584</guid>
		<description>This will simplify our work of writing the robots.txt file.
</description>
		<content:encoded><![CDATA[<p>This will simplify our work of writing the robots.txt file.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Brian M</title>
		<link>http://www.ysearchblog.com/2006/11/02/yahoo-search-crawler-yahoo-slurp-supporting-wildcards-in-robotstxt/comment-page-1/#comment-3583</link>
		<dc:creator>Brian M</dc:creator>
		<pubDate>Thu, 02 Nov 2006 22:36:33 +0000</pubDate>
		<guid isPermaLink="false">http://ysearchblog.com/blog/2006/11/02/yahoo-search-crawler-yahoo-slurp-supporting-wildcards-in-robotstxt/#comment-3583</guid>
		<description>I agree with Mike and Jean-Luc above. It would be much more helpful if ALL the search engines are going to follow the directives.

Or, you need to at least add the capability to test the robots.txt file for Yahoo directives in your Site Explorer (like Google does for their interpretation of the robots.txt file in their &quot;Webmaster Tools&quot;).

There is already confusion about what works and what doesn&#039;t (as Jean-Luc points out), so how else is a webmaster to be certain that changes will work?
</description>
		<content:encoded><![CDATA[<p>I agree with Mike and Jean-Luc above. It would be much more helpful if ALL the search engines are going to follow the directives.</p>
<p>Or, you need to at least add the capability to test the robots.txt file for Yahoo directives in your Site Explorer (like Google does for their interpretation of the robots.txt file in their &#8220;Webmaster Tools&#8221;).</p>
<p>There is already confusion about what works and what doesn&#8217;t (as Jean-Luc points out), so how else is a webmaster to be certain that changes will work?</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Priyank Garg</title>
		<link>http://www.ysearchblog.com/2006/11/02/yahoo-search-crawler-yahoo-slurp-supporting-wildcards-in-robotstxt/comment-page-1/#comment-3582</link>
		<dc:creator>Priyank Garg</dc:creator>
		<pubDate>Thu, 02 Nov 2006 22:22:21 +0000</pubDate>
		<guid isPermaLink="false">http://ysearchblog.com/blog/2006/11/02/yahoo-search-crawler-yahoo-slurp-supporting-wildcards-in-robotstxt/#comment-3582</guid>
		<description>Yes, &#039;Slurp&#039; also works as the user-agent. Essentially we look for the presence of &#039;Slurp&#039; in the user-agent string. Please refer

&lt;a href=&quot;http://help.yahoo.com/help/us/ysearch/slurp/slurp-02.html&quot; rel=&quot;nofollow&quot;&gt;http://help.yahoo.com/help/us/ysearch/slurp/slurp-02.html&lt;/a&gt;
</description>
		<content:encoded><![CDATA[<p>Yes, &#8216;Slurp&#8217; also works as the user-agent. Essentially we look for the presence of &#8216;Slurp&#8217; in the user-agent string. Please refer</p>
<p><a href="http://help.yahoo.com/help/us/ysearch/slurp/slurp-02.html" rel="nofollow">http://help.yahoo.com/help/us/ysearch/slurp/slurp-02.html</a></p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Mike &#124; MetaSearch</title>
		<link>http://www.ysearchblog.com/2006/11/02/yahoo-search-crawler-yahoo-slurp-supporting-wildcards-in-robotstxt/comment-page-1/#comment-3581</link>
		<dc:creator>Mike &#124; MetaSearch</dc:creator>
		<pubDate>Thu, 02 Nov 2006 22:00:05 +0000</pubDate>
		<guid isPermaLink="false">http://ysearchblog.com/blog/2006/11/02/yahoo-search-crawler-yahoo-slurp-supporting-wildcards-in-robotstxt/#comment-3581</guid>
		<description>I think that&#039;ll be very helpful for webmasters if they can use wildcards, but other search engines should also follow if webmasters are really going to use wildcards.
</description>
		<content:encoded><![CDATA[<p>I think that&#8217;ll be very helpful for webmasters if they can use wildcards, but other search engines should also follow if webmasters are really going to use wildcards.</p>
]]></content:encoded>
	</item>
	<item>
		<title>By: Jean-Luc</title>
		<link>http://www.ysearchblog.com/2006/11/02/yahoo-search-crawler-yahoo-slurp-supporting-wildcards-in-robotstxt/comment-page-1/#comment-3580</link>
		<dc:creator>Jean-Luc</dc:creator>
		<pubDate>Thu, 02 Nov 2006 20:26:50 +0000</pubDate>
		<guid isPermaLink="false">http://ysearchblog.com/blog/2006/11/02/yahoo-search-crawler-yahoo-slurp-supporting-wildcards-in-robotstxt/#comment-3580</guid>
		<description>Hi,

I am surprised to see:
User-Agent: Yahoo! Slurp

It used to be:
User-Agent: Slurp

Are both versions of the user-agent understood by all Yahoo! bots ?

Jean-Luc
</description>
		<content:encoded><![CDATA[<p>Hi,</p>
<p>I am surprised to see:<br />
User-Agent: Yahoo! Slurp</p>
<p>It used to be:<br />
User-Agent: Slurp</p>
<p>Are both versions of the user-agent understood by all Yahoo! bots ?</p>
<p>Jean-Luc</p>
]]></content:encoded>
	</item>
</channel>
</rss>
