Keeping Ad Tracking and Dead URLs out of Yahoo! Search
We’re often asked how Yahoo! Search determines which pages get indexed and which pages are left un-crawled. First and foremost, we honor the industry-standard robots.txt file format, which gives Webmasters several layers of control over which sites, pages and specific URLs should be indexed. Lately we’ve heard from a number of Webmasters asking how best to prevent ad tracking URLs and dead URLs from getting indexed, so we thought we’d respond via this post.
Ad tracking URLs
Ad tracking URLs are used by Webmasters to help determine what traffic is coming in from advertisements (e.g., Yahoo! Sponsored Search and Yahoo! Publisher Network) but aren’t necessary to include in the Yahoo! Search index. Sometimes you might notice that these URLs still appear in the index. That’s because they’ve appeared on pages that are “crawlable” or may have been copied over to crawlable pages by users. If you don’t want Yahoo! Slurp, our Web crawler to index these URLs you can use wildcards in robots.txt. For example, if you are using the parameter ‘ref’ to track ad sources, you can use a rule like the one below to keep your tracking URLs from being Slurped:
User-Agent: Yahoo! Slurp
Disallow: /*ref=YahooPublisherNetwork
Dead URLs
The best way to remove dead URLs from the Yahoo! Search index is to return an HTTP Error 404 when our crawler requests the page. If you want to act before the 404 discovery and URL removal process completes, you can use Site Explorer to quickly delete the URLs from the index. One advantage to using Site Explorer is that you can delete multiple URLs including an entire subpath so long as the URL prefix is the same. As Danny Sullivan points out in his deep-dive post on the delete function, if you delete http://domain.com/subarea1/, then all the pages that begin with ?domain.com/subarea1? will get removed. E.g.:
http://domain.com/subarea1/page1.html
http://domain.com/subarea1/page45.html
We’ll continue to visit the Yahoo! Search blog to give Webmasters like you pointers on how to better manage your sites in the Yahoo! Search index. Be sure to visit us at the Site Explorer Suggestion Board if there are specific areas that you’d like us address in more detail.
Thanks,
Priyank Garg
Yahoo! Search

There is a long standing bug in Yahoo SERPs that needs to be analyzed ASAP!
Getting listed on the Yahoo Directory will nix o your organic listing
Getting a local Yahoo listing will also nix the organic listings
We are talking about ALGOs not the controversy of the Edited Title and Descriptioi.
While the sites will be be in the Yahoo database and come up under the SITE or LINK operator – there is a huge programming flaw that conflicts with the organic SERPs once sites have been listed on either of the above.
This REALLY does exist – it is vital that this concern be fully explored.
Thank You, I have am reworking my entire site and ditching some URLs.
Why does it allow only 5 deletes per site?
Ad tracking URLs
301 rdirection is also OK? or not?
My site and others’ sites
The majority of the result of the retrieval of a new site disappeared.
Back in Oct we had server issues, basically we were giving out a lot of incorrect server headers for about a week due to server load issues. In Nov we saw what we expected a lot of pages being incorrectly indexed and our SERP’s dropped. we thought it would take time to get these incorrect URIs to disappear but as of this moment me are still showing thousands of pages that are either 404, redirects PPC URI’s or pages blocked by a robots.txt file. From my Stats Im seeing 20K visits a day from Slurps Ive not checked every ip address that the Bots coming from but Im assuming they are correct ip’s. The bot visits these pages get given the correct server response whether thats a 404 or 301 or 200 then visits the robots.txt file and gets given the disallowed folders. But yet we are 3 and a bit months in since we had issues with the server and Yahoo is still indexing rubbish and my serps haven’t returned. Thanks for your advice on this subject but precautions have been in place for almost 4 months and Slurps isn’t listening.
please fix 301 redirects! you should be treating them as a 404 for the old page and include the new page.
Something is still wrong with the indexing system. Some of my sites are now nowhere to be found on the search results.
Thanks for the advise on dead urls.
It should not be a limit for how many dead urls we can remove…..
Slurp should automatically not crawl dead urls after it crawls them several times and found dead.
Thank god for this service. Its very annoying when you come across dead urls. Most likely it was something that you needed and of course its not there. Nice step in the right direction.
Are you serious, really serious?
The HTML spec clearly says return code 410 is “gone”, and you are treating 404 like gone?
http://www.w3.org/Protocols/rfc2616/rfc2616-sec10.html
the explanation of 404 says it is temporary or unknown it also indicates “The 410 (Gone) status code SHOULD be used if the server knows, through some internally configurable mechanism, that an old resource is permanently unavailable and has no forwarding address.”
Yahoo slurp is free to treat a 404 page as if removed, but I can’t understand why you teach people to send a 404 if a 410 return code is appropriate.
Sorry, I misspoke. Off course I mean the HTTP spec not the HTML spec.
I think that Yahoo! Slurp should not crawl dead URLs. Once it finds that a link is dead remove it from the indexed page that it once used to be. This will help from people searching for something and not being able to access it.