August 21, 2008
Site Explorer Gets a Makeover
A few years ago, we launched Site Explorer with the goal of providing site owners with better visibility into how we index their websites and what data we use in our search service. Over the years we’ve moved beyond simply providing information to webmasters to allowing them to tell us what to do with their site, using functions such as submitting feeds, deleting URLs or reporting spam. Our most successful function among all has been Dynamic URL Rewriting. We’ve had thousands of site owners enter rules for their websites and webmasters auto-rewrite an average of 25,000 URLs per rule, with some sites rewriting millions of URLs in 1 shot.
Today, we launched a new look and feel for Site Explorer (http://siteexplorer.search.yahoo.com/new) that provides a more dynamic interface to accommodate future feature roll-outs. The new interface also includes a new Site Summary page to provide statistics for authenticated sites. On top of this, we’re also increasing the number of rules for Dynamic URL Rewriting that you can enter from 3 to 10.
The new site is located at a special URL to give you some time to play around with it and update your tools that use our interface. We will make this the default experience soon, so please use this time to update your tools. And, as always, please give us feedback on your new experience. We want to hear from you!
Priyank Garg
Yahoo! Search
April 14, 2008
Yahoo! Slurp 3.0
Over the past few weeks, we’ve been preparing for the latest version of the Yahoo! Search crawler with some infrastructure updates, which recently caused a variance in our crawl behavior.
With everything now in place, the rollout has officially begun. The new Yahoo! Slurp 3.0 recognizes the same user-agent and all robots.txt directives for ‘Yahoo! Slurp,’ though it’ll identify itself as Slurp 3.0 in your web logs.
As the new software undergoes a phased rollout to our production crawlers over the next several weeks, you’ll see the following changes:
a) The crawlers will start crawling from a different and much smaller set of IP addresses, but it’ll still be from the crawl.yahoo.net domain. Any reverse DNS checks to identify our crawler will continue to work. Please note that if you’re using IP-based recognition of our crawlers, you might see a drop in crawl/coverage from Yahoo! We strongly recommend that you move to reverse DNS-based identification of Yahoo! Slurp if you’re using any other method to avoid this problem. The current set of IPs will disappear from your web logs in the next several weeks.
b) The crawlers will also publish a new user-agent, ‘Yahoo! Slurp/3.0.’ Existing robots.txt directives for ‘Slurp’ or ‘Yahoo! Slurp’ will continue to work, but if you have directives specific to ‘Slurp/2.0,’ they won’t be recognized by the new crawler (though usage of the ‘Slurp/2.0′ user-agent is very rare on the web, so you won’t likely be affected). We recommend specifying the shorter version of: User-agent: Slurp. Check out “How do I prevent my site or certain subdirectories from being crawled?” on our Help page for more details.
These changes will affect the main Yahoo! Web Search crawlers. Crawlers that similarly respect the Yahoo! Slurp directive but identify themselves more specifically, such as Yahoo! Slurp China and others, will not be impacted.
Let us know if you have any questions or observe anything unusual.
Sharad Verma & Yoram Arnon
Yahoo! Search
December 05, 2007
Yahoo! Search Support for X-Robots-Tag Directive to Simplify Webmaster’s Control and Weather Update
Today we’re announcing support for tags that give webmasters even more flexibility over which pages and documents are crawled and indexed by Yahoo! Search. Specifically, we’re extending our support of page level exclusion tags — NOINDEX, NOARCHIVE, NOSNIPPET, NOFOLLOW — to provide additional control for archiving and summarization of ANY file type. Previously, these page level tags could only be expressed within html pages through the META directive (for e.g. <META NAME="Slurp" CONTENT="NOARCHIVE">), but based on feedback from our webmasters, Yahoo! now enables these tags to be expressed through X-Robots-Tag directive in the http header, giving webmasters the flexibility to achieve exclusions on PDF, Word documents, PowerPoint, video, and other file types, including html files, and increasing their coverage through a simplified process. Additionally, webmasters no longer need access to html templates in order to express exclusions for html files. To take advantage of this feature, simply add the following page level tags to the X-Robots-Tag directive in the HTTP Header. Here are a few examples:
Along with this change, we’ll be rolling out additional changes to our crawling, indexing and ranking algorithms over the next few days. We expect the update will be completed early next week, but you may see some changes in ranking as well as some shuffling of the pages in the index during this process.
We’re at SES in Chicago and WebmasterWorld’s PubCon in Las Vegas, participating in a few different panels this week. Please find us if you have any questions or suggestions or drop us your feedback here.
Sharad Verma
Yahoo! Search
November 06, 2007
Site Explorer Counts Resolved
Last week we announced that we were working on a fix to correct discrepancies in the page and inlink data counts. The good news is that the fix is now complete. You should see consistent page and inline counts in Site Explorer, whether you’re logged in or logged out.
Try a query in Site Explorer to confirm. As always, let us know your feedback.
Priyank Garg
Yahoo! Search
October 26, 2007
Update on Site Explorer Results and Counts Data
Recently, some of you noticed changes in counts for Site Explorer results, where the counts were different for logged-in users versus logged-out users.
While the counts have been incorrect in some cases, the actual returned results have been correct. However, we did roll out a product fix yesterday and will be rolling out a couple more over the next few days to resolve this difference in counts some of you have observed.
Please disregard any counts for inlinks reported by Site Explorer from October 11 through next week. Thank you for raising this issue.
Priyank Garg
Yahoo! Search
September 10, 2007
Come and Explore your Site…
Yahoo! Small Business just made it easier for customers to submit and authenticate their sites to Yahoo! Site Explorer. Now, all you have to do is make sure that ’sitemap.xml’ is enabled and your site will be submitted to Yahoo! Site Explorer automatically.
With this feature, new stores as well as existing stores with ’sitemap.xml’ enabled will have access to the toolkit inside Site Explorer. Within a few hours of enabling, you’ll be able to locate your indexed pages and the links to your sites, as well as delete pages in the index or rewrite dynamic URLs. To double check if your site was auto-authenticated, take a look in the ‘Source’ column in the ‘My Sites’ page in Site Explorer.
If you’re looking for more information on the Sitemap feature, take a look at Sitemap help from Yahoo! Small Business. You can also read more about this feature on the Yahoo! Stores blog.
Welcome Yahoo! Small Business customers! Let us know how things are working for you in the comments below.
Priyank Garg
Yahoo! Search
August 21, 2007
Be Dynamic, Be Confident — Yahoo! Search Supports You
Please excuse the dramatic start to this post. Between the anticipation of rolling this out and my incessant Harry Potter reading, I couldn’t resist.
Once upon a time, on the World Wide Web, all URLs were fixed strings — static in form. The idea of URL parameters then came along, allowing for database driven sites and session ids in URLs to create personalized experiences for users. At that time, the Web was alive with rich data and experiences. Then came the crawlers, which made it easier for users to navigate through the Web; however, they inevitably battled with dynamic URL parameters and every webmaster had to choose between a dynamic site and search traffic.
Today comes a new wave for search engines with the first-ever Beta launch of ‘Dynamic URL Rewriting’ in Site Explorer. The new feature provides the ability for site owners to alert Yahoo! of the dynamic parameters in URLs that they’d like Yahoo! to ignore, which we’ll then automatically rewrite accordingly. Try this out for all the cases where you’d want to use parameters in your URLs that don’t affect the content of your page, but that have other important uses.
How to get there?
- Login to Site Explorer from Yahoo! Search.
- Add to My Sites and then authenticate any sites that you own or manage.
- For any sites that you have authenticated, you’ll see a ‘Dynamic URLs’ tab.
- On this tab you can enter parameters you want us to either remove from URLs or always crawl with a specific value.
- Once you enter the parameter, we’ll show you the # of URLs we estimate will be affected.
- After you confirm the action, we’ll modify our crawler such that every time we see a URL from your site with that parameter, we’ll automatically rewrite it within our system as per your instruction.
So you might wonder what the feature really gives you. Utilizing the ‘Dynamic URL Rewriting’ feature enables:
- A more efficient crawl of your site, with fewer duplicate URLs being crawled.
- Better and deeper site coverage, as we’ll be able to use our crawler capacity to find and index more new content on your site.
- More unique content discovered, as we’ll handle more dynamic parameters in your URLs (if you remove the content-neutral dynamic parameters).
- Fewer chances of crawler traps, or web page sets that can cause an infinite number of requests or a poorly constructed crawler to crash.
- Cleaner and easier-to-read URLs displayed in the search results.
- Better site ranking due to reduced fragmentation of links and anchor text to your site’s pages.
Looking for more details on when to use URL parameters? Visit the Site Explorer Help page for additional background on the Beta feature to help define and omit what dynamic URLs Yahoo! should ignore.
We’re here to address any questions/ needs that you have, so let us know how it works for you.
Priyank Garg
for Lakis, Amit B., Amit S., Jay, Judy, Srikanth, Zheng
Yahoo! Search
April 11, 2007
Webmasters Can Now Auto-Discover With Sitemaps
Since working with Google and Microsoft to support a single format for submission with Sitemaps, we have continued to discuss further enhancements to make it easy for webmasters to get their content to all search engines quickly.
All search crawlers recognize robots.txt, so it seemed like a good idea to use that mechanism to allow webmasters to share their Sitemaps. You agreed and encouraged us to allow robots.txt discovery of Sitemaps on our suggestion board. We took the idea to Google and Microsoft and are happy to announce today that you can now find your sitemaps in a uniform way across all participating engines. To do this, simply add the following line to your robots.txt file:
Sitemap: http://www.example.com/sitemap.xml
Please provide the complete URL for your Sitemap on this line. We will pick it up wherever you put it in your robots.txt file. This directive is not specific to user-agent. If you have multiple Sitemaps, you can point to your Sitemap index file on this line. Details about the Sitemaps protocol including this addition are available on the protocol website — http://www.sitemaps.org.
If you prefer, you can continue to issue Sitemaps to Yahoo! Search by simply inputting the URL for your Sitemap and submitting. Or add feeds to a site you are already managing under ‘My Sites’ in Site Explorer. This also allows us to provide more feedback to you about what we are doing with the sitemap.
We’re also happy to have some east coasters, Ask and IBM, announce their support for Sitemaps. The more the merrier!
We’ll also be sharing more this week at SES NY.
If you have other thoughts about how we can collaborate with other search engines on standards such as robots.txt, we’d love to hear from you — visit our suggestion board.
Priyank Garg
Product Manager, Yahoo! Search
April 10, 2007
Site Explorer Matures a Bit More and Accepts Mobile Feeds
It’s been nearly two years since we first made Site Explorer available . How time flies! Since its inception, we’ve added a number of new features to Site Explorer, including Feed Submission, Site Authentication and more data for webmasters. And today, we’ve got a few more additions to share with our users.
Site Explorer offers Mobile Submit
Enhancing our Mobile Site Submit feature, publishers can now submit mobile sites and feeds to Site Explorer, which enables them to get their mobile sites into Yahoo! oneSearch and gain access to Yahoo!’s mobile user base. Our mobile crawler will consume these feeds to help it find new pages. The feeds can be:
Site Explorer is out of Beta
A while back we added the Delete URL feature to provide more direct control to webmasters. This was a critical stage for Site Explorer and after having successfully crossed that milestone, today we’re taking it out of beta. Over the last few months, webmasters have tried out the various features and provided their feedback, which we’re addressing in this release:
Report Spam
We’ve heard from a number of webmasters who are looking for ways to address spam, so we’re trying out a new feature. Now when exploring your authenticated site, if you find a suspicious inlink, such as an off-topic link or a suspected linkfarm, just click on the ‘Report Spam‘ button and submit a spam report.
We hope you find these updates useful. And as always, keep the feedback coming!
Yahoo! Site Explorer and Mobile Search teams
February 26, 2007
Keeping Ad Tracking and Dead URLs out of Yahoo! Search
We’re often asked how Yahoo! Search determines which pages get indexed and which pages are left un-crawled. First and foremost, we honor the industry-standard robots.txt file format, which gives Webmasters several layers of control over which sites, pages and specific URLs should be indexed. Lately we’ve heard from a number of Webmasters asking how best to prevent ad tracking URLs and dead URLs from getting indexed, so we thought we’d respond via this post.
Ad tracking URLs
Ad tracking URLs are used by Webmasters to help determine what traffic is coming in from advertisements (e.g., Yahoo! Sponsored Search and Yahoo! Publisher Network) but aren’t necessary to include in the Yahoo! Search index. Sometimes you might notice that these URLs still appear in the index. That’s because they’ve appeared on pages that are “crawlable” or may have been copied over to crawlable pages by users. If you don’t want Yahoo! Slurp, our Web crawler to index these URLs you can use wildcards in robots.txt. For example, if you are using the parameter ‘ref’ to track ad sources, you can use a rule like the one below to keep your tracking URLs from being Slurped:
User-Agent: Yahoo! Slurp
Disallow: /*ref=YahooPublisherNetwork
Dead URLs
The best way to remove dead URLs from the Yahoo! Search index is to return an HTTP Error 404 when our crawler requests the page. If you want to act before the 404 discovery and URL removal process completes, you can use Site Explorer to quickly delete the URLs from the index. One advantage to using Site Explorer is that you can delete multiple URLs including an entire subpath so long as the URL prefix is the same. As Danny Sullivan points out in his deep-dive post on the delete function, if you delete http://domain.com/subarea1/, then all the pages that begin with ?domain.com/subarea1? will get removed. E.g.:
http://domain.com/subarea1/page1.html
http://domain.com/subarea1/page45.html
We’ll continue to visit the Yahoo! Search blog to give Webmasters like you pointers on how to better manage your sites in the Yahoo! Search index. Be sure to visit us at the Site Explorer Suggestion Board if there are specific areas that you’d like us address in more detail.
Thanks,
Priyank Garg
Yahoo! Search