Introducing Robots-Nocontent for Page Sections

We recently returned from our annual rendezvous at SES New York and, like always, learned a lot from our webmasters. The ‘Robots.txt Summit’ generated some healthy discussions and support for adding a tag to parts of a page that do not relate to the main content, such as navigation, menus repeated across the entire site, boilerplate text, or even advertising. We heard what people were asking for so we did a little homework and are now happy to introduce the ‘robots-nocontent’ tag.

This tag is really about our crawler focusing on the main content of your page and targeting the right pages on your site for specific search queries. Since a particular source is limited to the number of times it appears in the top ten, it’s important that the proper matching and targeting occur in order to increase both the traffic as well as the conversion on your site. It also improves the abstracts for your pages in results by omitting unrelated text from search result summaries.

To do this, webmasters can now mark parts of a page with a ‘robots-nocontent’ tag which will indicate to our crawler what parts of a page are unrelated to the main content and are only useful for visitors. We won’t use the terms contained in these special tagged sections as information for finding the page or for the abstract in the search results. Note: Using a “nocontent” tag to mark explicit sections of content is not considered “cloaking” because all of the content on the page is available to protect the relevance of the results (unlike “cloaking” where we may be served content that is different from what visitors see).

So for example, the header and boilerplate on Yahoo! Answers might be useful to visitors, but it’s probably not helpful when searching for this particular page. The ‘robots-nocontent’ tag allows you to identify that for our crawler in order to improve the targeting and the abstract for the page.


Applying the “class=robots-nocontent” Attribute:
Here are a few examples of how to apply this attribute for various uses and different syntax options:

    <div class="robots-nocontent"> This is the navigational menu of the site and is common on all pages. It contains many terms and keywords not related to this site</div>

    <span class="robots-nocontent"> This is the site header that is present on all pages of the site and is not related to any particular page</span>

    <p class="robots-nocontent"> This is a boilerplate legal disclaimer required on each page of the site</p>

    <div class="robots-nocontent"> This is a section where ads are displayed on the page. Words that show up in ads may be entirely unrelated to the page contents</div>

We’re rolling out an index update tonight for this change. As usual, you’ll see some changes in ranking along with shuffling of the pages that are included in the index. Let us know what you think and share your thoughts on other forms of support you’d like to see down the road on our suggestion board.

Update: Addressing some comments and questions, with regards to links, the ‘robots-nocontent’ does not in any way affect how links are treated. All links will continue to be used to find targets and will carry attribution to the target if they do not have the ‘rel=nofollow’ tag on them, whether or not they are inside a ‘robots-nocontent’ section.

We deploy various algorithms and mechanisms to understand your website and pages including headers, navigation, footers, etc. However, using this and other markup such as the ‘rel=nofollow’, you can ensure we have more information to understand your site correctly.

On standards, we would be happy to make this into a microformat and are already reaching out to that community. We chose this mechanism because we saw that it was compatible with existing standards and microformats and that makes it easier to gather broader support, including from the other search engines.

Priyank Garg
Yahoo! Search

  • Subscribe
back to yahoo! search

subscription options

Facebook Fans

latest posts