Fighting Duplication: Adding more arrows to your quiver

  • Posted February 12th, 2009 at 12:29 pm by Yahoo! Search
  • Categories: Search, Search Tips

Avoiding duplicates in the search engine index has consistently been a key concern we’ve heard from webmasters and site owners. Over the last few years, we have made significant strides in finding duplicates in our crawler and index algorithmically and provided webmasters with better tools for controlling these. Today we are announcing our support for a new HTML tag, the <link> tag, which helps reduce duplicates by documenting the preferred URL form to access each page.

When you use the <link> tag, you can indicate the canonical URL form for crawlers to use for each page of content, no matter how it was retrieved. This puts the preferred URL form with the content so that it is always available to the crawler, no matter which session id, link parameter, sort parameter, parameter order, or other source of variance is present in the URL form used to access the page.

To do this, specify a <link> tag in the <head> section of your page content:

<link rel=”canonical” href=”http://www.example.com/products” />

The above tag indicates to the crawler that the URL it is present on should be represented canonically as http://www.example.com/products. This would eliminate the following duplicates:

http://www.example.com/products?trackingid=feed

http://www.example.com/products?sessionid=hgjkeor2

http://www.example.com/products?printable=yes&trackingid=footer

A few technical details:

• The URL paths in the <link> tag can be absolute or relative, though we recommend using absolute paths to avoid any chance of errors.

• A <link> tag can only point to a canonical URL form within the same domain and not across domains. For example, a tag on http://test.example.com can point to a URL on http://www.example.com but not on http://yahoo.com or any other domain.

• The <link> tag will be treated similarly to a 301 redirect, in terms of transferring link references and other effects to the canonical form of the page.

• We will use the tag information as provided, but we’ll also use algorithmic mechanisms to avoid situations where we think the tag was not used as intended. For example, if the canonical form is non-existent, returns an error or a 404, or if the content on the source and target was substantially distinct and unique, the canonical link may be considered erroneous and deferred.

• The tag is transitive. That is, if URL A marks B as canonical, and B marks C as canonical, we’ll treat C as canonical for both A and B, though we will break infinite chains and other issues.

For several years, we have had a clear policy on handling redirects that allows you to take control of how crawlers and browsers relate between pages on your site. Another useful tool for eliminating spurious dynamic URLs and avoiding content duplication is the Rewrite Dynamic URLs feature of Site Explorer. All you need to do is authenticate your site in Site Explorer, which can now be done instantly, and then create a URL Rewriting rule. The benefit of this approach is that Yahoo! does not need to crawl your duplicate pages to discover the canonical relationships. The <link> tag provides you with another resource to use, and is also being supported by our other partners in the Sitemaps effort, Google and Microsoft.

We recommend that you structure your site with normalized URLs and minimum duplication, or use 301s if need be. If those don’t work for you, try Site Explorer and/or the <link> tag. Our support for the <link> tag will be implemented over the coming months. Let us know if you have any questions on our Site Explorer Suggestion Board.

Priyank Garg
Director Product Management
Yahoo! Search

  • 79 Comments
  • Subscribe

RSS feed

79 Comments

Comment by Joost de Valk
2009-02-12 13:14:08

I’ve got Drupal, Magento and WordPress plugins ready for this feature: http://yoast.com/canonical-url-links/

 
Comment by Dave
2009-02-12 13:58:44

This is HUGE, long have been in an endless battle of avoiding duplicate content via our partner network.. Thank you yahoo!

 
Comment by Karthik
2009-02-12 15:47:58

This is a very good idea to avoid pages that have unnecessary and redundant parameters passed to it, which form complex urls.

 
Comment by Aleksandar Ratkovic
2009-02-12 15:58:14

This is great news. It’ll solve many problems with dinamical content websites.

 
Comment by Jared
2009-02-12 16:12:22

This is all well and good, but the Site Explorer API has been down since December.

 
Comment by taylor
2009-02-12 20:31:10

This seems like a major step forward and I don’t see any downside to it.

Just the ticket to make life LOTS easier for those who want to cooperate with search engines (everyone).

Q: is a 301 redirect to the canonical URL redundant if a page is using the link tag?

 
Comment by avatar
2009-02-12 23:22:00

Really that was well idea pertaining that has to abuse some spam link

Keep going.

 
Comment by Nicolas
2009-02-13 01:04:43

Not a good idea: avoiding duplicates in the search engine index must be done in deleting duplicate resources from your index.
You do not have to palliate the bad URL and HTML implementations of webmasters, in particular if they are not able to know, understand and apply (URL) standards.

 
Comment by William Alvarez
2009-02-13 12:02:39

I applause this initiative from the big G, Y! and MSN (aka Live), it’s a big help for e-commerce sites that live with this issue and becomes time consuming and expensive to support and solve.

 
Comment by David
2009-02-13 12:31:16

This is a great idea. It will save me trying to use .htaccess to rewrite all the weird version urls that come into our site and the utm_campaign variables.

 
Comment by 花蓮民宿
2009-02-13 19:17:48

It’ll solve many problems with dinamical content websites.

 
Comment by george sill
2009-02-14 23:46:45

LOTS easier for those who want to cooperate with search engines

 
Comment by dennisb
2009-02-15 07:53:25

Good information, most people don’t know how much duplicate content can harm search results and user experience.

 
Comment by hiddenmanna2008
2009-02-15 17:55:56

This is a bad idea, because it will hender those of us with “The Same Relevant Message” we’re trying to share with as many people as we can in different social networking communities on the web.

Duplicating that same message, as did, Dr.Martin Luther King, all of our Presidents, Congressmen, and women, and anyone with an agenda, whether it’s political, or business, which is what franchising is all about, “duplication,” and that’s the way the search engine must observe some of the content of those like me with a real redundant message.

 
Comment by 花蓮
2009-02-16 03:27:49

This seems like a major step forward

 
Comment by saxarock
2009-02-16 12:28:42

i like this tag)

 
Comment by Paul James
2009-02-17 02:19:34

It may have been nicer to have borrowed the form from Atom rather than creating a new rel type.

 
Comment by Daniele
2009-02-18 00:19:20

Hey Priyank, I use WordPress, /category/ and /tag/ generally crawl by search engines apart from the post pages, so by using “canonical” link can we prevent ourselves from content duplication?

 
Comment by 楊文值
2009-02-18 02:23:41

It’ll solve many problems with dinamical content websites.

 
Comment by 楊文值
2009-02-19 02:20:34

It will save me trying to use .htaccess to rewrite all the weird version urls that come into our site and the utm_campaign variables.

 
Comment by Ian M
2009-02-19 05:17:40

I have one absolutely burning question about this tag:

If I include it on a page which has a meta robots tag of “noindex”, and point it to a canonical variant of this page (which can be indexed), does this cause any problems?

Essentially, we use meta robots “noindex, follow” for things like pagination, different sorting order of products, etc etc – this handles the duplicate content issue (and much better than robots.txt, from a site-owner’s perspective).

What I want to make sure is that, if I include this new rel=canonical tag, that search engines that don’t handle this new tag can handle the “noindex” tag to eliminate duplicate content that way and search engines which do use the canonical tag are correctly supported.

This is the single most important thing I need to know about this new tag.

The second most important thing is – is the behaviour of the above standardised with the other search engines which are using it too?

 
Comment by george sill
2009-02-20 11:10:23

This is fantastic

 
Comment by 花蓮縣長
2009-02-21 05:53:44

we use meta robots “noindex, follow” for things like pagination, different sorting order of products, etc etc – this handles the duplicate content issue (and much better than robots.txt, from a site-owner’s perspective).

 
Comment by SEOxg
2009-02-21 08:14:36

I’ve modified discuz for canonical URL link: http://www.shyedu.net/it-website/discuz-URL-canonicalization-128.html

 
Comment by Arow Blackdragon
2009-02-24 18:13:22

This sounds great, but has anyone done any testing, yet? I mean, are dup URLs actually being *removed* from indexes by this? How is this going to affect manipulative duplicate content alogos?

Canonicalization issues should be addressed in planning and development and are very easily avoided when you’ve structured your website appropriately. Keyword research, develop, deploy. I just don’t trust this *one* tag (anyone remember metas?) to resolve the issues, entirely; it’s up to programmers to program accordingly. Dynamic 404s and strict URL structuring is an extremely effective, preemptive technique that people aren’t using as it is. What happens when this tag gets abused or deployed incorrectly?

Will this tag actually have any effect on ‘big’ sites that *don’t* implement this technique?

I need to understand the reward and penalty structure of this tag, in direct reference to white hat, and black hat, policies; and what Search Engines have in mind for this consideration.

This will be interesting to watch unfold over the next several months…

Arow

 
Comment by Proxima
2009-02-25 22:58:34

Arow,
Thanks for the insight into the tag.

I would like to know the implications too. I am a newbie to web development and In fact I was considering bypassing doing the research on the mod rewrites in php application that i am developing.

I have posted a few questions(as below) other forums too :

Does this help in how google sees the dynamic urls ?

With a not so adequate knowledge about how google sees the Dynamic URLs as not so google friendly, I was looking for all the information that is available for changing the dynamic URLs to the Static ones.

I am not sure if this tag saves all the research that I was about to do starting from the .htaccess files to the MOD rewrites for the php applications. OR is this tag really a substitute for that , Anyone – Any comments on that would be much appreciated.

Thanks,
Arun – Web developer ,
Proxima Systems India.

 
Comment by 花蓮民宿
2009-02-27 01:54:04

It may have been nicer to have borrowed the form from Atom rather than creating a new rel type.

 
Comment by 花蓮
2009-02-28 20:26:49

I am not sure if this tag saves all the research that I was about to do starting from the .htaccess files to the MOD rewrites for the php applications.

 
Comment by Article Directory
2009-03-03 02:54:46

Finally!! Useful Tag…Just use it now! Thanks

 
Comment by Super Jumbo Loans
2009-03-09 09:16:30

It is good to see this tag out. It will definitely solve a lot of problems for Webmasters everywhere. Good job!

 
Comment by Dvi Gear
2009-03-11 00:43:30

I hope this will help with our Yahoo store!

 
Comment by seo
2009-04-07 21:58:52

This tag will really helpful for the canonical problem which exist in the most of the sites.

 
Comment by Amit Doda
2009-04-26 10:24:12

Quite helpful. We have implemented it on few web sites but still the results are not very encouraging. Let’s see how it will behave further.

 
Comment by 花蓮民宿
2009-04-28 07:30:06

I would like to know the implications too. I am a newbie to web development and In fact I was considering bypassing doing the research on the mod rewrites in php application that i am developing.

 
Comment by 花蓮旅遊
2009-05-29 21:16:14

This is HUGE, long have been in an endless battle of avoiding duplicate content via our partner network.. Thank you yahoo!

 
Comment by 租車
2009-06-01 08:25:02

This tag will really helpful for the canonical problem which exist in the most of the sites.

 
Comment by Sikis
2009-06-11 06:46:16

Should be addressed in planning and development and are very easily avoided when you’ve structured your website appropriately. Keyword research, develop, deploy. I just don’t trust this *one* tag (anyone remember metas?) to resolve the issues, entirely; it’s up to programmers to program accordingly. Dynamic 404s and strict URL structuring is an extremely effective, preemptive technique that people aren’t using as it is.

 
Comment by Arow Blackdragon aka Arows1Faith
2009-06-16 23:54:57

Seeing as my original post (arows1faith; Feb 24th, 2009) hasn’t had any a/b testing replies, yet – and this article is still quite visible – I was wondering if anyone had any “I done gone and proved it” data to share?

I haven’t seen a difference in using this tag, alone. Combined with and on-page linking there is a significant difference, but nothing to show that this tag – by itself – is doing anything….

Arow

 
2009-06-17 00:02:39

I apologize for including {code} in my previous reply….

The second paragraph should read:

“I haven’t seen a difference in using this tag, alone. Combined with the {title} tag and on-page linking there is a significant difference, but nothing to show that this tag – by itself – is doing anything…”

 
Comment by Eligio
2009-06-27 13:39:53

Does the canonical link tag really works on yahoo? I decided to add this functionality on my site 3 months ago to minimize the coding but it seems the old url still exist and still indexed for example http://www.dressupdollgames.net/index.php?params=game/332/ , where it should be http://www.dressupdollgames.net/game/332/Roiworld-Girl-Dress-Up-Game-20.html . I have the tag place correctly on my site.

 
Comment by ayosini
2009-06-29 01:42:28

This sounds great, but has anyone done any testing, yet? I mean, are dup URLs actually being *removed* from indexes by this? How is this going to affect manipulative duplicate content alogos?

Canonicalization issues should be addressed in planning and development and are very easily avoided when you’ve structured your website appropriately. Keyword research, develop, deploy. I just don’t trust this *one* tag (anyone remember metas?) to resolve the issues, entirely; it’s up to programmers to program accordingly. Dynamic 404s and strict URL structuring is an extremely effective, preemptive technique that people aren’t using as it is. What happens when this tag gets abused or deployed incorrectly?

Will this tag actually have any effect on ‘big’ sites that *don’t* implement this technique?

I need to understand the reward and penalty structure of this tag, in direct reference to white hat, and black hat, policies; and what Search Engines have in mind for this consideration.

This will be interesting to watch unfold over the next several months…

 
2009-07-01 17:35:12

This tag will really helpful for the canonical problem which exist in the most of the sites.

 
Comment by PeringkatSatu
2009-07-01 18:29:08

I would like to know the implications too. I am a newbie to web development and In fact I was considering bypassing doing the research on the mod rewrites in php application that i am developing.

 
2009-07-01 18:33:30

This is HUGE, long have been in an endless battle of avoiding duplicate content via our partner network.. Thank you yahoo!

 
Comment by hupe
2009-07-04 10:24:15

This tag will really helpful for the canonical problem which exist in the most of the sites.

 
Comment by sikis
2009-07-14 21:23:36

Seeing as my original post (arows1faith; Feb 24th, 2009) hasn’t had any a/b testing replies, yet – and this article is still quite visible – I was wondering if anyone had any “I done gone and proved it” data to share?

I haven’t seen a difference in using this tag, alone. Combined with and on-page linking there is a significant difference, but nothing to show that this tag – by itself – is doing anything….

Arow

 
Comment by Clickbank Code
2009-07-23 01:11:46

Duplicate content is a headache. The most problem encountered is with the codes in wordpress blogs, you started off with no intention of duplication but ended up with duplication issues with the categories and tags. Duh!

 
Comment by Ivybot
2009-08-06 18:34:43

Good information, most people don’t know how much duplicate content can harm search results and user experience. This tag will really be helpful for the canonical problem which exist in the most of the sites.

 
Comment by 花蓮旅遊
2009-08-07 04:29:23

This is a very good idea to avoid pages that have unnecessary and redundant parameters passed to it, which form complex urls.

 
Comment by Coop
2009-08-08 02:43:39

The canonical link is a great step forward in fighting duplication. Thanks for the info on how to implement the canonical link tags.

 
Comment by Coop
2009-08-08 18:45:36

Can’t figured out how to use the Rewrite Dynamic URLs feature of Site Explorer. Where do I authenticate my site in Site Explorer…hmmm Though I must admit that the is very much easy to use for the canonical solutions.

 
Comment by 花蓮旅遊
2009-08-25 20:18:17

Where do I authenticate my site in Site Explorer…hmmm Though I must admit that the is very much easy to use for the canonical solutions.

 
Comment by 花蓮旅遊
2009-09-10 06:19:39

thanks… but nothing to show that

 
Comment by 花蓮租車
2009-09-10 06:21:00

How is this going to affect manipulative

thanks

 
Comment by kocsog
2009-09-20 04:44:01

This tag will really helpful

 
Comment by echealth
2009-10-01 19:52:14

What if your links are currently divided between the www. and non on yahoo site explorer. Will this meta tag fix the problem and help me avoid going backwards and trying to fix all these links 1 by 1?

 
Comment by guru
2009-10-20 19:57:47

Does the canonical issue still apply? I prefer plugins that address this, so much better! :)

 
Comment by fasciitis treatment
2010-01-14 20:45:38

Stretching plays an important part in rehabilitation from plantar fasciitis. By doing regular plantars fasciitis exercises, you can avoid endoscopic plantar fasciitis treatment which can lead to plantar fasciitis numbness. A course of plantar fasciitis excercises are clearly a preferred fasciitis treatment.

 
Comment by Realtime rss feeds
2010-02-03 03:25:44

Is there any other way to have this done, this is a great idea when you have only a couple of urls with duplicate content but, what if you have several urls, my site has a backround color variation feature, this actually changes the url parameters but the content is the same, how can I solve this? the .htaccess also can’t apply here: too many urls . Any help will be great, thanks.

 
Comment by Lavor
2010-03-11 23:46:18

I think this tag won’t help for the canonical problem.

 
2010-06-19 02:50:48

Eversince, this canonical issue came out, we have included and been applying this to all of our sites. It stated that it would eliminate different url structure with the same destination, thus consolidating link juice. This is great stuff, Im thinking – whats next?

Cheers,

 
Comment by Webdesign Company
2010-06-20 02:49:25

Nice information. Most people don’t know how much duplicate content can harm search results and user experience. This tag will really be helpful for the canonical problem which exist in the most of the sites. Thanks for sharing.

 
Comment by Randy Palmer
2010-07-30 12:49:15

As someone who has continued to optimize his own site, I do agree that duplicate content can hurt. The tag is very easy to use, and could be extremely helpful in the future.

 
Comment by joomla tutorial
2010-12-14 11:58:19

This is a great idea. It will save me trying to use .htaccess to rewrite all the weird version urls that come into our site and the utm_campaign variables.

 
Comment by Rennie
2010-12-20 15:36:40

This may solve a lot of duplication problems.

 
2010-12-24 00:33:52

Most people don’t know how much duplicate content can harm search results and user experience.

 
2011-02-06 00:01:02

This code is very useful, duplicate content is an ongoing issue.

 
Comment by lytess
2011-06-25 00:52:36

The canonical link is a great step forward in fighting duplication. Thanks for the info on how to implement the canonical link tags.

 
Comment by tiffany
2011-07-24 22:25:00

This seems like a major step forward

 
Comment by tiffany
2011-07-24 22:25:52

The canonical link is a great step forward in fighting duplication. Thanks for the info on how to implement the canonical link tags.

 
2011-08-15 01:58:15

Many people do not know a lot of duplicate content can harm your search results and user experience.

 
Comment by 防犯カメラ
2011-08-16 00:21:06

Many people do not know a lot of duplicate content can harm your search results and user experience.

 
Comment by 防犯カメラ
2011-08-16 00:21:59

duplicate content is an ongoing issue.

 
2011-08-20 10:52:44

Many people do not know a lot of duplicate content can harm your search results

 
2011-08-20 10:54:08

great step forward in fighting duplication. Thanks for the info on how to implement

 
Comment by プロペシア
2011-08-20 10:56:19

We always enjoy watching.

 
Comment by Weige king
2011-08-30 03:13:10

We always enjoy watching. thanks

 
2011-09-03 15:48:05

Many people do not know a lot of duplicatealways enjoy watching.

 
2011-09-06 04:53:19

always enjoy watching

 

Sorry, the comment form is closed at this time.

back to yahoo! search

subscription options

Facebook Fans

latest posts

archives