February 12, 2009

Fighting Duplication: Adding more arrows to your quiver

Avoiding duplicates in the search engine index has consistently been a key concern we’ve heard from webmasters and site owners. Over the last few years, we have made significant strides in finding duplicates in our crawler and index algorithmically and provided webmasters with better tools for controlling these. Today we are announcing our support for a new HTML tag, the <link> tag, which helps reduce duplicates by documenting the preferred URL form to access each page.

When you use the <link> tag, you can indicate the canonical URL form for crawlers to use for each page of content, no matter how it was retrieved. This puts the preferred URL form with the content so that it is always available to the crawler, no matter which session id, link parameter, sort parameter, parameter order, or other source of variance is present in the URL form used to access the page.

To do this, specify a <link> tag in the <head> section of your page content:

<link rel=”canonical” href=”http://www.example.com/products” />

The above tag indicates to the crawler that the URL it is present on should be represented canonically as http://www.example.com/products. This would eliminate the following duplicates:

http://www.example.com/products?trackingid=feed
http://www.example.com/products?sessionid=hgjkeor2
http://www.example.com/products?printable=yes&trackingid=footer

A few technical details:

• The URL paths in the <link> tag can be absolute or relative, though we recommend using absolute paths to avoid any chance of errors.

• A <link> tag can only point to a canonical URL form within the same domain and not across domains. For example, a tag on http://test.example.com can point to a URL on http://www.example.com but not on http://yahoo.com or any other domain.

• The <link> tag will be treated similarly to a 301 redirect, in terms of transferring link references and other effects to the canonical form of the page.

• We will use the tag information as provided, but we’ll also use algorithmic mechanisms to avoid situations where we think the tag was not used as intended. For example, if the canonical form is non-existent, returns an error or a 404, or if the content on the source and target was substantially distinct and unique, the canonical link may be considered erroneous and deferred.

• The tag is transitive. That is, if URL A marks B as canonical, and B marks C as canonical, we’ll treat C as canonical for both A and B, though we will break infinite chains and other issues.

For several years, we have had a clear policy on handling redirects that allows you to take control of how crawlers and browsers relate between pages on your site. Another useful tool for eliminating spurious dynamic URLs and avoiding content duplication is the Rewrite Dynamic URLs feature of Site Explorer. All you need to do is authenticate your site in Site Explorer, which can now be done instantly, and then create a URL Rewriting rule. The benefit of this approach is that Yahoo! does not need to crawl your duplicate pages to discover the canonical relationships. The <link> tag provides you with another resource to use, and is also being supported by our other partners in the Sitemaps effort, Google and Microsoft.

We recommend that you structure your site with normalized URLs and minimum duplication, or use 301s if need be. If those don’t work for you, try Site Explorer and/or the <link> tag. Our support for the <link> tag will be implemented over the coming months. Let us know if you have any questions on our Site Explorer Suggestion Board.

Priyank Garg
Director Product Management
Yahoo! Search

December 13, 2007

Boost Your Blog with Yahoo! Shortcuts for WordPress (Beta)


Writing a good blog post is more than just putting words on paper. It’s also about rounding out ideas, opinions and thoughts with content that supports your statements — be it maps, pictures or links. And sometimes, the hassle of digging up that supporting content is the most painful part. So, to help bloggers address these pain points, we built Yahoo! Shortcuts for Wordpress — a technology that sits in the background and finds and offers content to help build out your post in real-time. Shortcuts lift the burden of finding additional content and integrating it into your posts so that you can focus on the meat — the writing.

So, how does it work? Simply download the Yahoo! Shortcuts plug-in and as your typing it will begin to find terms in your post such as company names and tickers, locations, news and product names — and, with no additional effort, integrates a roll-over or preview badge into your post. For example, "Crater Lake" brings up a map of Crater Lake to answer the "where the heck is that" question and "Citigroup" calls up a dynamic finance chart of the company’s stock performance. The product Shortcut (e.g. Nintendo Wii) displays the latest product reviews and price comparisons from retailers across the web via Yahoo! Shopping.

And, because a picture is worth a thousand words we didn’t stop there. While these days it is popular to release new features every half hour or so, we decided to hold the product and dive deeper to offer images, as well. Now, under Creative Commons licensing, we’ll recommend Flickr images based on the key themes of your post, with proper attribution to the original author of the picture included. For a complete list of Shortcuts available, check out this site.

flickr_shortcuts_editor_3

All of these capabilities are built on the premise of giving the publisher control — you can decide whether to keep or reject the recommended content and how it is visually presented. To see these controls in action watch this short tutorial.

You can learn more about it at Yodel Anecdotal . And if you’re one of the first 500 bloggers to install and use the plug-in we’ll send you one of these cool t-shirts.

We’ll continue to roll-out new versions to provide an open environment for developers to create their own modules for content within and outside of the Yahoo! network. So, stay tuned for future updates. If you have some ideas on which platforms to support next, let us know in the comments below.



Ariel Seidman & Luke Wroblewski


Yahoo! Search

March 02, 2005

10 Years That Rocked The World

Yahoo! incorporated in 1995, the year I discovered the World Wide Web. That year, I made a decision that changed my life: I dared myself to use the Web to find a job on the Internet. I was a natural-born information junkie who could read, write, edit, and catalog–and fearlessly follow hyperlinks wherever they might lead.

I bought a fast Pentium running shiny new Windows 95. I got ISDN. I downloaded each new beta browser. In early 1996, I was hired to build a directory of web sites for one of Yahoo!’s now vanished competitors. I stepped into the fast-moving current, riding wave after wave of discovery, gathering a daily catch of tools and trinkets: image maps, javascripts, dancing widgets, canonical lists of nearly everything. I was getting paid to websurf!

In those days, we studied Yahoo! to see how directory was done. I walked the tree, and pondered colon classification and what it meant that Ranganathan was a Yahoo!. Web search scaled and evolved quickly to colonize the new info landscape, but the algorithms were young, and results were erratic and sometimes surprisingly irrelevant.

Yahoo! hired me on my third try, in 1998. The Web seemed vast, but finite. We still believed there was an end of the Internet. Then, as now, the Yahoo! Directory exemplified the value of informed human intervention, aggregating and organizing the best of the Web, creating choice out of chaos. And Yahoo! was fast, free, and fun, with invisible, reliable, leading-edge technology.

Over the past seven years, it’s been a privilege to participate as Yahoo! and the Web grew up together. Through the tumultuous boom and bust years, search technology thrived. Yahoo! enjoyed a succession of relationships with great search providers. Then, more recently, we reinvented ourselves and launched Yahoo! Search Technology.

These days, search engine is a household word. The power of search has captured the public imagination and become essential in the lives of millions. And though we’re continually innovating, we’ve just begun to explore the multi-faceted, multimedia knowledge exchange that becomes possible when search technologies mature and get smarter. Stay tuned.

And now it’s time to celebrate. You’re invited to Yahoo!’s 10th birthday party. There’s even a present waiting for you there. Feeling nostalgic? Don’t miss our amazing, entertaining web installation, Netrospective: 10 years, 100 moments of the Web. We’d love to hear from you.

Havi Hoffman
Yahoo! Editorial