Friday, 20 February 2009

I say tomato/, you say tomato?sort=name-desc

Last week Google, Yahoo and Microsoft agreed to support a new HTML tag hidden in the head section of a webpage. The tag, which is generally called the canonical tag, looks like this:

<link rel="canonical" href="http://www.wahanda.com/"/>

This new tag is designed to help alleviate the issue of duplicate content; a real problem for any SEO conscious site. Basically the tag is an instruction to search engine crawlers that says "I don't care what URI you used to reach this page, the URI I want indexed is the one in the canonical tag".

For our site,
Wahanda, this solves two problems.

The first is where a single page gets indexed under several different URIs, even though the content is identical. We recently had our Valentine's Day page indexed in Google under three different URIs:

/seasonal/valentines-day-gifts
/seasonal/valentines-day-gifts/
/seasonal/valentines-day-gifts?x=y


This is partially our fault; we should have 301 redirected all the variants to one 'canonical' form of the URI, but it's irritating that the engines don't compare the content of pages with similar URIs to see if they actually represent the same page.

The second problem for us is internal search result and listing pages. These two pages:

/therapists/pro-personal-trainer/in-london-uk
/therapists/pro-personal-trainer/in-london-uk-depth-1/sort-profile-name-desc


Are logically the same set of results, just with different sort orders. There is no value in having both of these pages indexed. In fact, for all listing pages we only really want to get the first page of results in the deafult sort order indexed. The canonical tag lets us do this by having the same canonical URI for all of the different views on this set of results.

This is all well and good, but supporting this tag means changes to a lot of pages for us. Verifying that a page has a canonical URI set and checking that it points to the correct page involves an awful lot of viewing-of-source and cutting-and-pasting. With that in mind, we've written a
canonical tag Firefox extension to help. Hopefully it will be of some use to others!