Google Webmaster Central Blog - Official news on crawling and indexing sites for the Google index

New markup for multilingual content

Monday, December 05, 2011 at 8:07 AM

Many websites serve users from around the world. There are different approaches to serving content appropriate to your users' language and/or region. Last year, we launched support for explicit annotations for web pages rendering the same content with different language templates.
Today we're going further with our support for multilingual content with improved handling for these two scenarios:
  • Multiregional websites using substantially the same content. Example: English webpages for Australia, Canada and USA, differing only in price
  • Multiregional websites using fully translated content, or substantially different monolingual content targeting different regions. Example: a product webpage in German, English and French

Specifying language and location

We've expanded our support of the rel="alternate" hreflang link element to handle content that is translated or provided for multiple geographic regions. The hreflang attribute can specify the language, optionally the country, and URLs of equivalent content. By specifying these alternate URLs, our goal is to be able to consolidate signals for these pages, and to serve the appropriate URL to users in search. Alternative URLs can be on the same site or on another domain.

Annotating pages as substantially similar content

Optionally, for pages that have substantially the same content in the same language and are targeted at multiple countries, you may use the rel="canonical" link element to specify your preferred version. We’ll use that signal to focus on that version in search, while showing the local URLs to users where appropriate. For example, you could use this if you have the same product page in German, but want to target it separately to users searching on the Google properties for Germany, Austria, and Switzerland.

Example usage

To explain how it works, let’s look at some example URLs:
  • http://www.example.com/ - contains the general homepage of a website, in Spanish
  • http://es-es.example.com/ - is the version for users in Spain, in Spanish
  • http://es-mx.example.com/ - is the version for users in Mexico, in Spanish
  • http://en.example.com/ - is the generic English language version
On all of these pages, we could use the following markup to specify language and optionally the region:

<link rel="alternate" hreflang="es" href="http://www.example.com/" />
<link rel="alternate" hreflang="es-ES" href="http://es-es.example.com/" />
<link rel="alternate" hreflang="es-MX" href="http://es-mx.example.com/" />
<link rel="alternate" hreflang="en" href="http://en.example.com/" />

If you specify a regional subtag, we’ll assume that you want to target that region.
Keep in mind that all of these annotations are to be used on a per-URL basis. You should take care to use the specific URL, not the homepage, for both of these link elements.

More help

As always, if you need more help correctly implementing multiregional and multilingual websites, please see our Help Center article about this topic, or ask in our Webmaster Help Forum.
The comments you read here belong only to the person who posted them. We do, however, reserve the right to remove off-topic comments.

49 comments:

Tommy Carlier said...

Is this limited to the <link>-tag, or do you also support rel="alternate" and hreflang on <a>-tags?

Christopher Semturs said...

This is supported for link elements only (html-header or http-header). On anchor elements it would not be immediately obvious if this links to the same content for another language/region or something completely different.

Fabio Schenone said...

Great resource, but not that easy to implemente, the combination of rel canonical and rel alternate could be tricky.

Ian said...

The original version of this required rel=canonical, but it's not clear from this post whether this is still required - please can you confirm?

Tommy Carlier said...

Christopher, the specs clearly say that the combination of rel="alternate" and hreflang on an <a>-element indicate that the hyperlink refers to the same content as the current page in a different language, exactly like the <link>-element.

I just ask because one of our websites has hyperlinks to other languages for the current page using the appropriate attributes <a href="/nl/page.html" rel="alternate" hreflang="nl-BE">Dutch</a>

I wouldn't want to have to add a bunch of <link>-elements with exactly the same information.

Ash said...

The sites that really need this are often large multinationals that have expensive, unwieldy CMSs that can't be modified to accommodate new header tags. I wonder how many large companies will adopt this.

Benoit said...

What should we do for Quebec region ?
CA-FR ?

Tommy Carlier said...

Benoit: fr-CA

Kangorimo said...

Hi,

We're launching into many European countries early in 2012, having the entire website translated into the appropriate languages.

Our original intention was to use completely different domains, such as www.website.fr, www.website.de. However, will there be more benefit to target the language pages, as stated above, from our existing .com site?

With regards to these changes, what is the impact of them on getting the new language pages crawled and ranked by Google?

Thanks in advance

Easca said...

Do any of the other link tag relationships work?

http://www.w3.org/TR/html401/struct/links.html#h-12.3.3

Teddie said...

Christopher, I have to agree with Tommy, wouldn't it be simpler, as most mutliligual websites use country selection menus to allow the rel tag to be used and recognised within anchors on those menus. Therefore I could deploy one set of menu code across all language variations of a site and not have to mess around with page headers?

I presume that you don't support this because as in body code it could be much more easily hackable. IE: people pasting in the tags into comment boxes etc to steal PR.

Christopher Semturs said...

@Ian Rel-canonical is no longer required. It is suggested for highly similar content in the same language. Not recommended for different language content, or for strong content variations.

Christopher Semturs said...

@Kangorimo annotating the .com page with all regional variations (as well as the other way round) certainly makes sense and helps on discovery.
As for ranking, it is one of many input signals.

Kangorimo said...

@Christopher_Semturs Do you think annotating the pages to the .com site makes more sense that using ccTLDs? We have already purchased the new domain names, but were wondering whether today's announcement changes what the best practice would be for a site such as ours that is just about to launch in to foreign markets.

Animatedgiff said...

This is a good development and helps to resolve an issue I have faced with several pan-european and global clients. Maybe the next step will be to provide markup for mobile content. As with the issue of near-duplicate content for two countries using the same language, most mobile sites will under normal circumstances provide similar or identical content to the sites from which they have been adapted. What we need is a solution to avoid duplicate content issues as well as acting as a sign-post for mobile search results.

Christopher Semturs said...

@Kangorimo
there is no contradiction. You could:
- put the regionalized content on the TLD-variations (E.g. on example.de and example.fr).
- on all pages that you have (example.com/example.de/example.fr) annotate all 3 of them with rel-alternate-hreflang

The annotation tells where the language/region-variation is to be found, not that the .com page covers that region/language as well.

The help center article can guide you through a more detailed example.

sfi said...

@Christopher_Semturs: You say "[Rel=canonical] is suggested for highly similar content in the same language."
Do you mean on different TLDs? If so, wouldn't that send linkjuice just to, say, the .de version of a site leaving the .at and .ch version juiceless? And wouldn't that lead to only the .de version being presented in the SERPs?

Christopher Semturs said...

@sfi think of it in a different way.
Step 1) Google decides to show your URL for a search result (e.g. your canonical example.de)
2) The user searches on google.at in German, and you defined:

link rel="alternate" hreflang="de-at" href="http://example.at"

on the canonical. So it will show example.at as the url instead of example.de

(same is true for all subpages).

Beerweasle said...

Hi,

i have a question about these link tags.

We have a multilingual site, for example:

http://www.domain.de/de/
http://www.domain.de/at/

In Germany (de) and in Austria (at) they speak German.
To avoid duplicate content, we blocked /at/ in our
robots.txt, because Germany is our main country.

Is it recommended to remove this pages from robots.txt
if we add these link-tags per page, for each country we
support?

We have more issues with "duplicate content" on other
pages. f.e. our greece homepage is currently in english,
which would give us duplicate content to /gb/ etc.

Is this problem solved with that link-tags?

Christopher Semturs said...

@Beerweasle

This means you have content on
/de/
/at/
/gb/
/gr/
?

You should definitely let Google crawl all of them (robotting out for controlling indexing is not a good idea).
in addition, decide for each page which language/region it represents, establish your link-rel-alternate-hreflang block and put it on all pages, e.g.:
link rel=alternate hreflang=de-at href=domain.de/at/product?id=3
link rel=alternate hreflang=de-de href=domain.de/de/product?id=3
link rel=alternate hreflang=de href=domain.de/de/product?id=3
link rel=alternate hreflang=en-gb href=domain.de/gb/product?id=3
...

and put it on all pages.
In addition, to mark highly similar content, you can use rel-canonical. For instance, on the page domain.de/at/product?id=3
link rel=canonical href=domain.de/de/product?id=3

that should work just fine.

Beerweasle said...

@Christopher Semturs:

Yes, we have content on /de/ /at/ etc.

We were worried about duplicate content on /de/ and /at/ .. and we don't want that german customers use the /at/ website, and reverse.

So the best solution for us was to disallow /at/ on robots.txt

So you think, with hreflang-link on every page for each language and a canonical link on the "foreign page" would be fine?

Christopher Semturs said...

@Beerweasle

Yes, that should work fine (hreflang and - in your case - combined with rel-canonical).

Beerweasle said...

Just one last question:

if i have a page /de/search.html?word=blablaba

Whats the Language equivalent - or should i set it?

/at/search.html?word=blablaba

Whats the canonical to that?

/de/search.html?word=blablaba
or
/de/search.html

( Parameter "word" changed the content of the page, of course )

Christopher Semturs said...

@Beerweasle
it would be
/de/search.html?word=blablaba
(there should be a 1:1 mapping between /de/* and /at/* in case of identical content).

seohippus said...

If I have same content but different country target appearing correctly in Bing, using canonical link elements will break this.

What happened to Google and Bing working together on CLE's? It seems you are now using them in completely contradictory ways.

Flemming Kaasgaard said...

If you are a fairly large corporation, with i.e. 50-60 (or more) different TLD's, each containing the same content, should you then make a list in your header with 50-60 extra lines of code, one for each version of the page?

I can see this make perfect sense for fixing .us, .co.uk, .ie, .au and various other instances where different locations speak the same language, but would the correct use of the tag be to list only the true duplicates, intended for other languages, or to make a comprehensive list?

Christopher Semturs said...

A comprehensive list would still be helpful, but you might want to issue regionalized searches on Google to see where you need to take action and where you are happy with the results.
Be aware that you can send the link-tag also in the http-header instead of html-header if that helps (e.g. so that it is not sent to real web-clients so that the page renders faster).

Walkinraven said...

So this is a 'new' tag? The HTML 4 specification since 1999 have listed this tag:
http://www.w3.org/TR/1999/REC-html401-19991224/struct/links.html#adef-hreflang

Eyewebmaster said...

I'll probably do this to one of client who have a resort and wanted to have it in English and French..

Thanks for this information

Paul D said...

@Christopher Semturs: Going back to Tommy's point from earlier, there are now two cases where you might use the rel/hreflang attributes:

1. On link elements (as described in your post)

2. On anchor elements that link to content in other languages

Are you saying that we should only use these attributes for case 1 and not case 2? Does Google even support case 2?

Christopher Semturs said...

@PaulD please annotate hreflang on the link-tag (Either html-header or http-header).

Katerina said...

Heelo,
I have problem with url modifications my software it is too complicated.

Now look my url like
www.examle.com/?language_id=21 for german

what I can do ?

Christopher Semturs said...

as you can see in the help center article, that annotation does not require URL modifications, adding the annotation on all involved pages is sufficient.

Jack DeNeut said...

This might be just what I've been looking for.

We run a travel/local search site, localized into a dozen languages, and what we've been doing so far is using robots.txt to avoid the duplicate content on, for example, .de and .at.

So, if a restaurant was located in Austria, we'd block indexing of the page on our .de site (using robots.txt) so that the same restaurant wouldn't appear twice in German.

Am I to understand that I can now stop doing that, as long as I mark the Austrian site pages as the alternate for de-AT? In our case, the two restaurant pages would often be *indentical* , i.e. http://www.example.de/some-restaurant and http://www.example.at/some-restaurant would have the same content.

Or should I still try to sculpt this with robots.txt?

Christopher Semturs said...

Hi,

first important message: Try to never use robots.txt for controlling indexing. Use the no-index tag.
http://code.google.com/intl/de/web/controlcrawlindex/docs/robots_meta_tag.html

In your scenario using rel-alternate-hreflang, combined with rel-canonical, sounds like a perfect match. You must enable Google to fetch all variations (aka not block in robots.txt).

Jack DeNeut said...

Thanks for your fast reply. I'll implement these changes on our sites immediately. This may solve the problem that our .ca, .co.uk, .at, .ch, etc. sites never seem to get any traffic despite the fact that they are indexed by the GoogleBot.

Joni said...

How will this look if you have an international website with appr. 150 subdirectories (all representing a different country)?

Of these 150 subdirectories (countries) 110 are using English (no local translation available) and are in essence duplicate from each other.

We can canonicalize all of these URLs to the UK version, but how can we best implement the rel="alternate" hreflang="x" tags?

Does this mean we will have 150 alternate tags on top of the source code (or HTTP header)?

Also; is it recommended to also add all of the subdirectories (countries) to GWC individually and set targeting there?

JLSW said...

I have three sites with similar but not identical, content. Our biz has offices in US, UK and AU.

We use Domain Access to allow each office to update content as needed, but much of the content is identical.

Does the use of Domain Access affect how well the hreflang element works? Does it matter that one of the sites is a ccTLD and the other two are subdomains?

JLSW said...

Also, if you specify a canonical, do you only use the hreflang tag on the canonical versions of the pages?

eradrix said...

How to deal with images? (when duplicated image appears on different localized pages, with localized alt attributes)

Christopher Semturs said...

@Joni 110 English-language variations sounds a little bit too much to me. Are they really all somehow different? Feel free to post the real example, either here or via a G+-message.

Christopher Semturs said...

@JLSW
a combination of ccTLD-URLs and sub-paths or sub-domains does not matter. We advise to annotate all of them (on the page level), not only the canonical.

Christopher Semturs said...

eradrix:
Giving a concrete example would enable me to answer this question better, but to give a general answer:
Sounds like you consider the images as part of the page. As such, annotate the pages and you are fine.

Rick F said...

I have just implemented this across the 3M Corporation's public facing sites. Since we already did something similar to this on all of our pages it was a snap to implement. Glad to see that Google is finally allowing us to link our country sites together in a smart fashion.

Amjad Khan said...

Does this means the answer of this question is a YES. The same content displayed on different cross domains like ccTLds will be considered as duplicate.

http://www.youtube.com/watch?v=Ets7nHOV1Yo

Do using the hreflang along with rel=cannonical will solve the issue of Dup content and you will be able to pick the desired domain for the purpose of SERP as per the region?

Would a site like a.co.uk will rank in UK and a.com in USA for the same query using mostly the same content depending upon the visitor's location?

Regards
Alicka

Christopher Semturs said...

@Amjad Khan

Yes, that's correct

David said...

@Christopher,
thanks for the explanations!

What I have not understood yet: if the similar content is on different TLD, should I / can I also specify the rel on cross-domain basis?

like so:
link rel="alternate" hreflang="de-DE" href="http://www.example.de/"

link rel="alternate" hreflang="de-CH" href="http://www.example.ch/"

What I am trying to achieve is that people searching with google.de find the domain.de and people using google.ch are directed to domain.ch (without risking Duplicate Content issues)

Christopher Semturs said...

@David cross-TLD will work

Alan Perkins said...

Hi Christopher

Thanks for all your help on this.

Can I check the interplay between rel=canonical and rel=hreflang?

Suppose I'm on a page that needs a canonical URL, e.g. ...

domain.de/at/product?id=3&color=blue

In this example, color=blue causes a blue image to be shown by default, and the blue value to be preselected in the color choice dropdown - no significant content changes, so the canonical URL should not contain the color query parameter. Following on from your post on December 8 at 1:43AM then, the canonical URL would be

domain.de/de/product?id=3

i.e. it would be in the /de/ rather than /at/ subdirectory, and it would have the color query parameter removed.

Question: should the hreflang URLs ALSO have the color query parameter removed, e.g.

link rel=alternate hreflang=en-gb href=domain.de/gb/product?id=3

or should they keep it?

link rel=alternate hreflang=en-gb href=domain.de/gb/product?id=3&color=blue

If we keep it, we'll be in a situation where the canonical matches none of the hreflang hrefs. Also, through those hrefs, we'll be providing links to a lot of URLs that we don't really want crawled and indexed.

If we remove it, given a URL whose content contains a set of hreflang hrefs, that URL will not appear in that set. The hreflang href would be acting a little like the canonical href. Would that be OK?