Sean Karson's Blog

Read about Sean's travels and and his take on various hot topics in tech.

If All Else Fails, Send Hreflang Tags in an HTTP Header

If All Else Fails, Send Hreflang Tags in an HTTP Header

A few months ago I started to take on more Search Engine Optimization responsibilities at work. I'm a software engineer at an online English-Spanish dictionary called SpanishDict, and a large percentage of our traffic comes to us from users searching for translations on Google. Recognizing that, we've invested lots of time recently in optimizing our page titles and descriptions that appear on Google in addition to ensuring that the SpanishDict links which Spanish speakers see on Google are just as effective as those that English speakers see.

SpanishDict has many different types of pages, but for this post, I'll focus on our translate pages, as these are the SpanishDict pages that Google searchers see most often. A translate page contains a dictionary entry for a particular word.

An English speaker querying Google for caminar should see a link to http://www.spanishdict.com/translate/caminar with the title Caminar | Spanish to English Translation - SpanishDict.

A Spanish speaker interested in translating caminar should instead see a link to http://www.spanishdict.com/traductor/caminar with the title Caminar en inglés | Traductor de español a inglés - SpanishDict.

Why We Use Hreflang Tags on SpanishDict

Despite these being two different pages on our site, the traductor page is identical to the translate page except for some pieces (like the page's meta title) which have been translated into Spanish. The same dictionary entry for caminar is shown on both pages.

Because the traductor and translate pages have duplicate content served in different languages, years ago we added hreflang tags to the <head> of these pages as Google recommends. This allows Google to know which of our links are for Spanish speakers and which are for English speakers:

Google Hreflang Instructions

This worked well until sometime last year when we started seeing a steady decline in the number of hreflang tags reported by Google in the International Targeting section of Search Console. Despite having millions of pages with hreflang tags, by the end of 2016, Google reported a count of only 5,500.

Troubled and Confused

This decline was both troubling and confusing. It was troubling because when a Spanish speaker searched Google for caminar en inglés, they unfortunately saw a link to http://www.spanishdict.com/translate/caminar with an English title, greatly impacting the click through rate of our links. It was confusing because we hadn't made any change to the way we serve hreflang tags on our pages, and we knew our implementation was correct.

In the <head> of both http://www.spanishdict.com/translate/caminar and http://www.spanishdict.com/traductor/caminar were the tags:

<link rel="alternate" hreflang="en" href="http://www.spanishdict.com/translate/caminar"/>

AND

<link rel="alternate" hreflang="es" href="http://www.spanishdict.com/traductor/caminar"/>.

In an effort to get our tags recognized, we read just about every available post on hreflang tags. We scoured the Google Webmaster Forums. We used page validation tools that indicated our tags were correct. We viewed our page source as Google sees it and confirmed that the tags were present. We even moved our tags to the top of the <head> in case that somehow made them more visible to the Google crawler. None of this remedied the situation. We were at a loss, so we turned to alternate methods of conveying hreflang information to Google.

Other Ways to Serve Hreflang Tags

In addition to adding tags directly on the page, Google allows you to include this information in your sitemap or to send it in the page's HTTP response header.

Putting hreflang information in our sitemap would not have been a good approach for a few reasons. Our sitemap has millions of entries and we are constantly adding new pages to SpanishDict. We didn't want to have to frequently regenerate our sitemap to make sure that Google would have hreflang information for our latest pages. Our backend also runs through a non-trivial amount of logic to determine whether to provide hreflang tags for a particular URL.

Because of our reluctance to use the sitemap approach, we turned to HTTP headers. We initially had reservations about doing this because everything we had read indicated that sending tags in the response header is more of a specialized approach for pages that serve non-HTML files like PDFs.

Nevertheless, we gave it a shot, removing the HTML tags from our pages and adding a couple of lines of code to send the hreflang tags in the HTTP header. Now when you visit http://www.spanishdict.com/translate/caminar, the server returns hreflang information in the Link header:

Link: <http://www.spanishdict.com/translate/caminar>; rel="alternate"; hreflang="en", <http://www.spanishdict.com/traductor/caminar>; rel="alternate"; hreflang="es"

So Did It Work?

When we deployed this change on 2/6/2017, Google counted about 5,000 SpanishDict pages with hreflang tags. Four days later Google recognized 60,000 tags, and that number has been increasing ever since: Today the number is 2.1 million!

Hreflang tag graph

How Did This Affect Spanish Speaker Pageviews?

Making this change had an immediate, positive effect on the total number of SpanishDict pages viewed by Spanish speakers:

Total Spanish Speaker Pageviews

The red line details the total daily Spanish speaker pageviews in 2016, while the blue line is the 2017 data. The 2017 pageviews track the 2016 pageviews extremely closely until February 6th, the day we started sending hreflang tags in the HTTP header. On February 5th, Spanish speaker pageviews were up 5% YoY. By February 15th, they were up 30% YoY. Hreflang tags really matter!

Looking only at Spanish speaker pageviews that came directly from Google tells the same story, jumping from up 100% YoY to up over 200% YoY:

Spanish Speaker Google Referred Pageviews

As expected, there was no noticeable change to our English speaker traffic.

So Why Couldn't Google See Our Tags?

Unfortunately, we don't know. Because Google's crawler is such a black box, we have no idea whether it was a change on their side or on our side that caused the tags to no longer be recognized. Even if we knew the problem were due to some change we had made, performing a git bisect would be impractical for many reasons.

My hunch is that a script, potentially some advertisement code, on our site was preventing Google from "seeing" the tags. If you've ever noticed anything similar or have any ideas about what might have been going on, please let me know in the comments!