Sean Karson's Blog

Read about Sean's travels and and his take on various hot topics in tech.

Uncovering a "Wild" Korean Link Scam

Uncovering a "Wild" Korean Link Scam

Each week at SpanishDict, I compile a report which includes, among other things, a list of our top Google keywords. The top keywords that drive traffic to our site are usually all variations on searches like Spanish dictionary or English-Spanish translator, but a few weeks ago I noticed something bizarre on the leaderboard: 야동 was our 11th most popular keyword.

At the time we didn't know why Google searches for 야동 might be driving thousands of clicks to a website catering to English and Spanish learners. I did a little bit of investigation but did not come up with anything. I searched for 야동 myself on both US and Korean Google, finding zero SpanishDict links in the top results, and I also translated 야동 using Google Translate, which told me that it is pronounced yadong, meaning wild. I chalked it up to bad data in Google Search Console and promptly forgot about it, returning to other projects.

While working on one of those projects, I noticed that we had recently started receiving a significant number of user queries that contain almost exclusively Korean characters. Some user or bot was frequently going to SpanishDict and searching for queries like this: 흥분제구하는법 【 KT8.ANA.KR 】 ― 카톡:kk368 | 텔레:kk369 ― k23bhgm8,바오메이상용방법,논현흥분제판매,흥분제사기없는곳,흥분제먹이고,흥분제복용방법.

When a user searches for something on SpanishDict, we first look in our dictionary to see if we have an entry that matches the query. For instance, when a user looks up apple, they get the following page:

Most user queries result in a dictionary entry, but there are quite a few that do not. When that happens, we use Microsoft Translator to convert the user's search into either English or Spanish:

Users can translate just about anything, so even when we receive a query entirely in Korean, we will send it off to Microsoft to be translated into Spanish, with an input language of English. Naturally, the output looks a lot like the input. For instance, 흥분제구하는법 【 KT8.ANA.KR 】 ― 카톡:kk368 | 텔레:kk369 ― k23bhgm8,바오메이상용방법,논현흥분제판매,흥분제사기없는곳,흥분제먹이고,흥분제복용방법 translates to 흥분제구하는법 【next KT8. Ana. KR 【 ― 카톡: kk368 | 텔레: kk369 ― k23bhgm8, 바오메이상용방법, 논현흥분제판매, 흥분제사기없는곳, 흥분제먹이고, 흥분제복용방법 when using our site, so I could not see how these translations might be proving useful to anyone.

That said, most of these queries had a similar structure. They tended to have lots of Korean characters with some URL embedded inside, normally toward the beginning. Here's a couple of more examples: 발기부전치료제판매 viayes,cㅇm 발기부전치료제약국판매 정품발기부전치료제판매사이트 발기부전치료제처방전 정품발기부전치료제온라인판매 발기부전치료제정품판매하는곳 정품발기부전치료제구입사이트 발기부전치료제구매처 and 바카라사이트 ♤KAS38,CㅇM◈ 실시간카지노주소추천 생중계온라인식보사이트 라이브실시간카지노주소 라이브카지노주소 인터넷온라인블랙잭게임. It was only when I got curious and visited some of these websites that I began to think there was something malicious afoot.

kt8.ana.kr, the URL from the first query, appears to be a fine online establishment offering Viagra and Cialis. How about viayes.com? The scantily clad lady on the landing page might lead you to think that this is a reputable dealer in ladies undergarments, but scrolling further will send you to a different conclusion. And as for kas38.com, well, that has the look of a casino which doubles as a peddler of male anatomy enhancement products:

I briefly sat and appreciated the fact that in order to do my job, I needed to visit some NSFW sites. Then I pressed on with the investigation. I wanted to understand who was actually sending us a bunch of Korean spam to be translated, so I checked our server logs and was surprised to find that almost all the Korean translation traffic was coming from Googlebot, Google's search engine spider which crawls the web in order to build its search index. This likely meant that there are SpanishDict links on other websites which lead to pages translating bunches of Korean characters.

Because of the way the SpanishDict site works, it's possible to construct a link which translates any query. You can find the dictionary entry for apple at http://www.spanishdict.com/translate/apple, and you can see the translation for the apple is red at http://www.spanishdict.com/translate/the apple is red. Similarly, you can view the translation for a bunch of Korean characters by navigating to http://www.spanishdict.com/translate/부전치료제판매사이.

Google Search Console has a great tool which can indicate to a webmaster the 1000 websites with the most links back to her site. One of the factors that improves a site's Google ranking is both the number and quality of links to it from other sites. Normally, the list for SpanishDict includes schools and Spanish language learning blogs which link to our content, but when I pulled up the list, it looked like this:

I recognized almost none of the websites, many of which were Korean .kr domains. The top link generator for SpanishDict is now golrazo.com, which links to us 149,000 times!!! Here are some of the SpanishDict links that Google found on golrazo.com:

These links all follow a similar pattern as above, but instead of a website, they include a weirdly formatted phone number. Combing through our backlink report, I found over 400 domains with spammy Korean links to SpanishDict! They take all shapes and forms. Included in the list are calminghill.com, israelchurch.org, theresortwedding.com, howcoach.com, dbglamping.com, and doctor-korea.com. Because Google only provides us with the top 1000 domains with links to our site, I believe there are actually far more than 400 of these domains.

At first I thought that all of these Korean websites with thousands of SpanishDict links were created for the sole purpose of containing these spammy links to our site, but I doubt that now. A few of them do seem fishy when exploring them, but I think the websites are far too diverse for someone to have created all of them. In addition, all the domains I checked are hosted at different IPs, and they were purchased over a span of several years according to the WHOIS record. Those that do not have obscured owner information all report different owners, and they use all sorts of different nameservers.

Here are the homepages for dbglamping.com, calminghill.com, and earlybirdsoccer.ca:

The element that these websites all have in common is the use of a particular PHP Bulletin Board System. It seems to be a plugin that works as a sort of forum for the sites' users. All of the spammy backlinks to SpanishDict are contained within posts on these forums at URLs like http://www.israelchurch.org/g/bbs/board.php?bo_table=z3_1&wr_id=1693 and http://earlybirdsoccer.ca/bbs/board.php?bo_table=b05&wr_id=3580, posts that sometimes contain pictures or videos and other times contain just text:

These pages seem pretty innocuous until you look at their source code, which is littered with SpanishDict links like the following:

This forum plugin shared by all the domains likely has an HTML injection vulnerability which allows a malicious user to inject a bunch of links into the site's source, positioning them 9999 pixels to the left and 9999 pixels above what you see when viewing the forum, so that they would never be noticed by a human user of the site, but would still be very much visible to Googlebot when crawling the domain.

OK, so now I knew why Google was consistently crawling SpanishDict, attempting to translate queries consisting of naughty URLs embedded within Korean characters. But why in the world would someone go through all this trouble just to get Google to index these translation pages? Well, to see why, let's query Google for 분당출장안마 콜걸후불, which apparently means business trip massage:

Uh oh! Because we dynamically generate the titles and descriptions for our translation pages based on a user's query, when Google indexes one of these Korean translation links, it acts as a convenient advertising avenue for finding an erotic massage on your business trip. Just call 010-6445-9663!

Unfortunately, all sorts of adult content-related Korean Google searches now had SpanishDict links as top results. For reference, here is an example of how our dynamic title generation is supposed to work:

We never expected our search engine titles to be hijacked like this, but here we were, serving two distinct roles. On the one hand, we were SpanishDict.com, the Spanish-English reference site our users know and love. On the other, we were KoreanDick.com, unknowingly peddling penis pills and erotic massages.

All sorts of popular websites dynamically generate their title and description tags in order to show well in search engine results, and the people behind this scam are well aware of this. As you can see above and from the following shot, we aren't the only site affected:

From what I can tell, these spammers systematically identified link patterns on popular websites which could be used to generate their desired search engine titles and then found a way to get thousands of these backlinks onto hundreds of different websites through the use of a vulnerable forum plugin. Google then crawled these domains, finding links to well-known websites like SpanishDict, Playstation Store, LinkedIn, and Meetup, and because of these sites' search reputations, links to these trusted sites now show up as top Google results when querying for naughty things in South Korea.

Not only did they do this, but they found a way to avoid some websites' spam detection systems. Many websites don't allow searches that contain phone numbers or URLs, but a simple implementation will not catch text that contains KAS38,CㅇM or 0l0 ②163 884⑦ despite the fact that a human can easily identify them as kas38.com and 010-2163-8847.

I had heard of negative SEO attacks, or Google Bowling, perpetrated by companies trying to demote a competitor in the Google search rankings, but this is something else entirely, and I actually have a lot of respect for whoever put this plan together and executed it.

That said, the excess of these bad backlinks to SpanishDict.com is a problem for our business because it could cause Google to demote us in search rankings. These links are also taking up space in Google's allotment for SpanishDict links that could be better used for useful pages instead! Of course, we also definitely don't want to be advertising for these kinds of services, so we immediately started returning a Not Found page anytime a query contains a Korean character. Now every time Google visits one of these links, it encounters a 404 error. Since implementing this change, Google has tried to access over 200,000 of these links:

We also used Google Search Console to disavow the links from the 400 domains we know of that have Korean links to SpanishDict. Going forward, we are actively thinking about other ways to validate queries and prevent a similar attack like this in the future, but this approach worked as a quick patch.

Remember that word 야동 which was our 11th most popular Google keyword? Well, it does mean wild, but this was definitely not a fluke with Search Console's data. According to Yahoo Answers, 야동 is also slang for erotic videos. Tens of thousands of clicks through to an online dictionary coming from Korean Google searchers looking for porn....pretty wild, huh? The internet continues to amaze.