provided by
Home Technologies Reports Sites Quality Users Blog Forum FAQ Search

Featured products and servicesadvertise here

Forum > Content Languages > Topic

Content Languages Forum

How the languages are detected?

Nooben on 15 August 2016, 2 years ago

Which methods of language detection are used? 

Looks like sometimes it works incorrectly, but i not sure if it's a problem on your side.  

Example: http://www.koreanrandom.com/

Your tool detects "english" language, but in the source code of this page there is only one indication of language: html lang="ru", which means "Russian".

1. Why it was detected as english?

2. How attr. "lang" in "html" tag used in your detection tool? 

Thank you.

Sam Soltano (site administrator) on 19 August 2016, 2 years ago

Hi Nooben,

Thank you for sharing that observation.

Languages are detected in several ways. We look at HTML code, HTTP headers, but we also try to analyze samples of text from web pages, and we include data from partners such as Alexa in the analysis. Unfortunately, quite often the various data sources give contradicting results. In these cases we apply additional heuristics, e.g. frequently seen mistakes in language codes, to come to conclusions. We will have a closer look to see what went wrong in your example, and how we can further improve our algorithms.

You need to be logged in to reply.

About Us Disclaimer Terms of Use Privacy Policy Advertising Feedback
W3Techs on   Twitter Twitter LinkedIn LinkedIn Google+ Google+ Facebook Facebook
Copyright © 2009-2018 Q-Success