Featured products and servicesadvertise here
Image File Formats
SSL Certificate Authorities
Traffic Analysis Tools
Top Level Domains
This ForumNew Topic
only registered users
can post in the forums
Content Languages Forum
Nooben on 15 August 2016, 1 year ago
Which methods of language detection are used?
Looks like sometimes it works incorrectly, but i not sure if it's a problem on your side.
Your tool detects "english" language, but in the source code of this page there is only one indication of language: html lang="ru", which means "Russian".
1. Why it was detected as english?
2. How attr. "lang" in "html" tag used in your detection tool?
Sam Soltano (site administrator) on 19 August 2016, 1 year ago
Thank you for sharing that observation.
Languages are detected in several ways. We look at HTML code, HTTP headers, but we also try to analyze samples of text from web pages, and we include data from partners such as Alexa in the analysis. Unfortunately, quite often the various data sources give contradicting results. In these cases we apply additional heuristics, e.g. frequently seen mistakes in language codes, to come to conclusions. We will have a closer look to see what went wrong in your example, and how we can further improve our algorithms.
You need to be logged in to reply.