Featured products and servicesadvertise here
ForumsW3TechsContent Management Server-side Languages Client-side Languages JavaScript Libraries CSS Frameworks Web Servers Web Panels Operating Systems Web Hosting Data Centers Reverse Proxies DNS Servers Email Servers SSL Certificate Authorities Content Delivery Traffic Analysis Tools Advertising Networks Tag Managers Social Widgets Site Elements Structured Data Markup Languages Character Encodings Image File Formats Top Level Domains Server Locations Content Languages This ForumNew Topiconly registered users can post in the forums |
Forum > Content Languages > Topic Content Languages ForumHow the languages are detected?Nooben on 15 August 2016, 9 years ago Which methods of language detection are used? Looks like sometimes it works incorrectly, but i not sure if it's a problem on your side. Example: http://www.koreanrandom.com/ Your tool detects "english" language, but in the source code of this page there is only one indication of language: html lang="ru", which means "Russian". 1. Why it was detected as english? 2. How attr. "lang" in "html" tag used in your detection tool? Thank you. Sam Soltano (site administrator) on 19 August 2016, 9 years ago Hi Nooben, Thank you for sharing that observation. Languages are detected in several ways. We look at HTML code, HTTP headers, but we also try to analyze samples of text from web pages, and we include data from partners such as Alexa in the analysis. Unfortunately, quite often the various data sources give contradicting results. In these cases we apply additional heuristics, e.g. frequently seen mistakes in language codes, to come to conclusions. We will have a closer look to see what went wrong in your example, and how we can further improve our algorithms. You need to be logged in to reply. |