Featured products and servicesadvertise here
ForumsW3TechsContent Management Server-side Languages Client-side Languages JavaScript Libraries CSS Frameworks Web Servers Web Panels Operating Systems Web Hosting Data Centers Reverse Proxies DNS Servers Email Servers SSL Certificate Authorities Content Delivery Traffic Analysis Tools Advertising Networks Tag Managers Social Widgets Site Elements Structured Data Markup Languages Character Encodings Image File Formats Top Level Domains Server Locations Content Languages This ForumNew Topiconly registered users can post in the forums |
Forum > Character Encodings > Topic Character Encodings ForumSource of character encoding statistics?John Cowan on 20 August 2014, 11 years ago Do you look at the *declared* encoding (HTTP header or HTML meta), or do you look at the actual encoding as statistically determined from the text? The fact that ASCII is so low a percentage makes me think you are looking at the declared encoding, which as we all know is often wrong. Plenty of pages labeled UTF-8 or Windows-1252/ISO-Latin-1/etc. are in fact pure ASCII. Sam Soltano (site administrator) on 21 August 2014, 11 years ago We do look at both. We verify the declared encoding by checking the text and if we see a contradiction we generate an error such as this one Incorrect character encoding defined and we use the detected encoding rather than the declared one. A web page that is declared UTF-8 and contains only pure ASCII characters is, as you know, correct, and we count it as UTF-8. We see this as a declaration of intent by the webmaster how he would encode characters outside the ASCII range in case they are needed, even if on that particular instance of the page it is coincidentally not needed. You need to be logged in to reply. |