Featured products and servicesadvertise here
Image File Formats
SSL Certificate Authorities
Traffic Analysis Tools
Top Level Domains
This ForumNew Topic
only registered users
can post in the forums
Character Encodings Forum
John Cowan on 20 August 2014, 4 years ago
Do you look at the *declared* encoding (HTTP header or HTML meta), or do you look at the actual encoding as statistically determined from the text? The fact that ASCII is so low a percentage makes me think you are looking at the declared encoding, which as we all know is often wrong. Plenty of pages labeled UTF-8 or Windows-1252/ISO-Latin-1/etc. are in fact pure ASCII.
Sam Soltano (site administrator) on 21 August 2014, 4 years ago
We do look at both. We verify the declared encoding by checking the text and if we see a contradiction we generate an error such as this one Incorrect character encoding defined and we use the detected encoding rather than the declared one.
A web page that is declared UTF-8 and contains only pure ASCII characters is, as you know, correct, and we count it as UTF-8. We see this as a declaration of intent by the webmaster how he would encode characters outside the ASCII range in case they are needed, even if on that particular instance of the page it is coincidentally not needed.
You need to be logged in to reply.