provided by
Home Technologies Reports Sites Quality Users Blog Forum FAQ Search

Featured products and servicesadvertise here

Forum > Character Encodings > Topic

Character Encodings Forum

Source of character encoding statistics?

John Cowan on 20 August 2014, 5 years ago

Do you look at the *declared* encoding (HTTP header or HTML meta), or do you look at the actual encoding as statistically determined from the text?  The fact that ASCII is so low a percentage makes me think you are looking at the declared encoding, which as we all know is often wrong.  Plenty of pages labeled UTF-8 or Windows-1252/ISO-Latin-1/etc. are in fact pure ASCII.

Sam Soltano (site administrator) on 21 August 2014, 5 years ago

We do look at both. We verify the declared encoding by checking the text and if we see a contradiction we generate an error such as this one Incorrect character encoding defined and we use the detected encoding rather than the declared one.

A web page that is declared UTF-8 and contains only pure ASCII characters is, as you know, correct, and we count it as UTF-8. We see this as a declaration of intent by the webmaster how he would encode characters outside the ASCII range in case they are needed, even if on that particular instance of the page it is coincidentally not needed.

You need to be logged in to reply.

About Us Disclaimer Terms of Use Privacy Policy Advertising Feedback
W3Techs on   Twitter Twitter LinkedIn LinkedIn
Copyright © 2009-2019 Q-Success