The 3 most common technical website quality problemsPosted by Matthias Gelbmann on 18 February 2010 in NewsWhen we analyze websites to identify technologies, we also look for potential problems. Since we do technology surveys, we restrict this to pure technical problems, for instance we don't analyze the usability or accessibility of a site. We don't take into account browser compatibility issues and performance analysis, and we don't get involved into religious wars, such as "uses tables for layout". We also try to report only problems that are likely to have any real-life impact. Furthermore, we want to keep the overlap with other quality reports, such as standard compliance checkers, at a minimum. You can see the list of all quality alerts that we detect. And this is the list of the most frequently encountered problems: #1 on 5.3% of all sites: No web page found at non-www url This report means, that a web server is configured at www.example.com, but not at example.com. When the web was young, using the www subdomain was quite common, but I guess the marketing people soon decided that CNN.com just sounds better without the www (which is the only thing whose shortened form takes three times longer to say than what it's short for, according to Douglas Adams). Since then, most sites resolve this problem by allowing both forms and serving the same pages. Most browsers also hide that problem from the users by silently adding the www whenever it seems appropriate and necessary. That browser behavior is probably the reason why this problem so often remains undetected by the webmasters. For search engines and for other spiders, the www and non-www sites are different pages. Links to these pages are not automatically mapped to the "correct" version. Search engines often do a reasonably good job to merge these pages into one "canonical" form, but not all spiders and browsers (think of mobile phones) are that error tolerant. It takes only a little configuration effort, and a webmaster can forget about that issue.
It is common practice to declare the character encoding of a web page (that is the character set used for the page, say UTF-8 or Shift JIS) in an HTML meta tag within the page itself. This may be a problem, because in principle, it is not possible to read the page (and thus the defining meta tag) without knowing the encoding. It works in practice, because the meta tag can be written rather at the beginning of the page, where only ASCII characters are needed. This is what is recommended by the W3C. A browser can read the page in pure ASCII mode, and then switch to whatever the declaration says. That works as long as there are no non-ASCII characters before it recognizes the character encoding. The web page title element, however, frequently uses characters whose encoding is not yet defined. If we see that on a page, we raise this alert. Most modern browsers and search engines are smart enough to work around such problems, but as I said before, one has to keep in mind that websites are not only processed by Firefox and Google.
We report the server time as being incorrect, when it is off by more than 10 minutes. Considering how simple it is to synchronize the time over then net, it is quite surprising that 4% of the servers have an incorrect setting, sometimes by more than a day. Several features of HTTP rely on exchanging of timestamps, for example caching of pages and page elements, and expiration of certain information such as cookies. Using a time stamp that is more or less random does not make much sense.
#4 on 3.4% of all sites: Contradictory character encoding specifications The character encoding of a page can be defined in several ways: in the HTTP header, the XML header and in an (X)HTML meta tag. The standard says that the HTTP header definition has precedence in case of conflicts, which has its logic, but is counter-intuitive from a practical point of view. If someone takes the effort to define the encoding on the page itself, I would assume that this is more likely to be correct than a generic sever setting. But whatever the standard says about this situation, it is certainly not a good idea to have contradictory definitions and rely on the browser and spider heuristics to resolve that. This is certain to fail from time to time.
Leave a comment | W3Techs.com on Facebook
|