Frequently Asked Questions
If you have any questions about our service, this is a good place to look for answers.
How do you know which technologies are used by a site?
Primarily, we use information provided by the site itself when downloading web pages. In other words, we fetch web pages very much like a search engine, and analyze the results. Additionally, we use publicly available information from sources such as Alexa and Google.
How exactly does your website analyzer work?
We search for specific patterns in the web pages that identify the usage of technologies, similarly to the way a virus scanner searches for patterns in a file to identify viruses. We use a combination of regular expressions and DOM traversal for this search. We have identified several thousand indicators for technology usage. These indicators have different priorities, and based on the presence or absence of specific combinations of indicators in a specific context, we come to our conclusions.
These are examples of the information used by the indicators:
A lot of research was necessary to build the analyzer, and we keep improving it all the time. We want it to be the best possible website analyzer.
How accurate is your information?
It is impossible for this type of surveys to be 100% accurate, since websites can choose to hide most of their technologies, if they want to. See also our disclaimer for some more information.
There is no way to be absolutely sure not to get some errors in the technology identification. We try to find ways to balance the false-positives and the false-negatives (after eliminating as many as possible), and we try to make sure that none of the remaining errors are clustering on one technology rather than another.
How often do you visit a site?
That depends on a number of factors, but approximately once a month, some sites less often.
Do you analyze only the home page or also inner pages?
That depends on what we know already about the site. Often it's only the home page, in many cases we crawl deeper.
How often do you update the reports?
All our reports are updated daily. Although we don't analyze every site every day (see above), we permanently add new information into our database, and we want new trends to be visible as quickly as possible.
Which websites do you count? Do you crawl all the web?
For the surveys, we count the top 10 million websites according to Alexa, see our technology overview for more explanations. We do crawl more sites, but we use the Alexa top 10 million to select a representative sample of established sites. We found that including more sites in the sample (e.g. all the sites we know) may easily lead to a bias towards technologies typically used for "throw-away" sites or parked sites or other types of spam domains.
Why do you use Alexa rankings? Alexa data are sometimes said to be inaccurate.
People report that Alexa rankings can be manipulated, but since we don't give any weight to the rank itself, we find that for our purpose of sampling, the Alexa list works very well.
In some of the market share reports, the figures don't add up to 100%. How come?
That is the case when websites use more than one of the technologies, for example websites may use more than one server-side programming language. We could do the calculations differently, but then a usage of 50% would not necessarily mean that the technology is used by every second site, which we would find quite confusing.
Why are your figures sometimes very different to figures published somewhere else?
The biggest source of confusion comes from the fact that we measure technologies used for websites, whereas other surveys measure something else. For example the well known Tiobe Index measures overall popularity of programming languages. C is more popular than PHP in this report, but C is very rarely used to build websites. Another example is Distrowatch, which measures popularity of Linux distributions, but that includes popularity of desktop installations. Therefore their ranking is different to ours.
Other figures published on the usage of web technologies often are based on different samples. For example they may use very small random samples, or samples favoring specific geographical regions, or they may use only a small fraction of the web say the top 10.000 sites, or they may include subdomains or even individual web pages in their counts, or they may even be based on polls of their website visitors. If there are no such differences in the measurement techniques, then there are certainly still differences in the website analyzing methods. We know for sure that a lot of research has been done to develop our analyzing methods, we are not so sure about others.
What are these breakdown and segmentation reports in the navigation bar?
Breakdown and segmentation reports are very powerful analysis tools. You probably have to play around a bit to explore all the possibilities and to find your way through the navigation to the reports you want. Use this as an example: if you want to know which web server technologies are used in Kyrgyzstan, then navigate from the Technologies overview to the Top Level Domain report. Then scroll all the way down to .kg for Kyrgyzstan (or use Ctrl-F in your browser to find it quickly) and click on it. Next click on Web Servers under the Segmentation menu you see the report you wanted.
Please be aware that some technologies have a very low representation in the top 10 million sites. Breakdown and segmentation reports may have a high statistical variance in these cases, in other words the figures may be unreliable. For instance, we know of only one site in the top 10 million, that uses Neapolitan (Wikipedia). Don't expect any useful statistics from such a data set.
Any other questions