The Definitive Guide to Alexa
— May 5, 2017
Founded in 1996 and acquired by Amazon in 1999, Alexa.com was the first free traffic measurement service. It also delivers a variety of other traffic analytics.
This 5-part series corrects widespread misinformation about Alexa. It shows you 6 important ways to use its most important metric, traffic rankings.
It covers why the conclusions from these 6 daily-use strategies are not only safe to act upon (contrary to the opinion of the vast majority of Alexa reviews), they are valuable and will help you on a daily basis.
In addition to traffic, other Alexa metrics provide insight into content quality. The combination of traffic insights (“quantity“) and content quality provides an excellent snapshot of sites other than your own. You can also derive useful deductions about who is growing or fading in your niche. All in all, that adds up to a mighty useful, free tool. That said…
Our analysis and conclusions run counter to most of the literature about Alexa.com. So we’ll take a great deal of room to explain and prove, in detail, why Alexa is so useful and reliable. We have attempted to create the ultimate resource in order to…
- reverse the collective impression left by false negative reviews
- show you how to use these tools to derive actionable conclusions.
Why does a mountain of misinformation about Alexa exist? Partly because it is the oldest and most widely used tool of its kind. In Alexa’s case, reviewers correctly mentioned problems, but missed the fact that its early problems did not invalidate its usefulness.
Subsequent reviews were like an echo chamber. The result? Alexa is the most widely misunderstood tool of significant importance. Using it properly gives you an edge over those who shun it for all the wrong reasons.
We only realized how widely and deeply held Alexa myths have become when we intended to update our original Alexa article. We first wrote it in 2005, making a few edits over the years. The essence, though, has remained intact (including our gorgeous 2005 page design! ).
Given the proliferation of negative articles, we decided that this topic (and Alexa itself) deserved a comprehensive review. Rather than update our original page, these comprehensive articles start from scratch, working from the ground up, including coverage of the best two complementary tools, SimilarWeb (“SW”) and SEMrush.
There is no subjective opinion here. The content is based on working with Alexa (vs. actual site traffic stats) for thousands of sites over more than a decade. You will also find an original study that shows how measured traffic correlates with Alexa (and how solopreneurs do, overall).
We hope this comprehensive, data-supported review helps set the Alexa record straight. If you develop the habit of using it (and its 2 “cousins”) correctly, it’s only a slight exaggeration to say that it’s “indispensable.” That’s only true, though, if you understand how and when to use it, when to combine results from SW and SEMrush… and when not to use it at all.
Alexa’s results are derived from a sampling of tens of millions of Internet users. Every sampling method has its strengths and weaknesses. The biggest example of that…
TV networks spend billions of dollars based upon reports from the Nielsen survey. The results are based on the TV habits of a mere 40,000 households. They do so despite the fact that it’s riddled with flaws, not least of which is its small sample size. They do so because they understand the issues and can still extract a great deal of valuable information.
Serious TV executives recognize the flaws and understand which data becomes invaluable, actionable information. So it is with Alexa (except billions of dollars are not at risk ).
The myth that “Alexa is useless” or “too inaccurate to be useful” starts with how Alexa originally worked. Its data came from tracking users who implemented its toolbar. This led to biased results among niches whose users were more likely to install the toolbar (e.g., tech-savvy users, Internet marketers, etc.).
Alexa has worked in two ways to correct that bias…
- It improved its algorithm to reduce the gap (several updates over the years)
- It widened and diversified its base of users, going from 1 toolbar to over 25,000 to billions of data points from a variety of sources – more on this in a moment
Today, its methodology reflects a wide, representative sampling of online traffic.
Despite these improvements, the myths continue to be repeated in new reviews (more in iUrban Myths below).
To use Alexa, click here and enter any domain name.
Alexa returns where that domain name ranks out of the ~30,000,000 sites that it tracks.
For example, as of the date of this report (April 2017), SiteSell.com is the 23,357th most visited site in the world (and 12,021st in the U.S.).
Google’s rank is 1 — the most visited site. 30,000,000 is the worst numeric score possible.
Beyond that, you’ll get a “not enough data” message (see right) which means that there’s no record of traffic to that site from any of the data points in Alexa’s panel.
Important: Alexa delivers your position relative to other sites. It does not deliver an absolute visitor count in its free version. And that’s fine — relative ranking is all that you really need for the common purposes (covered below) that deliver most of its value.
While Alexa is usually useful on its own, you get a more reliable fix on a site’s popularity when used together with the other two free traffic measurement tools (more on that below).
Misinformation You Should Un-Know
Alexa has often been maligned, originally due to its reliance on its browser-based toolbar. The use of one source created bias in websites with visitors who were more likely than average to use the toolbar.
For example, websites where the topic was Internet Marketing (heretofore simply referred to as Internet Marketing sites) had visitors who were more likely to use the toolbar. This caused those sites to receive better rankings (i.e., lower Alexa numbers — the closer to 1, the higher your traffic) since they and their colleagues would be more likely to visit their sites.
In 2008, Alexa/Amazon announced a major overhaul, pulling in more data from a wider and diversified base of users of 25,000 different browser extensions, browser toolbars and plugins.
This corrected much of the previously described bias, as evidenced by immediate changes in the ranking of many sites. Quoting Alexa (full post here)…
Alexa’s traffic data is based on a global panel of people which is a sample of all internet users.
The panel consists of millions of people using toolbars and browser add-ons created by over 25,000 different publishers, including Alexa and Amazon.
The data also includes all traffic to sites which have installed the Alexa Certify code, regardless of whether visitors are using a toolbar or extension.
Since that time, Alexa’s panel of data has constantly shifted and improved over the years. Many of the original 25,000 extensions have been retired, resulting in a 96% decrease in reliance on such data sources.
In talking to Alexa, we found that they now “source data from a large number of 3rd party providers, representing a significant sample of the Internet browsing population.” Alexa goes on to say that they “also have thousands of sites with Alexa’s Certified Pixel on them, allowing us to directly measure a broad spectrum of internet sites.”
They compare the data from a variety of sources in order to detect, normalize, and correct for bias that exists in any individual source. Then, they compare that data to actual metrics from the Certified sites, allowing them to more precisely estimate traffic and engagement.
Site engagement is now also a contributing factor to ranking.
Alexa also crawls the Web and maintains archives of previous crawls. Since 1999, it has been the source of data for the Wayback Machine, a nonprofit (with the same founders) that enables searchers to see archived versions of web pages across time. Alexa continues to supply its crawls to the Internet Archive.
Alexa has also decreased sampling bias algorithmically. More on how Alexa works and derives its traffic reports…
- How are Alexa’s traffic rankings determined? (from Alexa)
- Detailed analysis of Alexa — our own original research here.
This report also covers two other useful, free tools that are similar to Alexa, SimilarWeb and SEMrush.com…
Where Do SimilarWeb and SEMrush Get Their Data From?
SEMrush data is based on the first 100 Google search results found for each keyword in its database. It delivers a good idea of a site’s traffic from searches.
SimilarWeb gathers its data from its crawl of the Internet, internet service providers (ISPs), and from click-stream data from tens of millions of diversified users who have installed their apps across many devices. The ISP data is additional diversification, further reducing risk of bias.
There are no reported issues of significant data bias in these two tools. Because all three tools get their data from different sources (some overlap between Alexa and SW is likely), they are especially useful together when you want higher accuracy.
Despite the improvements to Alexa’s functionality in 2008 and ongoing algorithmic developments, social influence perpetuated out-of-date conclusions. The result?
We’ll use the term iUrban Myths when referring to the online equivalent of offline urban myths.
Most recent Alexa reviews quote no hard, original research. Most aggregate information from earlier reviews (which were based on earlier ones, which were based on… well, you see where this is going ).
Naturally, they come to the same conclusions. Alexa has had a hard time shaking the 1990s and 2000s, despite the fact that the original issues were corrected in 2008. Most reviews don’t even mention the diversification of data sources or the use of site engagement.
Even in the pre-2008 years of sampling bias, the first round of Alexa reviews missed some key points. There were valid uses, despite the bias.
Why do we keep finding inaccurate reviews?
Google ranks “authoritative” articles highly. It depends on visitor-generated signals to determine “authority” (e.g., inbound links and hundreds of other off-page/off-site signals). That only works until humans fail to recognize that a report is wrong or out-of-date. So the older, cited articles continue to rank well, together with some well-received new ones.
Google continues the myths because it keeps ranking cited, but faulty, pages highly. There is little that is unknown in this post by Google-famous Matt Cutts in 2006. Pre-2008, the bias had yet to be resolved. However…
- the drip-drip-drip repetition will prove hard to reverse later
- everyone has become so habituated to the negative that they fail to mention the positive.
Take a look at the graph from Cutts. He took 2 lessons from the graph…
- “TV advertising isn’t jolting Ask’s traffic. The biggest spike was when they dropped Jeeves at the end of February.”
- “There is some serious webmaster skew in the Alexa data. There is no way that I have 1/4th the daily reach of Ask.
He correctly concludes that his site “gets a little boost because tons of SEOs install the Alexa toolbar.” The second is already known — why not simply accept it and remark on the value of what he has learned from Ask:
- Traffic spiked when Ask dropped Jeeves. But it also came back down to the baseline in the 2 months that followed.
- TV advertising made no difference.
For someone as interested in search engines as Matt Cutts, that’s useful information.
He can also see that his own traffic had an appreciable jump in mid-April that sustained to mid-May (end of the review period). That’s not useful information for him (since he would use Google Analytics, not Alexa, to get exact numbers for his own site — more on that later). But it IS of value for anyone else who has an interest in the reach of Cutts’ site.
Bottom line: This is an excellent example of social influence. He has been pre-conditioned to report the negative. The value of tracking trend lines for both Ask.com and Cutts is ignored.
Alexa no longer provides “Reach.” However, you could use “Traffic Rankings” to arrive at all the same conclusions. The point here is how even the knowledgeable can miss what is important, blinded by social influence from seeing the truth.
Perhaps the most dramatic myth comes from those who notice that Alexa itself says its results are not reliable for Alexa rankings beyond 100,000 (bottom of this page):
“We do not receive enough data from our sources to make rankings beyond 100,000 statistically meaningful.”
100,000?! That is so much traffic (in 2017) that it’s unattainable for most solopreneurs. Many reviews have concluded, therefore, that virtually every Alexa traffic ranking is meaningless.
We knew that this could not be correct, though, based on our own experience of correlating many thousands of SBI!-built sites’ actual traffic vs. Alexa.
We wondered why Alexa would be so ridiculously conservative about its reliability. At first we reasoned that they were just being very “mathematical,” speaking about the accuracy of any single measurement and applying a strict level of statistical testing.
Techies tend to do that! It drives marketers crazy!
Even then, based on our experience of working this data, it was way too conservative.
Then the answer hit us. They had not updated that note — ever. Sure enough, you will see the same notice about 100,000 from…
We do not have an earlier version of the page and therefore can’t comment on when this notice first appeared. So we’ll use 2003. Why is their statement way over-conservative for 2017?
As you can see from this Netcraft Web survey, there were only 18,000,000 sites at the time. That is almost 1/10th of the 174,000,000 active sites currently!
Our experience of 1,000,000 as being reliable is consistent with 2017 data. Alexa was quoting based on 2003!
However, there’s a bigger point.
Reviewers seem pre-conditioned to accept the negative. No one dug to find how Alexa was only comfortable with a level that’s rarely attained. They repeat the reviews that preceded them, which do the same, and so on.
No wonder this iUrban Myth is so resistant to correction. 12 years of mostly negative reviews carved the myths in stone. Alexa’s reputation has suffered from this unusual combination of circumstances.
What’s the most damaging myth that comes out of all this?
Alexa is USELESS because it only pulls data from a single toolbar, resulting in a bias for those niches with a higher-than-normal rate of its use.
Why is it “damaging”? Because of what you miss out on if you believe the articles.
OK, back to the main topic.
“Myth and Miss” Continues…
To recap: the pre-2008 articles and posts that originally focused on the issue of inter-niche bias started the Alexa iUrban Myth. Oft-repeated since then, it persists despite the changes in 2008 and in the years since.
In a nutshell, this means that Alexa’s data may not be spot on, but it
does offer some valuable insights into any website you might
need to research without having access to their analytics.
They hit the key points…
- “may not be spot-on” — good ballpark estimates are all you need
- “valuable insights” — bingo!
- it’s for “any website you might need to research.” — i.e., not your own
Despite the occasional voice of clarity from careful thinkers, the majority of new reviews somehow continue the myth and miss the point.
Even before 2008, Alexa was useful for:
- comparing 2 or more sites from the same niche.
- checking to see if a site’s traffic is rising or falling.
- evaluating a site’s commercial potential (e.g., ads, joint ventures, etc.)
- getting a “quick fix” on any site for any minor reason (e.g., curiosity)
- seeking potential influencers to build relationships
- finding super-affiliates
For the latter two purposes, Alexa is “only” a good starting point. If its ranking is high enough to consider more seriously, supplement with SimilarWeb and SEMrush for greater certainty.
Few, unfortunately, understand how valuable Alexa really is.
Those who reach solopreneurs owe it to them to stop the “myth and miss.” Their impact is most visible when reading the misinformed comments in posts about Alexa.
For example, read the comments to the Kissmetrics post. You will see that the level of misinformation is high and repeated, despite having just read the post. A few of them do sound convincing, but are inevitably flawed (e.g., anecdotal, undocumented claims, etc.).
It’s hard to understand how so few experts want to share the valuable utility of this tool with their readers. Here’s one example of a smart blogger…
This review (2017) gave an example of 3 high-traffic websites. It then explained that 1 of the 3 Alexa rankings was off (significantly) compared to the other 2 because it was the only 1 (of the 3) in a niche that was not marketing-savvy (i.e., the other 2 were). It goes on to conclude that:
- SimilarWeb.com “is more accurate” than Alexa based on that 3-site sample, as well as a second sample of 4 of his own sites (no names, it does not include all his sites).
- Alexa is “wildly inaccurate” because of the toolbar…“The fact that some sites have accurate data, and others have wildly inaccurate data just makes it near impossible to make decisions using Alexa Traffic Rank as a metric.”
- He would not use Alexa or SimilarWeb to make any serious decisions because they are just estimates.
However, the review fails to mention that…
- SimilarWeb gave similar numbers as Alexa. They were higher than Alexa’s for all 3 sites, but all were still within the top 50,000. And they were ranked in the same ranking order as Alexa (including country breakdown).
That’s confirmation of ballpark, which is useful info. And the fact that SimilarWeb, which uses ISPs and other data sources, varied in the same way that Alexa did, raises very different questions that were not addressed. What else could be causing SW and Alexa to act in the same way, given that SW does not use Alexa’s toolbar?
- It does not report how the 2 “Alexa-biased” domain names compared to each other (I expect that we’d see a better correlation if the niches were more similar than with the third site).
- Alexa has diversified its app sources greatly — it has not been dependent on one toolbar since 2008. That bias was not proven in the article — contradictory data was ignored.
But readers will remember one line:
“The fact that some sites have accurate data, and others have wildly inaccurate data just makes it near impossible to make decisions using Alexa Traffic Rank as a metric.”
It’s wrong. He didn’t prove it. Remember, the sample size was only 3. Supplementary documentation for the 3 sites (e.g., Google Analytics) would have been nice.
And the other test of 4 sites must be ignored since they are not named. Also, they were chosen from all the sites that he owns (introducing possible selection bias).
Once again this review, like so many others, failed to address the important issues correctly, and arrived at the wrong conclusion.
He did show that Alexa was useful to get a site into a certain ballpark of data and that SW can confirm and add some precision. But he did not mention that.
And once again, the comments were misinformed. They showed a failure by many to understand points as basic as this… ONLY use Alexa and SimilarWeb (and SEMrush) to check the traffic of OTHER sites. Comparing Google Analytics with Alexa (or SW/SEMrush) is apples to oranges.
The author seems to knows his stuff, but fails to draw conclusions about the valuable, useful ways to use Alexa and its “cousins.” Nothing in that post invalidates his conclusions.
I do, of course, agree with its conclusion not to make any “serious decision” — depending on the definition of “serious.” We can agree that we would not buy a website based on these numbers! But that’s not the point of handy, daily-use tools like Alexa.
It’s posts such as that one, published in 2017, that perpetuate the Alexa “Myth and Miss” phenomenon. Folks seem to start with a mindset that they’ll add a new thought to the widely accepted negative view of Alexa. They ignore the data and conclude what they already anticipated.
In other words, social influence keeps this ball rolling.
Anecdote vs. Data
We’ve studied more “actual site traffic vs. Alexa” than any company (except for the obvious BigCo’s). This article is an example. Our report shows, among other things, that…
- there is a definite correlation of traffic with Alexa’s rankings.
- the higher the traffic, the more reliable are the rankings.
- there is increasing “scatter” (unreliability) as traffic decreases.
Scatter tightens as Alexa traffic rankings drop from 2,000,000 to 1,000,000 to 500,000 and so forth. Your site could jump from 10,000,000 to 5,000,000 just by adding a few visits per day. But your site would need to increase by tens of thousands of visitors per day to rise from 100,000 to 50,000. Random deviation creates greater error at high Alexa numbers.
So what? Think “BALLPARK.” Anyone who understands Alexa will not be saying “WOW” about a site with an Alexa ranking of either 5 million or 10 million. Both are in the “Poor” ballpark — we’ll cover the 7 ballparks in Part 2.
Bottom line on bias? The issue of bias existed with Alexa prior to 2008. Even in those early years, though, Alexa was still valuable for the points outlined earlier. Knowing about the bias in favor of Internet marketers allowed users to discount single measurements accordingly. It was still useful to compare two sites in the same niche (whether the niche was Anguilla or SEO). And if an Internet Marketer bragged about his/her site’s Alexa rank, you would simply discount “the Alexa bias bonus.”
Since 2008, this bias no longer exists. Only Alexa can guarantee that it has been eliminated 100%, but it has been reduced algorithmically and by diversifying Alexa’s data sources.
Indirect methods use a large sampling of Internet users. This is the only way to get reasonable estimates of sites that you do not control. Sampling methods introduce bias, and the sampled data requires extrapolations that approximate reality.
It’s important to understand that all indirect methods will be approximations and will have some type of bias based on sampling techniques. Once you understand that, you can deal with it.
No indirect method for traffic determination will ever be as exact as the direct tools such as Google Analytics, or log file analyzers. However, the problem with GA is that you can’t get that data for other people’s sites. The data is private to the site’s owner.
Those who reach solopreneurs need to get these points out to the typical marketer. Based on the low quality of comments to Alexa posts, the literature on Alexa does solopreneurs a disservice.
I encourage colleagues who advise solopreneurs to adjust their advice about how to use these tools. Alexa, SimilarWeb and SEMrush doubtless all have some bias, but that does not invalidate them for certain uses. A simple approach to using 1, 2 or all 3 (depending on the purpose) can provide the info solopreneurs need for useful insights (see the 6 uses above).
That’s the whole point that almost everyone seems to miss. Alexa, SimilarWeb and SEMrush are valuable tools that provide “good enough” traffic estimates for any site.
You can’t use GA to get that data. There are paid tools that provide better data (e.g., Hitwise, Quantcast, comScore), but they cost more than solopreneurs can afford. And they wouldn’t need those tools at all if they knew how to use Alexa, SW and SEMrush.
Alexa and SimilarWeb provide other useful metrics. Especially useful are the following, added to Alexa in 2009:
Bounce, Time on Site, and Pages per Visit give you an idea of how well visitors receive a site. The lower the bounce, the longer the time on site, and the greater number of pages per visit, the better the site (these are all measures of sticking around to consume more of what you offer).
That means that Alexa has another valuable use…
Compare these metrics for 2 sites in the same niche (e.g., your site and a competitor). You now get a good snapshot of relative quality of content, as well as quantity of traffic.
Other metrics are also available (% traffic from search, country breakdown, top keywords, clickflow, pagespeed, demographics). This takes us beyond the scope of this article.
Remember, these are all indirect, so are good for ballpark estimates when comparing within a niche. Use Google Analytics to get this data for your own site.
From now on, we’ll focus on the most basic and useful metric of all, traffic.
Speaking of traffic and ballparks, we arrive at the next topic… see you in the next article.