Home About News Products Blog

Why is my website page view count higher on section.io than on Google Analytics?


#1

Non JS Browsers Don’t Count in GA
GA can count sessions for browsers which are running Javascript (JS). Not all of your users’ browsers have JS turned on and some browsers will not run it - specifically bots.

While the volume of real users with JS turned off browsing your site may be as low as 1% the volume of bots visiting your site may be large indeed. Bots such as Google, Bing etc do not run JS but do dowload your website’s pages.

It’s All About the Bots
The proportion of bot traffic on your website is a function of the number of pages your site presents to the bots for crawling versus the volume of normal traffic. For sites with volumes of real users in the hundreds of thousands per month and where the sites have many pages for a bot to crawl (such as ecommerce product pages) bot traffic may be greater than 50-75% of traffic on your website. This can be a shocking statistic to see for the first time.

For example, here are the to 20 user agents browsing a particular website which is serving circa 60,000 pages a day (including bot traffic). The obvious bot browsers have been noted. The top 2 browsers and 6 of the top 10 are bots.

Sites with larger volumes of normal traffic (over the 2m pages per month) do not usually see bots as a significant proportion of traffic except where the catalog of pages presented to the bots is unusually large.

Serve Bots Faster
Serving bots faster, more reliable pages is a good thing. While you may choose to reduce the rate with which the bots crawl your website, you cannot change the frequency (see for example Google Support). You could also mark pages on your site for the bots to not crawl. However, serving faster responses to search bots (for all content types) is advantageous for the site SEO ranking. See Google’s reference to this.

Why is a Bot Page View the Same as Normal User?
Bots make your site serve pages in the same way as a real browser. They request the HTML and the assets associated with the HTML in the same way as a real user so from your webserver’s perspective and section.io, a bot requires the same level as service as a real user.


#2

Here is an excellent article discussing the use of headers to manage the bot traffic on your site.

TLDR

  1. Use cache expiry times on static resources
  2. Use conditional GETs to serve HTTP 304 rather than 200 so that:

"The next time the document is requested, Googlebot or Bingbot will add a If-Modified-Since: header to the request that contains the Last-Modified date that it received. (In the examples below, I’m using curl and the -H option to send these HTTP headers.)

If the document hasn’t been modified since the If-Modified-Since date, then the server will return a 304 Page Not Modified response code and no document. The client, whether it is Googlebot, Bingbot, or a browser, will use the version that it requested previously."