So who (or what) is visiting your website?

Update I have updated this post with the results of what happened once I url shortened the post and posted it to twitter at the bottom.

Awhile back, I submitted one of my posts to reddit and stumbleupon and a few of the social content websites. I then checked out my google analytics account the next day and was pretty pleased when it said I'd had 70-odd visits from reddit in less than a day. However, it also said that the average visit time was 0 seconds. This didn't seem like the visit behaviour of real people, so I wanted to try and analyse what had happened, and in general I was interested to see what else might be visiting my website that google analytics was not going to tell me about.

I probably could have come up with some kind of logging system, but a quick hop onto Google showed me exactly what I was after (with a lot less effort) - check out the visitor logging tutorial I followed if you are interested (Beware, that link does not seem to be working at the moment). I made 2 modifications, the main one being that the example worked off text files, I changed it to work from a MySQL database. I also made the code only create a log entry for the first page visited by each visitor to the site as I did not want to fill the table with thousands of entries (yeah like my website is that popular lol).

I implemented the logging on the 10th of September, so 20 days ago at time of posting. According to Google Analytics, in those 20 days I have had 97 visitors who have looked at 212 pages. In my logging table in the database, there are nearly 2200 entries. Bearing in mind I have tried to limit the entries in the table as mentioned above (although I do not know if that will do anything against googlebot and similar web crawlers) it goes to show that a lot more visitors human and otherwise can be visiting your site.

Visitors

So, what kind of visitors did I see? First off, there was what seemed to be human visitors (browsers ranging from IE to Chrome to Firefox to even a couple of hits from Opera) as would one expect and hope for. I even had a couple of visits from users using the Lynx browser (which is a text only browser if you have not heard of it). Then there was some services I would have expected, such as the W3C Crawlers from when I used the validators to check my XHTML and CSS were up to scratch, or the "bitlybot", from the bit.ly URL shortening service website. On the crawlers front, everyones friend Googlebot made quite a few appearances, along with msnbot, but aside from that there were some surprises. The site seemed to be quite popular with old search favourite Ask Jeeves, and there were a few visits from the Baidu crawler (very interesting considering its a Chinese language search engine)! Things such as the "artViper thumbnail spider", various twitter related ones such as "twitmatic", the linkaider service which checks for broken links and the MLBot from www.metadatalabs.com are just a few of the other visitors I had that had I not implemented this logging, I would not have had a clue about.

One final thing that was interesting is that before I switched to "SEO friendly URLs", I used the old system of blog.php?id=7, or whatever the post ID number was. Many entries in the logging show crawlers or visitors attempting to access posts on the site by this kind of URL, even though as far as I am aware there are no links to URLs of this kind left on the site. I do not know if this is because of these URL's being cached in Google, or for other reasons. If anyone can help me with that question please comment or get in touch. Thanks for reading, and it just goes to show you never know who your online visitors are!

Update: Yesterday morning I submitted this post to Twitter, through a shortened bit.ly URL. The post got a bit of popularity on Twitter, as it was Retweeted 5 times (although bit.ly only counted 4, twitter showed 5) and apparently got 39 clicks, according to the bit.ly stats. An examination of the log table shows that (unsurprisingly) the bitlybot was the first visitor to the site, followed by crawlers from twiturly, MetaURI API, radian6_linkcheck, R6_Feedfetcher, the msn news bot which seemed to pick up on the new post very quickly, and even the odd seemingly human visitor (again with a variety of browsers). Bit.ly reckoned I had 39 clicks, whereas Google Analytics thought I had 13 visitors to this site, but only 5 of them read this post. Who knows what the true figures, probably somewhere in between.

P.S. I have yet to repeat my reddit experiment. When I do I will update this post with the results.

Same issue

My web host always showed much more visits/pageviews results than my google analytics account. I was starting to think that analytics don't work that well.
Posted by Celebter

Same issue

When i look at m web hosting stats i always had much more visits/pagevies than google analytics. The web host stats engine tells me wich visitor is a bot.
Posted by Celebter

Tracking

The reason for your discrepancy at the end is pretty obvious... Google analytics requires Javascript, most crawlers dont crawl with Javascript enabled. Bit.ly's count was higher because it routes through their servers first before sending the user off to your site. That will capture everyone who goes through the bit.ly urls, including crawlers. As far as your logs it will show everyone who hits your site, bots and visitors. Hope that helps!
Posted by Adam

A few answers

With regards to the "0 seconds on site": This is calculated by time between page-loads. If someone visits your site, views a single page, and leaves, it counts as zero seconds even if the page was open for 15 minutes. It's one of those metrics that's practically useless in isolation, but can be pretty telling if it changes drastically- For instance, if one month your Average Time On Site is 3 minutes, and the next month it's 10 minutes, it's worth looking into what new content has captivated enough users for that number to go up so drastically.

With regards to logs showing the old "non-seo-friendly" urls: I don't know if this is true for your hosting setup, but on mine, logs always show the non-SEO version of a URL. This is because the "friendly URL" mechanism (isapi_rewrite for IIS7, sort of like mod_rewrite for windows) grabs the friendly URL and translates it to the non-friendly for the server to interpret, before the request gets logged. This is only true for server-side logging solutions like livestats, urchin, or looking at raw .log files. Google analytics is javascript-based, so it sees only what the browser sees- It will show friendly URL's whenever they're used.

Hope this clears a couple things up:)
Posted by Alex

Human Visitor

Just wanted to let you know that I am an actual human visitor and really enjoyed your post. Very interesting.
Posted by Mike L

Add a comment

ajax bar loader