So who (or what) is visiting your website?
Update I have updated this post with the results of what happened once I url shortened the post and posted it to twitter at the bottom.
Awhile back, I submitted one of my posts to reddit and stumbleupon and a few of the social content websites. I then checked out my google analytics account the next day and was pretty pleased when it said I'd had 70-odd visits from reddit in less than a day. However, it also said that the average visit time was 0 seconds. This didn't seem like the visit behaviour of real people, so I wanted to try and analyse what had happened, and in general I was interested to see what else might be visiting my website that google analytics was not going to tell me about.
I probably could have come up with some kind of logging system, but a quick hop onto Google showed me exactly what I was after (with a lot less effort) - check out the visitor logging tutorial I followed if you are interested (Beware, that link does not seem to be working at the moment). I made 2 modifications, the main one being that the example worked off text files, I changed it to work from a MySQL database. I also made the code only create a log entry for the first page visited by each visitor to the site as I did not want to fill the table with thousands of entries (yeah like my website is that popular lol).
I implemented the logging on the 10th of September, so 20 days ago at time of posting. According to Google Analytics, in those 20 days I have had 97 visitors who have looked at 212 pages. In my logging table in the database, there are nearly 2200 entries. Bearing in mind I have tried to limit the entries in the table as mentioned above (although I do not know if that will do anything against googlebot and similar web crawlers) it goes to show that a lot more visitors human and otherwise can be visiting your site.
Visitors
So, what kind of visitors did I see? First off, there was what seemed to be human visitors (browsers ranging from IE to Chrome to Firefox to even a couple of hits from Opera) as would one expect and hope for. I even had a couple of visits from users using the Lynx browser (which is a text only browser if you have not heard of it). Then there was some services I would have expected, such as the W3C Crawlers from when I used the validators to check my XHTML and CSS were up to scratch, or the "bitlybot", from the bit.ly URL shortening service website. On the crawlers front, everyones friend Googlebot made quite a few appearances, along with msnbot, but aside from that there were some surprises. The site seemed to be quite popular with old search favourite Ask Jeeves, and there were a few visits from the Baidu crawler (very interesting considering its a Chinese language search engine)! Things such as the "artViper thumbnail spider", various twitter related ones such as "twitmatic", the linkaider service which checks for broken links and the MLBot from www.metadatalabs.com are just a few of the other visitors I had that had I not implemented this logging, I would not have had a clue about.
One final thing that was interesting is that before I switched to "SEO friendly URLs", I used the old system of blog.php?id=7, or whatever the post ID number was. Many entries in the logging show crawlers or visitors attempting to access posts on the site by this kind of URL, even though as far as I am aware there are no links to URLs of this kind left on the site. I do not know if this is because of these URL's being cached in Google, or for other reasons. If anyone can help me with that question please comment or get in touch. Thanks for reading, and it just goes to show you never know who your online visitors are!
Update: Yesterday morning I submitted this post to Twitter, through a shortened bit.ly URL. The post got a bit of popularity on Twitter, as it was Retweeted 5 times (although bit.ly only counted 4, twitter showed 5) and apparently got 39 clicks, according to the bit.ly stats. An examination of the log table shows that (unsurprisingly) the bitlybot was the first visitor to the site, followed by crawlers from twiturly, MetaURI API, radian6_linkcheck, R6_Feedfetcher, the msn news bot which seemed to pick up on the new post very quickly, and even the odd seemingly human visitor (again with a variety of browsers). Bit.ly reckoned I had 39 clicks, whereas Google Analytics thought I had 13 visitors to this site, but only 5 of them read this post. Who knows what the true figures, probably somewhere in between.
P.S. I have yet to repeat my reddit experiment. When I do I will update this post with the results.
Add a comment






Same issue
Same issue
Tracking
A few answers
With regards to logs showing the old "non-seo-friendly" urls: I don't know if this is true for your hosting setup, but on mine, logs always show the non-SEO version of a URL. This is because the "friendly URL" mechanism (isapi_rewrite for IIS7, sort of like mod_rewrite for windows) grabs the friendly URL and translates it to the non-friendly for the server to interpret, before the request gets logged. This is only true for server-side logging solutions like livestats, urchin, or looking at raw .log files. Google analytics is javascript-based, so it sees only what the browser sees- It will show friendly URL's whenever they're used.
Hope this clears a couple things up:)
Human Visitor