DOTE

Chain And Rate

Tuesday, June 18, 2013

Browser User Agents

So by now you are probably wondering, why bother trying to identify browsers? Not only would it be poor design to incorporate browser sniffing into a site, but it would also be unreliable since many browsers allow the user to set their own agent string. Well, if you are curious about questions like market share, and trends in the Internet community, examination of agent strings is still the only practical way to get a meaningful and sizeable snapshot of which browsers are being used on the web.

And I should add that people who set their agent string to "None of your business" or some rude four-letter words, are not being counted. In other words if you prefer to use Galeon and you would like the rest of the world (including statisticians, economists, lesgislators, spin-doctors, advocates etc) to know that there are a few Galeon users in the world, the best way to have a say, and in effect vote for Galeon would be to use the standard agent string that shipped with the Galeon distribution. The logic behind this simple analysis was based on the small sample of agent strings that I had collected and the few browsers that I was using (various versions of lynx, Links, Netscape, Mozilla, MSIE and Konqueror).


I based the original logic on strings like the following:
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 4.0)
Mozilla/4.78 [en] (Win98; U)
The logic for this was very simple:
If the first word is 'Mozilla/4.0' If the second word is '(compatible;' use the third word as the browser type Else Treat it as Mozilla/4.0. Else Carry out remaining checks.


This is based on the observation that MSIE always claims to be Mozilla/4.0. There was a flaw in this logic however. It turned out that there were many agent strings like:
Mozilla/4.0 (compatible; MSIE 5.0; Windows NT 4.0) Opera 6.04  [en]
On the face of it this would seem to be Opera 6.04 pretending to be MSIE 5.0 pretending to be Netscape pretending to be Mozilla/4.0.
This means that an agent string such as the one above is no longer counted as MSIE. The new browser and robot detection logic has been placed in a seperate script called agent_id. This contains two perl subroutines, which_browser() and which_robot(). If you feed an agent string to these routines they should return the name of the browser or the robot.
In order to test the script you could save the agent_id to /MyPath/agent_id, and use a script like the following:
#!/usr/bin/perl

require "/MyPath/agent_id";

while(<>){

$agent = which_browser($_);

print "$agent\t$_\n";

}
You can now feed the browser agent strings to this script and it should print the results as browser and full agent string (separated by a tab).