FAQ: Counting Visitors With Cookies

Can Sawmill count visitors using cookies, rather than unique hostnames?

Short Answer

Yes -- it includes a built-in log format to do this for Apache, and other servers can be set up manually.

Long Answer

Yes. The reason you'd want to do this is that using unique browsing hostnames (or IPs) to count visitors is an imprecise method, since the same actual visitor may appear to come from several hostnames -- the same person may dial up and receive random IP addresses, or in some extreme cases, their ISP may be set up so that they have a different IP address for each hit, or several actual visitors may appear as one hostname if they're all using the same proxy. The solution to this problem is to set your web server to use cookies to keep track of visitors. Apache and IIS can be configured to do this, and in both cases, Sawmill can be configured to use the cookie log field, instead of the hostname, as the basis for its "visitor" field. To do this, edit your profile (in LogAnalysisInfo/profiles) with a text editor, find the "visitors" database field (look for "database = {", then "fields = {", then "visitors = {"), and change the log_field value to your cookie field; for instance, if your cookie field is cs_cookie, change it to log_field = "cs_cookie". Note that this will only work if your entire cookie field tracks the visitor cookie, and does not track any other cookes; if you have multiple cookies, you can't use the whole cookie field as your visitor ID, and you need to use the approach described below to create a visitor_id field and use a regular expression to extract your visitor cookie into it, and then change log_field to visitor_id.

Installing the cookie tracking JavaScript

If your server or environment already tracks visitors by cookie, you can skip this section. If not, you need to add a bit of JavaScript to each of your web pages, to assign cookies to each visitor. To do this, copy the log_analysis_info.js file, from the Extras folder of your Sawmill installation, into a folder called js, in your web server root directory, and add this to every possible entry page (best to add it to every page):

Using Cookie-based Visitors IDs in Apache

In the case of Apache, it's even easier, because Sawmill includes a log format descriptor for a special "combined format plus visitor cookie" log format. The format is just normal combined format, with the visitor ID stuck at the front of each log entry. You can log in this format by adding the following lines to your httpd.conf file:

  CookieTracking on
  CookieExpires "2 weeks"
  CustomLog /var/log/httpd/cookie.log "%{cookie}n %h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\""

(replace /var/log/httpd/cookie.log above with the pathname of the log you want to create). When you point Sawmill at this log file, it will recognize it as an "Apache Combined With Visitor Cookies" log, and it will set up the log filter described above for you, so you don't have to do any manual profile at all.

Using Cookie-based Visitors IDs in IIS

IIS has built-in support for visitor cookies -- just turn on logging of the Cookie field (extended property), or tell IIS to use "W3C Extended Log File Format" for logging, and you'll get cookies in your log data. Once you've done, that, you'll need to create a "visitor_id" log field to hold the cookie information, and use that field as the bases for your visitor database field.

An Example Filter For Extracting Cookies

If your cookie field contains more than just a visitor ID, you'll need to extract the visitor ID part of the field, and put it into a separate Sawmill's "visitor id" log field. This can be done using a regular expression filter with variable replacement. First, you'll need to create a visitor ID log field. You can do this by editing the profile .cfg file (in the profiles directory of the LogAnalysisInfo directory in your installation), and find the log.fields group (search for "log =" and then forward from there for "fields ="). Add the following log field:

  visitor_id = {
    label = "visitor ID"
    type = "flat"

Next, in the same .cfg file, change database.fields.visitors.log_field to visitor_id (i.e. search for "database =", then search for "fields =", then search for "visitors =", and then set the log_field value within visitors to visitor_id), so the visitors field uses the visitor_id to determine whether two events are from the same visitor.

Then, add a log filter (in the Log Filters section of the profile Config, or in the log.filters section of the .cfg file) to extract the visitor ID from the cookie. For example example, suppose that the cookie field value looks like this:


The lavc cookie (the visitor id, 123456789 in this case) is buried inside the field, surrounded by other cookie names and values. To extract it you need a filter that grabs the part after lavc= and before &. This can be done most easily with the following filter:

    if (matches_regular_expression(cookie, "&lavc=([^&]*)&")) then visitor_id = $1

(for IIS, the value in quotes will be ASPSESSIONID[A-Z]*=([^&]*). This filter finds a section of the field starting with &lavc=, followed by a series of non-& characters, followed by a &, and it sets the visitor id to the sequence of non-& characters it found (123456789, in this case).

Once you've added the visitor id log field, and the filter to set it, and modified the visitors database field to use the visitor id as its log field, rebuild the database. Sawmill is now using the lavc value from your cookie field as your visitor id, which should make your visitors counts more accurate.