FAQ: Default Page Hits
In my reports, I see entries for /somedir/, and /somedir, and /somedir/ (default page). What's the difference? I seem to have two hits for each hit because of this; one on /somedir and then one on /somedir/; what can I do to show that as one hit?
/somedir/ is the total hits on a directory and all its contents; /somedir is an attempt to hit that directory which was directed because it did not have the trailing slash; and the default page ones both indicate the number of hits on the directory itself (e.g., on the default page of the directory).
To understand why there are hits shown on both /somedir/ and /somedir, where "somedir" is the name of a directory (folder) in the web site, it is necessary to understand what happens when there is a browser that tries to access http://hostname/somedir . That URL is incorrect (or at best, inefficient), because it lacks the trailing divider, which implies that somedir is a file. Here's what happens in this case:
The web browser asks for a file named /somedir .
The server checks, and finds that there is no file by that name (because it's a directory). It responds with a 302 redirect to /somedir/, which basically means, "no such file, but there is a directory; maybe that's what you meant?"
The browser accepts the redirect, so now it requests a directory named /somedir/
The server notes that there is a directory by that name, and that it contains an index or default file. It responds with a 200 event, and the contents of the index file.
This looks like this in the web logs:
Sawmill reports this as two hits, because it is two hits (two lines of log data). Sawmill differentiates the aggregate traffic within a directory from traffic which directly hits a directory, by using /somedir/ to represent aggregation of traffic in the directory, and using "/somedir/ (default page)" in graphical reports to represent hits on the directory itself (i.e., hits which resulted in the display of the default page, e.g., index.html or default.asp). So in graphical reports, the second hit above appears as a hit on "/somedir/ (default page)".
A good solution to this is to make sure that all links refer to directories with the trailing slash; otherwise the server and browser have to do the elaborate dance above, which slows everything down and doubles the stats.
Another option is to reject all hits where server response starts with 3, using a log filter like this one:
if (starts_with(server_response, '3')) then 'reject'
This discards the first hit of the two, leaving only the "real" (corrected) one.
In summary, hits on /somedir/ in reports represent the total number of hits on a directory, including hits on the index page of the directory, any other files in that directory, and any other files in any subdirectory of that directory, etc. Hits on /somedir in reports represent the 302 redirects caused by URLs which lack the final /. Hits on "/somedir/ (default page)" represent hits on the default page of the directory.
For information about selecting the default page using a report filter, see Using Report Filters.