FAQ: Typical Usage Patterns
How does a typical company use Sawmill; what does a typical Sawmill setup look like?
Installations vary from customer to customer--Sawmill provides enough flexibility to let you choose the model that works best for you.
There are quite a lot of different "models" that different customers use. For web server analysis, it is common to have Sawmill running on the active web server, either stand-alone or in web server mode, accessing the growing log files directly; this works well as long as the dataset is not too large and the server is not too heavily loaded. For very large datasets, however, many customers have dedicated Sawmill machines, which pull the logs over the network from the server(s). Databases are generally updated regularly; it's common to have them updated in the middle of the night, every night, using the Sawmill Scheduler or an external scheduler like cron.
In terms of the database layout, some common models include:
A single database. Most customers use a single large database that contains all their data. This works well if you have a lot of disk space and a fast computer (or computers) to process your logs with, or if your log data is not too large. You can use Sawmill's normal filtering features to zoom in on particular parts of the data, but it's all stored in a single database. Sawmill has other features that can be used to limit certain users to certain parts of the database; this is particularly useful for ISPs who want to store all their customers' statistics in a single large database, but only let each customer access their own statistics.
A "recent" database and a long-term database. In cases where log data is fairly large (say, more than 10 Gigabytes), or where disk space and/or processing power is limited, some customers use two databases, one in detail for the recent data (updated and expired regularly to keep a moving 30-day data set, for instance), and the other less detailed for the long-term data (updated regularly but never expired). The two databases combined are much smaller than a single one would be because they use less overall information, so it takes less time to process the logs and to browse the database. This is often acceptable because fine detail is needed only for recent data.
A collection of specialized databases. Some customers use a collection of databases, one for each section of their statistics. This is particularly useful for log data in the multi-Terabyte range; a tightly-focused database (for instance, showing only hits on the past seven days on a particular directory of the site) is much smaller and faster than a large all-encompassing database. This is also useful if several log files of different types are being analyzed (for instance, an ISP might have one database to track bandwidth usage by its customers, another to track internal network traffic, another to track usage on its FTP site, and another to track hits on its own web site).
There are a lot of options, and there's no single best solution. You can try out different methods, and change them if they're not working for you. Sawmill provides you the flexibility to choose whatever's best for you.