FAQ: System Requirements

How much memory, CPU power, and disk space do I need to run Sawmill?

Short Answer

At least 2GB RAM, 4 GB preferred; 500 MB disk space for an average database; and as much CPU power as you can get.

Long Answer

Sawmill is a heavy-duty number crunching program, and can use large amounts of memory, CPU, and disk. You have some control over how much it uses of each, but it still requires a reasonably powerful computer to operate properly.

Sawmill uses around 100 MB of memory when it processes a small to medium size log file, and it can use considerably more for very large log files. The main memory usage factors are the "item lists", which are tables containing all the values for a particular field. If you have a field in your data, which is very complex, and has many unique values (the URL query field for web log data is a common example of this), the item list can be very large, requiring hundreds of megabytes of memory. This memory is mapped to disk to minimize physical RAM usage, but still contributes to the total virtual memory usage by Sawmill. So for database with very complex fields, large amounts of RAM will be required. For large datasets, it is possible for Sawmill to use more than 2GB of address space, exceeding the capabilities of a 32-bit system; in this situation, it is necessary to use a 64-bit system, or a MySQL database, or both (see Database Memory Usage and Sawmill uses too much memory for builds/updates, and is slow to view). This typically will not occur with a dataset smaller than 10 GB, and if it often possible to process a much larger dataset on a 32-bit system with 2GB. A dataset over 20 GB will often run across this issue, however, so a 64-bit system is recommended for very large datasets. A large dataset is defined as 10 GB or more. A multi-core 64-bit CPU coupled with a 64-bit operating system and at least 2 GB RAM PER CORE (e.g. 8 GB recommended for a 4-core system) is highly recommended, if not required for datasets larger than 10 GB of log data. If your system cannot support the RAM usage required by your dataset, you may need to use log filters to simplify the complex database fields.

The Sawmill installation itself takes less than 50 Meg of disk space, but the database it creates can take much more. A small database may be only a couple megabytes, but if you process a large amount of log data, or turn on a lot of cross-references and ask for a lot of detail, there's no limit to how large the database can get. In general, the database will be somewhere on the order of 200% to 300% the size of the uncompressed log data in it, perhaps as much as 400% in some cases. So if you're processing 100 GB of log data, you should have 200 GB to 400 GB of disk space free on your reporting system to hold the database. If you use an external (e.g. SQL) database, the database information will take very little space on the reporting system, but will take a comparable amount of space on the database server.

Disk speed is something else to consider also when designing a system to run Sawmill. During log processing, Sawmill makes frequent use of the disk, and during statistics viewing it uses it even more. Many large memory buffers are mapped to disk, so a disk speed can have a very large impact on database performance, both for processing log data and querying the database. A fast disk will increase Sawmill's log processing time, and the responsiveness of the statistics. SCSI is better than IDE, and SCSI RAID is best of all.

During log processing, especially while building cross-reference tables, the CPU is usually the bottleneck -- Sawmill's number crunching takes more time than any other aspect of log processing, so the rest of the system ends up waiting on the CPU most of the time. This means that any improvement in CPU speed will result in a direct improvement in log processing speed. Sawmill can run on any system, but the more CPU power you can give it, the better. Large CPU caches also significantly boost Sawmill's performance, by a factor of 2x or 3x in some cases.