Optimizing VSTK Performance

Historically, virus scanning and pattern recognition were dependent on two things: disk I/O performance, and processing power. Essentially, how fast can you read the data, and how fast can you process it. Because of this, tuning and optimizing VSTK for performance applications is very much hardware dependent. However, there is still opportunity to improve performance by optimizing how system resources are utilized, and VSTK gives you the flexibility to adapt the product to many different environments.
 
The key to increasing performance with VSTK (and specifically VSTK-Turbo) is parallelism. This means removing data flow bottlenecks and fully utilizing the CPU. The first rule of thumb is to avoid using pipes while invoking components of VSTK at all costs. This causes a situation where data can only be processed in a linear fashion, restricting the flow of data into VFind-MT. For example, using the find command in conjunction with UAD and VFind-MT connected via pipes, will cause VFind-MT to only be as fast as UAD can process, which is one file at a time.

This situation can be easily avoided with the --uad flag which is built into VFind. When used in conjunction with the --threads flag, VFind-MT will invoke one UAD process for every VFind process, thus removing the bottleneck caused when using only one UAD process. Threading brings us to your next consideration in the optimization process. It is a common misconception that you should have close to a one-to-one relationship of processes to processor cores. Most of the time, this rule does not apply and will not be effective because you will not be utilizing all system resources available to you.

In many cases, it is actually much more beneficial to run many times more threads than you have physical cores on your processor. The number of threads you use, directly equates to the number of files VFind is able to process at any given time. If you are using 16 threads, and there is still more than enough processing power left over, and the disk is not being thrashed, then you have the potential to process more data at once simply by increasing your thread count.

The optimal thread count will differ from system to system. Different types of processors and different types of disk setups will handle vastly different types of work loads. A system with many large files and archives will largely benefit from a lower thread count as a lot of processing power is required. A system with many small files will benefit from a higher number of threads as the files can be read and processed more quickly. Logically then, systems with a wide variety of data will find that its optimal thread count is somewhere in the middle.

Eventually you may reach the limits of your current hardware, at which point you may need to optimize your hardware configuration. As previously mentioned, processing power and disk I/O are where most bottlenecks occur, so these will be our focus. Processors which cope well with many small tasks run in parallel, such as most modern multi-core desktop processors, will work best. Generally speaking, the more physical cores you have in a system, the better VSTK will be able to perform.

Hard drives are the most common source of I/O bottlenecks. Hard disk drives are notoriously slow because the disks are physically too slow to provide a lot of data very rapidly. This was a problem up until recently when solid state disks or SSDs became widely available at an affordable price point. Since SSDs are mostly random access, it is possible to read from them at speeds exceeding 500 MB/s, making them perfect candidates for increasing VSTK performance.

We have demonstrated and conclusively proven that, with relatively inexpensive off-the-shelf hardware, to process data at a rate of up to 2.43 gigabytes per minute with room for improvement. During our testing, we used 8-core desktop processors with SSDs to achieve these speeds. A highly parallel system configuration and simple product optimization made it all possible. With enterprise level hardware, it would be theoretically possible to process orders of magnitude more information. This is only the beginning.