$Id: README,v 1.1.1.1 2003/05/20 12:49:06 perry Exp $ PIPE2 scripts, May 2003 ----------------------- 1. Overview 2. Details 3. Implementation 4. Sample Runs ################################################################################ 1. Overview ----------- These are example scripts for using multiple VFind daemon processes run in the background using named pipes for input. VFind-13.6.0 or later is required due to necessary use of the -i,--ignore-eof option. These scripts have been tested on Solaris and Debian/Linux. One motivation for these scripts is to demonstrate how to reduce the startup overhead of VFind initialization. VFind daemons can be started just once, then files can be processed and scanned as desired thereafter, without incurring any initialization overhead for each file scanned. Another motivation is to demonstrate use of multiple VFind daemons running in parallel, which can be especially effective in using system resources for machines with multiple CPUs, and also has some benefits and usefulness even on single CPU machines. The three example scripts are: pipes.sh Example script to start/stop multiple VFind daemon processes using named pipes for input. files.sh Example script to process files using one VFind daemon. pfiles.sh Example script to process files using multiple VFind daemons in parallel. Please check the script CONFIGURATION sections, and modify the scripts to suit your requirements before trying to use them. CyberSoft, Inc. is not responsible for any damage, be it physical or mental, caused or indirectly caused by these example scripts. Copyright (c) May 2003 by CyberSoft, Incorporated. ################################################################################ 2. Details ---------- This is an update of older scripts from May 2002 which only started one VFind daemon and were unreliable on some systems. This version can start multiple VFind daemons, and is more robust. The input pipes and output files for each daemon will be named in1, out1, in2, out2, etc. A file named "NUM" will be created to record the number of VFind daemons started; a file named "PIDS" will be created to record the process ids of the VFind daemons; a file named "UAD" will be created to record usage of the VFind -ssr, --smartscan-read option. The pipes and files will be created in an existing directory specified by the PIPEDIR environment variable, or in the current directory if PIPEDIR is not defined. Note that the VFind daemons will be started in the PIPEDIR directory, so file names to be processed must either be relative to PIPEDIR, or absolute path names. Usage: pipes.sh [-num] start|stop [VFind options...] num is the number of VFind daemon processes to start and defaults to 1. One of start or stop must be specified, optionally followed by additional VFind start options to use, such as -ssr for smartscan input. VFind is run with the following required options, plus any additional options specified as arguments to this script: -i -p --quiet=1 files.sh [-num] files... num is the VFind daemon pipe number to write to, default is 1. files.sh processes each file in sequence and writes it to a single VFind daemon pipe. If the VFind daemons were started with the -ssr, --smartscan-read option then uad -ssw will be used to write the files to VFind. pfiles.sh [files...] pfiles.sh processes each file in sequence and writes it to the next available VFind daemon pipe. If no files are specified on the command line, then file names are read from standard input. The output from multiple VFind daemons used is printed in parallel. If the VFind daemons were started with the -ssr,--smartscan-read option then uad -ssw will be used to write the files to VFind. ################################################################################ 3. Implementation ----------------- VFind-13.6.0 has a new -i,--ignore-eof option to ignore end-of-file: -i, --ignore-eof Ignore EOF and keep trying to read input file names or SmartScan input. Use of this option is necessary when reading from a named pipe (FIFO) because VFind will see EOF on read if no process currently has the FIFO open for writing. With the -i option, if VFind sees EOF then it just does sleep(1); clearerr(fin); and continues trying to read input. The implementation of the -i option also enables reading from multiple sequential and independent smartscan input streams, so when one smartscan input stream is finished, and VFind sees EOF, it will continue and keep trying to read a new smartscan input stream. Output to a FIFO is difficult to implement reliably. If there is no process currently reading the FIFO, there will be an error on writing or flushing output, plus there will be a SIGPIPE signal. The signal can be caught, but that doesn't help much, the output will still be messed up. Therefore, for the new VFind daemon setup, the input is a named pipe, but the output is just a plain file. tail from the shell or fseek() in a C program can be used to start reading from the end of the output file. The sample scripts use several measures to prevent deadlock and interference between parallel processing instances when using VFind daemons. Deadlock can occur if one process is waiting for input from another process which is waiting for output from the first process. Interference between parallel processes can occur if two or more processes try to write input at the same time to the same daemon. To prevent deadlock, pipes.sh writes an initial newline character to each VFind daemon output file. This guarantees that when files.sh or pfiles.sh starts to read VFind output (which they must do before writing anything, otherwise some output may be lost), there will at least be a newline character available to read. When the shell starts a VFind daemon process with input from a named pipe, it doesn't really start the process until some input is available from the pipe. Thus the initialization overhead of VFind startup doesn't really occur until the first file is processed. files.sh and pfiles.sh also check for and skip unreadable, non-regular, empty, and symlink files, which would cause deadlock due to abnormal or missing output if used as input for uad or VFind. To prevent interference between parallel processing instances, pfiles.sh uses process id files to record the pid of each tail output process. These pid files are used as a locking mechanism to prevent reuse of a VFind daemon process which is already in use; and the absence of a pid file serves to identify VFind daemon processes which are ready for more input. ################################################################################ 4. Sample Runs -------------- The sample runs shown here were performed in a subdirectory of the directory containing the pipes.sh, files.sh, and pfiles.sh scripts. 4.1. This sample run starts just one VFind daemon, and scans one file: % ../pipes.sh start ../pipes.sh: running vfind [21891] < in1 >> out1 % date; ../files.sh ../README; date Thu May 8 20:21:38 EDT 2003 ../files.sh: writing ../README to in1 ##==> Checking file: "../README" ##==> Number of possible virus infections found in file "../README": 0 Thu May 8 20:21:47 EDT 2003 Note that it took 9 seconds to scan the first file, due to VFind startup/ initialization overhead. Scanning the same file again only takes 1 second: % date; ../files.sh ../README; date Thu May 8 20:21:54 EDT 2003 ../files.sh: writing ../README to in1 ##==> Checking file: "../README" ##==> Number of possible virus infections found in file "../README": 0 Thu May 8 20:21:55 EDT 2003 % ../pipes.sh stop ../pipes.sh: stopping vfind 21891 The final command above uses pipes.sh to stop the VFind daemon process. --- 4.2. This sample run starts two VFind daemon processes, with smartscan input, and scans two files at the same time: First, the daemons are started: % ../pipes.sh -2 start -ssr ../pipes.sh: running vfind [22006] -ssr < in1 >> out1 ../pipes.sh: running vfind [22009] -ssr < in2 >> out2 then, in one window, a file is processed using daemon #1: % ../files.sh tuna.tar ../files.sh: writing tuna.tar via uad to in1 ##==> Checking file: "tuna.tar" -> "tuna.msg" ##==> Checking file: "tuna.tar" -> "tuna.newout" ##==> Checking file: "tuna.tar" -> "tuna.txt" ##==> Checking file: "tuna.tar" -> "tuna.vdl" ##==> Number of possible virus infections found in file "tuna.tar": 0 and simultaneously, in a separate window, another file is processed using daemon #2: % ../files.sh -2 mac.zip ../files.sh: writing mac.zip via uad to in2 ##==> Checking file: "mac.zip" -> "mac-bad.vdl" ##==> Checking file: "mac.zip" -> "mac1.vdl" ##==> Checking file: "mac.zip" -> "mac1a.txt" ##==> Checking file: "mac.zip" -> "mac1b.txt" ##==> Checking file: "mac.zip" -> "mac2.txt" ##==> Checking file: "mac.zip" -> "mac2.vdl" ##==> Number of possible virus infections found in file "mac.zip": 0 --- 4.3. This sample run starts three VFind daemon processes, with smartscan input, and uses pfiles.sh to distribute scanning of seven input files over the available daemons: First, the daemons are started: % ../pipes.sh -3 start -ssr ../pipes.sh: running vfind [22254] -ssr < in1 >> out1 ../pipes.sh: running vfind [22257] -ssr < in2 >> out2 ../pipes.sh: running vfind [22260] -ssr < in3 >> out3 then pfiles.sh is used to feed the files: % ../pfiles.sh mac.zip ../../../magistr/earliest.exe ../*.sh tuna.tar data.gz ../pfiles.sh: writing mac.zip via uad to in1 ../pfiles.sh: writing ../../../magistr/earliest.exe via uad to in2 ../pfiles.sh: writing ../files.sh via uad to in3 ##==> Checking file: "../../../magistr/earliest.exe" ##==>>>> VIRUS POSSIBLE IN FILE: "../../../magistr/earliest.exe" ##==>>>> VIRUS ID: CVDL Earliest ##==>>>> VIRUS END OFFSET: 190, matched: x00x000x00x00x00x00@x00x00x10x00x00x00x02 ##==> Checking file: "../files.sh" ##==> Number of possible virus infections found in file "../files.sh": 0 ##==> Number of possible virus infections found in file "../../../magistr/earliest.exe": 1 ../pfiles.sh: writing ../pfiles.sh via uad to in3 ../pfiles.sh: writing ../pipes.sh via uad to in2 ##==> Checking file: "mac.zip" -> "mac-bad.vdl" ##==> Checking file: "mac.zip" -> "mac1.vdl" ##==> Checking file: "mac.zip" -> "mac1a.txt" ##==> Checking file: "mac.zip" -> "mac1b.txt" ##==> Checking file: "mac.zip" -> "mac2.txt" ##==> Checking file: "mac.zip" -> "mac2.vdl" ##==> Number of possible virus infections found in file "mac.zip": 0 ##==> Checking file: "../pfiles.sh" ##==> Number of possible virus infections found in file "../pfiles.sh": 0 ##==> Checking file: "../pipes.sh" ##==> Number of possible virus infections found in file "../pipes.sh": 0 ../pfiles.sh: writing tuna.tar via uad to in2 ../pfiles.sh: writing data.gz via uad to in3 ##==> Checking file: "tuna.tar" -> "tuna.msg" ##==> Checking file: "tuna.tar" -> "tuna.newout" ##==> Checking file: "tuna.tar" -> "tuna.txt" ##==> Checking file: "tuna.tar" -> "tuna.vdl" ##==> Number of possible virus infections found in file "tuna.tar": 0 ##==> Checking file: "data.gz" -> "" ##==> Number of possible virus infections found in file "data.gz": 0 Note that after initially feeding one file each to in1, in2, and in3, the remaining files are distributed to the daemons as they become available; in this case the order was in3, in2, in2, in3 for the remaining four files.