Copyright © August 2001 by CyberSoft, Incorporated.
Permission is granted to any individual or institution to use, copy, or redistribute this document so long as it is not sold for profit, and provided that it is reproduced whole and this copyright notice is retained.
CVDL is the CyberSoft Virus Description Language. It is used to define patterns for virus scanning by the VFind Security ToolKit from CyberSoft, Incorporated.
The presentation here is a tutorial on using CVDL to create your own patterns. It starts with some simple examples, then presents an overview of all of the CVDL operators.
** VFind™ Version 11.3.0 or higher is required **
The format for specifying a VDL is:
:name, definition#
The VDL name can consist of any characters, including spaces, but can not contain a comma, since the comma is used to separate the name from the definition. The VDL definition consists of a sequence of patterns and operators. The definition is terminated by the # character.
The simplest type of patterns are literal strings, e.g. "abc" or "\x01\x4d\xaa", where "\x.." is used to specify bytes in hex. Prefixing a string with a tilde (~) specifies case-insensitive matching, e.g. ~"abc" will match "abc", "ABC", "AbC", etc.
The simplest type of operators are the high-level logic operators: AND, OR, NOT, XOR.
The ex1 example VDL uses some strings and logic operators:
:ex1, "pets" AND "cat" OR "dog" #
The precedence of the logic operators is, in order of decreasing precedence: NOT, AND, XOR, OR. Parentheses can be used to group operations into a desired execution order irrespective of operator precedence.
Due to the higher precedence of AND as compared to OR, VDL ex1 from above is equivalent to:
:ex1, ("pets" AND "cat") OR "dog" #
Which will match data that contains either: the word pets anywhere and the word cat anywhere; or the word dog anywhere.
If the intention is to match data containing the word pets anywhere, and either the word dog or the word cat anywhere, the VDL should be written like this instead:
:ex2, "pets" AND ("cat" OR "dog") #
Which is equivalent to this more verbose form:
:ex2, ("pets" AND "cat") OR ("pets" AND "dog") #
XOR is the exclusive-OR operator, which by definition can be expressed using AND and OR.
For example, VDL ex3:
; guns or bullets, but not both
:ex3, ("guns" AND NOT "bullets") OR ("bullets" AND NOT "guns") #
Can be expressed more efficiently using XOR:
:ex3, "guns" XOR "bullets" # ; guns or bullets, but not both
ex3 also illustrates the use of the semicolon (;) in VDL files to create comments which extend to the end of the line.
NOT by itself seems to be a strange logical operator for pattern matching, since it means that we have a match if some pattern is not present.
However, consider a situation where all files or email must contain a certain notice, and we want to detect the lack of that notice. An example pattern is:
:ex4, NOT "Copyright 2001, CyberSoft, Inc." #
VDL pattern elements are concatenated into larger patterns by using a comma, for example:
"abc", "def"
is the same as:
"abcdef"
In a VDL definition the comma means: followed by.
You can specify an offset range for concatenation of strings and other VDL patterns using the @ operator, for example:
"abc", @0-10, "def"
will match "abc" followed by "def" at an offset of anywhere from 0 to 10 bytes from the end of "abc". So this will match "abcdef", "abcXdef", ..., "abcXXXXXXXXXXdef", where X represents any byte.
There are two special operators, WS0 and WS1, for offsets consisting of white space characters, where white space is defined as " " (blank), "\t" (tab), or "\\\n" (backslash newline):
WS0 - matches zero or more white space characters
WS1 - matches one or more white space characters
Examples:
The ABS operator is used to specify an absolute offset from the beginning of the scanned data to match a pattern.
Example VDLs:
The a1 VDL checks for a Bourne shell script file header.
The a2 VDL checks for "abc" followed by "def" within the next 20 bytes, and "01234" at absolute position 14.
The MS/VBA VDL uses ABS 0 to check for the 8-byte Microsoft signature header which appears at the very beginning of most Microsoft application files.
The ~~ operator was designed for matching phone numbers, and is an extension of the case-insensitive string matching ~ operator. ~~ performs case-insensitive string matching while skipping any white space and punctuation characters. White space and punctuation characters are those defined by the C library isspace() and ispunct() functions.
Examples:
~~"123-456-7890"
matches "(123)-456-7890" and "123.456.7890" and
"1 2 3 - 4 5 6 - 7 8 9 0", etc.
~~"800 FREE CAR"
matches "(800) - F r e e C a r !!!", etc.
The \d+ operator matches one or more digits, and is named after the similar Perl operator.
This can be used, for example, to detect obfuscated URLs:
"http://", \d+, "/"
matches URLs like: http://3626287830/
"http://0", \d+, ".", "0", \d+, ".", "0", \d+, ".", "0", \d+, "/"
matches URLs like: http://00000325.0000030.00000341.00000116/
and can also be written using a macro:
$define zerod "0", \d+
"http://", $zerod, ".", $zerod, ".", $zerod, ".", $zerod, "/"
CVDL macros will be discussed in more detail later in the tutorial.
The ~# operator matches only digits, skipping all other characters, over a default maximum range of 30 bytes of scanned input data.
The maximum range of scanned input data can be specified by placing a number between ~# and the digit string.
Examples:
~#"code 1234 sub-code 567"
matches the digits 1234567 in sequence, regardless of any intervening
non-digit characters, over any 30 byte range of scanned input data,
e.g. it will match "1abc2efg34---5 6 7"
~#60"code 1234 sub-code 567"
As above, but over a maximum range of 60 bytes of input data.
The low-level or operator | specifies the occurrence of patterns at a position relative to the preceding pattern in the scanned data.
For example:
"x", ("a" | "b"), "y"
matches "x", followed by "a" or "b", followed by "y" at some position in the scanned data.
Do not confuse the low-level | operator with the high-level OR operator. The high-level OR operator specifies the occurrence of patterns at any positions in the scanned data.
Bytes, byte ranges, and compliments of bytes and byte ranges can be specified using characters and decimal or hex integers. For example:
65
0x41
'\x41'
'A'
any of these matches a byte whose value is 65 (decimal)
'a'-'f'
matches a byte whose value is in the range 0x61-0x7a
^0-10
matches a byte whose value is not in the range 0-10
Fuzzy expressions are a convenient way of specifying ranges for bytes and strings. Fuzziness is specified using integers which represent plus and minus offsets.
Examples:
Recognized case-sensitive spellings for the fuzzy operator are: FUZZY, Fuzzy, fuzzy, FUZZ, Fuzz, fuzz
Multiple occurrences of bytes, byte ranges, strings, or case-insensitive strings can be specified by using [number] after the expression. For example:
15[20]
0xF[0x14]
either form matches 20 occurrences of the byte value 15
0-10[40]
matches a sequence of 40 bytes whose values are in the range 0..10
"X"[3]
matches "XXX"
A VDL macro is specified using $define as the first word on a line, and the entire macro definition must be contained all on one line.
The syntax is:
$define name value
where the line contains: optional leading white space, $define, white space, name, white space, value.
The macro name must start with a letter [a-zA-Z_] followed by 0 or more letters or digits. Macro names are case-sensitive. The value continues to the end of the line, and trailing white space is trimmed.
VDL macros are invoked by specifying their name after a $ character.
Macros are lexical tokens, which means that they can not be confused with other tokens, e.g. strings. Thus:
"abc", $mac, ...
invokes the macro named mac, but:
"abc$mac", ...
Does not invoke any macro, and is simply a literal string.
VDL macros can be nested to unlimited depth, so macros can refer to other macros in their definition. Macros can not be used in a VDL rule before being defined, but they can be used in other VDL macro $define's before being defined. Macros have per-file scope; macros defined in one VDL file do not carry over to other VDL files.
Examples:
$define pf1 $pets AND $food
$define pets "dog" OR "cat"
$define food "fish" OR "pie"
$define pf2 ($pets) AND ($food)
:v1, $pf1 AND "ate" #
:v2, "ate" AND $pf2 #
Note that pf1 resolves to:
"dog" OR "cat" AND "fish" OR "pie"
which is the same as:
"dog" OR ("cat" AND "fish") OR "pie"
but pf2 resolves to:
("dog" OR "cat") AND ("fish" OR "pie")
As with C/C++, #define macros, parentheses may be used in the VDL macro definition or invocation to ensure that the intended result is obtained.
File type restriction directives may be specified in CVDL files. When specified outside of a VDL, these directives have VDL file scope and apply only for VDL rules which appear following the directive in the same VDL file. When specified as part of a VDL, the directives apply only for that particular VDL. The file type restrictions apply only if VFind is reading SmartScan input from UAD, which reports the file types.
The directives and their meanings are:
If the SmartScan file type reported by UAD is "unknown", or if VFind is run standalone (without SmartScan input), then all VDL file type restrictions are ignored and everything is scanned.
Examples:
:v1,"..."# ; all file types
<"text"> ; only "text" file types for the following vdls
:v2,"..."#
:v3,"..."#
<!"HTML"> ; no "HTML" file types for the following vdls
:v4,"..."#
:v5, <"JPEG","GIF"> "..."# ; only "JPEG" and "GIF" file types for v5
:v6,"..."#
<> ; all file types for the following vdls
:v7,"..."#
Matching for file type restrictions is case-sensitive, and only requires that the VDL-restricted type be a substring of the SmartScan-reported type.
Versions for VDL files and rules can now be reported using an extension to the file type restriction syntax. If you use a string starting with version= in a file type restriction directive, whatever follows the = character in that string will be printed as an informative message about the version of the VDL file or rule.
Here is an example which specifies a version for the VDL file and a version for VDL rule `b':
% cat v.vdl
<"text","version=1.2.3">
:a, "abc"#
:b, <"version=9.9"> "bbb"#
% vfind --vdl=v.vdl hi
...
##==>> Loading VDL code from: v.vdl
##==>> All SmartScan file types disabled.
##==>> SmartScan file type `*text*' enabled.
##==>> VDL file `v.vdl' Version: 1.2.3
##==> VDL model for `a' loaded.
##==> VDL `b' Version: 9.9
##==> VDL model for `b' loaded.
##==> Checking file: "hi"
...
VFind uses a parallel search engine, so scanning run-time is mostly independent of the number of VDL rules.
The default VFind parallel search engine does not handle case-insensitive ~"..." VDL strings, so VDLs containing such strings generally run slower.
But there is a special parallel search engine which does handle case-insensitive string matching.
To specify a VDL file which mostly contains case-insensitive string matches, use the --vdlc= option instead of --vdl=
All VDL rules are checked precisely by VFind if triggered by the parallel search engines, to avoid false detects.
The --vdlc= option only affects the scanning speed and does not affect the precision of virus detection.