CVDL Tutorial

Introduction

Copyright © August 2001 by CyberSoft, Incorporated.

Permission is granted to any individual or institution to use, copy, or redistribute this document so long as it is not sold for profit, and provided that it is reproduced whole and this copyright notice is retained.

CVDL is the CyberSoft Virus Description Language. It is used to define patterns for virus scanning by the VFind Security ToolKit from CyberSoft, Incorporated.

The presentation here is a tutorial on using CVDL to create your own patterns. It starts with some simple examples, then presents an overview of all of the CVDL operators.

** VFind™ Version 11.3.0 or higher is required **

VDL Format

The format for specifying a VDL is:

:name, definition#

The VDL name can consist of any characters, including spaces, but can not contain a comma, since the comma is used to separate the name from the definition. The VDL definition consists of a sequence of patterns and operators. The definition is terminated by the # character.

The simplest type of patterns are literal strings, e.g. "abc" or "\x01\x4d\xaa", where "\x.." is used to specify bytes in hex. Prefixing a string with a tilde (~) specifies case-insensitive matching, e.g. ~"abc" will match "abc", "ABC", "AbC", etc.

The simplest type of operators are the high-level logic operators: AND, OR, NOT, XOR.

String and Logic Examples

The ex1 example VDL uses some strings and logic operators:

:ex1, "pets" AND "cat" OR "dog" #

The precedence of the logic operators is, in order of decreasing precedence: NOT, AND, XOR, OR. Parentheses can be used to group operations into a desired execution order irrespective of operator precedence.

Due to the higher precedence of AND as compared to OR, VDL ex1 from above is equivalent to:

:ex1, ("pets" AND "cat") OR "dog" #

Which will match data that contains either: the word pets anywhere and the word cat anywhere; or the word dog anywhere.

If the intention is to match data containing the word pets anywhere, and either the word dog or the word cat anywhere, the VDL should be written like this instead:

:ex2, "pets" AND ("cat" OR "dog") #

Which is equivalent to this more verbose form:

:ex2, ("pets" AND "cat") OR ("pets" AND "dog") #

XOR and NOT

XOR is the exclusive-OR operator, which by definition can be expressed using AND and OR.

For example, VDL ex3:

; guns or bullets, but not both
:ex3, ("guns" AND NOT "bullets") OR ("bullets" AND NOT "guns") #

Can be expressed more efficiently using XOR:

:ex3, "guns" XOR "bullets" # ; guns or bullets, but not both

ex3 also illustrates the use of the semicolon (;) in VDL files to create comments which extend to the end of the line.

NOT by itself seems to be a strange logical operator for pattern matching, since it means that we have a match if some pattern is not present.

However, consider a situation where all files or email must contain a certain notice, and we want to detect the lack of that notice. An example pattern is:

:ex4, NOT "Copyright 2001, CyberSoft, Inc." #

Concatenation and Offsets

VDL pattern elements are concatenated into larger patterns by using a comma, for example:

"abc", "def"

is the same as:

"abcdef"

In a VDL definition the comma means: followed by.

You can specify an offset range for concatenation of strings and other VDL patterns using the @ operator, for example:

"abc", @0-10, "def"

will match "abc" followed by "def" at an offset of anywhere from 0 to 10 bytes from the end of "abc". So this will match "abcdef", "abcXdef", ..., "abcXXXXXXXXXXdef", where X represents any byte.

Whitespace

There are two special operators, WS0 and WS1, for offsets consisting of white space characters, where white space is defined as " " (blank), "\t" (tab), or "\\\n" (backslash newline):

WS0 - matches zero or more white space characters
WS1 - matches one or more white space characters

Examples:

  • "/bin/rm", WS1, "-rf", WS1, "/"
  • "cat", WS0, ">>", WS0, "/etc/passwd"
  • ~"Bulletproof", WS1, ~"Web", WS1, ~"Hosting"

Absolute Offsets

The ABS operator is used to specify an absolute offset from the beginning of the scanned data to match a pattern.

Example VDLs:

  • :a1, ABS 0, "#!", WS0, "/bin/sh" #
  • :a2, "abc", @0-20, "def" AND ABS 14, "01234" #
  • :MS/VBA, ABS 0, "\xD0\xCF\x11\xE0\xA1\xB1\x1A\xE1" AND "\xFE\xCA" #

The a1 VDL checks for a Bourne shell script file header.

The a2 VDL checks for "abc" followed by "def" within the next 20 bytes, and "01234" at absolute position 14.

The MS/VBA VDL uses ABS 0 to check for the 8-byte Microsoft signature header which appears at the very beginning of most Microsoft application files.

Phone Numbers

The ~~ operator was designed for matching phone numbers, and is an extension of the case-insensitive string matching ~ operator. ~~ performs case-insensitive string matching while skipping any white space and punctuation characters. White space and punctuation characters are those defined by the C library isspace() and ispunct() functions.

Examples:

~~"123-456-7890"

matches "(123)-456-7890" and "123.456.7890" and
"1 2 3 - 4 5 6 - 7 8 9 0", etc.

~~"800 FREE CAR"

matches "(800) - F r e e  C a r !!!", etc.

Digits

The \d+ operator matches one or more digits, and is named after the similar Perl operator.

This can be used, for example, to detect obfuscated URLs:

"http://", \d+, "/"

matches URLs like:  http://3626287830/

"http://0", \d+, ".", "0", \d+, ".", "0", \d+, ".", "0", \d+, "/"

matches URLs like:  http://00000325.0000030.00000341.00000116/
and can also be written using a macro:

$define zerod  "0", \d+
"http://", $zerod, ".", $zerod, ".", $zerod, ".", $zerod, "/"

CVDL macros will be discussed in more detail later in the tutorial.

Only Digits

The ~# operator matches only digits, skipping all other characters, over a default maximum range of 30 bytes of scanned input data.

The maximum range of scanned input data can be specified by placing a number between ~# and the digit string.

Examples:

~#"code 1234 sub-code 567"

matches the digits 1234567 in sequence, regardless of any intervening
non-digit characters, over any 30 byte range of scanned input data,
e.g. it will match "1abc2efg34---5 6 7"

~#60"code 1234 sub-code 567"

As above, but over a maximum range of 60 bytes of input data.

Low Level OR

The low-level or operator | specifies the occurrence of patterns at a position relative to the preceding pattern in the scanned data.

For example:

"x", ("a" | "b"), "y"

matches "x", followed by "a" or "b", followed by "y" at some position in the scanned data.

Do not confuse the low-level | operator with the high-level OR operator. The high-level OR operator specifies the occurrence of patterns at any positions in the scanned data.

Byte Expressions

Bytes, byte ranges, and compliments of bytes and byte ranges can be specified using characters and decimal or hex integers. For example:

65
0x41
'\x41'
'A'

any of these matches a byte whose value is 65 (decimal)

'a'-'f'

matches a byte whose value is in the range 0x61-0x7a

^0-10

matches a byte whose value is not in the range 0-10

Fuzzy Expressions

Fuzzy expressions are a convenient way of specifying ranges for bytes and strings. Fuzziness is specified using integers which represent plus and minus offsets.

Examples:

  • FUZZY 2 100
  • FUZZY +-2 100         are the same as: 98-102
  • FUZZY -2 +2 100
  • FUZZY -2 +3 "cow"     is the same as: 'a'-'f', 'm'-'r', 'u'-'z'

Recognized case-sensitive spellings for the fuzzy operator are: FUZZY, Fuzzy, fuzzy, FUZZ, Fuzz, fuzz

Repetition Expressions

Multiple occurrences of bytes, byte ranges, strings, or case-insensitive strings can be specified by using [number] after the expression. For example:

15[20]
0xF[0x14]

either form matches 20 occurrences of the byte value 15

0-10[40]

matches a sequence of 40 bytes whose values are in the range 0..10

"X"[3]

matches "XXX"

Defining VDL Macros

A VDL macro is specified using $define as the first word on a line, and the entire macro definition must be contained all on one line.

The syntax is:

$define name value

where the line contains: optional leading white space, $define, white space, name, white space, value.

The macro name must start with a letter [a-zA-Z_] followed by 0 or more letters or digits. Macro names are case-sensitive. The value continues to the end of the line, and trailing white space is trimmed.

Using VDL Macros

VDL macros are invoked by specifying their name after a $ character.

Macros are lexical tokens, which means that they can not be confused with other tokens, e.g. strings. Thus:

"abc", $mac, ...

invokes the macro named mac, but:

"abc$mac", ...

Does not invoke any macro, and is simply a literal string.

VDL macros can be nested to unlimited depth, so macros can refer to other macros in their definition. Macros can not be used in a VDL rule before being defined, but they can be used in other VDL macro $define's before being defined. Macros have per-file scope; macros defined in one VDL file do not carry over to other VDL files.

VDL Macro Examples

Examples:

$define pf1 $pets AND $food
$define pets "dog" OR "cat"
$define food "fish" OR "pie"
$define pf2 ($pets) AND ($food)
:v1, $pf1 AND "ate" #
:v2, "ate" AND $pf2 #

Note that pf1 resolves to:

"dog" OR "cat" AND "fish" OR "pie"

which is the same as:

"dog" OR ("cat" AND "fish") OR "pie"

but pf2 resolves to:

("dog" OR "cat") AND ("fish" OR "pie")

As with C/C++, #define macros, parentheses may be used in the VDL macro definition or invocation to ensure that the intended result is obtained.

File Type Restrictions Directives

File type restriction directives may be specified in CVDL files. When specified outside of a VDL, these directives have VDL file scope and apply only for VDL rules which appear following the directive in the same VDL file. When specified as part of a VDL, the directives apply only for that particular VDL. The file type restrictions apply only if VFind is reading SmartScan input from UAD, which reports the file types.

The directives and their meanings are:

  • <"...",...>        specifies a list of file types to scan,
    i.e. scan only the file types specified.
  • <!"...",...>    specifies a list of file types to not scan,
    i.e. scan everything except for the file types specified.
  • <>                resets to scan everything.

If the SmartScan file type reported by UAD is "unknown", or if VFind is run standalone (without SmartScan input), then all VDL file type restrictions are ignored and everything is scanned.

File Type Restrictions Examples

Examples:

:v1,"..."# ; all file types
<"text"> ; only "text" file types for the following vdls
:v2,"..."#
:v3,"..."#
<!"HTML"> ; no "HTML" file types for the following vdls
:v4,"..."#
:v5, <"JPEG","GIF"> "..."# ; only "JPEG" and "GIF" file types for v5
:v6,"..."#
<> ; all file types for the following vdls
:v7,"..."#

  • v1 will be used for all file types.
  • v2 and v3 will only be used for "text" file types.
  • v4 and v6 will only be used for non-"HTML" file types.
  • v5 will only be used for "JPEG" and "GIF" file types.
  • v7 will be used for all file types.

Matching for file type restrictions is case-sensitive, and only requires that the VDL-restricted type be a substring of the SmartScan-reported type.

VDL Version Reporting

Versions for VDL files and rules can now be reported using an extension to the file type restriction syntax. If you use a string starting with version= in a file type restriction directive, whatever follows the = character in that string will be printed as an informative message about the version of the VDL file or rule.

Here is an example which specifies a version for the VDL file and a version for VDL rule `b':

% cat v.vdl
<"text","version=1.2.3">
:a, "abc"#
:b, <"version=9.9"> "bbb"#

% vfind --vdl=v.vdl hi
...
##==>> Loading VDL code from: v.vdl
##==>> All SmartScan file types disabled.
##==>> SmartScan file type `*text*' enabled.
##==>> VDL file `v.vdl' Version: 1.2.3
##==> VDL model for `a' loaded.
##==> VDL `b' Version: 9.9
##==> VDL model for `b' loaded.
##==> Checking file: "hi"
...

VFind --vdlc Option

VFind uses a parallel search engine, so scanning run-time is mostly independent of the number of VDL rules.

The default VFind parallel search engine does not handle case-insensitive ~"..." VDL strings, so VDLs containing such strings generally run slower.

But there is a special parallel search engine which does handle case-insensitive string matching.

To specify a VDL file which mostly contains case-insensitive string matches, use the --vdlc= option instead of --vdl=
All VDL rules are checked precisely by VFind if triggered by the parallel search engines, to avoid false detects.

The --vdlc= option only affects the scanning speed and does not affect the precision of virus detection.