DOTE

Chain And Rate

Wednesday, July 4, 2012

Patterns of Activity : Signatures

A signature can be any unique feature that characterizes an email message, a web page, or a larger entity such as an entire web site. In almost all cases, signatures take the form of unique strings, such as a specific name or URL, but they can also be the organization of files in a directory or the structure of a URL. Strings are much easier to search for than these broader patterns, but both play a role in finding linked documents and sites.

Some of good signatures that illustrate their diversity:

Unique words
An unusual name of a person or location, or a word from a language other than that used in a document.

IP addresses and hostnames
Addresses are inherently specific, but they tend to be changed frequently in spam messages.

Specific URLs and patterns within URLs
Although entire URLs may vary, the path to a document or the directory name may be conversed.

Mail message headers
In spam messages, headers are often varied in order to defeat filters, but similarities in their structure may define a unique signature.

Encoded image or data in email messages
Any part of a block of encoded data can serve as a unique signature for that block.

Directory listings on a web site
The names and sizes of files within a specific directory can serve as a unique signature.

A turn of phrase
An unusual or incorrect phrase within a block of text can stand out as a signature for that document. For example: We receive many complaints concerning unsunctioned [sic] taking the money off the balance of our users recently.

Searching with Signatures

Most signatures can be represented as regular expressions that can be used for searching through mail files or directories of web pages. The Unix grep command can be used to scan both of these, as long as care is taken to escape any characters that have special meaning to this command. It is a very efficient way to identify files that contain a match and can report the lines and line numbers where the matches are found. But in the case of email files, what you really need is a way to extract the individual messages that match, and grep cannot do this for you.

Most email client programs allow you to search the content of messages, but these can be laborious to use and may not offer the flexibility that you need. By examining the output messages, you can confirm that the signature pattern really is specific for the type of message.


Happy hunting, nerds! ^_^