In the last few months we have seen plenty of activity in the fight against spam. The Lycos Europe "Make Love Not Spam" screen saver, the heated debate on Sender ID and Microsoft's intellectual property rights, the release of MS Exchange 2003 Intelligent Message Filter (IMF), are some of the few that spring to my mind. The outlook for this year promises a continuation of the trend. The upcoming Exchange SP2 support for Sender ID is one development to watch out for. Anti-spam is certainly making news and one way or another everyone is grabbing his share. With all this buzz, it can be confusing to identify the right approach to mitigate the plague, SPAM.
In this article I will look at a layered anti-spam protection strategy. Layering is a technique widely adopted in securing IT systems. Anti-virus is one case in point. MS Exchange and other mail architectures commonly adopt server side anti-virus filters as the first line of defense. Desktop anti-virus then serves as the second line covering mail, file system and other points of entry to the machine.
Spam is typically considered a less dangerous beast than viruses. This justifies a zero-tolerance approach to viruses and a more permissive approach towards spam. Still the ever increasing spam load and the endless creativity of spammers are raising the demand for a more effective anti-spam strategy. This strategy must be based on a good understanding of the available technologies and how these can be combined.
Filtering Stages
A mail transaction undergoes two stages of information exchange between the sending and receiving ends. The first stage involves an SMTP protocol conversation. At this point, information available includes the sending end IP, sender address and recipient addresses. During the second stage the mail content (subject, body, and attachments) is delivered. Anti-spam solutions differ depending on the stage at which filtering is applied. The setup is illustrated below:
Content Filters
Up to a few months ago anti-spam applications were predominantly based on analyzing the mail content. The application would process the mail body looking for keywords and other characteristics typical of spam. The technology behind this content analysis differs from one solution to another. This varies from static keyword databases, to self-learning engines that automatically adjust to the specific business characteristics. On-line updates would then further enhance the process. This technology has been around for some time now and is quite mature. The MS Exchange IMF works on this concept. In all cases spam is identified after the mail is received into the organization.
SMTP Protocol Filters
SMTP protocol filtering is the area where we have seen most activity recently. Some filtering technologies follow shortly. Note that here I am not aiming to give an exhaustive list. My goal is that of identifying some technologies representative of this type.
- Simple IP, sender, recipients access/deny lists
- SPF and Sender ID anti-spoofing
- Real-time block lists (RBL)
- Business Reputation Services
Spoofed sender information is a typical characteristic of spam. SPF and Sender ID are the most relevant counter measure. See references for more details.
RBLs are on-line lists of IP addresses classified as being a source of spam. RBLs must be used with care. Most importantly one has to look into the information gathering process adopted by the list provider. Some providers occasionally do list legitimate senders as spammers.
Business Reputation Services are on-line databases that identify a source of spam by analyzing all available SMTP protocol data. This typically includes the IP, sender and recipient addresses. When compared to IP based RBLs the decision making process is much more informed. An emerging standard for these services is the Server Index Query Protocol (SIQ). As a development consultant I had the opportunity to develop an MS Exchange plug-in for such a product. See more details on SIQ in the references section.
Exchange 2003 out-of-the-box also includes SMTP protocol filtering support. Exchange support includes, IP access/deny lists, real-time block lists (RBLs), and recipient/sender filtering. With the upcoming SP2, support will be extended to include Sender ID as well.
The following illustration shows how different types of filters plug into the mail information.
SPAM Overheads
It is quite obvious that un-filtered spam causes a productivity issue to end-recipients. Unfortunately that's not the end of it. Filtered spam is also a burden. Spam like any other mail consumes storage, bandwidth, processing power and administrative resources. For example, filtered spam in the Outlook Junk Email folder requires the mailbox size limits to account for it.
To see better how much spam is costing us day-to-day, I will consider two factors:
- Resources consumed for a filter to analyze mails.
- Resources consumed as a result of the filtering action taken.
SPAM Overheads - Filter Technology
Every filtering technology obviously requires resources to perform its task. This is the starting point for us to see how much overheads can be attributed to spam.
Filter Type |
Filtering Technology |
Resources Overheads |
Content Filter |
Content Analysis |
The technology behind content filters is proprietary and the resource consumption varies from one vendor to another. In general content filters need to at least extract the mail body and then analyze the content against a keyword database. This process can also include self-learning filters and on-line updates.
Typically processing power is the main resource consumed. Because of the wide variety in implementation, evaluation is the only way to quantify this overhead. Bandwidth is only consumed for on-line updates and normally is not relevant.
|
SMTP Filter |
Simple IP, sender, recipients access/deny lists |
This primitive filter is commonly considered to be the fastest. It simply classifies as spam whatever matches its internal database. This requires minimal processing and no extra bandwidth consumption.
It certainly requires administrative maintenance in order to manually populate its filtering rules. For this reason the scope is very limited.
|
SMTP Filter |
SPF, Sender ID, RBLs, Business Reputation Services |
Filtering relies on information available on-line. DNS or SIQ queries are submitted in order to classify mails. Hence bandwidth is the main resource consumed by these filters. Processing power consumption is minimal. The construction of queries and interpretation of responses is typically very fast.
Caching is critical to technologies relying on online resources. This can minimize the number of queries issued. The effectiveness of caching, and the ability to fine tune it, varies from one product to another. Again one has to draw his conclusions through product evaluation.
|
Note that one should not consider the entries in this table to be a comprehensive comparison of filtering technologies. Here I am just focusing on one aspect, resource consumption. For example SMTP access/deny lists are certainly low in consumption but no organization can rely on them as the only filtering technology adopted.