WinDeveloper IMF Tune

WinDeveloper IMF Tune
WinDeveloper IMF Tune

Hardening Anti-SPAM Protection

Alexander Zammit

Alexander Zammit Photo

Software Development Consultant. Involved in the development of various Enterprise software solutions. Today focused on Blockchain and DLT technologies.

  • Published: Feb 01, 2005
  • Category: Anti-Spam
  • Votes: 5.0 out of 5 - 2 Votes
Cast your Vote
Poor Excellent

Spam is a moving target. Counter measures need to be flexible in order to keep up. In this article we look at overheads incurred by spam as a measure of whether the current spam protection is adequate. Different filtering technologies are studied within this context. Finally a layered filtering approach is proposed as a response to this challenge.

In the last few months we have seen plenty of activity in the fight against spam. The Lycos Europe "Make Love Not Spam" screen saver, the heated debate on Sender ID and Microsoft's intellectual property rights, the release of MS Exchange 2003 Intelligent Message Filter (IMF), are some of the few that spring to my mind. The outlook for this year promises a continuation of the trend. The upcoming Exchange SP2 support for Sender ID is one development to watch out for. Anti-spam is certainly making news and one way or another everyone is grabbing his share. With all this buzz, it can be confusing to identify the right approach to mitigate the plague, SPAM.

In this article I will look at a layered anti-spam protection strategy. Layering is a technique widely adopted in securing IT systems. Anti-virus is one case in point. MS Exchange and other mail architectures commonly adopt server side anti-virus filters as the first line of defense. Desktop anti-virus then serves as the second line covering mail, file system and other points of entry to the machine.

Spam is typically considered a less dangerous beast than viruses. This justifies a zero-tolerance approach to viruses and a more permissive approach towards spam. Still the ever increasing spam load and the endless creativity of spammers are raising the demand for a more effective anti-spam strategy. This strategy must be based on a good understanding of the available technologies and how these can be combined.

Filtering Stages

A mail transaction undergoes two stages of information exchange between the sending and receiving ends. The first stage involves an SMTP protocol conversation. At this point, information available includes the sending end IP, sender address and recipient addresses. During the second stage the mail content (subject, body, and attachments) is delivered. Anti-spam solutions differ depending on the stage at which filtering is applied. The setup is illustrated below:

SMTP Protocol Conversation and Mail Content Delivery

Content Filters

Up to a few months ago anti-spam applications were predominantly based on analyzing the mail content. The application would process the mail body looking for keywords and other characteristics typical of spam. The technology behind this content analysis differs from one solution to another. This varies from static keyword databases, to self-learning engines that automatically adjust to the specific business characteristics. On-line updates would then further enhance the process. This technology has been around for some time now and is quite mature. The MS Exchange IMF works on this concept. In all cases spam is identified after the mail is received into the organization.

SMTP Protocol Filters

SMTP protocol filtering is the area where we have seen most activity recently. Some filtering technologies follow shortly. Note that here I am not aiming to give an exhaustive list. My goal is that of identifying some technologies representative of this type.

  • Simple IP, sender, recipients access/deny lists
  • SPF and Sender ID anti-spoofing
  • Real-time block lists (RBL)
  • Business Reputation Services

Spoofed sender information is a typical characteristic of spam. SPF and Sender ID are the most relevant counter measure. See references for more details.

RBLs are on-line lists of IP addresses classified as being a source of spam. RBLs must be used with care. Most importantly one has to look into the information gathering process adopted by the list provider. Some providers occasionally do list legitimate senders as spammers.

Business Reputation Services are on-line databases that identify a source of spam by analyzing all available SMTP protocol data. This typically includes the IP, sender and recipient addresses. When compared to IP based RBLs the decision making process is much more informed. An emerging standard for these services is the Server Index Query Protocol (SIQ). As a development consultant I had the opportunity to develop an MS Exchange plug-in for such a product. See more details on SIQ in the references section.

Exchange 2003 out-of-the-box also includes SMTP protocol filtering support. Exchange support includes, IP access/deny lists, real-time block lists (RBLs), and recipient/sender filtering. With the upcoming SP2, support will be extended to include Sender ID as well.

The following illustration shows how different types of filters plug into the mail information.

Filters plug to mail information

SPAM Overheads

It is quite obvious that un-filtered spam causes a productivity issue to end-recipients. Unfortunately that's not the end of it. Filtered spam is also a burden. Spam like any other mail consumes storage, bandwidth, processing power and administrative resources. For example, filtered spam in the Outlook Junk Email folder requires the mailbox size limits to account for it.

To see better how much spam is costing us day-to-day, I will consider two factors:

  1. Resources consumed for a filter to analyze mails.
  2. Resources consumed as a result of the filtering action taken.

SPAM Overheads - Filter Technology

Every filtering technology obviously requires resources to perform its task. This is the starting point for us to see how much overheads can be attributed to spam.

Filter Type Filtering Technology Resources Overheads
Content Filter Content Analysis

The technology behind content filters is proprietary and the resource consumption varies from one vendor to another. In general content filters need to at least extract the mail body and then analyze the content against a keyword database. This process can also include self-learning filters and on-line updates.

Typically processing power is the main resource consumed. Because of the wide variety in implementation, evaluation is the only way to quantify this overhead. Bandwidth is only consumed for on-line updates and normally is not relevant.

SMTP Filter Simple IP, sender, recipients access/deny lists

This primitive filter is commonly considered to be the fastest. It simply classifies as spam whatever matches its internal database. This requires minimal processing and no extra bandwidth consumption.

It certainly requires administrative maintenance in order to manually populate its filtering rules. For this reason the scope is very limited.

SMTP Filter SPF, Sender ID, RBLs, Business Reputation Services Filtering relies on information available on-line. DNS or SIQ queries are submitted in order to classify mails. Hence bandwidth is the main resource consumed by these filters. Processing power consumption is minimal. The construction of queries and interpretation of responses is typically very fast. Caching is critical to technologies relying on online resources. This can minimize the number of queries issued. The effectiveness of caching, and the ability to fine tune it, varies from one product to another. Again one has to draw his conclusions through product evaluation.

Note that one should not consider the entries in this table to be a comprehensive comparison of filtering technologies. Here I am just focusing on one aspect, resource consumption. For example SMTP access/deny lists are certainly low in consumption but no organization can rely on them as the only filtering technology adopted.

SPAM Overheads - Filtering Action

Once spam is identified the possible counter measures available depend on the filtering stage. One of the following actions is typically taken:

  1. Accept & Mark: Accept the mail, mark it as spam and let it go to the recipient mailbox. Spam may then be separated from ham by posting it to the Outlook Junk Mail folder, through insertion of subject prefixes or similar techniques.
  2. Accept & Delete: Accept the mail and delete it so as not to reach the recipient mailbox.
  3. Accept, Delete & Archive: Accept the mail, delete it and archive it to a central repository.
  4. Accept, Delete & NDR: Accept the mail, delete it, and send a non-delivery report to the sender.
  5. SMTP Reject: Reject the mail at SMTP Protocol level blocking the mail from being delivered.

The type of action taken determines how many extra overheads spam will manage to incur. Clearly SMTP Protocol rejection is only an option to filters integrating at the protocol level.

Action Type Resources Overheads
Accept & Mark

Highest Overheads. The action does not save us anything in terms of storage, bandwidth and processing power. Administrative resources are still wasted in server management but saved in end-user support. Although sorted, end-users still have access to spam and time is wasted in reviewing it.

Accept & Delete

As the mail is accepted resources are consumed up to the point of deletion. Bandwidth and processing power hit the servers on the network perimeter where the filtering action is taken, but saved from back-end servers. Storage and administrative maintenance are saved. The end-recipient never sees the mail hence eliminating further loss in productivity.

Accept, Delete & Archive

Same overheads as Accept & Delete plus some more due to archiving. Storage is still required but since the archive is centralized this does not affect the end-user mailbox.

Typically archiving only makes sense if a proper review procedure is in place. This introduces a new administrative burden but which should be much smaller than the total loss in productivity incurred when reviewing is left up to the end-user.

Accept, Delete & NDR

Same overheads as Accept & Delete plus some more due to the NDR. Extra bandwidth and processing power is consumed in order to generate the NDR.

It is worth mentioning that this type of action although commonly available is worth avoiding. For example if a DoS attack is underway the NDRs will further stress the servers.

SMTP Reject

Lowest Overheads. Rejecting at SMTP level gives the largest savings in terms of resources. The action leads to minimal bandwidth and processing power consumption.

Layered anti-spam protection

It is obvious that one cannot just look at overheads in isolation without considering the end-result. The key role of anti-spam remains that of catching the largest number of spam mail with minimal false positives. Also, most of these technologies can and should be adopted together. Certainly no one can count on adopting Access/Deny lists or SPF/Sender ID exclusively. These technologies are not even meant to be used in that context.

Many organizations today only perform 'Accept & Mark' filtering actions. This is in-line with the classic play-safe approach when handling spam. With the increase in spam load, a more aggressive attitude might be appropriate. Monitoring the overheads incurred by spam is the correct way to determine when new counter measures are appropriate.

'Accept & Mark' filtering actions are certainly a necessity. It is not always easy to tell if a mail is spam or ham. Hence in such cases human review is the ultimate technology available in our arsenal. Nevertheless today's filters are in a position to practically classify a good number of mails with 100% certainty. Let's look at a trivial example. If you work for an IT company and get a mail with this subject, would you have any doubt whether this is spam?

"PIAGET, ROLEX, CARTIER Replicas - Expensive Look, Not Expensive Price - LOUIS VUITTON, OMEGA, LONGINES"

It is also fair to expect that any serious content based anti-spam filter would have no doubts either (at least after enough time until the filter catches up with the latest spam trends). I gave an example based on mail content since it is easy for everyone to understand, but the same situation exists in SMTP protocol filtering.

Organizations finding 'Accept & Mark' not to be enough will have to move on and start getting rid of some of these mails. If a protocol filter classifies mail with very high certainty then its worth considering rejection. If the protocol filter is a bit uncertain mark it and let it through. In a layered setup other anti-spam filters are in place which will have their go at classifying the same mail. Again the action taken by the next filter should reflect the level of certainty.

So, layered anti-spam protection is composed of a number of filters. Protocol filters would be placed right at the edge of the network perimeter, possibly also followed by content filters. Each layer should give three types of result:

  • SPAM high degree of certainty - Reject/Delete Mail
  • SPAM certainty level not high enough - Mark Mail
  • HAM

Layered Anti-Spam protection

Putting such a system in place certainly requires a good evaluation and deployment procedure. The administrator has to be confident in the filters he is adopting because losing business because of spam is not an option. Rejection and deletion actions have to be applied gradually. Finally if your filter is classifying mails as spam with a high degree of certainty when they are not, look for alternative filters. There are certainly some good products that can do a better job.

References

Meng Wang Wong's SPF site:
http://spf.pobox.com/

Microsoft's Sender ID site:
http://www.microsoft.com/mscorp/twc/privacy/spam/senderid/default.mspx

Business Reputation Services SIQ Protocol internet draft:
http://www.ietf.org/internet-drafts/draft-irtf-asrg-iar-howe-siq-00.txt

Copyright © 2005 - 2024 All rights reserved. ExchangeInbox.com is not affiliated with Microsoft Corporation