White and Black lists are standard spam filters. Their typically simple interface, provide a way to quickly identify emails as legitimate or spam.
Any piece of information within the email, or provided by the SMTP protocol, could be useful on identifying emails. The information used for this purpose often includes the originating host IP, sender address, recipient addresses, email subject and the email body.
In general configuring these lists is very intuitive. It just involves identifying a piece of data based on which an email may be classified with a reasonably high degree of certainty. This is then inserted in the appropriate white/black list. However it is best not to get carried away. It is not unusual to come across administrators trying to block all spam through black lists. Today we look at how to use these filters in an effective manner. We identify the criteria to follow and some things to avoid.
Fundamentals
There are some very basic rules that are worth remembering when configuring white and black lists.
If we white list all legitimate emails, everything else is spam.
Safe guarding the delivery of legitimate emails is more important than ensuring no spam reaches the inbox.
The effectiveness of a white/black list is dependent on the reliability of the email information against which it is applied.
White listing all legitimate emails is obviously not practical. However the first point stresses the ripple effect that is achievable. These lists are typically the first stage of a multi-layered filtering setup. Let's say we are able to immediately identify a good proportion of legitimate emails and spare them from going through further filtering. The filtering layers that follow will see less legitimate emails. Thus there is immediately a lowered risk of false positives. In turn this allows for more aggressive spam filtering.
The second point sets clear our priorities. A spam filtering system must keep false positives (misclassification of legitimate emails) to a minimum. This is true even at the cost of leaving some spam unfiltered.
The last point reminds us of spoofing. Spammers are always trying to give a legitimate look to their emails. A white/black list needs to target the information that leads to the most reliable matching results.
White/Black Listing by IP
IP white listing is great when dealing with hosts known to only generate legitimate emails. There are various applications that use emails for reporting, to manage workflows, to deliver faxes etc. These are typically perfect candidates for this list.
The IP is widely regarded to be the most reliable piece of information available to us. It is directly determined from the connection established between sending and receiving hosts. However this is only possible as long as the sending host directly connects to the host enforcing the IP white/black list. Otherwise IPs may be determined through the Received email header.
Each host involved in routing the email from source to destination adds a Received header to the email. This header includes the IP of the last host from which the email was sent. Thus a complete set of Received headers builds up, tracing the delivery route.
As an example here are the headers taken from a spam email. The first header is the one last inserted.
Received: from pcp0010520431pcs.prshng01.fl.comcast.net ([69.244.215.149]) by serv-box1.exchangeinbox.com with Microsoft SMTPSVC(6.0.3790.211); Thu, 10 Feb 2005 20:02:30 +0100
Received: from gateway-r.comcast.net by pcp0010520431pcs.prshng01.fl.comcast.net with HTTP; Thu, 10 Feb 2005 11:38:26 -0600
Received: from 154.5.87.115 by pcp0010520431pcs.prshng01.fl.comcast.net [69.244.215.149] with Microsoft SMTPSVC(5.0.2195.6713); Thu, 10 Feb 2005 11:37:30 -0600
One may wonder about the trustworthiness of Received headers. After all can't headers be forged? As the following diagram illustrates, this is not a problem:
Here the spam filter (Host B) sits behind a perimeter server that handles internet originating email. The connection IP will always be that of Host A and is of no use to us. We are instead interested in the IP of the last foreign host (Host 2) connecting to our perimeter server. This is what Host A sees and inserts in its Received header. Furthermore, since Host A is our own perimeter server, the Received header is certainly reliable.
The inability to forge IPs is one of the reasons why so many anti-spam technologies rely on this piece of information. However there is a catch worth keeping in mind, IPs may change. In case of white listing, this can be an issue especially when the white listed host is not under our control.
The issue of changing IPs is a lot more acute in case of IP Black listing. Today spammers are making use of zombie machines. These are hacked machines utilized for the delivery of spam and malware. There are so many such machines, that spammers may not reuse the same one again for a very long time.