Sometime I come across organizations that employ a long list of DNSBLs for filtering spam. Personally I find it questionable whether using more of the same is all that useful. So today I want to expand my opinion on this topic.
My main argument is not specific to DNSBLs. Indeed I could write the same about other filter types. However the use of many DNSBLs is the practice I come across most often, which makes it a good example.
More of the Same...
Let's first talk of spam filtering in general. What makes a particular filter unique? I need to get into this bit of theory to better explain what I mean with "more of the same".
-
The Technology - Each filter is based on some technology that allows it to classify emails as spam or legitimate. In case of DNSBLs its underlying technology determines how IPs are gathered for listing, how incorrect entries are removed etc.
For example many DNSBLs use honeypots, mailboxes to which no legitimate email should ever be addressed. So hosts sending to a honeypot are automatically listed. Another example of a typical DNSBL technology, is the automatic testing of SMTP servers for open relays and known exploits.
If we look at content filters, some are based on algorithms that learn from the emails an organization is receiving. The Exchange IMF/Content Filter is based on data downloaded from the Microsoft Updates service.
-
Analyzed Email Information - This refers to the type of information the filter processes in order for it to classify each email. DNSBLs base their classification on the email sender IP. This is obtained directly from the SMTP connection or is extracted from the Received headers.
Content filters normally base themselves on the email body, headers and sometimes also the attachment content.
The Implementation - The choice of technology greatly determines how a filter is supposed to work. However at the end of the day it all boils down on how the filter is actually coded and kept up to date by the software developers.
From above it should be easy to see how two filters developed independently might have a lot in common. Filters based on the same Technologies very often Analyze the same email information, leaving the Implementation as the main differentiator.
If we trust our emails in the hands of a filter, of course we do that because we believe the filter is well implemented. We wouldn't employ a filter we don't trust, right?
So the question now becomes is it worth deploying multiple filters whose key difference is the implementation. I don't see any significant benefit in doing that.
Are All DNSBLs the Same?
Not all DNSBLs fall in the same category. All DNSBLs use the DNS service to expose their service but not all use the same IP gathering criteria/technologies.
For example the list provided by Backscatterer, is very different from those provided by Spamhaus. Here is the listing policy of backscatterer taken from their site:
Every IP which backscatters (Sending misdirected bounces or misdirected autoresponders or sender callouts) will be listed for the next 4 weeks here.
Note: I would personally never use backscatterer to block emails, but that's another story...
On the other hand if we look at the zen.spamhaus.org list we find that it is composed of four lists. Here is the description of each taken from their site:
SBL - Direct UBE sources, spam operations & spam services
CSS - Direct snowshoe spam sources detected via automation
XBL - CBL + customized NJABL. 3rd party exploits (proxies, trojans, etc.)
PBL - End-user Non-MTA IP addresses set by ISP outbound mail policy
The four sub-lists within zen.spamhaus.org are themselves a good example of how not all DNSBLs are the same.
So which DNSBLs ARE the Same?
We are now ready to get to the point, the fact that most freely available DNSBLs are indeed based on very similar IP gathering technologies and listing criteria.
Here I picked three examples Spamhaus, Spamcop and the free Barracuda DNSBL. We already had a look of what Spamhaus includes. Let's see how Barracuda describes their list here:
http://www.barracudacentral.org/rbl/listing-methodology
When email is received, the connection is automatically analyzed to determine if the connecting machine is either an open proxy or a node in a spam-generating botnet. If either is true, the IP address is immediately added... The Barracuda Reputation System detects spam by using honeypots, special addresses created to receive only spam and do not belong to any real user and through analysis of captive spyware protocol activity...
The same can be said about spamcop just check their description for details from:
http://www.spamcop.net/fom-serve/cache/297.html
What comes out very clear is that all of these are doing their best in gathering the same set of IP addresses and you can rest assured that there will be a huge overlap between the three. What won't overlap are:
- False classifications
- The very latest IPs (one provider might discover an IP before the others)
So the main benefit in using many DNSBLs is when it comes to the discovery of the latest IPs.
The Alternative is...
The alternative to having more of the same is that of having a good mix. Every spam filtering technology has its own strengths and weaknesses. Ideally a filter should only deal with the category of spam against which it is most effective.
Having weaknesses is not a big problem as long as our opponent is not allowed to take advantage. If one filtering technology is undecided on how to classify an email, it is best if this is left unclassified and let other filters decide.
Similar filters have the same strength/weakness characteristics. Quite obviously we can expect better filtering results if we hand over the email to filters with different characteristics.
Final Tips
If there is one thing that l learnt on this subject is certainly that there are many opinions and that there is no magic formula. What I presented here is an overall approach to spam filtering.
The best filtering solution is the one that manages to bring together different filtering technologies and make them work together as one filter. Making a set of technologies to act as a team is easier said than done. Indeed one key weakness I see in the Exchange filters provided out of the box, is the fact that they are not very smart when it comes to work together. Indeed having a mix of technologies doesn't automatically guarantee that you get the full benefit that mix is able to offer.