White/Black Listing by Sender
The sender address is one of the least reliable email information elements. To see how easy it is to spoof it, just configure an Outlook Express email account with whatever sender address.
Spoofing certainly minimizes the usefulness of sender black listing. However there are still cases when blocking by sender is effective. Often this filter is employed to block the broader class of unwanted emails rather than just spam. For example, sender blocking does a good job in filtering "legitimate" newsletters. By definition if the newsletter follows a proper subscription/cancellation procedure, it is not spam. However this is a minor detail once an organization wants to disallow its distribution.
Another similar example is when emails are blocked by domain suffix through wildcards. If an organization decides not to be interested in emails from certain countries, it can block all senders whose address ends with the country suffix. Again this blocks a whole class of emails rather than targeting spam directly.
The bottom line is that when dealing with sender white and black listing, we are often identifying senders who are not hiding their identity. Spammers usually do not fall in this category.
White/Black Listing by Recipient
In general we can categorize recipients in two:
Clearly the first step is to eliminate the second recipient address category. This is not the job of classic recipient black listing. Instead these are filtered by blocking addresses not present in Active Directory. This functionality is supported out of the box in Exchange 2003. Furthermore foreign domains are blocked by disallowing relaying.
Recipient lists are meant to deal with addresses falling under the first category. White listing a valid recipient disables spam filtering, allowing all emails to reach the inbox. This is sometimes done for mailboxes receiving very critical emails. If we don't want to risk a single false positive, not even one in a million, then this list does the job.
Likewise a recipient black list blocks the delivery of all emails addressed to a specific recipient. This is commonly used for mailboxes that are only meant for internal use.
Recipient black lists may also be handy when used in combination with other white lists. Consider a mailbox that is only meant to receive emails generated by a web hosted feedback form whose subject is fixed. White lists should always take priority over black lists. Thus by white listing the email subject and black listing the mailbox address, we effectively block all emails except those matching the feedback form subject. Of course it would be best if the subject were to be fairly unique.
White/Black Listing by Subject/Body
There are no limitations to what the email subject and body may contain. This is the key characteristic that must be dealt with when populating lists targeting the email content.
Trying to black list all possible permutation of a single keyword is a hopeless feat. A quick look at how the word pharmacy is being expressed in the latest spam wave should clarify this point (PHAxquRMACY, PHoizARMA etc).
Furthermore, spammers can count on a whole slew of other tricks including images, hidden text etc. This is why complex technologies like SmartScreen, the one behind the Microsoft Intelligent Message Filter (IMF), were developed. Thus the bulk of content based filtering should certainly be left to engines specifically built for this purpose. The role of black listing should be that of fine tuning the core filter.
Content white listing can be useful when spammers start targeting your business sphere. If you happen to sell the same products spammers are pushing, white listing could instruct the core content filter to allow those emails through. Otherwise filtering technologies not based on content provide an alternative solution.
Content white listing may also be employed in a precautionary manner. For example we could simply list the product names and services our organization provides, without waiting for any false positives. This works quite well as long as you don't sell the big brand names targeted by spammers.
The most important fact about content white/black listing is certainly the selection of keywords and phrases to be matched. When using Google it is quite obvious that short generic phrases are to be avoided. The same rule applies here:
Multiple word phrases should be preferred to single keywords
Single keywords should be longer than 5 characters
Many short words are also sub-strings in other words, use whole word matching whenever possible
These are the basic keyword selection rules. Of course each filter may provide additional functionality, allowing for more accurate content matching. It is certainly worth checking the filter documentation. Getting these keywords wrong is a lot easier than many think.
Final Tips
The different pieces of information extracted from an email have a varying level of reliability. Understanding these characteristics allow us to avoid the tricks employed by spammers.
Armed with this knowledge, white lists become an effective tool for legitimate emails to bypass filtering. Black lists allow us to quickly trap any spam that would otherwise manage to go through unfiltered.