What is a spam filter?
A spam filter is a piece of software that
scans through a message to determine if it is spam or not. Most
spam filters work in a very similar way, using a set of rules to try
and work out what the message is about and whom it is from. Some are more
successful than others.
Why do spam filters fail?
Just about every spam filter on the market
today uses a combination of the following methods to stop spam:
Challenge-Response
The first on the list is Challenge-Response spam filtering systems. A
challenge-response filter basically will not allow an e-mail message from
someone who has not been pre-approved. When a message arrives from a new sender
a spam filter using this method would
automatically reply to the senders asking them to validate themselves.
The reasoning behind this is that if the sender were a spammer then they simply
would not have the time to validate themselves to the millions of mailboxes
they have sent their message to.
There are several problems with the challenge-response based system…
-
1. The spam filter
will happily respond to other spam or virus produced messages, which often use
a false address. So the innocent person whose e-mail address the spammer has
used will receive hundreds of these challenge-response requests! These messages
are often termed invalid bounce-backs, they are replying to the wrong address
due to a spammer or virus faking its own address.
-
2. If you have just ordered an item online, the store will usually send you an
automated e-mail with your receipt. The challenge-response type filter will
intercept this message and send the store a message asking them to
authenticate! The problem lies in that the message would have been sent from an
automated mailbox at the e-commerce store, it will be unable to authenticate
itself with the spam filter, leaving the
message sitting in limbo.
-
3. Even if the new sender were to be an actual person with a valid e-mail
message, the e-mail sent back to them asking to authenticate may well then get
caught in that persons own spam filter! Again leaving the message sitting in
limbo!
-
4. The challenge-response systems often require the user to enter a code
displayed as an image on a web page. This process is to validate they are an
actual person and not a piece of software trying to fake it. The problem is
that these images can sometimes to difficult to read, especially for people
with reading disabilities. It may be too much trouble for someone to
authenticate just so they can send you an e-mail message and they may give up
and try someone else, perhaps a competitor!
So in general the challenge-response method is not a solution to the problem of
spam. It can help a little but does have some serious drawbacks, some of which
can cause a lot of disruption on the Internet generating a whole load of
problems for people on the other end of the challenge-response e-mails.
Back to top
Rules based scans
Rules based systems are the original spam filters.
They started life by simply looking for key words or phrases in the message and
block based when matches were found. Simple rules based
spam blockers are very poor at filtering out spam. They will often
block a legitimate message and also let through a good percentage of the actual
spam.
If for example the spam filter rules dictated
that the word “viagra” was a blocked word then a spammer could easily work
around this by sending a message with the words changed to “v1agra” (Changing
the I to a 1). This is a very simplistic overview of the rules and today they
are much better and would pick up on this simple change, but the spammers are
constantly learning new ways around the rules based engines.
A good spammer will test his message through a variety of these rules based
spam blockers, and tune it so it will by-pass the filter. The problem
then works the other way, as the rules become stricter they run the risk of
blocking legitimate e-mail as well as the spam. They often now work by trying
to establish a theme of the message, giving it a score based on what types of
grammar and sentences are used. If the message passes a set score it gets
blocked as well.
Although the scoring system may work fairly well at blocking the majority of
spam, it will never be able to stop 100% and also will block a small percentage
of legitimate e-mail. If you are using the spam filter on a business account a
rules based system may quite regularly move any sales type messages into your
junk mail folder.
Overall rules based spam filters are probably the easiest for spammers to fool
and need regular software updates to keep up. They can work fairly well, but
usually do need a fair amount of use before they tune themselves to your type
of e-mail – learning which senders are good and which are bad, but they will
never be 100% at either blocking spam and letting the good e-mail through!
Back to top
Global black lists
Every e-mail sent on the Internet contains some tracking information to help
show where it has come from. When each message is sent its sending IP address
can also be logged.
Global black lists are lists of well-known spammers. They usually list the
physical IP address (Internet Protocol Address) of where a message has come
from. An e-mail server using spam-filtering software can then check each
message against the black list as it arrives. If the IP address appears on the
black list the mail server simply rejects the message before it has been sent.
In general black lists are a very good idea. They give a definitive list of
known spammer addresses, and can be used very easily to block spam without
wasting any bandwidth.
The problem is that these lists are based on reports by end-users, so a spammer
may well get a few thousand (Or even million) messages sent out before he is
listed on a black list. Spammers often send their e-mail out through different
addresses, they even sometimes used hacked networks to send out through other
peoples machines (And addresses) – which can then lead to legitimate people
being black listed.
Other problems lie in the fact that sometimes one or two people may be
responsible for deciding who goes on a black list. Who is to say whether
someone is a spammer or not? Perhaps someone sends out e-mail to a few people
by mistake and was reported for spamming, they could then end up on a black
list blocking all future e-mail for a simple mistake.
Some black lists can even end up black listing entire ISP (Internet Service
Providers), blocking e-mail for all of their customers because a few have
abused the system!
In reality mistakes when adding addresses to black lists do not happen very
often, but they can happen and that can be disastrous for a small company.
Back to top
Bayesian Analysis
Bayesian Analysis is a relatively new method of filtering spam e-mail. It uses
mathematical formulae to analyse the content of a message, learning from the
user which is a valid message and which is spam.
A Bayesian spam filter relies on two things to
work effectively:
The main problem is that these black lists do take time to update, and until
they are updated it does mean that the spammers message will get through the
filter.
-
1. How well the Bayesian analysis formulae has been implemented
-
2. How good a sample of data it has to work with
How well the formulae have been implemented is a key item for the
spam filter. The actual formulae is well known (Originally created by
Thomas Bayes (1702-1761), but implementing this in software 300 years after it
originated is another matter!
The other key item for the spam filtering process is the sample of data.
Because Bayesian analysis learns by example, it relies heavily of the sample of
data it has to begin with. Some spam filters come with reasonably large
databases built on a good sample of spam e-mail, but they all need a period of
training.
During this period of training it is up to the user to tell the system, which
e-mails it has mistaken as spam and which it has blocked where it should not.
The system stores the mathematical signatures of these messages then for next
time a similar looking message arrives.
To be effective a Bayesian based spam filter will
have to be trained with hundreds of messages that are specific to the end user.
Even once trained the filter will still make the occasional mistake, often
blocking legitimate e-mail or allowing spam through.
Like the rules based spam filters, spammers can also fool the Bayesian based
software. The spammers are well paid for their work and will have the very
latest technology to test their e-mails against, tweaking the message just
enough to by-pass the majority of filters.
Back to top
Please see The ClearMyMail
Difference.
Other guides
About ClearMyMail
ClearMyMail is the world's only 100% guaranteed spam blocker, stopping all spam and other unwanted emails before they reach your computer.