The Whys and The Hows of Email Spam Filters

Ketevan Bostoganashvili

1 year ago

This is a cover image for an article that explains how Spam Filters work in emails.

Spam filters keep away most of the spam and phishing emails circulating online. It’s thanks to them that our inboxes don’t get cluttered with emails from wealthy princes promising to donate their whole fortune.

But how do spam filters identify and stop spam? Read on to find out.

Deliverability Consultation for Businesses

Schedule a consultation

What are email spam filters, and why should you use them?

Spam filters are just like regular, real-life filters, but instead of your coffee, they filter messages. They act as barriers between you and malicious actors.

In technical terms, a spam filter is an application or software that analyses incoming emails to detect unsolicited, harmful, ‘spammy’, or malware-infected messages. It then either quarantines, rejects, or places such emails in a junk folder.

By doing so, it protects users from:

Having their personal or client information stolen;
Their devices being used as spam bots;
Receiving unwanted emails and being unable to see important messages;
Cyberattacks executed using emails.

Spam filters can also work the other way around. Outbound email filters won’t allow you to send emails that contain suspicious elements. This could be the lack of a subject line, excessive links, or overuse of spam-triggering words. Either way, such emails will bounce back, and you’ll receive non-delivery reports.

False positives and negatives

False positives occur when an email spam filter identifies a perfectly legitimate email as spam. This can happen to organizations sending important transactional emails or businesses sending marketing campaigns.

In that case, password reset, email verification, or welcome emails will end up in the spam folder.

False negatives occur when a spam email slips through the filter and reaches the inbox. In that case, the user can mark the email as spam manually so that similar emails are classified as spam in the future.

What are the types of spam filters?

We can differentiate between the types of spam filters based on their deployment and the factors they evaluate.

Types of spam filters based on deployment

Based on deployment, spam filters can be on-premises, cloud-based, or software-based.

On-premises or gateway spam filters

On-premises spam filters are physical devices that operate using specific pre-defined rules. They sieve through inbound emails and, based on configuration, block, quarantine, or delete spam messages.

Why these are great

They have a high level of confidentiality, as emails remain on your local network and servers as soon as they arrive at the filtering border. This makes them suitable for organizations that process sensitive or confidential data.
They provide more control as administrators can configure all the necessary details themselves;
These are easier and faster to troubleshoot if anything goes south. You won’t have to deal with third-party providers – your administrators can inspect and fix the issues immediately.

Where they fall short

On-premise email spam filters are usually expensive to deploy. While those costs reduce over time, the initial chunk of expenses can be hard to cover for smaller businesses;
They are hard to scale as they require additional physical appliances when the load increases;
They may not be effective in combating new forms of spam and phishing emails.

Cloud-based, server-side, or hosted spam filters

Cloud-based spam filters are hosted on the cloud instead of the physical server. The deployment involves modifying the MX records to point to the spam filtering solution. They act as relays that sort through emails before they reach your network.

Why these are great

They operate using multiple data centers worldwide, ensuring uninterrupted operations;
They are easier to scale or downgrade if your business needs increase/decrease over time. Plus, they can handle large volumes of emails without any additional equipment;
They are compatible with almost every device and operating system;
They can be taught to detect even the newest types of junk mail.

Where they fall short

The cost of cloud-based spam filters can increase significantly with the growth of processing needs.

Software-based or client-side spam filters

Software-based spam filters are software and applications that you download on your machine. These are hardware-dependent and should be installed on each computer separately. They can be used alongside email management applications, which provide greater control over unwanted emails.

Why these are great

They provide an extra layer of security when paired with email service providers’ (ESPs’) spam filters;
Can be customized based on the requirements of the specific machine and its usage;
Can be very hard to get through.

Where they fall short

No longer popular as ESPs’ spam filters are becoming stronger;
They filter the emails once they are delivered to the user’s device, not on the network level.

Types of spam filters based on the factors they analyze

With the deployment out of the way, let’s differentiate between the types of spam filters based on the factors they check.

Keep in mind that spam filtering technologies are constantly developing. As we speak, researchers are running tests to improve machine learning algorithms and find the most optimal solutions for detecting all sorts of spam emails. The types of filters we cover below are the most common and have proven to be effective.

Content filters

As you’d guess from the name, content filters analyze the content of the emails to determine their legitimacy. They examine all parts of the message, including the headers, subject line, footer, links, and images.

The idea behind such filters is that spammers usually use the same words in most emails. They have a specific vocabulary designed to invoke various emotions in the recipient. These could be the sense of urgency, fear, or the desire to grab the best deal.

As a result, the recipients may get lured into opening emails or clicking suspicious links.

So, content spam filters will look for the words that spammers usually exploit.

Why these are great

They can quickly recognize ‘classic’ spam emails;
Can be customized to block specific content the user isn’t interested in;
They can be constantly improved to recognize newer spam words.

Where they fall short

They may block legitimate and important emails just because they contain certain spam words (for example, ‘free’ or ‘best deal’ are pretty common in legitimate marketing emails).

Header filters

Header filters check email metadata to find inconsistent and falsified information. They typically check the following factors:

Inconsistencies in the SMTP transaction, such as improperly configured from field;
The number of recipients;
The legitimacy of the sending email domain (if the email is coming from mailtramp.io instead of mailtrap.io, for example);
The legitimacy of the IP address to detect known spammers.

Why these are great

They spot spammers trying to spoof emails and mimic legitimate senders even if the content isn’t ‘spammy’;
They analyze header info, which usually goes unnoticed by recipients.

Where they fall short

The spam can slip through if the header metadata is correct.

Blacklist filters

Blacklist or block list email filters check the sender’s IP address against Domain Name System blocklists (DNSBLs). They immediately block emails coming from senders whose IP addresses appear in any of the well-known blacklists, such as Barracuda, Spamrbl IMP-SPAM, PSBL, and others.

Run a quick scan on yours with our free IP Blocklist Checker and Domain Blocklist Checker tools.

Why these are great

They keep away most known spammers;
They are frequently updated to catch even newly-created email addresses.

Where they fall short

They can’t detect spam if the sender’s domain was changed recently and hasn’t been blacklisted yet.

Machine learning algorithms

Machine learning (ML) algorithms are commonly used in email spam filtering technologies to classify emails into spam and non-spam. Most of these algorithms are supervised machine-learning methods.

These models use an existing dataset, which they get trained on. Based on the training data sets, they can make predictions about new emails and successfully classify them into spam and ham (non-spam).

Common supervised ML algorithms include Naive Bayes, Neural Networks, Decision Trees, and others. While not perfect, the Naive Bayes has proven to be the most effective.

Filters that operate using the Naive Bayes algorithm are called Bayesian filters.

Certain words have a higher probability of appearing in spam filters. Based on this notion, Bayesian filters are taught about words with a high spam probability.

When the email arrives, they analyze word probabilities (or likelihood functions) against all words in the email. If the ratio is high enough, the email will be considered spam. It learns new spam words based on each email the user marks as spam.

Why these are great

They are highly customizable to the user’s needs;
They have 98% accuracy in spam filtering;
They can constantly be trained to learn new spam words.

Where they fall short

They are susceptible to Bayesian poisoning. This happens when spammers use random legitimate words to confuse the filters and decrease the spam score;
They are also unable to detect text transformed into images or letters substituted with characters (for example, ‘m0ney’ instead of ‘money’);
Most ML filters struggle with multilingual spam detection.

Rule-based filters

Moving on to the rule-based filters, these bad boys are pretty self-explanatory. They filter messages based on pre-defined rules, such as specific words, senders, or even phrases. Emails meeting one of the set rules will automatically be sent to the spam folder.

Why these are great

They allow users to block emails from specific senders automatically;
They are highly customizable to the user’s needs and preferences.

Where they fall short

Spam emails that don’t contain the triggers of the rule-based filters won’t get blocked.

Language and country filters

Language filters are designed to block emails written in a language different from the recipient’s native language. Since spammers tend to target people worldwide, spam emails written in foreign languages are common.

Similarly, country filters allow users to block emails coming from foreign countries.

Why these are great

They can be customized to block emails in specific languages or originating from specific countries;
They are easy to implement and manage.

Where they fall short

These aren’t sufficient for spam protection;
They may block legitimate business (or other) emails just because they are written in a foreign language or are sent from abroad.

Source authentication filters

Source authentication spam filters check the authentication protocols of the sender’s domain. Since spammers change email addresses and domains frequently, they might not have authentication protocols in place, such as SPF, DKIM, and DMARC.

Source authentication filters check the MX and A records to determine whether the domain is legitimate or not. If such records don’t exist, they will send emails straight to the spam folder.

Why these are great

The implementation doesn’t require a lot of resources, since checking authentication protocols is a common practice used by Internet Service Providers (ISPs);
The accuracy of such filters is relatively high;
They can detect spam emails quickly.

Where they fall short

Can’t detect spam if domains are verified with good sending history.

Challenge-response filters

When the email arrives, challenge-response filters send a reply containing a specific challenge to the sender. If the domain is legitimate, the email is received, and the challenge is solved, the sender is considered legitimate.

A challenge-response filter is based on two main ideas:

Spammers usually use invalid return path email addresses;
Spammers send emails in large volumes, making it hard for them to solve the challenges in bulk.

The challenges may include sending an unaltered reply, completing a CAPTCHA, clicking a link, etc.

Why these are great

They can easily filter out common spam emails;
Can be implemented by organizations or individuals whose emails are frequently forged.

Where they fall short

Challenge-response filters are no longer necessary after the spread of SPF, DKIM, and DMARC authentication protocols, as they are the first step in detecting forged emails;
Challenge-response filters may hinder the delivery of solicited bulk emails or transactional messages.

How do spam filters work?

Typically, none of the email providers or ISPs will use only one type of filter. It’s the combination of various filtering technologies that creates a strong barrier between the recipients and ransomware-filled emails.

A standard spam filtering scheme is like an onion with layers of filters to ensure email security. And as you peel back each layer, fewer and fewer emails remain on their way to the inbox.

The emails will first go through the content filters that will conduct keyword analyses to identify spam. Then, the header filters will examine the metadata. Blacklist filters will query DNSBLs to verify if any of the sender IPs were blacklisted.

At this point, the rule-based filters will come into play. They will apply pre-defined rules set by the user in their email client. The last stage is a challenge-response filter that will conduct verification.

In terms of supervised ML algorithms, MLs will be fed both ham and spam emails to understand the differences between them. Then, after the implementation, they will use categorical separation to classify incoming new emails into ham and spam.

How do email service providers’ spam filters work?

Email service providers such as Gmail, Google Workspace, Microsoft 365 (former Office 365), Outlook, Yahoo!, AOL, Hotmail, and others never reveal how they block spam.

If they did, spammers would be quick to adapt their strategies. Apart from general information and recommendations, we don’t know how their spam filters work.

Here’s what we do know:

Gmail spam filters are primarily based on machine learning algorithms. They use user feedback and spam complaints to improve spam detection constantly. Spam filters examine factors such as IP addresses, authentication protocols behind bulk email sender domains, and domains and subdomains themselves.
Microsoft 365 users with Exchange Online mailboxes or the users of Exchange Online Protection (EOP) without those mailboxes leverage the anti-spam protection of the EOP. The anti-spam technologies EOP uses include connection filtering (filters spam based on the IP Allow list, blocklists, and safe list) and content filtering. EOP classifies messages depending on Spam Confidence Levels (SCL): Spam, High Confidence Spam, Phishing, High Confidence Phishing, and Bulk. There’s also an Advanced Spam Filtering option, which is very aggressive and eliminates the chance of reporting false positives.
Outlook also has a Junk Email Filter, but it’s set to No Automatic Filtering by default. To prevent spam and phishing attacks, users can choose between three levels of protection: Low, High, and Safe Lists Only. The Safe Lists are a sort of whitelist for senders and recipients, as their emails never get filtered.
Yahoo! and AOL use the standard set of spam filters with the ability to filter unwanted emails manually.

What triggers spam filters?

Spam filters will have different triggers based on their type, but, generally, the most common triggers include the following:

‘Spammy’ words such as ‘free’, ‘earn’, ‘money’, ‘winner’, etc., or any words of sexual character;
Phrases such as ‘double your income’, ‘earn $$$ in a single day’, ‘cash bonus’…
Excessive use of punctuations such as double (or even triple) question marks (???), exclamation points (!!!!), etc.
Excessive use of links and call-to-action (CTA) buttons;
Too many images (more than 40% of the message content);
ALL CAPS WRITING (yes, these are annoying to reasonable humans and spam filters alike);
Excessive use of symbols and characters such as $$, %%, @@…
URLs as text leading to a different page;
Previous history of sending spam email messages;
Bad sender reputation;
Low email engagement rates;
Broken HTML tags and elements;
Large number of recipients;
The lack of unsubscribe link and the permission to send commercial emails to a user;
And improperly configured (or non-existent) authentication protocols.

The most popular spam filter software

You have multiple options if you’re looking for spam filter software. The most popular and reliable spam filters include (but are not limited to):

Apache SpamAssassin – an open-source spam filter that employs multiple filtering technologies such as Bayesian filters, text and header analysis, DNS filtering and blocklists, etc. Suitable for Windows, MacOS, and Linux. Requires dev knowledge for installation.
Barracuda Email Security Gateway – an on-premises or virtual spam filter that protects your emails from phishing, malware, DoS, and other threats. Uses the data from Barracuda Central to identify known spammers.
Avanan – a cloud-based spam filtering service that protects your Gmail and Microsoft 365 email accounts from spam, phishing, and malware. Uses AI technology. Suitable for small businesses and enterprises. Avanan can be bought as a standalone email protection software or email and collaboration tools protection.
SpamTitan – a robust solution offering gateway and cloud-based solutions for email protection. Compatible with Microsoft Exchange servers and Microsoft 365.

The choice of spam filter software will hugely depend on the size of your business, technical resources, number of mailboxes, and specific needs. Read our blog post on spam checkers to find more options with detailed reviews.

Avoiding spam filters when sending emails

Is it enough to avoid spam trigger words to deliver messages to the inboxes? Not really. A lot goes into achieving high email deliverability rates, especially if you’re sending marketing emails to email lists.

To prevent emails from ending up in the spam folder, you should:

Maintain an impeccable sender reputation;
Optimize email content and subject lines, maintain a good image-to-text ratio, and avoid excessive links;
Ask for recipients’ permission before sending them emails;
Maintain high engagement rates.
Comply with the latest sender requirements from Google and Yahoo.

For more details and tips, read our dedicated blog post or watch this video.

One of the most important steps you can take to avoid spam filters is using a reliable email infrastructure. This includes not only a sending solution but also an email testing tool to reduce the spam score before targeting recipients.

Mailtrap is an Email Delivery Platform that offers both solutions: Mailtrap Email Testing for debugging emails in staging and Mailtrap Email Sending for sending emails in production.

Try Mailtrap for Free

Mailtrap Email Testing is an Email Sandbox that captures all the SMTP traffic and allows you to closely inspect their spam score, check HTML elements, troubleshoot sending issues, or view tech info to see SMTP transaction information.

Email Testing’s Spam Checker feature is particularly useful if you want to prevent emails from going to spam.

The checker verifies the content of your emails using the SpamAssassin filter, assigning a specific score. Anything below 5 is considered optimal, while anything above that threshold will most likely go straight to the spam folder.

But only a score doesn’t tell you the problems with your emails. So, the Spam Checker also provides information about the rules that gained the highest score. These rules can be missing header data or the presence of an external image. With this information, you can then make the necessary fixes and test your emails again until you lower your score below 5.

Additionally, Email Testing’s Spam Checker provides a Blacklist Report. It analyzes the most popular blacklists to check if your IP address has been blacklisted in any of them. If you did get blacklisted, you can click the name of the blacklist. The link to their website is hyperlinked, and you can follow their instructions to get delisted.

Try Email Spam Checker for Free

And to take things further, you can also inspect HTML Source and HTML Check tabs to find HTML elements that might not be supported by various clients.

After refining your emails in staging, you can send them to recipients with Mailtrap Email Sending. It’s an email infrastructure with high deliverability rates by design.

Email Sending provides automatically generated SPF, DKIM, and DMARC records for your sending domain for easy verification. This is the first step in building your reputation as a trustworthy sender.

Once you send out the emails, you can analyze their performance with actionable analytics offering robust monitoring capabilities. You can control the state of your infrastructure with drill-down reports and helicopter-view dashboards.

Try Mailtrap for Free

That’s it! Now you know how spam filters work and how to avoid them. Keep an eye on our blog and YouTube channel to learn more about email deliverability.

If you’re curious about the role AI plays in spotting spam, also make sure to give our tutorial a watch: