Article

feb2003.tar

Swimming with the Spammers

Phillip Tiburcio

For a long time, my various email accounts did their work quietly and effectively; then I began receiving a large volume of spam every day. I was being crushed under the weight of spam, so I fought back with a personal email server built entirely with free tools and reduced my spam intake by 95%. Although this article describes a personal email setup, the tools mentioned can be applied to other environments.

Bill of Materials

My plan was to build a server that would download all my email accounts, scrub them for spam, sort them into appropriate mail folders, then make everything available via imap/ssl or encrypted Webmail. The integration of several free tools makes this possible on my personal Linux server.

Spamassasin and Vipul's Razor scrub mail for spam using a distributed spam-signature Internet hive mind. A free Perl script called Gotmail can download messages from a Hotmail account via HTTP. Another script called Fetchyahoo does the same for any Yahoo account. Fetchmail rounds out the bunch by downloading mail from various POP accounts I've collected through the years, and Squirrelmail provides a Webmail gateway using Apache and PHP. A few procmail rules sort the incoming mail into appropriate subfolders and relegate spam suspects to a separate folder. An old Pentium III machine running Red Hat Linux was pressed into service to act as the foundation for this host of utilities. Any of the various BSD flavors or Solaris would have worked as well.

Network Setup

A DSL connection provides the Internet connection to my home LAN via a Linksys DSL Router. I configured the DSL router to allow inbound SMTP, ssh, http, and https traffic to the Linux machine. Note that some revisions of the Linksys firmware have terrible upload performance unless you tune the Maximum Transmission Unit (MTU) size parameter in its configuration. A size of 1492 worked well for me. I suggest you not configure these inbound services until your server is installed, tested, and patched for security purposes.

Domain Setup

I decided to go with my own domain name, registered via DomainDiscover. ZoneEdit.com will host up to five domains on their DNS servers free of charge. ZoneEdit also supports dynamic DNS updates via a Perl script they provide. This is key if you do not have a static IP address from your ISP. Note that ZoneEdit is not a domain registrar; it merely provides DNS servers and a decent Web administration interface.

Once my domain was registered and properly auto-updating in ZoneEdit, I created an MX record. Thus, I can receive email directly into my own domain, while still downloading all my legacy accounts and scrubbing everything for spam.

Email Setup

The overall message flow goes like this:

1. Email comes into various Internet accounts and is collected via command-line utilities on a cron schedule.

2. Messages are forwarded to a local user account via SMTP.

3. Courier SMTP (http://www.courier-mta.org) picks up these messages (as well as messages sent directly to my domain) and passes them to Procmail for local delivery.

4. Procmail pipes messages through SpamAssassin, which modifies messages that are detected as spam.

5. Modified messages are flagged by procmail and sorted into a "spam" folder.

6. Clean email gets sorted into folders according to its original destination.

7. Courier IMAP serves up the email for local LAN clients.

8. Squirrelmail running on Apache/PHP presents a Webmail interface over ssl and accesses mail accounts via local IMAP.

I installed the Courier email system to serve SMTP and IMAP on my server. There are several freely available alternatives for mail servers (e.g., qmail and sendmail), but I selected Courier because it comes with an integrated IMAP server and uses Maildirs for mail storage instead of the traditional MBOX format. Courier also comes with a Webmail component that is a bit ugly and short on features compared to the SMTP and IMAP components. Each Courier subsystem can be installed independently and each communicates via standard protocols, so it was easy to substitute into a different Webmail system.

Courier can be compiled from source, but this is not recommended. Compiling and configuring Courier is a long, tedious process, and the developers suggest you find a binary distribution and use that instead. After going through the process myself, I definitely agree.

Once Courier is installed, set "procmail" as the local Mail Delivery Agent (MDA). You can do this by editing the courierd configuration file and setting:

DEFAULTDELIVERY="| /usr/lib/courier/bin/preline /usr/bin/procmail"

This will cause Courier to hand over all incoming email to procmail for final delivery, a fact we will make extensive use of shortly. Your paths may vary depending on how Courier was installed.

A Note on Courier Subfolders

You need to make sure that each user on your system has a valid ~/Maildir directory with the right structure. Courier comes with a program called "maildirmake" that will create a proper Maildir directory structure. On a Red Hat system, you can add a Maildir in /etc/skel and all new users created with the adduser command will be set up properly.

Note that if you want to use command-line tools to create Inbox subfolders that appear via IMAP, you must create another Maildir format directory structure in your ~/Maildir folder, named with a leading ".". Thus, if you want an IMAP subfolder called "MailingLists", you must use maildirmake to create ~/Maildir/.MailingLists/. To create a sub-subfolder (e.g., a "BugTraq" mail folder that is a subfolder of "MailingLists"), you must prepend the name of the parent folder, chained with a ".". Therefore, to make a BugTraq subfolder of MailingLists, you must use maildirmake to create ~/Maildir/.MailingLists.BugTraq/. Do not make actual subdirectory structures.

Mail Collection

Once I verified that I was receiving email at my new domain and that IMAP was working on the local LAN, I installed gotmail into /usr/local/bin. Gotmail (http://ssl.usu.edu/paul/gotmail/) is a single file Perl script that accesses Hotmail via http (using curl, another command-line utility), reads each mail message, and forwards that message to another address that you specify. All configurations for gotmail are stored in ~/.gotmailrc. Take care that file permissions are restrictive enough on .gotmailrc, since it must contain your Hotmail username and password. I configured gotmail to forward incoming Hotmail to my local account on the server. However, gotmail has several useful options to archive all your Hotmail into an MBOX.

Fetchyahoo (http://fetchyahoo.twizzler.org/) operates on a similar concept but is geared to download Yahoo mail via http. Fetchmail is another venerable UNIX command-line utility that fetches mail via POP. I use fetchmail to download mail from my mac.com account. Each of these utilities can be configured on a per-user basis, and can be set to run via cron at regular intervals. All of them forward incoming messages to the local UNIX account via SMTP. Hotmail tends to change its Web page layout every few months, so be sure to stay informed of any updates to Gotmail. If you subscribe to these services' POP offerings, you can simply use fetchmail for everything and avoid any incompatibilities.

Spam Filtering

At this point, I had mail from many different sources being collected on a regular basis into my account on the Linux machine. All told, I was receiving more than 90 spam messages per day. This is where SpamAssassin (http://spamassassin.taint.org) came in to save the day. SpamAssassin is a Perl script that analyzes mail according to many rules. Each rule "hit" has a score value associated with it. If the overall score for a particular message goes over a configurable threshold, the email is tagged as spam. This system works incredibly well. In my experience, the default configuration is 99% accurate with very few false positives. Rules match on text commonly found in commercial email, whether or not the source domain is in the Realtime Blackhole List (RBL), and many other items.

Each tagged message gets modified in two ways: the subject line gets the string "*****SPAM*****" prepended to it, and the body of the message is prepended with a report indicating which rules were triggered. For example, here's the epitaph on a recently "assassinated" message:

Subject: *****SPAM***** Automated Credit Repair that Works! Free Sign Up!

SPAM: ------------------- Start SpamAssassin results --------------------
SPAM: This mail is probably spam.  The original message has been altered
SPAM: so you can recognise or block similar unwanted mail in future, using
SPAM: the built-in mail filtering support in your mail reader.
SPAM:
SPAM: Content analysis details:   (10.4 hits, 5 required)
SPAM: Hit! (0.8 points)  Subject has an exclamation mark
SPAM: Hit! (0.8 points)  BODY: /^[^<]{199,}$/m
SPAM: Hit! (2.5 points)  BODY: Link to a URL containing "remove"
SPAM: Hit! (3.3 points)  BODY: /click here.{0,100}<\/a>/is
SPAM: Hit! (3 points)    Listed in Razor, see http://razor.sourceforge.net/
SPAM:
SPAM: ------------------- End of SpamAssassin results -------------------

Vipul's Razor (http://razor.sourceforge.net) is a set of Perl scripts that can scan mail as a plug-in to SpamAssassin. Vipul's Razor creates a fingerprint of each incoming message and checks it against an Internet database of spam message fingerprints. Users can report spam mail that is not yet in the database to help other users identify similar mail in the future. By this collective effort, widely distributed commercial email can be detected. Although this mechanism is not foolproof, it can increase SpamAssassin's accuracy when used as a plug-in. Details on how to configure both tools are available at their respective Web sites.

Both SpamAssassin and Vipul's Razor require several supporting Perl modules. The best way to install these is to run:

perl -MCPAN -e shell

to start an interactive installation shell prompt. Then type:

install <perl module>

The CPAN shell will locate the desired Perl module on the Internet, then download and install it along with any dependencies it requires.

To have SpamAssassin start tagging mail, you need to pipe all your mail through it using Procmail. If you are scrubbing a lot of mail, instantiating Perl for each incoming message could degrade performance. Luckily, SpamAssassin also features a "daemon" mode, where you load up a single instance of Perl/SpamAssassin in server mode, then use a small stub program to submit text for evaluation. This can drastically affect performance for busy servers. There are many configuration options available; however, my experience is that SpamAssassin works nearly flawlessly "out of the box."

Mail Sorting with Procmail

To activate SpamAssassin for all users on your system, put the following block in your /etc/procmailrc file:

PATH=/bin:/usr/bin:/usr/local/bin
SHELL=/bin/sh
VERBOSE=on
LOGFILE=$HOME/proc.'date +%y-%m-%d'
LOGABSTRACT=all
INCLUDERC=$HOME/.procmail-preprocess
:0fw:spam.lock
      | spamassassin -P

      :0e
      {
         EXITCODE=$?
      }

      :0:  
      * ^Subject:.*\*\*\*\*SPAM\*\*\*\*
      $HOME/Maildir/.spam/

INCLUDERC=$HOME/.procmail

:0
*
$HOME/Maildir/

Keep in mind that the ~/Maildir/.spam/ directory must already exist and be in proper Maildir format.

Messages sent to certain mailing lists tend to trigger false positives in SpamAssassin due to the envelope structure, so certain exceptions must be made. Procmail reads /etc/procmailrc upon startup for global mail-handling rules. Typically the /etc/procmailrc references a .procmail in the user's home directory for custom rules. I changed the procmailrc flow to run a ~/.procmail-preprocess file in each user's home directory (the first INCLUDERC line), followed by the global SpamAssassin block, then to execute ~/.procmailrc (the last INCLUDERC line). Rules that tightly match certain mailing lists to which I belong are put in ~/.procmail-preprocess so that SpamAssassin does not see them. Procmail is a powerful mail-handling tool, and a complete discussion of how to configure it is beyond the scope of this article. However, if you like to learn by example, take a look at these excerpts from my configuration. The following block in my .procmail-preprocess will sort any mail that has the string "LISTSERV@beethoven.us.\checkpoint.com" anywhere in the header, and pipe it into my Inbox/Lists/Security mail folder:

:0 H
* .*LISTSERV@beethoven\.us\.checkpoint\.com.*
$HOME/Maildir/.Lists.Security/

Rules that sort incoming scrubbed mail into per-original-account folders get put in the ~/.procmailrc. This block in my ~/.procmail will stuff any mail sent to my mac.com account (and subsequently fetched by fetchmail) into its own subfolder:

:0:
* ^TO_.*tiburcio@mac\.com
$HOME/Maildir/.Accounts.Tiburcio@mac/

Any email from the local cron process is automatically trashed by this code:

:0:
* ^From:.*Cron
/dev/null

Don't forget the trailing slash when indicating a Maildir-style destination folder, otherwise procmail will assume you want MBOX output.

Beware that certain e-card messages will also get sorted as spam. I had to make an exception in my .procmail-preprocess for users who like to send me such mail. Refreshingly, SpamAssassin skillfully catches chain letters and joke messages sent from people that I do know. Procmail could be configured to immediately delete detected spam, but I prefer to sort it all into a separate folder for manual deletion later. Sometimes clean email messages (such as email receipts for Web purchases) get tagged, so it's worthwhile to scan the subject lines in the spam folder before emptying it.

Web Sites and Webmail

After the mail and spam system was debugged and working, the next step was to serve up the mailboxes via Webmail. Squirrelmail (http://www.squirrelmail.org) is an excellent, free collection of PHP scripts that get the job done. I created a virtual host in Apache just for Webmail (i.e., mail.<domain>.com). Squirrelmail requires very little by way of configuration, and comes with a small terminal-based control panel. At minimum all you need to do is enter the IP address of your IMAP instance into the control panel. Color and font schemes are selectable on a per-user basis from the Web interface. If you are handy with HTML and CSS, you can create your own "skins." The Web pages it generates are largely free of JavaScript and images, and download quickly.

I installed mod_ssl and created a homebrew SSL certificate per instructions at http://www.modssl.org. This allowed me to access my Webmail domain via https for greater security. The locally generated certificate means that the first time a Web browser encounters your site a pop-up message will appear, warning that the certificate cannot be authenticated. Clicking "OK" to accept the certificate will allow you into the site. You can alternatively purchase a real certificate and avoid this error message entirely.

Remote Administration and Access Remote adminstration is accomplished via ssh. While my DSL provider occasionally changes my IP address, I do have a static hostname courtesy of ZoneEdit. Squirrelmail is nice, however I like to use a full-featured mail client whenever I can. To this end, I set up an ssh tunnel to access the SMTP and IMAP ports on my server remotely. Since I run Mac OS X on my laptop, I can use the following SSH command:

ssh -C -L 25:192.168.1.3:25 -L 143:192.168.1.3:143 <domain.com>

192.168.1.3 is the local Ethernet IP address of my Linux server, whereas <domain.com> resolves to its public IP address. The "-C" option indicates compression should be used, which results in a substantial speed improvement on my 128K DSL uplink. The "-L" sections indicate that the ssh client program running on my laptop should start listening on port 143 and send any data received over the ssh link to the server for delivery to port 143 on 192.168.1.3 (the server itself). My laptop's mail client is then directed to connect to 127.0.0.1 for IMAP. The second "-L" command does the same for port 25 (outgoing SMTP). This gives me an encrypted session for all email traffic, and I don't have to open up IMAP ports on my Linksys. Outgoing email is also relayed automatically, because from the server's perspective it appears to be a local connection.

This mini-vpn works wherever ssh traffic is allowed and lets me use a more robust email client with offline capabilities from my laptop. Webmail is still useful for those situations where I am at a guest computer or where outgoing ssh traffic is denied by a firewall.

Conclusion

All told, this setup took about a weekend to complete. Most of that time, however, was spent installing Red Hat and compiling Courier from scratch (a wretched task if there ever was one). You can probably save some time by using whatever SMTP and IMAP server solutions come with your distribution, provided you can insert Procmail into the local delivery chain. The reduction in spam was well worth my efforts, however, and the bonus of having all my mail aggregated and available in a multitude of formats has been very useful.

Phillip Tiburcio is an independent consultant in Chicago. He graduated from Rensselaer Polytechnic Institute in 1995 with a degree in Computer Systems Engineering. Phillip has been in the systems administration/integration field for the past eight years, working mainly with Solaris, Windows NT 4.0 (with Terminal Server Edition/Citrix Metaframe), and Windows 2000. He has been working with Linux for about eight years, starting with Slackware, and currently with Red Hat. Phillip can be contacted at: phillip@tiburcio.info.