FOSS reigns supreme at NPSF-SANG

June 1, 2008

“Udyogam Purushalakshanam’’ goes an old saying (translatation: Employment is the defining virtue of a man) . I’ve been more busy than I can ever remember, and I’m enjoying it. At NPSF, each day brings new challenges and learning opportunities. I have been principally in charge with our (C-DAC Pune’s) contribution towards the euindia grid. Managing a cluster is a tough task and a grid, even more so. Be it a problem with bugs in the installation scripts or services that mysteriously conk off and refuse to restart, I have had my hands full, doing fire-fighting, but it has been fun.

My responsibilities got a slight digression recently as C-DAC employees are being deluged under massive amounts of spam. Since my boss also heads the SANG team that is in-charge of mail administration, the ball came into his court. As I had spent some time understanding Sendmail’s arcane rulesets and hacking through it’s source, my boss assigned this task to me. I had to acquaint myself to MailScanner, an opensource mail-security package which allows easy integration of multiple anti-virus as well as anti-spam tools. It uses Spamassassin, another opensource software for spam-detection. MailScanner also allows use of multiple virus-scanning tools in series. In our setup, we are using ClamAV , a free antivirus.

Here’s how it works:-

MailScanner runs two instances of Sendmail on the host machine. The first instance accepts mails for local as well as remote delivery, and just queues it up in a special queue (/var/spool/mqueue.in).
MailScanner users Real-time Blackhole Lists (RBLs) as well as DNS blacklists to check whether the mail has been sent from any known offenders. These lists are very exhaustive and are constantly updated. Next, after ensuring that the sender is clean (or not present in the black-list, yet), MailScanner delegates the job of further scanning to Spamassassin, which does a scanning of the contents of the mail and assigns it a spam score. If the score is below the threshold of what is considered to be spam, it is then scanned for viruses and other dangerous content. Enter M/s ClamAV, Avast et al. Once the mail has been deemed safe, it is put into the standard mail queue (/var/spool/mqueue) by MailScanner, where the other instance of Sendmail picks it up and does the delivery to the final recipient.

USP of Spamassassin:-

It has to the Bayesian filter. Spamassassin makes it possible for it’s spam-scoring technique to be adaptive. It can ‘learn’ what kind of mail is ‘normal’ to us and what is not, by analyzing our mail. All we need to do is bunch all our spam mail in one lot and all clean mail in another (yeah, it takes a bit of effort) and ask it to learn from it. The Bayesian filter needs to be fed at least 200 clean and 200 spam messages before Spamassassin allows the filter to change it’s perception of what is spam and what is not. This is very beneficial as without such orientation, a spam-filter may mark a lot of genuine mails as spam. The objective of a spam filter is not so much in detecting spam as in not marking genuine mails as spam (false positives). Spamassassin claims figures of about 95 % spam detection with a 0.06% chance of false-positives. Nobody likes spam, but nobody likes missing out on important mails either. Techniques like using Bayesian filters make the task of effective spam detection much easier.

If any of my readers has been using or is acquainted with MailScanner or any other opensource email-security/anti-spam suite, I would love to hear about his/her experience and/or observations about it.