Mail and the way it processed [#27737]

Submitted by Ilia on Thu, 05/30/2013 - 03:59

Hi!

Could you please be kind answer few mail related questions?

How does it work to have few users for a domain? Where the other users' mail will be stored? How do you access it if the user under Linux is different? Is this possible to change homes directory to something else?
Is there a scheduled script that could go through a specific folder in Maildir to learn spam messages that were manually moved to Spam folder?
How to switch from using saslauthd/PLAIN to using md5_crypt with Postfix/Dovecot?

Sincerely, Ilia

Status:

Active

Comments

Submitted by andreychek on Thu, 05/30/2013 - 09:15 Comment #1

How does it work to have few users for a domain? Where the other users' mail will be stored? How do you access it if the user under Linux is different? Is this possible to change homes directory to something else?

New users are located in /home/DOMAIN/users/USERNAME.

Emai is stored in Maildir format in /home/DOMAIN/users/USERNAME/Maildir.

To change where they're stored, you can go into System Settings -> Virtualmin Config -> Defaults for new domains, and there you can change "Subdirectory for mailbox user home directories" to your preferred Subdirectory.

Is there a scheduled script that could go through a specific folder in Maildir to learn spam messages that were manually moved to Spam folder?

No, the only way to do that is to send the spam message to the spamtrap@domain.tld address.

How to switch from using saslauthd/PLAIN to using md5_crypt with Postfix/Dovecot?

This isn't normally something you would need to change, as you could always just have your users login using TLS/SSL.

Reviewing the Postfix docs, it doesn't appear that there's a simple way to change that.

Submitted by Ilia on Mon, 06/03/2013 - 05:03 Comment #2

Andrey, thanks for your reply!

Could you please be kind and add the function that with other Virtualmin cron jobs could be run time after time to learn on spam messages!!

Why? Well, because that is the best to make spam work across all Mail Clients (Thunderbird, Kaiten Mail, Roundcube and etc). All you do, you just set up in the Mail Clients mentioned above the spam directory. If you have a message in your SPAM folder that is not marked as SPAM then it will be analyzed with spamassassin.

Same could be used to periodically run a script over your Inbox to make sure that if there is a message in your inbox marked as spam it will be also analyzed and marked as ham.

I wrote a small script in PHP to give you an example of how I do it for SPAM messages. For HAM it's the same analogy!

Please add this feature to Virtualmin and if you can share with me either PERL or BASH script.

My script does the work perfectly fine though but it's in PHP. Example:

<?php
    // Check if there is messages in SPAM folder that don't have [SPAM] lable (e.g.: manually moved from INBOX)
    exec("grep -L '\\[SPAM\\]' /home/domainexample.ru/Maildir/.Junk/cur/\* 2> /dev/null", $spam_messages);
    
    if (!empty($spam_messages)) {
        
        $sa_learn = '/usr/bin/sa-learn --spam';
        foreach ($spam_messages as $spam_message) {
            //Learn a message that we believe is spam
            $sa_learn .= ' ' . $spam_message;
            $marked_as_spam = file_get_contents($spam_message);
            // Adding [SPAM] flag to message's subject of analyzed message
            $marked_as_spam = str_replace("Subject:", "Subject: [SPAM] ", $marked_as_spam);
            file_put_contents($spam_message, utf8_encode($marked_as_spam));
        }
        // Logging the results
        $sa_learn .= ' > ' . '/home/domainexample.ru/.spamassassin/logs/' . date('d.m.Y-G:i') . '_analyzer.log';
        //Executing analyzer in background 
        = shell_exec("nohup $sa_learn 2> /dev/null & echo $!");
        // Cleaning cached messages in Dovecot
        shell_exec("rm -f /var/lib/dovecot-virtualmin/index/domainexample.ru/.Junk/*");
    }
?>

Sincerely, Ilia

Submitted by Ilia on Mon, 06/03/2013 - 05:04 Comment #3

Any ideas? Could you add it to the default Virtualmin functionality?

P.S. Virtualmin.com when down for around 24 hours? What happened? ;)

Submitted by andreychek on Mon, 06/03/2013 - 05:48 Comment #4

Thanks for the feature suggestion!

I'm not sure if we'll be able to implement that, though I'm planning to talk to Jamie about it.

There's a lot of challenges in making a change like that for everyone, since that's a big change from how things work now.

So if we did decide to add that, it would likely be a few Virtualmin versions before that could be properly planned and tested.

Submitted by aitte on Sat, 06/08/2013 - 14:23 Comment #5

Part of me likes the idea, and part of me dislikes it.

The good: If we find a message with [SPAM] in inbox then assume it was moved out of Junk and mark as ham and remove [SPAM] from Subject. If we find a message without [SPAM] in Junk then assume it was manually moved into Junk and mark as spam and add [SPAM] to subject. It would have pretty reliable automation abilities this way and be very intuitive.

The bad #1: Dovecot and email clients cache folder indexes and mailfile lengths; in fact, each mail message in Maildir format contains the filesize as part of the filename, and editing the Subject-line in the raw message file breaks that, causing cache and file-size mismatches.

The bad #2: Email clients will have cached the old subject lines and will not see the [SPAM] tag get added/removed, since they are looking at locally cached email subjects.

The bad #3: Messages would be processed on a cron job so it would take hours for the user to see their [SPAM] tag get added/deleted from the subject,

The bad #4: Messages would be processed on a cron job so it would take hours before the messages are scanned. The user might delete them from Junk/Inbox way before that, assuming the job was immediately performed the second they moved the message to a different folder.

The bad #5: Scanning maildir folders for every user on every domain with potentially millions of emails, and opening each file and scanning the subject line on a cron job is a HEAVY, HEAVY job. Not a good idea AT ALL. Think of the server memory requirements and disk load.

The bad #6: The sa-learn would only run on messages manually moved into Junk; not on messages already in the folder and assumed to be spam. Normally you might want to sa-learn those too, to help SpamAssassin learn the traits of the junkmail to be even more certain about raising its junkscore next time.

The bad #7: Anyone can email you a message with a [SPAM] subject header into the inbox. This would cause the script to assume that it's a message that WE have moved from Junk to Inbox, and will ham-learn it, thereby removing their spamscore.

The only solution to ALL of these issues (if folder-based processing is to be used), is to NOT use Inbox/Junk at all, but to use two special folders: SASpam and SAHam. If any email is moved into them, the ham/spam learn is executed. In case of the SASpam folder, we delete the mailfiles after the cron job is done, and in case of the SAHam folder we remove the [SPAM] tag, update the filesizes in the filename, generate a new unique mailid for the filename to defeat cache (risky if the new ID causes a clash though) and move them to the Inbox maildir (this move+rename gets rid of the Dovecot/client cache-mismatch issue). The user just has to dump emails into either folder and wait until the folders empty themselves (meaning processing is done). However, it's kinda ugly to have two folders sitting there at all times, AND if you accidentally move messages into the wrong one then you are screwed.

There is of course a similar mechanism already in virtualmin: Forward the spam/ham contents to spamtrap/hamtrap@domain.com and it will be processed, if that feature has been enabled on the domain. This is less convenient than just moving files between folders, but much safer. It is also very fast, because the real user and these aliases exist on the same mail server, meaning that the forwarded messages are immediately passed between mailboxes on the same server without having to travel the internet (apart from the initial upload from the mail client).

Other solutions are to install the sa-learn plugin for Roundcube and do all marking from that GUI. Or to do it from Usermin.

I dunno... Part of me likes your idea but it has way too many risks and issues.

What's really needed is an extension to the IMAP protocol to tell the server "MARK/UNMARK AS SPAM", along with a new spam/not spam metadata field. This way the Dovecot/Courier server does all the work, immediately, and it won't matter what folder things are in. Anyone feel like writing an RFC, waiting 5 years for it to be standardized and another 5 years for clients/servers to appear? :)

Submitted by aitte on Sat, 06/08/2013 - 14:32 Comment #6

There is actually a very long 2009 thread on the Internet Engineering Task Force mailing list, about extending IMAP:

www.ietf.org/mail-archive/web/imapext/current/msg00110.html

Needless to say, nothing came of it. It was deemed way too complex. "Should clients do a simple binary yes/no spam toggle? Should it be based on folders where the learning is immediately triggered upon moving messages in/out of a Junk folder? Should it be more gradual like server-spamscore of 2 + client-spamscore of 2 = total spamscore of 4 = message is spam on both client and server and should be learned as spam? Should it be bidirectional so that server and clients exchange bayesian filters? Should it be done as an extension or as a keyword or a trigger or what?" - etc etc etc.

One of the guys said it best: "If this proposal had come 10 years ago when we were standardizing extensions, it would have been easy to do. But now, the installed IMAP client base is too large for us to change things too much."

But I am sure something will happen in this arena within another 10-20 years. So If you still have spam then, you might wanna check if IMAP has been extended. ;-P

Submitted by jrhosting on Tue, 06/18/2013 - 15:35 Comment #7

There is this dovecot plugin called antispam. We are using that and we have set several IMAP Folders that are Spam. If a message is moved to Spam it's learned to dspam (sa-learn would work too I think), and if it is moved away from Spam then it's ham and relearned.

http://wiki2.dovecot.org/Plugins/Antispam

There are SpamAssassin examples there too, this might be way easier then running a script with additional dependencies..

Hope this helps..

Submitted by aitte on Tue, 06/18/2013 - 16:39 Comment #8

@jrhosting: That's pretty cool. I just worry that it triggers every time an incoming message is filed into the Junk folder automatically. In that case, any incoming email that SpamAssassin guesses is spam, is put in the Junk folder, triggering the Antispam plugin to do a sa-learn on it, thus boosting the spam score of what MAY have been misidentified ham.

But if it only triggers when an actual email client moves emails, then it sounds really useful.

Submitted by andreychek on Tue, 06/18/2013 - 18:10 Comment #9

The Dovecot antispam plugin is something we've been exploring. It is indeed pretty cool!

Actions are only performed during the "IMAP COPY" within Dovecot, so it's not called during the initial email delivery... only if the email client moves it into the Spam folder using IMAP. Or, another action is triggered by moving email out of the spam folder, which would allow messages to be tagged as ham.

What exactly occurs when those events are triggered is configurable -- we're working on a resource friendly way to handle that.

The Dovecot antispam plugin isn't likely to become "official" within Virtualmin installs in the near future, but we are hoping to write up some instructions on how to set it up.

Submitted by aitte on Wed, 06/19/2013 - 03:12 Comment #10

Oh wow, that's excellent. Since it's only done on "IMAP COPY", I might actually implement this on all servers so that users can just move emails. Thanks for the tip andrey and jrhosting! I'll first have to make sure it doesn't act on Sieve "fileinto" events either, but I don't think it would since I believe fileinto is a literal move as opposed to IMAP's copy+delete original.

Submitted by aitte on Wed, 06/19/2013 - 04:39 Comment #11

Dovecot-Antispam was a real bitch to get to build on the outdated Dovecot-2.0.9 version on CentOS 6, since the official Dovecot is up to 2.2. I figured out which commits introduced build errors and reverted them.

Here's my CentOS 6 installation guide:

- First of all, install Mercurial (required to download the Antispam plugin sources), Dovecot-Devel (provides /usr/share/aclocal/dovecot.m4 and various development headers), and the three autotools/libtool requirements needed to create the build-script.
# yum install mercurial dovecot-devel autoconf automake libtool
- Now clone the Dovecot-Antispam repository into the root user's home folder (we want to keep the repo around so that we can update it later with "hg pull", and the root user's home is as good a location as any).
# cd /root
# hg clone http://hg.dovecot.org/dovecot-antispam-plugin
# cd dovecot-antispam-plugin
- Revert a patch (http://hg.dovecot.org/dovecot-antispam-plugin/rev/5e8351bcfb29) that is incompatible with the older Dovecot version's internal structs, to prevent build errors:
# sed -i "s/\(ctx->copying\)_via_save/\\1/" src/mailbox.c
- Short-circuit a failing prerequisite check (from http://hg.dovecot.org/dovecot-antispam-plugin/rev/5ebc6aae4d7c) so that it always takes the else-path and does the right thing for our build version.
# sed -i "s/if defined.DOVECOT_PREREQ. && DOVECOT_PREREQ.2,2./if 1==2/" src/signature-log.c
- Now create the automake structure (this is the command that requires autotools and libtool to be installed).
# ./autogen.sh
- Time to configure, build and install the plugin. (Exclude the --with-dovecot part if you aren't running 64-bit CentOS.)
# ./configure --prefix=/usr --with-dovecot=/usr/lib64/dovecot
# make -j4
# make install

Configuration is left up to you.

Submitted by aitte on Thu, 06/20/2013 - 12:11 Comment #12

Turns out that getting the Antispam plugin to talk to spamc (the SpamAssassin spamd client) was non-trivial. Why? Because the plugin gives absolutely zero error-logging when the command fails, so I had to waste 3 hours trying about 10-15 different things until I figured out why the plugin wasn't calling the command properly.

The Antispam v2-plugin docs are very bad, because the current maintainer that has taken over the plugin and made it work with Dovecot 2 is clearly inexperienced and has stripped all kinds of useful information and examples that were there in the Antispam v1 plugin, making configuration a bit of a blind guess. Several things show his inexperience, such as the way he renamed the very logically named "pipe" backend (capable of passing the email to any program via STDIN pipes), to "sendmail", as if that was the only program it was capable of working with, clearly showing that he's a bit of a newbie. I also found indication that the original Antispam author dislikes this guy and didn't sanction the Dovecot 2 version. Anyway, to make matters worse, the plugin isn't very popular so it's very rare to find configuration examples anywhere online to compare mine against. I found one person using it with dspam and one with sa-learn but nobody successfully using it with spamc. I've now figured out how to do the latter.

Here are the two most important pieces of information that I have now learned:

The Antispam plugin executes the external plugin as the UID/GID of the connected mail user, NOT as root (despite Dovecot itself running as root). Therefore, you do not need to use "sudo" in your external command to get it to execute as the permissions of the mail user; that's taken care of automatically.
The spam/notspam parameter can only take ONE word. That's it. You CANNOT specify "antispam_mail_spam = "-L spam"" or similar. The workaround is to provide the -L flag as part of the generic argument list.

I've been on quite a journey, even going as far as talking directly to Dovecot's lead programmer to figure out some of how its internal parameter processing works. In the end, it came down to the two issues above.

Here is my finished configuration for a CentOS 6 system which runs spamc/spamd (far, far more efficient than sa-learn, since it doesn't have to load the full SpamAssassin engine every time it has to learn/unlearn a single email). This is for people using SpamAssassin with real home directories for each email user (NOT for virtual email users; for those, you probably have to use the spamc -u parameter which I mention below).

This configuration properly executes spamc as the exact email user, so that only the correct user's Bayesian filters are updated.

All of the comments are by me.

/etc/sysconfig/spamassassin

- Add "-l" (lowercase "L") to the spamd options to enable the "allow tell" function (required for spam/ham learning), so that it looks like the 2nd line below (shows before/after):
SPAMDOPTIONS="-d -c -m5 -H"
SPAMDOPTIONS="-d -l -c -m5 -H"

Remember to restart spamd so that the changes take effect.

/etc/dovecot/conf.d/90-antispam.conf

plugin {
  # Tell Antispam to use the Mailtrain/Pipe backend, which pipes the message to an external program.
  antispam_backend = mailtrain
  
  # Which folders to treat as the Spam and Trash folders.
  # By telling the plugin the name of the Trash folder, it will be able to properly ignore Spam->Trash and Trash->Spam moves.
  # Note that the "Deleted *" wildcard matches "Deleted Messages" and "Deleted Items", used by iPhones and some webmail clients.
  antispam_spam = Junk
  antispam_trash_pattern_ignorecase = Trash;Deleted *
  
  # Unsure folders are all folders that are on the client's local hard drive or on other email accounts,
  # where it's impossible to know the source/destination of the move. Moves to/from unsure folders are
  # therefore ignored by the Antispam plugin. By default, all external folders are treated as Unsure,
  # but here you can add extra folders that are actually on the account itself, if desired.
  #antispam_unsure = SomeExtraFolder
  
  # Make sure that we never handle APPENDs to the spam folder(s), since it's impossible to know the source
  # of the off-server message, and we would therefore introduce all kinds of issues such as training the
  # server with data that it has never encountered and will never encounter, and we also wouldn't be able
  # to detect and ignore external Trash->Spam transitions since we don't know the source.
  # The default is No, but we re-inforce the option in case the default ever changes.
  antispam_allow_append_to_spam = no
  
  # Configure the MAILTRAIN backend to pipe the message to spamc.
  # The Antispam plugin executes the given command with the UID/GID of the connected email user; not as root.
  # Therefore, there is no need for the -u username parameter. Spamc will automatically figure out the username.
  # Args:
  # -d localhost = connect to local spamd.
  # -s 2097152 = accept messages up to 2 MiB in size (normally spamc ignores anything over 500 KiB;
  #   spam is rarely that large, but we might have to re-classify misidentified ham which could sometimes exceed 500 KiB).
  # [NOT USED] -u Username (the unix username of the mailbox owner, which tells spamd to properly update
  #   that user's personal Bayesian filters; however, we let spamc translate the executing UID on its own).
  # -L spam/ham (the training action to take on the provided email).
  # NOTE: We provide the -L parameter as part of the generic argument list, because the plugin doesn't support
  #   passing spaces in the final parameter, so we can't do "-L spam" or "-L;spam", but it's okay because the
  #   spam/notspam words are appended as the final parameter to the command, immediately after the "-L" parameter.
  antispam_mail_sendmail = /usr/bin/spamc
  antispam_mail_sendmail_args = "-d;localhost;-s;2097152;-L"
  antispam_mail_spam = "spam"
  antispam_mail_notspam = "ham"
}

/etc/dovecot/conf.d/20-imap.conf

protocol imap {
  mail_plugins = $mail_plugins [...] antispam
}

Where [...] is any other plugins you had enabled.

Submitted by aitte on Fri, 06/21/2013 - 11:04 Comment #13

It's been running for a few days now, and I've been double- and triple-checking that each action is correctly interpreted and trained as ham/spam as intended and that there are zero mistakes or errors.

It was a success. I can now recommend the above setup to any other admins that want a resource-efficient and very intuitive spamc/spamd training method.

Submitted by jrhosting on Sun, 06/23/2013 - 13:13 Comment #14

Well our setup differs 'alot' from the standard Webmin/Virtualmin Pro kind of setup. We use dspam instead of spamassassin which is a succes but a pita to setup at first. Once it runs it's groovy though!

I would like to see a virtualmin friendly kind of configuration for that, possibly an option that can be toggled to be used instead (though with the various OS's around the admin should get the plugin going by himself).

Thanks Remko