big quota bug in email lookup

/usr/libexec/webmin/virtual-server/lookup-domain.pl

this script checks the user quota and returns code that makes /etc/procmailrc refuse email if too close to quota and makes procmail exit with an error code that makes postfix bounce hte email back to the sender...

well...

you forgot about soft quotas + grace period...

so if my user has hit 110% of soft quota (has no hard quota) but has 1 week left to go below 100% before we enforce it... i dont want virtualmin to be denying his emails.

this needs to be fixed so it allow email while still in grace period of soft quota

so check:

  1. user/group has reached hard quota? deny the email. has no hard quota? check #2.
  2. user/group has reached soft quota, and grace period is expired, deny the email.
  3. user/group has reqched soft quota, but still has grace period left, allow the email.

failure to do this made me almost crazy until i figured out why email bounced even tho my soft quota grace period was not expired yet :O

Status: 
Active

Comments

Virtualmin by default only looks at and sets hard quotas, so any lower soft quotas should have no effect.

In your system's /etc/webmin/virtual-server/config file, what is the hard_quotas= line set to?

hard_quotas=0

In server templates I did "disk quotas type: soft"

every user has 2gb soft quota + unlimited hard quota

because hard quota is not good for me, it cause bounce if someone make a tiny mistake. better that they have 7 days to get below quota again...

virtualmin is stopping legitimate mail because it only check quota and doesnt understand grace time :/

grace time is 7 days so people can be above their quota as much as they want for 7 days, then it becomes a hard quota automatically...

but virtualmin just sees the quota is over limit, doesnt check/understand grace time, returns error code and procmail returns error to postfix which bounce the mail...

Ok, I see now - this is really a Virtualmin bug, as when soft quotas are in use it shouldn't be bouncing mail like that. I will fix this in the next release.

fantastic.

i think this is be the best method/order to test:

step 1: if user/group no hard/soft quota, accept immediately
step 2: if user/group has hard quota + itss been reached, deny the email.
step 3: if user/group has soft quota + its been reached, check grace time.
step 4: if grace time still has time left, accept the email
step 5: if grace time has no time left (has become hard quota), deny the email

remember to check both the user + maybe also its group(s) since group quota can be full too, but maybe checking group too need too much procesing power...

the "quota --raw-grace user" show raw unix timestamp of when grace expires.

   -p, --raw-grace
          When user is in grace period, report time in seconds since epoch when his grace time runs out (or has run out).
          Field is ’0’ when no grace time  is  in  effect.   This  is
          especially useful when parsing output by a script.

so...

quota --user [user] --raw-grace

for groups i dunno... there is "quota --group [group]" but i hope there is faster way than cehck every group for a user 1 by 1.

there must be some fast way to check groups too because http://wiki2.dovecot.org/Quota/FS manages to have a plugin that checks group quotas for logged in user...

http://www.dovecot.org/releases/2.1/dovecot-2.1.15.tar.gz

look in src/plugins/quota/quota-fs.c

look in this function:

fs_quota_get_linux(struct fs_quota_root *root, bool group, bool bytes,
       uint64_t *value_r, uint64_t *limit_r)

oh... i see... it checks main grp for user, not extra groups. okay that is a fast way!

so when mail comes in, check user quota + its main grp quota!

(it checks uid + gid quotas for user)

reopen just in case so you see last 2 comments

We aren't likely to add support for grace time checking any time soon, as the default quota setup used by Virtualmin doesn't make use of soft quotas or grace times. But I will keep this bug open to look into it, if there is additional demand from users.

the reasons i switched to soft quotas were many:

  1. denial of service filling inbox leads to bounces if hard quotas are used. soft quotas give time to react and fix.

  2. when users have hard quotas they cannot delete their own email to get back below quota, because imap clients connect, copy the message to trash, and mark original as to-be-deleted. this copy fails on hard quotas.

  3. i use dovecot-deliver from procmail to put mail into user mailboxes, so taht sieve mail filters can be run before delivery. problem with a hard quota are many... first is that lookup-domain.pl doesnt detect fast enough that hard quota is exceeded so processing moves on to dovecot... then, dovecot detects that hard quota is full and tries to send a bounce, but it FAILS to queue the bounce because the users hard quota is so full that it cant even write to mail queue... so, with hard quotas, the lookup-domain.pl reacts too slow to a full quota, dovecot executs but can't deliver the message, and it cant send a bounce. so the message is forever deleted and no bounce is sent :O

  4. if a user is on vacation their mailbox could fill up. hard quotas lead to lost emails.

  5. users often install stuff that takes up a lot of space (like when compiling) and then delete the temp folders. soft quotas allow this. hard quotas are trouble.

  6. the best setup is to have a soft quota of like 2gb + hard quota of like 4gb so they can exceed their normal 2gb quota temporarily but will be stopped at 4gb, but virtualmin only supports 1 or the other, so i had to choose soft quotas because they are the best of the choices.

so... why cant you add grace time checking? is it too hard to add the lookup call? you only need to add 2 calls and look at the unix timestamp it says as grace time (see above post, it's easy...) and if timestamp is in teh future then the user can still receive mail... what is so hard about adding that check?

what is the job of lookup-domain.pl anyway? what if i delete it from my procmail so that dovecot will be responsible for checking grace time instead. because at least dovecot understands what grace time is. i could make dovecot return an error code and set EXITCODE=$? and postfix would understand... the only reason i still use procmail is because virtualmin is so integrated wth it..

i looked a bit in the code and it look like its only job is to return status code for if the user can receive mail or not... does it do anything more like log or status etc that is required? can i safely delete it? what do i lose if i delete it? do i break stuff?

It may be added, it just takes time, and there's not many people using quotas the way you're using them.

We'll keep our ears open though, and as demand for this feature grows, we'll look into adding it.

Lookup-domain is responsible for working out which domain a user is in, and thus selecting the appropriate procmail include file for per-domain spamassassin rules. It also performs quota checks to reject email before they actually run out of quota, so that dovecot can still be used to delete messages.

You don't have to use it though - it is possible to instead have an /etc/procmailrc file that calls spamassassin directly, and doesn't call lookup-domain at all.

ahh i see, i have an idea... would this work?

LOGFILE=/var/log/procmail.log
TRAP=/etc/webmin/virtual-server/procmail-logger.pl
:0wi
VIRTUALMIN=|/etc/webmin/virtual-server/lookup-domain.pl $LOGNAME
EXITCODE=$?
:0
* ?/usr/bin/test "$EXITCODE" = "73"
/dev/null
EXITCODE=0
:0
* ?/usr/bin/test "$VIRTUALMIN" != ""
{
INCLUDERC=/etc/webmin/virtual-server/procmail/$VIRTUALMIN
}
DEFAULT=$HOME/Maildir/
ORGMAIL=$HOME/Maildir/
SHELL=/bin/sh
DELIVER=/usr/libexec/dovecot/deliver
DROPPRIVS=yes
:0
* ^X-Spam-Status: Yes
| $DELIVER -m Junk
EXITCODE=$?
:0 E
| $DELIVER
EXITCODE=$?
:0
$DEFAULT

that's my current /etc/procmailrc

my idea is that maybe i can change it so lookup-domain will still find per-domain procmailrc (to allow per-domain spam/virus scan disable/enable), but to edit it so procmail doesnt exit if lookup-domain returns an error.

i am not sure where to edit...

maybe this:

:0wi -- edit flags here to make it not abort on error
VIRTUALMIN=|/etc/webmin/virtual-server/lookup-domain.pl $LOGNAME
EXITCODE=$? -- remove this (we are not gonna exit so dont set return code)

i know enough about procmail to know "w=wait for exec and check return code, i=ignore pipe error" so... it seems i would have to edit the "wi" flags to something else that makes procmail ignore the return code of lookup-domain.

and i know that the VIRTUALMIN var is filled with stdout return value of execution...

so i think i am on the right track to fix this with least damage possible... i want to keep per-domain spam/virus toggle but not quota checks.

(i hope the lookup-domain script returns procmailrc path even if user is over quota btw... otherwise this plan is going to fail...)

something that confuse me a lot is that i try to run:

/etc/webmin/virtual-server/lookup-domain.pl someusername

and it just sits there, no progress. if i hit ctrl+c then $? is 130 and stdout is empty...

i wanted to test what it returns but it doesnt wanna spit out its secret :P

The secret is that it expects to get the email being processed on stdin, from which it can work out the message size. If you are just running it for testing, you can instead run :

/etc/webmin/virtual-server/lookup-domain.pl someusername </dev/null

thanks... damn, tested with a user with a 10mb file and...

[root@test home]# /etc/webmin/virtual-server/lookup-domain.pl foo.yo.com <test && echo $? Disk quota for foo.yo.com of 1024 blocks has been reached.

no service id returned... so i cant just edit procmailrc to grab server id (for per-server procmailrc include) and ignore its return... i must instead somehow patch lookup-domain.pl to never fail quota check...

so i need to change /usr/libexec/webmin/virtual-server/lookup-domain.pl to always set quota to "UNLIMITED" regardless of lookup-domain-daemon or local answer

okay my patches are done...

sed -i 's/\(.*did.*dname.*spam.*spamc.*quotaleft.*fromdaemon.*\)/\1\n\t\$quotaleft = "UNLIMITED";/' /usr/libexec/webmin/virtual-server/lookup-domain.pl

sed -i 's/\(\$uquota \*= \$bsize;\)/\1\n\$quota = 0; \$uquota = 0;/' /usr/libexec/webmin/virtual-server/lookup-domain.pl

they remove all quota checks in lookup-domain.pl for daemon + local modes

so, at least now i can survive until the day virtualmin understands quotas better

then again, i may never move back to letting virtualmin check quotas, because it's very slow at reacting to disk space used changing. people can be over quota and the daemon/lookup script doesnt notice it until much later (a cron job updates its internal quota perhaps instead of always checking latest quota?). better to let dovecot do it all and let that detect the latest up to date quota at time of delivery.

so i guess these patches will be permanent on my server ;) i wish virtualmin was better in this regard so i didnt have to do this..

lol, i discovered a crazy thing in virtualmin...

:0wi
VIRTUALMIN=|/etc/webmin/virtual-server/lookup-domain.pl $LOGNAME
EXITCODE=$?
:0
* ?/usr/bin/test "$EXITCODE" = "73"
/dev/null
EXITCODE=0

you crazy man! :P

lookup-domain.pl returns 73 if user is over or near quota

this is stored in EXITCODE

next, you check if EXITCODE is 73 (CANT CREATE FILE) and if so deliver the email to /dev/null and set exit code to 0... this tells postfix that the email was delivered ok even though it was deleted...

but... we can do better!

i suggest this code instead:

:0wi
VIRTUALMIN=|/etc/webmin/virtual-server/lookup-domain.pl $LOGNAME
EXITCODE=$?
:0
* ?/usr/bin/test "$EXITCODE" = "75"
HOST

(also change lookup-domain to return code 75 instead of 73)

this makes it so procmail aborts (HOST) if code is 75, and returns to postfix, which sees code 75 ("temporary fail") and will keep mail in retry queue (http://www.postfix.org/QSHAPE_README.html#active_queue) for a week or something. if user still has no quota after a week, it bounces the message to the sender.

here are valid sysexits.postfix will retry on EX_TEMPFAIL, everything else is treated as permanent error.

#define EX_OK        0    /* successful termination */

#define EX_USAGE    64    /* command line usage error */
#define EX_DATAERR    65    /* data format error */
#define EX_NOINPUT    66    /* cannot open input */
#define EX_NOUSER    67    /* addressee unknown */
#define EX_NOHOST    68    /* host name unknown */
#define EX_UNAVAILABLE    69    /* service unavailable */
#define EX_SOFTWARE    70    /* internal software error */
#define EX_OSERR    71    /* system error (e.g., can't fork) */
#define EX_OSFILE    72    /* critical OS file missing */
#define EX_CANTCREAT    73    /* can't create (user) output file */
#define EX_IOERR    74    /* input/output error */
#define EX_TEMPFAIL    75    /* temp failure; user is invited to retry */
#define EX_PROTOCOL    76    /* remote error in protocol */
#define EX_NOPERM    77    /* permission denied */
#define EX_CONFIG    78    /* configuration error */

how long messages kept in deferred queue is decided by "maximal_queue_lifetime" which default 5 days.

here you go! i have improved /etc/procmailrc for you.

LOGFILE=/var/log/procmail.log
SHELL=/bin/sh
TRAP=/etc/webmin/virtual-server/procmail-logger.pl
# virtualmin-specific processing
:0
{
  # check user quota and retrieve per-domain procmail id
  :0 wi
  {
    VIRTUALMIN=| /etc/webmin/virtual-server/lookup-domain.pl $LOGNAME
    EXITCODE=$?
  }

  # exit with code 75 (EX_TEMPFAIL) if user quota exceeded
  # this puts the mail in the postfix mail queue where it will
  # retry at regular intervals for up to 5 days (by default)
  # after which it bounces if the user still doesn't have space
  :0
  * EXITCODE ?? ^75$
  /dev/null # makes procmail give up and exit with the current code

  # reset current exit code to 0 before proceeding
  EXITCODE=0

  # execute the per-domain procmail file if one existed
  # this performs spam and virus scanning for the domain
  :0 E
  * VIRTUALMIN ?? ^[0-9]+$
  {
    INCLUDERC=/etc/webmin/virtual-server/procmail/$VIRTUALMIN
  }
}
# deliver to the user's maildir
DEFAULT=$HOME/Maildir/
ORGMAIL=$HOME/Maildir/
DROPPRIVS=yes
:0
$DEFAULT

this is a little faster (uses IF/ELSE instead of 2 tests, uses built-in variable test instead of spawning /usr/bin/test and feeding it the email just to figure out if a domain id is contained in $VIRTUALMIN) and has a bunch of small fixes (most importanty code 75 so email doesnt hard-bounce without any retries).

now all you need to do is make this the default /etc/procmailrc contents + make lookup-domain.pl return code 75 instead of 73, and people will enjoy queued emails for up to 5 days when their quota is full, gives them chance to react and fix their quota!

you're welcome to edit this to pieces to remove the comments and make it fit your style!

Actually, the way the current /etc/procmailrc is setup, mail is bounced when the user is over quota, not sent to /dev/null .

This is because processing stops if EXITCODE is 73 (with the receipe to /dev/null), but then procmail itself exists with status 73 which tells the mail server that delivery failed. The message will then be bounced back to the sender.

Exiting with status 75 is an interesting idea, but this seems to be different behavior from what Postfix would do if procmail wasn't involved and it was writing to mail files directly.

alright, but at least look at the method i used to speed things up.

this is the original:

:0
* ?/usr/bin/test "$VIRTUALMIN" != ""

it spawns /usr/bin/test in a shell, feeds it the variables, feeds it the email on STDIN, and just generally sucks!

use this instead:

:0
* VIRTUALMIN ?? ^[0-9]+$

same if you want to check another value like:

:0
* EXITCODE ?? ^75$

and yes, tempfail is the recommended method. code 75 is the only one that tells postfix to put the email in the deferred queue, where it will be retried for up to 5 days (by default, can be changed in postfix conf with "maximal_queue_lifetime = 14d" for instance).

if postfix still hasnt succeeded after 5 days, it bounces back to the sender.

this is better than code 73 = instabounce, because that leads to lost emails. a lot of email is auto-generated and will not be retrying delivery when they get a bounce back. so users could lose serial numbers, site logins, etc, all kinds of important emails. it is better to use code 75 and at least give them a chance to get quota under control agian.

by the way...

EXITCODE=0 doesnt do what you think it does. ;)

i learned that today.

it forces that return code even if $HOME/Maildir/ writing failed

so anything after the "EXITCODE=0" line in /etc/procmailrc will return EX_OK no matter how broken or failed it was. this leads to lost emails.

the ONLY way to avoid this is to do like this:

EXITCODE=0
# deliver to the user's maildir
DEFAULT=$HOME/Maildir/
ORGMAIL=$HOME/Maildir/
DROPPRIVS=yes
:0
$DEFAULT
# return code 73 (EX_CANTCREAT) if the default delivery failed
EXITCODE=73
:0
/dev/null

this also has a bonus in that it fixes a flaw in your /etc/procmailrc, which is that it added ":0 $DEFAULT" to the bottom of the file if we choose the option "disable user filters". well, teh flaw is that if the :0 $DEFAULT write fails, then execution proceeds to the ~/.procmailrc file anyway. so we must stop everything with /dev/null.

by the way, please please change lookup-domain.pl a tiny bit:

currently:

  • if quota is full, returns error 73, writes error string to stderr
  • if quota is not full, returns 0 + writes "129939232" on stdout

wanted:

  • always write "12321939239" on stdout (even if quota is full)
  • if quota is full, write error sring to stderr
  • if quota is full, return 75, otherwise 0

that way, it is easy for me and anyone else to simply edit procmailrc to ignore the return code, but still get the benefit of receiving the per-domain ID, without having to patch lookup-domain.pl the way I have done...

means i don't have to re-patch the file every time virtualmin packages update. i currently use these patches:

sed -i 's/\(.*did.*dname.*spam.*spamc.*quotaleft.*fromdaemon.*\)/\1\n\t\$quotaleft = "UNLIMITED";/' /usr/libexec/webmin/virtual-server/lookup-domain.pl

sed -i 's/\(\$uquota \*= \$bsize;\)/\1\n\$quota = 0; \$uquota = 0;/' /usr/libexec/webmin/virtual-server/lookup-domain.pl

I will change it to always write the domain ID on stdout, even if the quota is full .. will that do what you need?

yes thanks, that solves everything. that way i can get the domain id, but ignore the return code.

that small fix to lookup-domain keeps things working even when people are in their soft quota grace period. :)

i read all the code of lookup-domain.pl and saw that most of it related to not printing domain-id if quota was reached. so as a bonus, that file will be much simpler now:

  1. retrieve domain id + quota via local or daemon calls
  2. print domain id no matter what
  3. check quota and if reached, print error on stderr and exit with code 75 (EX_TEMPFAIL) <-- remember code 75 instead of 73 so users get 5 days of local Postfix deferred queue chances per email rather than instantly losing it if they are over quota ;) [alternatively you could just return 0 / 1 from lookup-domain.pl, and use that status code in /etc/procmailrc to then decide to return 73 or 75 from procmailrc based on a virtualmin GUI setting of "instantly bounce if users are over quotas" vs "put email in local queue and attempt re-delivery for 5 days (Postfix's maximal_queue_lifetime setting defaults to 5 days)"]
  4. else (no quota/not reached) exit with code 0 (EX_OK)

Regarding code 73 vs 75 ... what does Postfix do normally when a user is over quota, if procmail is not being used to deliver mail?

well, first a bit about code 75 (EX_TEMPFAIL): it's a special code, because it's the only one that makes postfix put an email in the deferred queue for retries. all other error codes lead to instant bounce.

for this reason, local delivery agents (LDAs) like dovecot-deliver and procmail and so on usually have a config option or error code mechanism for what to return in case delivery fails due to quota...

in dovecot, you set "quota_full_tempfail = yes" (yes is the default) to make it return 75.

upon seeing code 75, postfix then puts the email in the deferred queue and retries in X seconds, then again in twice that amount, and again in twice that, etc etc, so it goes progressively longer between retries, until finally giving up 5 days later (the default).

this is a smart thing to do because most emails that bounce will never be re-sent to the person they were intended for, making all bounces potentially very serious data losses that the person will not even notice, so everything should be done not to bounce, especially if they are only doing something as non-serious as temporarily being over quota. hence - why most LDAs offer the option to return tempfail so that quota errors are retried until delivery succeeds.

if delivery still fails with code 75 after 5 days, postfix gives up and returns a message saying that the message was undeliverable due to "temporary failure (code 75)". if the LDA had instead been set to immediately return a fatal error code, postfix would have bounced immediately with something like "code 73: can't write to disk"

so, there's no extra clarity in the bounce messages that would favor one over the other. it's simply a matter of choosing 73 if you want instant bounce or 75 if you want to lose as little email as possible to the great bit bucket in the sky.

now as for postfix if using its own internal delivery mechanism, it has this option:

virtual_overquota_bounce = yes

if set to yes, then the message is immediately bounced. if set to no, the message is put in the deferred queue. in either case, postfix is the author of the bounce message and is capable of generating a better bounce message description, but that doesn't matter in virtualmin's case since it uses procmail.

and the default? postfix's default for that option: "no". postfix prefers deferring messages, rather than instant bounce.

but wait a minute, why did the option have the "virtual_" prefix? ah, you are an astute reader! i just told you about the option for the *virtual portion of postfix.

as for postfix itself, at the plain, barest level, without virtual boxes, and just plain delivery to exact users?

drumroll... deferred delivery, if it fails due to quota, postfix-local will keep retrying until maximal_queue_lifetime

enjoy!

here's a small todo:

  1. make lookup-domain return 0 for under quota, 1 for over quota
  2. add a virtualmin config option for "when user is over quota: [bounce immediately] [put in deferred mail queue]" (with a clickable label for a help section that describes that the deferred mail queue in Postfix lasts for 5 days by default)
  3. default to "put in deferred mail queue"
  4. make /etc/procmailrc look for code 1 and exit with 73 (EX_CANTCREAT) if user has chosen "bounce immediately" or 75 (EX_TEMPFAIL) if user has chosen "put in deferred mail queue". this part of procmailrc gets overwritten every time the user changes the "what to do when user is over quota" option. basically just changing the number of the exit code.
  5. add an option to the postfix-server module which lets us edit the maximal_queue_lifetime option (default = 5d)

for point 4, the code is simple:

LOGFILE=/var/log/procmail.log
SHELL=/bin/sh
TRAP=/etc/webmin/virtual-server/procmail-logger.pl
# retrieve virtualmin domain id and check user quota
VIRTUALMIN=| /etc/webmin/virtual-server/lookup-domain.pl $LOGNAME
OVERQUOTA=$?
# check if quota is exceeded
:0
* OVERQUOTA ?? ^1$
{
  # return code 75 (EX_TEMPFAIL) if the user is over quota
  EXITCODE=75
  :0
  /dev/null
}
# include per-domain procmail file for spam/virus scanning
:0
* VIRTUALMIN ?? ^[0-9]+$
{
  INCLUDERC=/etc/webmin/virtual-server/procmail/$VIRTUALMIN
}
# deliver to the user's maildir
DEFAULT=$HOME/Maildir/
ORGMAIL=$HOME/Maildir/
DROPPRIVS=yes
EXITCODE=0
:0
$DEFAULT
# return code 75 (EX_TEMPFAIL) if the default delivery failed
EXITCODE=75
:0
/dev/null

the code above is complete and working and includes every fix/trick I've mentioned in this thread (such as not needlessly slowing down by spawning extra shells all over the place just to run /usr/bin/test to look at variable values, and the fact that one must manually set an EXITCODE at all times even when default delivery fails, because procmail doesn't do it, and postfix would then receive code 0 thinking the message was delivered).

you have my permission to use it, and i recommend that you include comments of some sort (similar to mine) so that system tweakers have an easier job ripping out the parts they don't want, like removing the overquota==1 check if using an alternative delivery system (cough cough).

the "what to do when user is over quota" virtualmin option should only affect the FIRST, quota-related EXITCODE. as for the latter one: if default delivery failed, we don't know what caused it and should always return code 75 so that it can be retried. that's what postfix wants us to do.

An option to select the exit code sounds like a good idea .. I will look into adding this for the next release.

excellent, and again, you have my permission to use the extensively tested procmail code above. it does everything correctly and as efficiently as possible. at the very least, you must stop spawning "/usr/bin/test" shells for your variable tests, and you must carefully watch and update the exitcode variable since it isn't automatically set by procmail in case of failure. my code takes care of all of that.

The next Virtualmin release will allow you to change the exit code from lookup-domain, so that mail is queued rather than bounced.

That is fantastic. So now users can choose whether to bounce or queue, and the lookup-domain returns a value even if the user is over quota. Awesome.

what's the status on the procmail script? that was the last remaining thing. Did you implement the more efficient variable checking I showed above? please get rid of the "/usr/bin/test" spawning ;) Note that every time you use that method, you are spawning a separate bash shell and piping the email contents to /usr/bin/test, just to check a variable! ;)

And I outlined some other big and small issues above, specifically that procmail won't return a failure error code on $DEFAULT delivery failure unless it's given a hand (explicitly setting EXITCODE). it only sets its own error code if execution falls through to $ORGMAIL and that also fails.

Also, when Virtualmin is told not to execute user's procmailrc files, it inserts "$DEFAULT", but that in itself won't avoid processing user's ~/.procmailrc files if $DEFAULT fails to deliver. To truly prevent ~/.procmailrc from running, one must deliver to $DEFAULT and if that fails, set an error code and deliver to /dev/null.

Basically just read over the whole script I've posted one or two posts up, and you'll see comments that explain it all.

I can't guarantee that I'm going to make all those suggested changes - for example, the overhead of running test seems pretty small.

Regarding delivery failures to $DEFAULT - in what cases could that happen (other than a mis-configuration, like a missing home dir) ?

"the overhead of running test seems pretty small." It's not. It's an order of magnitude slower than the built-in procmail regex engine used in my script. Did you not read it? It's literally a one line change to use the built-in procmail test engine.

As for $DEFAULT failing, it'd be if the folder is full, missing or has the wrong permissions. Rare indeed.

So I looked into why I did it that way, and it turned out that older procmail versions (such as the one on Solaris) don't suport the ?? syntax. Even if running the /usr/bin/test command is an order of magnitude slower than doing it internally, this is going to be tiny compared to the amount of time spent on spam and virus scanning for each email. Even my slowest test system can run over 1000 test commands per second.

So it doesn't seem worth the risk of potential breakage to switch. Supporting both would just make Virtualmin more complex, and the code is already insanely complicated. In general we would prefer to keep the code simple and reliable, rather than trying to squeeze every last bit of performance out of it.

That said, you can safely edit /etc/procmailrc yourself to make your own changes or optimizations.

Ah so it's one of those things. And yeah I already have a very different procmail script.