Submitted by FL- on Wed, 09/09/2020 - 13:35
I'm not sure if this is exactly the same as https://www.virtualmin.com/node/2941 it doesn't seem to directly related to a timeout, but a program failure is causing the same result of mail being lost.
In this case there's an unknown issue that caused clam or the wrapper to be killed but, the filter shouldn't default to marking as a virus and sending to /dev/null or perhaps should do some kind of quarantine instead especially since there's no bounce, no warning (other than in procmail.log after the fact) and no way to recover the mail. For now I've disabled virus scanning as a work around.
from procmail.log:
procmail: Program failure (-15) of "/etc/webmin/virtual-server/clam-wrapper.pl"
Time:* From:* To:* User:* Size:* Dest:/dev/null Mode:Virus
sh: line 1: 1203 KilledÂ
Status:
Active
Comments
Submitted by JamieCameron on Wed, 09/09/2020 - 19:06 Comment #1
That's odd, as the reason we have that
clam-wrapper.pl
script in the config is to prevent exactly this failure mode.Is there anything in the logs indicating why it is failing, or why the clamdscan command that it runs is failing?
Submitted by FL- on Wed, 09/09/2020 - 20:30 Comment #2
The system ran out of memory, oom-killer ran and killed clam. I had the dashboard open at the time and the load average jumped to over 50 before it stopped responding. From the logs it looks like clam ended up using several GB of memory before it consumed all the available memory. It isn't a low memory install, there's swap and it's a low use testing system. So far I haven't found any reasons in the logs or really anything abnormal besides running out of memory.
Submitted by JamieCameron on Fri, 09/11/2020 - 17:55 Comment #3
Wow - was there a particular email that triggered this?
The OOM killer just kills the largest process, which may not be the one that caused the lack of RAM if something else caused hundreds of other small processes to be spawned.
Submitted by FL- on Sun, 09/13/2020 - 19:19 Comment #4
I went through the rest of the old procmail logs and found it happened once before. After checking the corresponding messages log both times oom-killer killed clam. One of the messages was a short message from a colleague and the other was an order confirmation from a major retailer so neither were big/atypical and neither should have set off clam. The dump process list in the logs after running out of memory doesn't show anything but the expected processes, not hundreds of anything, nothing else really using any significant memory.
Submitted by JamieCameron on Sun, 09/13/2020 - 19:46 Comment #5
Looking at the original message, it looks like the script
clam-wrapper.pl
was killed, which is unusual as it's a very small Perl script that exists only to exit cleanly if theclamscan
command dies.Could something else have killed it, other than a lack of memory?
Submitted by FL- on Sat, 09/19/2020 - 15:42 Comment #6
No I don't think something else is killing clam. The system is a stock virtualmin install. I wanted to wait a few days before responding to see if anything else happened, but with virus scanning disabled the system hasn't run out memory. No changes to the system other than disabling virus scanning and no changes to usage.
I've done some testing and I think I understand what's happening just not the why yet:
When clam itself gets terminated
clam-wrapper.pl
works as intended and sends exit code of 0 to procmail. Whenclam-wrapper.pl
is terminated an exit code of 143 or 137 on account of SIGTERM or SIGKILL is passed to procmail, and the procmail recipe is interpreting any non zero return as a virus found then sending the message to /dev/null.Submitted by JamieCameron on Sat, 09/19/2020 - 19:08 Comment #7
You're right that if clam-wraper.pl is terminated that procmail will drop the message. What I'm wondering is why it would be terminated when it's a fairly small process that doesn't use much RAM..