Hi everyone!
Last night one of my domains started to "vanish" new mails.
The maillog shows that spamassassin thinks that disk quota exceeded. (Which is weird because I've checked total disk quota, administrator quota, I've checked the whole filesystem with find / -user USERNAME
du -hs /home/DOMAIN shows 2.7GB data, 10GB disk quota)
This is however not the important part because I've turned off the quota (unlimited) and the mails started to flow BUT...
Where are those mails which arrived during the period where quota was an issue? maillog shows it is delivered but it's not the case (sorry for the long log excerpt but maybe someone clever finds the answer in it) (i've replaced the actual username and the domain with USERNAME and DOMAIN for privacy protection)
Apr 30 06:53:20 vm postfix/qmgr[2158]: 3DB951A3027: from=<illetekigazolas@allamkincstar.gov.hu>, size=50563, nrcpt=1 (queue active)
Apr 30 06:53:20 vm postfix/smtpd[447]: disconnect from mail.allamkincstar.gov.hu[84.206.45.194]
Apr 30 06:53:20 vm spamd[13186]: spamd: connection from localhost [127.0.0.1] at port 33580
Apr 30 06:53:20 vm spamd[13186]: spamd: setuid to USERNAME.DOMAIN succeeded
Apr 30 06:53:20 vm spamd[13186]: spamd: processing message <201304300453.r3U4rKD8027789@nlcrypto> for USERNAME.DOMAIN:547
Apr 30 06:53:20 vm spamd[13186]: bayes: partial write to bayes journal /home/DOMAIN/homes/USERNAME/.spamassassin/bayes_journal (1400 of 3624), recovering
Apr 30 06:53:21 vm spamd[13186]: bayes: partial write to bayes journal /home/DOMAIN/homes/USERNAME/.spamassassin/bayes_journal (1400 of 3624), recovering
Apr 30 06:53:22 vm spamd[13186]: bayes: partial write to bayes journal /home/DOMAIN/homes/USERNAME/.spamassassin/bayes_journal (1400 of 3624), recovering
Apr 30 06:53:23 vm spamd[13186]: bayes: partial write to bayes journal /home/DOMAIN/homes/USERNAME/.spamassassin/bayes_journal (1400 of 3624), recovering
Apr 30 06:53:24 vm spamd[13186]: bayes: partial write to bayes journal /home/DOMAIN/homes/USERNAME/.spamassassin/bayes_journal (1400 of 3624), recovering
Apr 30 06:53:25 vm spamd[13186]: bayes: partial write to bayes journal /home/DOMAIN/homes/USERNAME/.spamassassin/bayes_journal (1400 of 3624), recovering
Apr 30 06:53:25 vm spamd[13186]: bayes: cannot write to bayes journal /home/DOMAIN/homes/USERNAME/.spamassassin/bayes_journal, aborting!
Apr 30 06:53:25 vm spamd[13186]: Exiting subroutine via last at /usr/share/perl5/Mail/SpamAssassin/BayesStore/DBM.pm line 1156.
Apr 30 06:53:25 vm spamd[13186]: Exiting subroutine via last at /usr/share/perl5/Mail/SpamAssassin/BayesStore/DBM.pm line 1156.
Apr 30 06:53:25 vm spamd[13186]: Exiting subroutine via last at /usr/share/perl5/Mail/SpamAssassin/BayesStore/DBM.pm line 1156.
Apr 30 06:53:25 vm spamd[13186]: Exiting subroutine via last at /usr/share/perl5/Mail/SpamAssassin/BayesStore/DBM.pm line 1156.
Apr 30 06:53:25 vm spamd[13186]: Exiting eval via last at /usr/share/perl5/Mail/SpamAssassin/BayesStore/DBM.pm line 1156.
Apr 30 06:53:25 vm spamd[13186]: Exiting subroutine via last at /usr/share/perl5/Mail/SpamAssassin/BayesStore/DBM.pm line 1156.
Apr 30 06:53:25 vm spamd[13186]: Exiting subroutine via last at /usr/share/perl5/Mail/SpamAssassin/BayesStore/DBM.pm line 1156.
Apr 30 06:53:25 vm spamd[13186]: Exiting eval via last at /usr/share/perl5/Mail/SpamAssassin/BayesStore/DBM.pm line 1156.
Apr 30 06:53:25 vm spamd[13186]: Exiting subroutine via last at /usr/share/perl5/Mail/SpamAssassin/BayesStore/DBM.pm line 1156.
Apr 30 06:53:25 vm spamd[13186]: Exiting subroutine via last at /usr/share/perl5/Mail/SpamAssassin/BayesStore/DBM.pm line 1156.
Apr 30 06:53:25 vm spamd[13186]: Exiting subroutine via last at /usr/share/perl5/Mail/SpamAssassin/BayesStore/DBM.pm line 1156.
Apr 30 06:53:25 vm spamd[13186]: Exiting subroutine via last at /usr/share/perl5/Mail/SpamAssassin/BayesStore/DBM.pm line 1156.
Apr 30 06:53:25 vm spamd[13186]: Error writing to /home/DOMAIN/homes/USERNAME/.spamassassin/bayes.lock.vm.rewaresoft.hu.13186: Disk quota exceeded at /usr/share/perl5/Mail/SpamAssassin/Locker/UnixN
FSSafe.pm line 88.
Apr 30 06:53:27 vm spamd[13186]: Error writing to /home/DOMAIN/homes/USERNAME/.spamassassin/bayes.lock.vm.rewaresoft.hu.13186: File exists at /usr/share/perl5/Mail/SpamAssassin/Locker/UnixNFSSafe.p
m line 88.
Apr 30 06:53:27 vm spamd[13186]: Error writing to /home/DOMAIN/homes/USERNAME/.spamassassin/bayes.lock.vm.rewaresoft.hu.13186: File exists at /usr/share/perl5/Mail/SpamAssassin/Locker/UnixNFSSafe.p
m line 88.
Apr 30 06:53:29 vm spamd[13186]: Error writing to /home/DOMAIN/homes/USERNAME/.spamassassin/bayes.lock.vm.rewaresoft.hu.13186: File exists at /usr/share/perl5/Mail/SpamAssassin/Locker/UnixNFSSafe.p
m line 88.
Apr 30 06:53:30 vm spamd[13186]: Error writing to /home/DOMAIN/homes/USERNAME/.spamassassin/bayes.lock.vm.rewaresoft.hu.13186: File exists at /usr/share/perl5/Mail/SpamAssassin/Locker/UnixNFSSafe.p
m line 88.
Apr 30 06:53:30 vm spamd[13186]: Error writing to /home/DOMAIN/homes/USERNAME/.spamassassin/bayes.lock.vm.rewaresoft.hu.13186: File exists at /usr/share/perl5/Mail/SpamAssassin/Locker/UnixNFSSafe.p
m line 88.
Apr 30 06:53:32 vm spamd[13186]: Error writing to /home/DOMAIN/homes/USERNAME/.spamassassin/bayes.lock.vm.rewaresoft.hu.13186: File exists at /usr/share/perl5/Mail/SpamAssassin/Locker/UnixNFSSafe.p
m line 88.
Apr 30 06:53:33 vm spamd[13186]: Error writing to /home/DOMAIN/homes/USERNAME/.spamassassin/bayes.lock.vm.rewaresoft.hu.13186: File exists at /usr/share/perl5/Mail/SpamAssassin/Locker/UnixNFSSafe.p
m line 88.
Apr 30 06:53:34 vm spamd[13186]: Error writing to /home/DOMAIN/homes/USERNAME/.spamassassin/bayes.lock.vm.rewaresoft.hu.13186: File exists at /usr/share/perl5/Mail/SpamAssassin/Locker/UnixNFSSafe.p
m line 88.
Apr 30 06:53:35 vm spamd[13186]: Error writing to /home/DOMAIN/homes/USERNAME/.spamassassin/bayes.lock.vm.rewaresoft.hu.13186: File exists at /usr/share/perl5/Mail/SpamAssassin/Locker/UnixNFSSafe.p
m line 88.
Apr 30 06:53:35 vm spamd[13186]: plugin: eval failed: bayes: (in learn) error closing /home/DOMAIN/homes/USERNAME/.spamassassin/bayes.lock.vm.rewaresoft.hu.13186: Disk quota exceeded at /usr/share/
perl5/Mail/SpamAssassin/Locker/UnixNFSSafe.pm line 118.
Apr 30 06:53:35 vm spamd[13186]: spamd: clean message (0.0/5.0) for USERNAME.DOMAIN:547 in 14.4 seconds, 50111 bytes.
Apr 30 06:53:35 vm spamd[13186]: spamd: result: . 0 - scantime=14.4,size=50111,user=USERNAME.DOMAIN,uid=547,required_score=5.0,rhost=localhost,raddr=127.0.0.1,rport=33580,mid=<201304300453.r3U4rKD8
027789@nlcrypto>,autolearn=unavailable
Apr 30 06:53:35 vm postfix/local[452]: 3DB951A3027: to=<USERNAME.DOMAIN@vm.rewaresoft.hu>, orig_to=<USERNAME@DOMAIN>, relay=local, delay=15, delays=0.06/0.01/0/15, dsn=2.0.0, status=sent
(delivered to command: /usr/bin/procmail-wrapper -o -a $DOMAIN -d $LOGNAME)
Apr 30 06:53:35 vm postfix/qmgr[2158]: 3DB951A3027: removed
So it "seems" that despite the quota problem in spamassassin the mail is delivered to the procmail-wrapper but where did it go? It's not in the users maildir. (checked in every subfolder, including spam, trash..etc.) it's not in /var/spool/postfix (i've checked all the queues like defer,deferred,active..etc.) it's not rejected so the sender mail server thinks it's delivered (so it won't try again) ....is it really just vanished?
The domain in question is a law office and there were time sensitive payment receipt mails from online government registration pages and if it's gone...well...I think a couple of heads will be falling...including mine :(
If you need any further information, log..etc., please let me know.
I'm getting quite desperate here, I appreciate any help!
Thanks, Andrew
Comments
Submitted by andreychek on Tue, 04/30/2013 - 07:50 Comment #1
Howdy -- when you see this line here:
to=<USERNAME.DOMAIN@vm.rewaresoft.hu>, orig_to=<USERNAME@DOMAIN>, relay=local, delay=15, delays=0.06/0.01/0/15, dsn=2.0.0, status=sent
(delivered to command: /usr/bin/procmail-wrapper -o -a $DOMAIN -d $LOGNAME)
That means it was delivered to procmail.
The next step is to see what procmail did with it.
You can determine that by looking in /var/log/procmail.log. The procmail logs will show what exactly procmail did with an email -- whether it delivered it to a particular directory, or whether it was bounced.
Submitted by tothand on Tue, 04/30/2013 - 08:54 Comment #2
Hi Andreychek,
Thank you for the quick reply.
I can see the mail in the procmail log like this:
Time:1367297196 From:illetekigazolas@allamkincstar.gov.hu To:USERNAME@DOMAIN User:USERNAME.DOMAIN Size:50418 Dest:/etc/webmin/virtual-server/clam-wrapper.pl /usr/bin/clamdscan Mode:None procmail: Quota exceeded while writing "/home/DOMAIN/homes/USERNAME/Maildir/tmp/1367297615.453_0.vm.rewaresoft.hu" procmail: Quota exceeded while writing "/home/DOMAIN/homes/USERNAME/Maildir/tmp/1367297615.453_1.vm.rewaresoft.hu" From illetekigazolas@allamkincstar.gov.hu Tue Apr 30 06:53:20 2013 Subject: =?ISO-8859-2?Q?Illet=E9kbefizet=E9si_igazol=E1s?= Folder: Bounced 50224
What is the Bounced folder in procmail terms? Is that means that procmail bounced it and then postfix delivered the standard bounce message to the sender?
Thanks, Andrew
Submitted by andreychek on Tue, 04/30/2013 - 09:27 Comment #3
Yeah I believe the recipient would have received a Bounce message of some sort.
In the procmail logs, shortly after this:
Apr 30 06:53:35 vm postfix/local[452]: 3DB951A3027: to=<USERNAME.DOMAIN@vm.rewaresoft.hu>, orig_to=<USERNAME@DOMAIN>, relay=local, delay=15, delays=0.06/0.01/0/15, dsn=2.0.0, status=sent
(delivered to command: /usr/bin/procmail-wrapper -o -a $DOMAIN -d $LOGNAME)
Apr 30 06:53:35 vm postfix/qmgr[2158]: 3DB951A3027: removed
Do you see an outgoing message to the person who sent the above message?
Submitted by tothand on Thu, 05/02/2013 - 00:57 Comment #4
No sign of any outgoing mails to those addresses where the messages in question came from. I've checked the procmail.log and the maillog also and I've checked multiple mails where I saw the "Disk quota exceeded" warning.
I might be on a completely wrong track (I'm not a seasoned procmail user), but there is this in the procmail config:
:0
* ?/usr/bin/test "$EXITCODE" = "73"
/dev/null
EXITCODE=0
and 73 means: Can't create (user) output file, according to sysexits.h
[root@vm log]# grep 73 /usr/include/sysexits.h
#define EX_CANTCREAT 73 /* can't create (user) output file */
From my point of view this means that procmail(-wrapper) will return with zero exit code (to postfix) in case it can't create the user output file so postfix thinks everything is fine although it isn't.
I've found a mini-faq about procmail () and it says: (The mechanism when you use this from Procmail is that Procmail quits with the specified error code, and the calling program is expected to catch that and act accordingly, i.e. generate a bounce message in most situations. If you are calling Procmail from a program which doesn't do this, there will obviously be no bounce. So if you test Procmail from your shell prompt and you see Bounced or Requeued in the log, it only really means Procmail quit with a non-zero exit code.)
Could it be the reason why the bounce message hasn't been sent?
Submitted by andreychek on Thu, 05/02/2013 - 07:59 Comment #5
Do you see any of those messages in the mail queue? It's not the default, but it's possible to configure Virtualmin to queue them rather than bounce them, when the quota is full.
You can determine what messages are in the mail queue by running "mailq".
Submitted by tothand on Mon, 05/06/2013 - 03:27 Comment #6
Thanks, but I couldn't find the messages in the mail queue.
Where can I find the settings for this behavior? (what should happen with the mails when quota is full)
Submitted by tothand on Sun, 12/22/2013 - 05:46 Comment #7
Hi,
It's been more than 7 months since I wrote and I was very careful with quotas but it happened again when /home ran out of disk space for a few hours during the night (yeah I know, silly me for letting it but that's another story)
The flow is the same: - mail comes in - postfix hands it to spamassassin - SA can't write the bayes file but returns with "not spam" - postfix hands the mail to procmail-wrapper - procmail can't deliver (no space left) instead puts it in Bounced, but the bounce mail never sent to the sender and the mail is lost forever with no sign of trouble (except the user who knows that someone sent him a letter but it never arrived)
Is there any way configure procmail to send the bounce mail in case of "no space left"?
Thanks, Andrew
Submitted by JamieCameron on Sun, 12/22/2013 - 17:11 Comment #8
So was the system out of disk space, or did the user exceed his quota? Virtualmin should handle the latter (by detecting the over quota situation at delivery time), but it doesn't yet detect if the whole filesystem is out of space.
Submitted by tothand on Sun, 12/22/2013 - 17:25 Comment #9
Sorry I wasn't specific enough.
/home ran out of space but / had plenty so I thought that the mails were sitting in the mail queue until the mailbox delivery is possible or it will send back a bounce message.
I understand that if / ran out of space then postfix can't write the spool files but in case the user ran out of space (either because of quota or /home is full) I expect postfix/procmail to handle this glitch.
As you can see at the beginning of the thread I had an issue with the quota but now I had the /home problem, the result is the same: mails lost without a trace.
Submitted by JamieCameron on Sun, 12/22/2013 - 23:54 Comment #10
Ok, I see ... it's possible that lack of space on /home could cause a problem.
Can you post what got logged to
/var/log/maillog
and/var/log/procmail.log
when email was received and bounced like this?Submitted by tothand on Mon, 12/23/2013 - 03:26 Comment #11
Maillog:
Dec 17 09:03:40 server postfix/qmgr[2036]: 872A01A2AD5: from=<SENDER@gmail.com>, size=1720, nrcpt=1 (queue active)
Dec 17 09:03:40 server postfix/smtpd[18062]: disconnect from mail-ie0-f173.google.com[209.85.223.173]
Dec 17 09:03:41 server spamd[17597]: spamd: connection from localhost [127.0.0.1] at port 51486
Dec 17 09:03:41 server spamd[17597]: spamd: setuid to USER@DOMAIN.hu succeeded
Dec 17 09:03:41 server spamd[17597]: spamd: processing message <CAO9apamOAvyoGpL-NMinj0D-EWmFm351oKWU5Xb_bEdUeeGFJQ@mail.gmail.com> for USER@DOMAIN.hu:687
Dec 17 09:03:45 server spamd[17597]: Error writing to /home/hungarogomb/homes/USER/.spamassassin/bayes.lock.server.DOMAIN.hu.17597: No space left on device at /usr/share/perl5/Mail/SpamAssassin/Locker/UnixNFSSafe.pm line 88.
Dec 17 09:03:46 server spamd[17597]: Error writing to /home/hungarogomb/homes/USER/.spamassassin/bayes.lock.server.DOMAIN.hu.17597: File exists at /usr/share/perl5/Mail/SpamAssassin/Locker/UnixNFSSafe.pm line 88.
Dec 17 09:03:46 server spamd[17597]: plugin: eval failed: bayes: (in learn) error closing /home/hungarogomb/homes/USER/.spamassassin/bayes.lock.server.DOMAIN.hu.17597: No space left on device at /usr/share/perl5/Mail/SpamAssassin/Locker/UnixNFSSafe.pm line 118.
Dec 17 09:03:46 server spamd[17597]: spamd: clean message (-2.7/5.0) for USER@DOMAIN.hu:687 in 5.8 seconds, 1852 bytes.
Dec 17 09:03:46 server spamd[17597]: spamd: result: . -2 - BAYES_00,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_LOW scantime=5.8,size=1852,user=USER@DOMAIN.hu,uid=687,required_score=5.0,rhost=localhost,raddr=127.0.0.1,rport=51486,mid=<CAO9apamOAvyoGpL-NMinj0D-EWmFm351oKWU5Xb_bEdUeeGFJQ@mail.gmail.com>,bayes=0.000000,autolearn=unavailable
Dec 17 09:03:46 server postfix/local[18138]: 872A01A2AD5: to=<USER-DOMAIN.hu@server.DOMAIN.hu>, orig_to=<USER@DOMAIN.hu>, relay=local, delay=6.6, delays=0.4/0/0/6.2, dsn=2.0.0, status=sent (delivered to command: /usr/bin/procmail-wrapper -o -a $DOMAIN -d $LOGNAME)
Dec 17 09:03:46 server postfix/qmgr[2036]: 872A01A2AD5: removed
The procmail log has been rotated since, so I don't have them, but I remember that it had a line: Folder: * * Bounced * *
And that's the last sign of that message. No outgoing mail which suggest that it's a bounce message (and the sender never got an error message).
Sorry that I can't be more helpful with the procmail log, but look back in the thread it's almost the same except it's not the quota that exceeded but the /home ran out of space.
Submitted by JamieCameron on Tue, 12/24/2013 - 01:49 Comment #12
Interesting, it looks like spamd actually worked fine despite the lack of disk space. I guess the real failure was at the procmail level, but it's hard to tell without the logs..