The following is an example of the errors several of our users have been encountering. So far they are all from the same site (same /home/agacademy/ folder location). Some users are not getting any errors, but it seems like more and more are coming up with problems.
Usermin shows messages in the inbox, but does not list them on the page. SquirrelMail gives an "ERROR: Connection dropped by IMAP server." error
The /var/logs/mail.log file shows the following errors at the time the inbox is opened in either SquirrelMail or Usermin:
Oct 4 11:10:05 mx0 dovecot: IMAP(jristy.agacademy): rename(/home/agacademy/homes/jristy/Maildir/new/1285958554.26604_0.mx0.tech-shield.net, /home/agacademy/homes/jristy/Maildir/cur/1285958554.26604_0.mx0.tech-shield.net:2,) failed: Stale NFS file handle Oct 4 11:10:05 mx0 dovecot: IMAP(jristy.agacademy): open(/home/agacademy/homes/jristy/Maildir/new/1285958554.26604_0.mx0.tech-shield.net) failed: Stale NFS file handle Oct 4 11:10:05 mx0 dovecot: IMAP(jristy.agacademy): open(/home/agacademy/homes/jristy/Maildir/new/1285958554.26604_0.mx0.tech-shield.net) failed: Stale NFS file handle
We restarted Dovecot and there is no change to the error message.
What must we do to get these errors resolved?
Comments
Submitted by andreychek on Mon, 10/04/2010 - 12:45 Comment #1
Howdy -- so it sounds like you're using NFS there... is just your /home partition mounted via NFS?
Looking through the Dovecot docs, it looks like there's a handful of things you may want to try tweaking in your Dovecot config in order to get it to play nice with NFS.
A variety of possible setups are described here, along with the settings required to make them work properly:
http://wiki.dovecot.org/NFS
Submitted by dkuhlman on Tue, 10/05/2010 - 13:36 Comment #2
Thats the odd thing... we are not using NFS.
# mount
/dev/sda1 on / type ext3 (rw,grpquota,errors=remount-ro,usrquota)
tmpfs on /lib/init/rw type tmpfs (rw,nosuid,mode=0755)
proc on /proc type proc (rw,noexec,nosuid,nodev)
sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
procbususb on /proc/bus/usb type usbfs (rw)
udev on /dev type tmpfs (rw,mode=0755)
tmpfs on /dev/shm type tmpfs (rw,noexec,nosuid)
devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=620)
/var/tmpFS on /tmp type ext3 (rw,noexec,nosuid)
Submitted by andreychek on Tue, 10/05/2010 - 14:09 Comment #3
Hrm, we've never run into anything like that before :-)
Poking around on Google, the only instance I can find of what you're describing occurred when someone had filesystem corruption due to a problem with the hard drive. That person rebooted into single user mode, ran fsck on the disk, and the issue was resolved.
My suggestion would be to try that and see if that helps.
The only other option I can think of would be if a non-standard option was somehow enabled in the Dovecot config. For example, if mail_nfs_storage were set to "yes" when not using NFS, could could potentially cause strange issues. If running a fsck doesn't help, I might suggest reviewing your dovecot config to make sure nothing unusual is enabled.
Submitted by sgrayban on Tue, 10/05/2010 - 15:42 Comment #4
Jamie something is going wrong here. The issue is that for some reason the owner is not being set correctly all the time. It looks like every so often the owner gets changed to someone else as if dovecot is getting confused.
Submitted by sgrayban on Tue, 10/05/2010 - 15:44 Comment #5
He is a ls output of such issue
-rw------- 1 lvanyo.agacademy agacademy 5956 2010-09-29 11:23 1285777387.27196_0.mx0.tech-shield.net:2,S
-rw------- 1 mmontgomery.agacademy agacademy 18116 2010-10-04 07:57 1285779226.32583_0.mx0.tech-shield.net:2,S
-rw------- 1 lvanyo.agacademy agacademy 6227 2010-09-29 12:19 1285780789.5178_0.mx0.tech-shield.net:2,S
-rw------- 1 lvanyo.agacademy agacademy 4047 2010-09-29 14:07 1285787272.23081_0.mx0.tech-shield.net:2,S
-rw------- 0 kkuehn.agacademy agacademy 4039 2010-10-04 08:10 1285788735.27194_0.mx0.tech-shield.net:2,S
-rw------- 1 ngileslauer.agacademy agacademy 19207 2010-10-03 16:13 1285789949.30587_0.mx0.tech-shield.net:2,S
where the real user should be set to lvanyo.agacademy
Submitted by JamieCameron on Tue, 10/05/2010 - 15:53 Comment #6
Dovecot doesn't set ownership on mail files though - only procmail does.
Try searching for one of those files in
/var/log/procmail.log
, with a command like :grep 1285789949.30587_0.mx0.tech-shield.net:2,S /var/log/procmail.log
Submitted by sgrayban on Tue, 10/05/2010 - 16:00 Comment #7
OK I found one...
This was on the 4th
# grep "1285851404.28189" /var/log/procmail.log*
/var/log/procmail.log.5: Folder: /home/agacademy/homes/rwessels/Maildir/new/1285851404.28189_ 96980
/var/log/procmail.log.5:Time:1285851404 From:bmeyer@agacademy.com To:apowers@agacademy.com User:rwessels.agacademy Size:97032 Dest:/home/agacademy/homes/rwessels/Maildir/new/1285851404.28189_0.mx0.tech-shield.net Mode:None
That was set the wrong user....
-rw------- 0 lnelson.agacademy agacademy 9089 2010-10-04 07:51 1285851404.28189_0.mx0.tech-shield.net:2,S
even though procmail set the user as rwessels.agacademy
Submitted by JamieCameron on Tue, 10/05/2010 - 16:07 Comment #8
Do the users
rwessels.agacademy
andlnelson.agacademy
have the same UID?Submitted by sgrayban on Tue, 10/05/2010 - 16:08 Comment #9
Nope
# id rwessels.agacademy
uid=1202(rwessels.agacademy) gid=1075(agacademy) groups=1075(agacademy)
# id lnelson.agacademy
uid=1189(lnelson.agacademy) gid=1075(agacademy) groups=1075(agacademy)
Submitted by JamieCameron on Tue, 10/05/2010 - 16:22 Comment #10
If new email comes in, does it get the wrong ownership right away?
Submitted by dkuhlman on Tue, 10/05/2010 - 16:23 Comment #11
Can we mark this private?
Submitted by JamieCameron on Tue, 10/05/2010 - 16:26 Comment #12
Marked as private.
Submitted by dkuhlman on Tue, 10/05/2010 - 16:29 Comment #13
Can we have it be private and still give Scott G access :)
Submitted by JamieCameron on Tue, 10/05/2010 - 16:37 Comment #14
Sorry, no ..
Submitted by sgrayban on Tue, 10/05/2010 - 16:38 Comment #15
I think I can assign it to me and still set it private.
I am not sure if it does is set the wrong permission right away. But I think it does by looking at the original error.
Oct 4 11:10:05 mx0 dovecot: IMAP(jristy.agacademy): rename(/home/agacademy/homes/jristy/Maildir/new/1285958554.26604_0.mx0.tech-shield.net, /home/agacademy/homes/jristy/Maildir/cur/1285958554.26604_0.mx0.tech-shield.net:2,) failed: Stale NFS file handle Oct 4 11:10:05 mx0 dovecot: IMAP(jristy.agacademy): open(/home/agacademy/homes/jristy/Maildir/new/1285958554.26604_0.mx0.tech-shield.net) failed: Stale NFS file handle Oct 4 11:10:05 mx0 dovecot: IMAP(jristy.agacademy): open(/home/agacademy/homes/jristy/Maildir/new/1285958554.26604_0.mx0.tech-shield.net) failed: Stale NFS file handle
Submitted by JamieCameron on Tue, 10/05/2010 - 16:39 Comment #16
Any chance I could login to this system and see what is happening?
Submitted by sgrayban on Tue, 10/05/2010 - 16:40 Comment #17
What I think I will do is start a tail on procmail and mail.log and start grepping for this and see if the error pops up.
Submitted by sgrayban on Tue, 10/05/2010 - 16:40 Comment #18
sure.. ill set the support login for you
Submitted by sgrayban on Tue, 10/05/2010 - 16:42 Comment #19
OK I set the support login up -- look for email related to mx0.tech-shield.net
Submitted by JamieCameron on Tue, 10/05/2010 - 19:07 Comment #20
This is quite odd - it looks like email is being moved around outside of Virtualmin or Dovecot. For example, these lines from the postfix log record the delivery of a message to ejohnson@agacademy.com :
Oct 3 02:51:21 mx0 postgrey[16608]: action=pass, reason=triplet found, delay=741, client_name=mta725.e.bordersstores.com, client_address=38.107.100.157, sender=bo-bxt0f74axsb41rau9rsexcbhh689a4@b.e.borders.com, recipient=ejohnson@agacademy.com
Oct 3 02:51:21 mx0 postfix/smtpd[24836]: 14B864C4216: client=mta725.e.bordersstores.com[38.107.100.157]
Oct 3 02:51:21 mx0 postfix/cleanup[24833]: 14B864C4216: message-id=<bxt0f74axsb41rau9rsexcbhh689a4.3935726848.9191@mta725.e.borders.com>
Oct 3 02:51:21 mx0 postfix/qmgr[5339]: 14B864C4216: from=<bo-bxt0f74axsb41rau9rsexcbhh689a4@b.e.borders.com>, size=12063, nrcpt=1 (queue active)
Oct 3 02:51:21 mx0 postfix/smtpd[24836]: disconnect from mta725.e.bordersstores.com[38.107.100.157]
According to procmail.log it was delivered to the file :
/home/agacademy/homes/ejohnson/Maildir/new/1286092284.24928_0.mx0.tech-shield.net
But the actual location of the file now is :
/home/agacademy/homes/kkromann/Maildir/cur/1285967444.19186_0.mx0.tech-shield.net:2,S
with the wrong permission ownership .. it is still owned by
ejohnson.agacademy
.Has anyone been moving mail files around manually?
Submitted by sgrayban on Tue, 10/05/2010 - 19:16 Comment #21
I haven't touched any thing
Submitted by JamieCameron on Tue, 10/05/2010 - 19:31 Comment #22
Has anyone edited that user or domain in any way? I see some entries in
/var/webmin/miniserv.log
by the userafsa
Submitted by sgrayban on Tue, 10/05/2010 - 19:36 Comment #23
Not our fault if they broke it
Submitted by JamieCameron on Tue, 10/05/2010 - 19:38 Comment #24
My guess is either that some action in virtualmin triggered this (like editing a user or domain), or some other tool was used to move messages between mail directories.
Submitted by sgrayban on Tue, 10/05/2010 - 20:54 Comment #25
The logs only show that afsa modified 2 mailboxes and non of those are the ones having this issue.
I am stumped -- could be from all the mail users that account has ? Looks like a 100 accounts or so.
Submitted by sgrayban on Tue, 10/05/2010 - 21:00 Comment #26
I have even looked at the user ejohnson mail filters to see if anything odd was put in there. Nothing that remotely suggests why it would deliver to kkromann.
Submitted by JamieCameron on Tue, 10/05/2010 - 23:12 Comment #27
I don't think this is a filter issue - the mail file was clearly moved to a new location.
Submitted by sgrayban on Tue, 10/05/2010 - 23:16 Comment #28
Well I don't know how that would even be possible unless they had root access and that isn't possible either since you have to be in the wheel group to use su or sudo and I don't think Dan would do this.
Submitted by JamieCameron on Tue, 10/05/2010 - 23:20 Comment #29
The owner of that domain could also do this.
Submitted by sgrayban on Wed, 10/06/2010 - 00:24 Comment #30
That might be the way then cause I know that neither I or Dan would do this.
Soi know this couldnt be a VM issue ?
Submitted by JamieCameron on Wed, 10/06/2010 - 00:28 Comment #31
I can't see how it could be a virtualmin issue ... at least I've never seen this happen before.
There isn't even really code in virtualmin that could cause mail to be moved like this.
Submitted by sgrayban on Wed, 10/06/2010 - 00:42 Comment #32
Weird... well set the ticket closed then
Submitted by JamieCameron on Wed, 10/06/2010 - 00:54 Comment #33
Ok .. the fix for now is to chown those files back to the correct owners.
Please let us know if you see this again.
Submitted by sgrayban on Thu, 10/07/2010 - 18:17 Comment #34
It happened again -- this time it was a dovecot indexing file.
lvanyo had wrong permissions again.... and I had to rebuild the dovecot index
-rw------- 1 lvanyo.agacademy agacademy 26524 2010-10-01 13:06 dovecot.index.log
-rw------- 1 ngileslauer.agacademy agacademy 39113 2010-10-04 07:48 dovecot-uidlist
Submitted by JamieCameron on Thu, 10/07/2010 - 18:29 Comment #35
That's crazy!
Do you know roughly what time this happened at? If so, are there any entries in
/var/webmin/miniserv.log
from around that time?Submitted by sgrayban on Thu, 10/07/2010 - 18:37 Comment #36
Nothing... absolutely no reason in syslog or any log why or how this happened.
Submitted by sgrayban on Thu, 10/07/2010 - 18:55 Comment #37
I greped everything from dovecot-uidlist to date to time and I couldn't find one single thing that caused this.
Submitted by JamieCameron on Thu, 10/07/2010 - 19:23 Comment #38
So this time I am 100% sure this is not a Virtualmin bug, as there is no place in the code that ever modifies
dovecot-uidlist
. Either you are seeing a Dovecot bug, or some other application on the system is moving files around.Submitted by sgrayban on Thu, 10/07/2010 - 19:29 Comment #39
I'm at a loss here -- this is the only server I am seeing this on. I am going to verify that Dan doesn't have anyone else that is using root access to do this. It's the only way this could happen and it sure as hell isn't me.
We can close this now for good and you can make it private again.