Dovecot "Stale NFS file handle" errors

The following is an example of the errors several of our users have been encountering. So far they are all from the same site (same /home/agacademy/ folder location). Some users are not getting any errors, but it seems like more and more are coming up with problems.

Usermin shows messages in the inbox, but does not list them on the page. SquirrelMail gives an "ERROR: Connection dropped by IMAP server." error

The /var/logs/mail.log file shows the following errors at the time the inbox is opened in either SquirrelMail or Usermin:

Oct 4 11:10:05 mx0 dovecot: IMAP(jristy.agacademy): rename(/home/agacademy/homes/jristy/Maildir/new/1285958554.26604_0.mx0.tech-shield.net, /home/agacademy/homes/jristy/Maildir/cur/1285958554.26604_0.mx0.tech-shield.net:2,) failed: Stale NFS file handle Oct 4 11:10:05 mx0 dovecot: IMAP(jristy.agacademy): open(/home/agacademy/homes/jristy/Maildir/new/1285958554.26604_0.mx0.tech-shield.net) failed: Stale NFS file handle Oct 4 11:10:05 mx0 dovecot: IMAP(jristy.agacademy): open(/home/agacademy/homes/jristy/Maildir/new/1285958554.26604_0.mx0.tech-shield.net) failed: Stale NFS file handle

We restarted Dovecot and there is no change to the error message.

What must we do to get these errors resolved?

Status: 
Active

Comments

Howdy -- so it sounds like you're using NFS there... is just your /home partition mounted via NFS?

Looking through the Dovecot docs, it looks like there's a handful of things you may want to try tweaking in your Dovecot config in order to get it to play nice with NFS.

A variety of possible setups are described here, along with the settings required to make them work properly:

http://wiki.dovecot.org/NFS

Thats the odd thing... we are not using NFS.

# mount
/dev/sda1 on / type ext3 (rw,grpquota,errors=remount-ro,usrquota)
tmpfs on /lib/init/rw type tmpfs (rw,nosuid,mode=0755)
proc on /proc type proc (rw,noexec,nosuid,nodev)
sysfs on /sys type sysfs (rw,noexec,nosuid,nodev)
procbususb on /proc/bus/usb type usbfs (rw)
udev on /dev type tmpfs (rw,mode=0755)
tmpfs on /dev/shm type tmpfs (rw,noexec,nosuid)
devpts on /dev/pts type devpts (rw,noexec,nosuid,gid=5,mode=620)
/var/tmpFS on /tmp type ext3 (rw,noexec,nosuid)

Hrm, we've never run into anything like that before :-)

Poking around on Google, the only instance I can find of what you're describing occurred when someone had filesystem corruption due to a problem with the hard drive. That person rebooted into single user mode, ran fsck on the disk, and the issue was resolved.

My suggestion would be to try that and see if that helps.

The only other option I can think of would be if a non-standard option was somehow enabled in the Dovecot config. For example, if mail_nfs_storage were set to "yes" when not using NFS, could could potentially cause strange issues. If running a fsck doesn't help, I might suggest reviewing your dovecot config to make sure nothing unusual is enabled.

Jamie something is going wrong here. The issue is that for some reason the owner is not being set correctly all the time. It looks like every so often the owner gets changed to someone else as if dovecot is getting confused.

He is a ls output of such issue

-rw------- 1 lvanyo.agacademy      agacademy   5956 2010-09-29 11:23 1285777387.27196_0.mx0.tech-shield.net:2,S
-rw------- 1 mmontgomery.agacademy agacademy  18116 2010-10-04 07:57 1285779226.32583_0.mx0.tech-shield.net:2,S
-rw------- 1 lvanyo.agacademy      agacademy   6227 2010-09-29 12:19 1285780789.5178_0.mx0.tech-shield.net:2,S
-rw------- 1 lvanyo.agacademy      agacademy   4047 2010-09-29 14:07 1285787272.23081_0.mx0.tech-shield.net:2,S
-rw------- 0 kkuehn.agacademy      agacademy   4039 2010-10-04 08:10 1285788735.27194_0.mx0.tech-shield.net:2,S
-rw------- 1 ngileslauer.agacademy agacademy  19207 2010-10-03 16:13 1285789949.30587_0.mx0.tech-shield.net:2,S

where the real user should be set to lvanyo.agacademy

Dovecot doesn't set ownership on mail files though - only procmail does.

Try searching for one of those files in /var/log/procmail.log , with a command like :

grep 1285789949.30587_0.mx0.tech-shield.net:2,S /var/log/procmail.log

OK I found one...

This was on the 4th

# grep "1285851404.28189" /var/log/procmail.log*
/var/log/procmail.log.5:  Folder: /home/agacademy/homes/rwessels/Maildir/new/1285851404.28189_    96980
/var/log/procmail.log.5:Time:1285851404 From:bmeyer@agacademy.com To:apowers@agacademy.com User:rwessels.agacademy Size:97032 Dest:/home/agacademy/homes/rwessels/Maildir/new/1285851404.28189_0.mx0.tech-shield.net Mode:None

That was set the wrong user....

-rw------- 0 lnelson.agacademy  agacademy   9089 2010-10-04 07:51 1285851404.28189_0.mx0.tech-shield.net:2,S

even though procmail set the user as rwessels.agacademy

Do the users rwessels.agacademy and lnelson.agacademy have the same UID?

Nope

# id rwessels.agacademy
uid=1202(rwessels.agacademy) gid=1075(agacademy) groups=1075(agacademy)

# id lnelson.agacademy
uid=1189(lnelson.agacademy) gid=1075(agacademy) groups=1075(agacademy)

If new email comes in, does it get the wrong ownership right away?

Can we have it be private and still give Scott G access :)

I think I can assign it to me and still set it private.

I am not sure if it does is set the wrong permission right away. But I think it does by looking at the original error.

Oct 4 11:10:05 mx0 dovecot: IMAP(jristy.agacademy): rename(/home/agacademy/homes/jristy/Maildir/new/1285958554.26604_0.mx0.tech-shield.net, /home/agacademy/homes/jristy/Maildir/cur/1285958554.26604_0.mx0.tech-shield.net:2,) failed: Stale NFS file handle Oct 4 11:10:05 mx0 dovecot: IMAP(jristy.agacademy): open(/home/agacademy/homes/jristy/Maildir/new/1285958554.26604_0.mx0.tech-shield.net) failed: Stale NFS file handle Oct 4 11:10:05 mx0 dovecot: IMAP(jristy.agacademy): open(/home/agacademy/homes/jristy/Maildir/new/1285958554.26604_0.mx0.tech-shield.net) failed: Stale NFS file handle

Any chance I could login to this system and see what is happening?

What I think I will do is start a tail on procmail and mail.log and start grepping for this and see if the error pops up.

sure.. ill set the support login for you

OK I set the support login up -- look for email related to mx0.tech-shield.net

This is quite odd - it looks like email is being moved around outside of Virtualmin or Dovecot. For example, these lines from the postfix log record the delivery of a message to ejohnson@agacademy.com :

Oct  3 02:51:21 mx0 postgrey[16608]: action=pass, reason=triplet found, delay=741, client_name=mta725.e.bordersstores.com, client_address=38.107.100.157, sender=bo-bxt0f74axsb41rau9rsexcbhh689a4@b.e.borders.com, recipient=ejohnson@agacademy.com
Oct  3 02:51:21 mx0 postfix/smtpd[24836]: 14B864C4216: client=mta725.e.bordersstores.com[38.107.100.157]
Oct  3 02:51:21 mx0 postfix/cleanup[24833]: 14B864C4216: message-id=<bxt0f74axsb41rau9rsexcbhh689a4.3935726848.9191@mta725.e.borders.com>
Oct  3 02:51:21 mx0 postfix/qmgr[5339]: 14B864C4216: from=<bo-bxt0f74axsb41rau9rsexcbhh689a4@b.e.borders.com>, size=12063, nrcpt=1 (queue active)
Oct  3 02:51:21 mx0 postfix/smtpd[24836]: disconnect from mta725.e.bordersstores.com[38.107.100.157]

According to procmail.log it was delivered to the file :

/home/agacademy/homes/ejohnson/Maildir/new/1286092284.24928_0.mx0.tech-shield.net

But the actual location of the file now is :

/home/agacademy/homes/kkromann/Maildir/cur/1285967444.19186_0.mx0.tech-shield.net:2,S

with the wrong permission ownership .. it is still owned by ejohnson.agacademy .

Has anyone been moving mail files around manually?

I haven't touched any thing

Has anyone edited that user or domain in any way? I see some entries in /var/webmin/miniserv.log by the user afsa

Not our fault if they broke it

My guess is either that some action in virtualmin triggered this (like editing a user or domain), or some other tool was used to move messages between mail directories.

The logs only show that afsa modified 2 mailboxes and non of those are the ones having this issue.

I am stumped -- could be from all the mail users that account has ? Looks like a 100 accounts or so.

I have even looked at the user ejohnson mail filters to see if anything odd was put in there. Nothing that remotely suggests why it would deliver to kkromann.

I don't think this is a filter issue - the mail file was clearly moved to a new location.

Well I don't know how that would even be possible unless they had root access and that isn't possible either since you have to be in the wheel group to use su or sudo and I don't think Dan would do this.

The owner of that domain could also do this.

That might be the way then cause I know that neither I or Dan would do this.

Soi know this couldnt be a VM issue ?

I can't see how it could be a virtualmin issue ... at least I've never seen this happen before.

There isn't even really code in virtualmin that could cause mail to be moved like this.

Weird... well set the ticket closed then

Ok .. the fix for now is to chown those files back to the correct owners.

Please let us know if you see this again.

It happened again -- this time it was a dovecot indexing file.

lvanyo had wrong permissions again.... and I had to rebuild the dovecot index

-rw------- 1 lvanyo.agacademy agacademy 26524 2010-10-01 13:06 dovecot.index.log
-rw------- 1 ngileslauer.agacademy agacademy 39113 2010-10-04 07:48 dovecot-uidlist

That's crazy!

Do you know roughly what time this happened at? If so, are there any entries in /var/webmin/miniserv.log from around that time?

Nothing... absolutely no reason in syslog or any log why or how this happened.

I greped everything from dovecot-uidlist to date to time and I couldn't find one single thing that caused this.

So this time I am 100% sure this is not a Virtualmin bug, as there is no place in the code that ever modifies dovecot-uidlist . Either you are seeing a Dovecot bug, or some other application on the system is moving files around.

I'm at a loss here -- this is the only server I am seeing this on. I am going to verify that Dan doesn't have anyone else that is using root access to do this. It's the only way this could happen and it sure as hell isn't me.

We can close this now for good and you can make it private again.