lock files broken with Debian 5.0

Since our last system upgrade the *.lock files aren't getting unlinked correctly leaving stall lock files.

Something is going wrong cause its not just one server I work on its all of the debian servers

Status: 
Active

Comments

Fortunately those .lock files contain a PID, so even if they are left around after the locking process dies they will be ignored by other processes trying to take the lock.

However, they should get cleaned up automatically. Was this upgrade a full upgrade to Debian 5, or just an update of some packages?

just a update with some packages and the lock files had no pid or any info with them but I'll test this again because one of my clients that uses CM was also having this issue

Was perl among the packages updated?

One thing to check for is a lack of disk space on the root filesystem. This might allow .lock files to be created, but only empty ..

Can you reliably re-produce this? If you create a mailbox, does an /etc/passwd.lock file get left behind .. and if so, what does it contain?

I'm going to test this more today and see if I can pin-point the reason.

Ok got a lock file error today.

Error: Failed to lock file /etc/webmin/server-manager/virtualmin-licences after 5 minutes
Error
-----
Failed to lock file /etc/webmin/server-manager/virtualmin-licences after 5 minutes
-----
Call Stack Trace
In file ../web-lib-funcs.pl at line 4333 calling WebminCore::error
In file server-manager-lib-funcs.pl at line 6274 calling WebminCore::lock_file
In file server-manager-lib-funcs.pl at line 6466 calling server_manager::list_virtualmin_licences
In file /usr/share/webmin/server-manager/licence.pl at line 17 calling server_manager::fetch_virtualmin_licences

# cat /etc/webmin/server-manager/virtualmin-licences.lock
32087

That was the right PID too so I have no idea what's going on. Only one instance was running too.

What process has PID 32087 ?

Like I said it was the correct PID

root     32087  0.0  0.6  26712 25176 ?        Ss   Oct01   0:00 /usr/bin/perl /usr/share/webmin/server-manager/licence.pl

That runs every hour from cron, but if it hangs (perhaps on a network connection timeout) it will hold onto the lock. I would suggest just killing it, then running /etc/webmin/server-manager/licence.pl and seeing how long it takes.

I ran that file and it took like 3 seconds.... maybe we need to use a short timeout ? Like 30-60 seconds ?

I will add a timeout on the HTTP request that script calls.

Good -- maybe that will fix one of the issues with the lock files. I'm going to test the other issue today.