Virtualmin cron jobs killing server weekly

14 posts / 0 new

Topic locked

#1 Mon, 01/17/2011 - 11:58

maxslug

Virtualmin cron jobs killing server weekly

Hi All,

I'm having a problem where virtualmin is launching a zillion cron jobs which are killing my system (See attached). Server hostnames and directories have been changed to "SITE1", etc. The log is the output of the script I posted on this thread https://www.virtualmin.com/node/16835 .

I'm running Centos 5.5 and all packages are up to date. I'm not sure why it's happening now because I've been happily running Virtualmin for a couple years on this box and not much has changed lately. I'm hosting about 30 low-traffic domains.

These are the most common processes :

/usr/libexec/webmin/virtual-server/collectinfo.pl /usr/libexec/webmin/virtual-server/spamconfig.pl

Ideas, Jamie, Help! :-)

Thanks, -m

#2 Mon, 01/17/2011 - 11:59

maxslug

added log attachment

#3 Mon, 01/17/2011 - 12:11

andreychek

It looks like at least part of the trouble is that the cron jobs are never finishing or exiting.

Do you receive any output (especially errors) if you manually run this command:

/usr/libexec/webmin/virtual-server/collectinfo.pl

Also, does the command actually exit and return you to the command line?

Lastly -- are you on a dedicated server or a VPS?

-Eric

#4 Mon, 01/17/2011 - 12:31

maxslug

Hi Eric, thanks for the speedy reply.

When I run collectinfo.pl from /root it bombs out immediately :

[root@SERVER ~]# /usr/libexec/webmin/virtual-server/collectinfo.pl Can't locate ./virtual-server-lib.pl in @INC (@INC contains: /usr/lib64/perl5/site_perl/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/site_perl/5.8.8 /usr/lib/perl5/site_perl /usr/lib64/perl5/vendor_perl/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/vendor_perl/5.8.8 /usr/lib/perl5/vendor_perl /usr/lib64/perl5/5.8.8/x86_64-linux-thread-multi /usr/lib/perl5/5.8.8 .) at /usr/libexec/webmin/virtual-server/collectinfo.pl line 8. [root@SERVER ~]#

I am running on as a VPS using Xen. Do you suspect a shared load problem?

-m

#5 Tue, 01/18/2011 - 22:42

andreychek

Hmm, maybe, but I'm not certain. Whatever it is, it's a strange issue, I don't think I've ever seen that before :-)

How much RAM does your Xen VPS have available to it... you can determine that by running "df -h".

And the RAM on your VPS -- is that dedicated, or is it shared in some way amongst other VPS's?

I've occasionally seen odd things when too many VPS's are contending for RAM, though I'm not sure if that occurs with Xen or not.

-Eric

#6 Wed, 01/19/2011 - 11:21

ronald

Competing for RAM would -to my knowledge at least- not occur with Xen.
Xen gives you dedicated RAM whereas openvz gives you shared ram.
On Xen you could have burstable RAM coming from swap.
swap could be shared though, but is unlikely.

#7 Wed, 01/19/2011 - 11:29

maxslug

ronald is right, xen dedicates the memory. It's only CPU that you contend for.

I'm thinking it might be webalizer job(s) that are using up all the memory, which then turns on the process killer. I disabled it and we'll see what happens.

#8 Tue, 02/01/2011 - 15:46

maxslug

It's either bw.pl or collectinfo.pl that is using up all the available memory and causing the machine to thrash until death.

Any ideas why either of those is failing for me? Is there some sort of "cleanup" i could do on the box?

-m

#9 Tue, 02/01/2011 - 16:58

maxslug

OK so here's my current hack to see if it helps.

I've created a script that invokes 'ulimit -v' in order to limit how much mem each job can take :

/root/bin/run_ulimit

#!/bin/bash

limit=$1
shift
ulimit -v $limit #kB 
"$@"

And now i add "/root/bin/ulimit 512000" before each command in root's crontab. Hopefully I will now just get an "out of memory" email instead of a crash.

-m

#10 Tue, 02/01/2011 - 17:13

andreychek

One thing you might try is to have bw.pl and collectinfo.pl run less frequently.

By default, bw.pl runs every hour. You can change that in System Settings -> Bandwidth Monitoring.

Also, collectinfo.pl runs much more frequently -- every 5 minutes I believe. You can change that in System Settings -> Virtualmin Config -> Status Collection, and set "Interval between status collection job runs".

If you make those run less frequently (neither is a necessary component for your server, they just keep stats up to date) -- I'd be curious if you run into less problems.

-Eric

#11 Fri, 07/22/2011 - 10:15

cpruitt

maxslug, Did you ever come to a resolution on this issue? We've been having crashes on our Ubuntu server 10.04.2 once a week and (being new to virtualmin and Linux in general) I'm a bit perplexed on how to even troubleshoot the issue. My first thought was that some cron job was running (which prompted Google to lead me here) but I'd love to know if you had a definitive solution you might be willing to share.

/var/log/messages shows nothing of any significance. The server just hangs and the next log messages are from the reboot. I'm currently looking into which cron jobs virtualmin is running weekly (and possibly how to alter or stop them from running), but with this issue seeming so similar I wanted to see how it ended up for you.

#12 Fri, 07/22/2011 - 10:57

maxslug

Hi cpruitt, I never did root-cause this. My best guess is it had to do with an io overload on the VPS (Xen) I was using that was causing cron jobs not to finish. "iostat -a" tells a different story than top. A series of hardware and software upgrades made the problem disappear, but no active resolution on my part.

-m

#13 Fri, 07/22/2011 - 13:12 (Reply to #12)

cpruitt

Thanks for the followup. When you say "A series of hardware and software upgrades" are you referring to upgrades you made or upgrades made by a VPS hosting provider? In our case this is a dedicated box (albeit higher-end-desktop grade, not server grade hardware) in our own data center so VPS is not a factor. Just wondering if you knew of any specific upgrades that were made or hardware components that were replaced.

Really wish I knew the cause of this... been driving me batty for a few weeks now and any "fix" I try takes me a week to test. :-/

#14 Fri, 07/22/2011 - 13:16

maxslug

nothing specific unfortunately - - i moved to a new dom0 and upgraded to the latest CentOS and it went away.

Topic locked