System grinding to a halt

8 posts / 0 new
Last post
#1 Thu, 04/16/2009 - 03:35
croombs

System grinding to a halt

Hi

I have 3 systems all working for 12+ months and no major issues, today 1 of them is just dieing on its feet

in /var/log/procmail.log I can see

WARNING: System limit for file size is lower than engine->maxscansize

Timeout connecting to lookup-domain-daemon.pl Virus scanner failed to response within 30 seconds

and currently (with postfix turned off for over 10 minute) this is what I am seeing, I cannot do it with postfix on because the load goes to 65+ and rising, the box normally only is .1 to .5 load.

Tasks: 199 total, 1 running, 189 sleeping, 0 stopped, 9 zombie Cpu(s): 3.0%us, 2.0%sy, 0.0%ni, 0.0%id, 94.0%wa, 0.0%hi, 1.0%si, 0.0%st Mem: 1025044k total, 994448k used, 30596k free, 3580k buffers Swap: 2031608k total, 1568416k used, 463192k free, 36740k cached

PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 7360 thomas 18 0 22052 6200 4544 D 1.0 0.6 0:00.04 php-cgi 7363 thomas 18 0 22048 6084 4500 D 1.0 0.6 0:00.03 php-cgi 6999 just-ser 18 0 116m 111m 992 D 0.7 11.1 0:05.35 clamscan 6494 steven.j 18 0 83548 18m 644 D 0.3 1.8 0:04.47 clamscan 7022 576 18 0 79968 42m 820 D 0.3 4.2 0:03.67 clamscan 7208 root 18 0 0 0 0 Z 0.3 0.0 0:01.18 collectinfo.pl <defunct> 7347 root 15 0 2324 1080 804 R 0.3 0.1 0:00.15 top 7354 tblogs 18 0 22048 6080 4436 D 0.3 0.6 0:00.04 php-cgi 1 root 15 0 2064 408 384 S 0.0 0.0 0:00.57 init 2 root RT -5 0 0 0 S 0.0 0.0 0:00.00 migration/0 3 root 34 19 0 0 0 S 0.0 0.0 0:00.00 ksoftirqd/0 4 root RT -5 0 0 0 S 0.0 0.0 0:00.00 watchdog/0 5 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 events/0 6 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 khelper 7 root 13 -5 0 0 0 S 0.0 0.0 0:00.00 kthread 10 root 10 -5 0 0 0 S 0.0 0.0 0:00.97 kblockd/0 11 root 20 -5 0 0 0 S 0.0 0.0 0:00.00 kacpid 99 root 20 -5 0 0 0 S 0.0 0.0 0:00.00 cqueue/0 102 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 khubd 104 root 10 -5 0 0 0 S 0.0 0.0 0:00.00 kseriod 169 root 10 -5 0 0 0 D 0.0 0.0 0:03.55 kswapd0 170 root 20 -5 0 0 0 S 0.0 0.0 0:00.00 aio/0 322 root 11 -5 0 0 0 S 0.0 0.0 0:00.00 kpsmoused 345 root 13 -5 0 0 0 S 0.0 0.0 0:00.00 ata/0 346 root 16 -5 0 0 0 S 0.0 0.0 0:00.00 ata_aux 349 root 12 -5 0 0 0 S 0.0 0.0 0:00.00 scsi_eh_0 350 root 12 -5 0 0 0 S 0.0 0.0 0:00.00 scsi_eh_1 363 root 16 -5 0 0 0 S 0.0 0.0 0:00.00 ksnapd 366 root 10 -5 0 0 0 D 0.0 0.0 0:01.10 kjournald 393 root 11 -5 0 0 0 S 0.0 0.0 0:00.00 kauditd 427 root 21 -4 2240 324 320 S 0.0 0.0 0:00.38 udevd 1208 root 20 -5 0 0 0 S 0.0 0.0 0:00.00 kmpathd/0 1230 root 12 -5 0 0 0 S 0.0 0.0 0:00.00 kjournald 1669 root 16 0 1716 420 376 D 0.0 0.0 0:00.20 syslogd 1672 root 15 0 1668 288 284 S 0.0 0.0 0:00.00 klogd 1698 named 25 0 41288 3232 1132 S 0.0 0.3 0:01.11 named 1734 dbus 16 0 2748 296 292 S 0.0 0.0 0:00.00 dbus-daemon 1756 root 25 0 9380 652 532 S 0.0 0.1 0:00.02 automount 1780 root 18 0 1668 344 340 S 0.0 0.0 0:00.00 acpid +++++ many more lines

I can so far see not issues in any other logs files (still looking)

System is the Centos 5 Virtualmin fully updated approx 30 domains

Any clues ?

Thanks for any help/clues you can give.

Denis

Thu, 04/16/2009 - 04:56
andreychek

Do you have a lot of email in your mail queue?

If you type &quot;mailq&quot;, what does the last line of output say?
-Eric

Thu, 04/16/2009 - 05:16 (Reply to #2)
ronald
ronald's picture

<div class='quote'>Tasks: 199 total, 1 running, 189 sleeping, 0 stopped, 9 zombie</div>

the first thing i would solve is killing the zombie processes. they can really mess up the chi on a server.

Thu, 04/16/2009 - 12:09
Joe
Joe's picture

You probably want to switch to the daemon version of clamav and clamdscan (in the Email Messages-&gt;Spam and Virus Scanning page). But, that may or may not resolve this issue. It <i>seems</i> like the problem is that clam is responding too slowly, but that may be an illusion.

--

Check out the forum guidelines!

Thu, 04/16/2009 - 12:11 (Reply to #4)
Joe
Joe's picture

Also, there was an update to clamav just a couple of days ago, which may be slower than the older version. clam tends to get slower and bigger with every release (which makes sense, I guess, since it's scanning for more viruses in every release), and maybe this one pushed it off the deep end for your system and workload. clamd would resolve that.

--

Check out the forum guidelines!

Thu, 04/16/2009 - 12:16 (Reply to #5)
andreychek

Sorry, I forgot to post a link saying as much, but this one was fixed up in the Bug Tracker :-)

http://www.virtualmin.com/index.php?option=com_flyspray&amp;Itemid=82&am...

It was indeed clamscan vs clamd, moving to clamd fixed it up.
-Eric

Wed, 03/10/2010 - 08:46
dgillard

Hi Eric

Are you able to track this bugtracker item down as the link doesn't seem to work anymore. We seem to have the same problem on one of our machines where it appears spamassassin just isn't altering the headers of the emails. In the procmail.log we are seeing the engine->maxscansize error

Dave

Wed, 03/10/2010 - 09:05 (Reply to #7)
andreychek

Yeah, it's tough finding those old bugtracker issues :-)

However, based on the notes I had written in this forum thread, it looks like the issue had been solved by moving to the ClamAV daemon, rather than using the command line scanner.

You can make that change in Email Messages -> Spam and Virus Scanning, and set "Virus scanning program" to "Server Scanner".

-Eric

Topic locked