Whole server winded down due at begin to monitor.pl taking 1.5 GIGAbytes real RAM ??!

One of our servers got a massive overload at peak hours an hour ago due to swapping quicking in, and from top result below it looks like due to monitor.pl process eating up loooots of RAM:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND          
31725 root      20   0 1656m 1.5g 3576 D    8 43.3   0:14.00 monitor.pl                               

In urgency, we had to kill that process, but then had to take server down and finally reboot it due to heavy swap not able to resume from this situation.

How can it be that monitor.pl is even able to use that much RAM ???

That server runs fine with 2.5 Gigs usually, and has 3.5 gigs RAM allocated. But the combined usage of RAM, and most probably of disk access contention generated by that process just killed the whole server.

btw "Collect all available package updates" was off if that matters (for collectinfo.pl).

I'm stumped and worried that this may happen again on that high-traffic server.

Status: 
Active

Comments

Howdy -- we may need to get Jamie's input on the specifics there as to why it's using such a decent chunk of RAM.

In the meantime though, you may want to start out by temporarily disabling the status monitoring feature (In System Settings -> Features and Plugins -> Status Monitoring). That should prevent monitor.pl from firing up again until you re-enable it.

A few questions to help get an idea of what's going on --

  • How many Virtual Servers do you have on your system?

  • What is the output of "free" on your server now?

  • What version of Virtualmin are you using -- are you by chance using the latest, 3.76?

  • Are you using Cloudmin on this server? If so, which version of that do you have?

Thanks!

How many Virtual Servers do you have on your system?

  • 17 if my maths are right, of which 1 alias and 1 disabled top one, rest (15) is active top-level domains (of which really only 1 high-traffic, and 2 medium-traffic): Virtualmin output:
  • Licensed domains 50 Domains left 33

What is the output of "free" on your server now?

             total       used       free     shared    buffers     cached
Mem:       3578188    3249788     328400          0      40204    1829420
-/+ buffers/cache:    1380164    2198024
Swap:      4194296          8    4194288

What version of Virtualmin are you using -- are you by chance using the latest, 3.76?

  • yes, 3.76 Pro

Are you using Cloudmin on this server? If so, which version of that do you have?

  • No, this server neither runs cloudmin, nor is controlled remotely by cloudmin.

Using 1.5G of RAM is crazy, as all monitor.pl does is check the status of various servers and websites.

What monitors do you have defined at Webmin -> Others -> System and Server Status? A screenshot would be useful..

Also, if you kill monitor.pl and re-run it manually, does it use up 1.5G of RAM again?

I never saw it at 1.5 Gigs.

Our "System and Server Status" : all green except last one:

   Monitoring   On host   Status  
Website site1.com Local
Website site1.com (SSL) Local
Postfix Server Local
BIND DNS Server Local
Website site2.com (SSL) Local
   Monitoring   On host   Status  
Apache Webserver Local
MySQL Database Server Local
Website site2.com Local
PostgreSQL Database Server Local (status: down, but that's due to not needed and not completely installed, didn't yet look why, but as it's off and unneeded on that server, it's on the todo)

If that matters, here the error at attempting to finish configuring Postgress through aptitude U:

Setting up postgresql-8.3 (8.3.9-0ubuntu8.04) ...
* Starting PostgreSQL 8.3 database server
* The PostgreSQL server failed to start. Please check the log output:
2010-01-21 00:37:16 CET FATAL:  could not load server certificate file "server.crt": Permission denied
   ...fail!
invoke-rc.d: initscript postgresql-8.3, action "start" failed.
dpkg: error processing postgresql-8.3 (--configure):
subprocess post-installation script returned error exit status 1
dpkg: dependency problems prevent configuration of postgresql:
postgresql depends on postgresql-8.3; however:
  Package postgresql-8.3 is not configured yet.
dpkg: error processing postgresql (--configure):
dependency problems - leaving unconfigured

That's off a plain virtualmin pro installation on a new Ubuntu 8.04LTS server 64 bits Xen instance. I remember we had to comment out a line in a postgres configuration file on another server to get it runing. Didn't remember right away which. But you may want to fix that in a future virtualmin installer.

Looks like this: http://www.mail-archive.com/ubuntu-bugs@lists.ubuntu.com/msg1462324.html

Tried this: https://bugs.launchpad.net/ubuntu/+source/postgresql-8.3/+bug/370422

sudo adduser postgres ssl-cert
The user `postgres' is already a member of `ssl-cert'.

Finally remembered and changed this line to false:

ssl = true                              # (change requires restart)

in /etc/postgresql/8.3/main/postgresql.conf (as our firewall doesn't allow external database access that can be ok for now, but the bug is elsewhere)

Again doubt it's related, but just in case...for completeness.

So monitor.pl doesn't spike up to 1.5GB anymore?

I wonder, did perhaps any of the sites being monitored have a huge file as their index page?

As said, I didn't see any spikes anymore nor before nor after that single event. We do continuous monitoring and logging of our servers, and a 1.5 gigs RAM use would be showing on the graphs...

The homepages are pretty normal, and I doubt the pages could have replied more than 8 MB as that's our limit for http replies in mod_security.

Hmm .. it is kind of hard to debug this then, unless it happens repeatedly (not that we'd really want that).