Virtualmin crashes when I try to create a new site

Problem: All sites go down when adding a new Virtual Server. I still have ssh access and #top reveals an apache2 process sitting at 100% CPU.

Quick Fix: I can recover all the sites by killing the apache2 process and then starting it back up again. This is done by,


# top
# ps ax | grep -v grep | grep apache2
# kill PID
where PID is the apache2 process taking up all the CPU.
# /etc/init.d/apache2 start

The network looks like this,

ns1.bluerubyhosting.com --> 173.165.128.42 (1:1 NAT) --> 192.168.7.5 --> eth0 on main virtualmin ns2.bluerubyhosting.com --> 173.165.128.45 (1:1 NAT) --> 192.168.7.6 --> eth1 on main virtualmin

Please help! Thanks

Status: 
Closed (fixed)

Comments

Howdy -- are the authentication details of your slave DNS server correct?

The error I see in your error logs shows this:

Login to RPC server as root rejected

That most commonly occurs when the root password, or the IP address, are incorrect.

BEFORE my second NS2 virtualmin (192.168.7.6) had a different password then my main NS1 virtualmin (192.168.7.5). So now that I have both IPs going to the same virtualmin the log in password to NS2 should be the same as NS1. How/Where do I set that? I think that is the problem.

The same error, Login to RPC server as root rejected

is generated when i go to webmin > cluster webmin servers and try to add 192.168.7.6 as a second server.

Here's more information. So when I try to create a new virtual site all the other sites go down but I can SSH into the Ubuntu server. When I run "top" command I see this,

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
6781 www-data  20   0  399m  11m  632 R  100  0.1   2:38.80 apache2

I can see that apache2 is using 100% of the cpu. I can see what is being process by doing by this command

# ps ax | grep -v grep | grep apache
6781 ?        R      4:44 /usr/sbin/apache2 -k start

So basically apache2 is trying to start and sits at 100% CPU crashing all the other sites.

So if NS1 and NS2 are the same machine, you don't actually need a cluster slave setup - otherwise Virtualmin will try to create the slave DNS zone on the same system as the master.

Am I correct that you actually only have a single physical system?

Ya that's right I now have a single physical system. I went ahead and deleted all traces of the second server (192.168.7.6) from Webmin > Webmin Server Index, Cluster Webmin server, Cluster Usermin Servers. After doing that I tried creating the virtual site again and I have a different output. (see attached) The original error "Login to RPC server as root rejected" went away but I still have the problem with the process "apache2 -k start" maxing out the CPU and crashing all the other sites.

Before I really did have a second server (192.168.7.6) and so I forgot to erase them out of the cluster when I decided to just add a second NIC on my main server and assign it 192.168.7.6.

So if you try to restart Apache with /etc/init.d/apache2 start , does it actually start up, or does it just crash? If the latter, what gets logged to the Apache error log?

I tried killing the apache2 process running at 100% cpu and starting it back up,
# /etc/init.d/apach2 start
* Starting web server apache2 [Sun Dec 08 13:02:28 2013] [warn] NameVirtualHost 173.165.128.42:80 has no VirtualHosts

I have attached the error log.

When you killed Apache, and started it on the command line -- that appears to have worked properly, that's just a warning that you saw. Are your websites working for you at that point?

Regarding the CPU issue -- the logs show some unusual errors that Apache is generating.

Do you have any unusual modules enabled, perhaps modules from third party repositories?

It's possible that an Apache module is mis-behaving.

What output do you receive if you run this command:

ls /etc/apache2/mods-enabled

Also, what output does this show:

dpkg -l apache2

The websites work when I reboot the system. I can recreate the problem by just trying to "create a virtual server".

I just barely rebooted and ran both of those commands. (see attached).

Hmm, I don't see any unusual modules enabled there.

Do you see an influx of bandwidth, that corresponds with the Apache CPU load you're seeing? I'm curious if that's related to traffic, rather than a misbehaving module.

I'm using monitis to monitor the server. See picture of today's graph attached.. The top graph is pings so the red dots are when the server appeared to be down. It's directly related to when the CPU load shot up to 100%

Thanks for the video! Yeah, I understand what's occurring, I just don't know why that might happen.

There's nothing Virtualmin does that should cause that sort of behavior... Virtualmin just adds VirtualHost content for the new domain, and then restarts Apache.

What if you run this command on that Apache process:

strace -p PID > apache_strace.txt 2>&1

And then, substitute the Apache process's process ID in place of the "PID" above.

And then after 5-10 seconds, kill that process if it doesn't end automatically.

Could you attach the resulting file (apache_strace.txt)?

Thanks! I've sent the relevant bits over to Jamie, let's see what he can make of it.

I'm going to post it below for future reference -- the messages below repeat throughout the entire file:

Process 22466 attached - interrupt to quit
gettimeofday({1386543157, 496531}, NULL) = 0
gettimeofday({1386543157, 496717}, NULL) = 0
gettimeofday({1386543157, 496886}, NULL) = 0
poll([{fd=67, events=POLLIN}], 1, 3000) = 1 ([{fd=67, revents=POLLHUP}])
read(67, "", 13160)                     = 0
gettimeofday({1386543157, 497414}, NULL) = 0
gettimeofday({1386543157, 497568}, NULL) = 0
gettimeofday({1386543157, 497725}, NULL) = 0
gettimeofday({1386543157, 497869}, NULL) = 0
poll([{fd=67, events=POLLIN}], 1, 3000) = 1 ([{fd=67, revents=POLLHUP}])
read(67, "", 13160)                     = 0

So I had a look, and it seems that just running apache2ctl graceful is enough to trigger this problem ... which suggests it is actually some kind of Apache bug.

As a work-around, I configured Virtualmin to not use that command - instead, it restarts Apache to apply config changes.

I just confirmed the problem is fixed. I can add/delete sites without apache crashing the other sites. Thanks Jamie!

Great! Now as to why apache2ctl causes Apache to hang, I don't know ..

I still have the issue with apache2 crashing. Now it's just random and I don't know what triggers it. Here's my quick fix for the problem. I just wrote a script that I run every 20minutes checking if apache has crashed. Basically I know it crashed if there's only one apache2 process running. Here's my script,

#!/bin/bash
ps aux | grep -v grep| grep apache2
BROKE=$(ps aux | grep -v grep | grep apache2 | wc -l)
PID=$(ps aux | grep -v grep| grep apache2 | tail -n 1| cut -d" " -f2)
echo $BROKE
echo $PID

if [ $BROKE -eq 1 ]
then
  date >> /home/chris/scripts/apache_crash_log
  echo "PID=$PID" >> /home/chris/scripts/apache_crash_log
  echo "server crashed...sites down!"
  kill $PID
  /etc/init.d/apache2 stop
  sleep 2
  /etc/init.d/apache2 start
  echo "sites should be back soon..."
  echo sleep 5
  ps aux | grep -v grep| grep apache2
  sleep 1
  wall /home/chris/scripts/apache_crash_log
  echo "--------------------------------" >> /home/chris/scripts/apache_crash_log
else
  echo "everything looks good!"
fi

I tested this in a real situation and it fixed the problem. I went to Webmin>Cluster>Cluster Cron Jobs and added this script in a cronjob to run every 20minutes. I don't know what else to do but if it works I'm happy. What do you think?