This just started happening today. I noticed it right after installing the latest virtualmin 'wbt' update. Might be related, or not - just as a data point.
Upon trying to create a new virtual top-level domain, the systems starts the creation script, and then hangs at the DNS portion. The /etc/named.conf file never gets populated with the new data.
This is not just unique to the example'd domain (d23.me) but also applies to other domains. There appears to be something stuck either with configurations or permissions, but it's not something that I can troubleshoot or figure out.
Setting Up Virtual Server
In domain d23.me Creating administration group d23.me .. .. done Creating administration user d23.me .. .. done
Creating aliases for administration user .. .. done
Adding administration user to groups .. .. done
Creating home directory .. .. done
Creating mailbox for administration user .. .. done
Adding new DNS zone ..
Submitted by HarryZink on Wed, 01/26/2011 - 08:24 Comment #1
there are .lock files in the /var/named directory for any of the domains I attempted to create during this process. The .lock files just contain what appears to be a process number. I'm not sure deleting them would have any effect, since even brand new domain creation acts the same.
Submitted by HarryZink on Wed, 01/26/2011 - 08:38 Comment #2
Addendum - when creating a sub-domain, the same freeze happens, albeit at 'restarting slave DNS servers'
Setting Up Virtual Server
In domain d23.fizbin.com Creating home directory .. .. done Adding records to DNS zone fizbin.com .. .. done
Adding to email domains list .. .. done
Adding default mail aliases .. .. done
Adding new virtual website .. .. done
Performing other Apache configuration .. .. done
Setting up scheduled Webalizer reporting .. .. done
Setting up log file rotation .. .. done
Creating MySQL database d23 .. .. done
Setting up spam filtering .. .. done
Setting up virus filtering .. .. done
Creating status monitor for website .. .. done
Setting up AWstats reporting .. .. done
Setting up password protection for AWstats .. .. done
Adding DAV directives to website configuration .. .. done
Adding DAV account for server administrator .. .. done
Overriding proxying for path /dav/ .. .. done
Adding Mailman alias and redirects to website configuration .. .. done
Re-starting DNS server .. .. done
Re-starting slave DNS servers ..
Are you DNS slaves live? (This shouldn't prevent things from working, and it'd be a bug we'd want to fix, but the fact that the problem always happens during DNS makes me think maybe one or more slaves are not responding and we're not handling that condition correctly.)
Anything relevant in /var/log/messages?
I'll try to login and have a look shortly (I'm not on a machine that has my private key at the moment, but I think I have it setup on one of our remote servers, so I can probably go through one of those).
Submitted by JamieCameron on Wed, 01/26/2011 - 13:04 Comment #4
The first issue sounds like a locking problem.
Does /etc/named.conf.lock or /etc/bind/named.conf.lock exist, and if so what process's PID is in it?
Submitted by HarryZink on Wed, 01/26/2011 - 18:51 Comment #5
/var/named/chroot/etc/named.conf.lock exists, and contains the same number as the domainName.lock files contained.
Just /etc/named.conf.lock doesn't exist.
/etc/bind/named.conf.lock does not exist either, nor does /var/named/chroot/etc/named.conf.lock
Submitted by HarryZink on Wed, 01/26/2011 - 18:57 Comment #6
process ID 4087 is this, by the way:
4087 ? S 0:00 /usr/libexec/webmin/bind8/mass_delete.cgi
I killed it.
I deleted the
I restarted named
Submitted by JamieCameron on Wed, 01/26/2011 - 19:00 Comment #7
Did killing that process fix the domain creation problem?
Submitted by HarryZink on Wed, 01/26/2011 - 19:03 Comment #8
It does appear that there's a problem with my DNS slaves, as they don't contain any updates.
Crap! Trying to track down what's going on with those.
Though, as you said, that shouldn't lock up the virtualmin process.
Submitted by HarryZink on Wed, 01/26/2011 - 19:30 Comment #9
update: when trying to delete the partially created test domains, now it gets stuck during the AWstats configuration file and Cron job process.
HELP! My system has just become unusable. Furthermore, I removed the DNS clusters / slaves, which made no difference, apparently.
update: It does seem to progress beyond it, but here's the error - the problem seems to be a general failure of processing locking:
In domain testdomain.com Deleting mail aliases .. .. done Deleting AWstats configuration file and Cron job .. .. AWstats reporting failed! : virtualmin-awstats::feature_delete failed : Failed to lock file /etc/httpd/conf/httpd.conf after 5 minutes at /usr/libexec/webmin/web-lib-funcs.pl line 1340.
Removing DAV directives from website configuration ..
First up: Don't panic. You'll do something stupid if you panic. ;-)
This is beginning to look like an actual system problem, and not mere configuration issues or bugs in Virtualmin.
Check for disk and memory problems first. Look in dmesg for disk I/O errors or OOM killer messages or segmentation faults or similar.
If nothing shows up there, check the SMART status of your disks. If SMART is not enabled, enable it and trigger a diagnostic. There is a Webmin module for SMART.
Also, check top to be sure you aren't simply running out of available memory (though the errors would probably be different if that were the case) or ending up swap thrashing to the point that things are timing out.
If it's not disk or memory, then we can proceed to checking other stuff.
Whenever I see a bunch of random seeming errors from software I know isn't generally random, I pretty much always assume something is wrong with the system itself. That may not be the case here, but we should check it before going further and possibly breaking other stuff.
Submitted by HarryZink on Thu, 01/27/2011 - 01:16 Comment #11
The system's fine - I checked all the hardware.
Things appear to be working again, after I deleted manually all the leftover from the various broken prior efforts, and once I removed the DNS slaves (and then recreated one DNS slave - it appears my ns2 slave simply won't synch or allow connections - a problem for another day).
In summary - is it possible that this was caused by the virtualmin update, by any chance?
Probably not caused by the update; I don't recall any changes in this version that would have touched any of the virtual server creation code (and we have had any similar bugs reported). It seems more likely it was triggered by the slave being offline, which would be a bug, but it could be a bug that's been around for ages.
I'll hand this one over to Jamie to see if he is able to reproduce it. It'd definitely be something we'd want to get fixed, if it's reproducible.
Submitted by Locutus on Thu, 01/27/2011 - 04:26 Comment #13
Maybe I can add a little thing here that might provide further insight.
On my experimental installation (hostname
lyra), which I recently upgraded to 3.83, I had a somewhat similar effect. Though not with virtual domain creation, but also DNS cluster slave related.
The action I performed was "moving to another cluster slave". Before that, I had used the same slave BIND (on host
gemini) for my production and my experimental Virtualmin, and now I installed a little extra Linux VM (host
ara, with just Webmin on it) as an experimental DNS slave.
Then on lyra I removed gemini from the "Webmin Servers Index" list. I added ara to that list, and tested the connection by "remote-controlling" ara's Webmin thru lyra. Worked okay.
Then, when trying to add ara as a DNS cluster slave in Webmin's BIND module, it simply kept timing out at the "adding slave" stage, as if the slave was not responding. Though said slave's Webmin did remote-control just fine, as well as respond to DNS queries.
Unfortunately there was no useful information in Webmin's logs, and after removing and re-adding ara in the server index list a few times, it finally worked.
Sorry that I can't give more detailed or "stable" information on this, since I didn't test it further (I mostly credited the effect to lyra being an experimental VM which has seen quite some stuff in its short life already. ;) )
Yet when seeing this report here now, I figured that there might be maybe some issue with the DNS cluster slave thing since the 3.83 update. I can sure test this some more if desired.
Submitted by JamieCameron on Thu, 01/27/2011 - 13:47 Comment #14
So it looks like the domain creation hung at the step of notifying the remote slave DNS servers, leaving locks in place and leftover config file entries.
Does your slave system have ports 10000 - 10010 open on its firewall?