Suddenly cannot create VPS systems.

I suddenly seem to not be able to create KVM systems. I'm not sure if this is related to the recent Cloudmin 8 upgrade or what -

Every time I attempt to create a KVM system on one of my servers, cloudmin just hangs after clicking the submit button --- no error message, no "waiting for a lock" --- it just seems to sit forever. I've tried several of my servers, and several different server images.

However I CAN restore an image from a backup (well, I think I can, it's taking forever) but I cannot seem to create a new server.

Any suggestions for debugging this?

Status: 
Closed (fixed)

Comments

In case it makes any possible difference --- the CLoudmin server is running in a CentOS 5 VPS -- and the servers I am trying to create a new VPS on are CentOS 6.

A slight change to this -- turns out that it's not really that the VPS systems are not getting created -- it's just that it takes as much as a 1/2 hour before the creation actually starts, then things suddenly wake up and it proceeds forward. I'm bewildered. But this is not as serious as the other issue I filed this AM.

So does it display ANY message before the delay happens?

Also, do you have a really large IP allocation range configured? One possible cause is that Cloudmin is spending a lot of time checking for free addresses in that range to assign to the new VPS.

No message at all -- it just hangs for a long time, then finally starts doing something. it seems to be coming and going.

The IP address range is not what I'd consider huge -- there's a class C of private addresses, and 3 small IP ranges of public addresses - something like 300 addresses altogether.,

One thing you could try to see what Cloudmin is stuck on is to find the PID of the create_system.cgi script when creation is hung, and then run the strace -p command on it.

THanks, I'll give that a whirl. Probably not today though ---

okay so --- there's a create.cgi script that keeps cycling, apparently on the time zone ---

time(NULL) = 1422653120
wait4(-1, 0x7fff549a4d74, WNOHANG, NULL) = 0
time(NULL) = 1422653120
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3543, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3543, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3543, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3543, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3543, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3543, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3543, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3543, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3543, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3543, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3543, ...}) = 0
stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=3543, ...}) = 0

/etc/localtime appears to be correct - it's a sym link to /usr/share/zoneinfo/America/Chicago

No, sorry, that's miniserv that seems to be hung there -- it's spawning a series of create.cgi's

The create.cgi's are sort of hung at a select -- something like select(8, [7], NULL, NULL, {143, 516000}

Then it seems to spawn ANOTHER batch of create.cgi -- and they seem to be doing this:

select(8, [7], NULL, NULL, {170, 852000}) = 1 (in [7], left {168, 0})
time(NULL) = 1422654099
select(8, [7], NULL, NULL, {168, 0}) = 1 (in [7], left {168, 0})
time(NULL) = 1422654099
select(8, [7], NULL, NULL, {168, 0}) = 1 (in [7], left {168, 0})
read(7, "From 192.168.3.189 icmp_seq=190 "..., 4096) = 183
time(NULL) = 1422654099
select(8, [7], NULL, NULL, {168, 0}) = 1 (in [7], left {164, 0})
time(NULL) = 1422654103
select(8, [7], NULL, NULL, {164, 0}) = 1 (in [7], left {164, 0})
time(NULL) = 1422654103
select(8, [7], NULL, NULL, {164, 0}) = 1 (in [7], left {164, 0})
read(7, "From 192.168.3.189 icmp_seq=193 "..., 4096) = 183
time(NULL) = 1422654103
select(8, [7], NULL, NULL, {164, 0}) = 1 (in [7], left {160, 996000})
time(NULL) = 1422654106
select(8, [7], NULL, NULL, {161, 0}) = 1 (in [7], left {161, 0})
time(NULL) = 1422654106
select(8, [7], NULL, NULL, {161, 0}) = 1 (in [7], left {161, 0})
read(7, "From 192.168.3.189 icmp_seq=196 "..., 4096) = 183
time(NULL) = 1422654106
select(8, [7], NULL, NULL, {161, 0}) = 1 (in [7], left {157, 996000})
time(NULL) = 1422654109

I don't right off hand know why it's trying to talk to this IP -- it's the private IP of my development server -- but that should not really be having anything to do with creating this VPS ---

Finally - after that all goes on for quite a while it does this:

read(7, "From 192.168.3.189 icmp_seq=355 "..., 4096) = 183
time(NULL) = 1422654264
select(8, [7], NULL, NULL, {3, 0}) = 0 (Timeout)
kill(6140, SIGTERM) = 0
time(NULL) = 1422654267
lseek(7, 17593, SEEK_SET) = -1 ESPIPE (Illegal seek)
close(7) = 0
rt_sigaction(SIGHUP, {0x1, [], SA_RESTORER, 0x2aef2189fca0}, {0x1, [], SA_RESTORER, 0x2aef2189fca0}, 8) = 0
rt_sigaction(SIGINT, {0x1, [], SA_RESTORER, 0x2aef2189fca0}, {SIG_DFL, [], SA_RESTORER, 0x2aef2189fca0}, 8) = 0
rt_sigaction(SIGQUIT, {0x1, [], SA_RESTORER, 0x2aef2189fca0}, {SIG_DFL, [], SA_RESTORER, 0x2aef2189fca0}, 8) = 0
wait4(6140, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGTERM}], 0, NULL) = 6140
--- SIGCHLD (Child exited) @ 0 (0) ---
rt_sigaction(SIGHUP, {0x1, [], SA_RESTORER, 0x2aef2189fca0}, NULL, 8) = 0
rt_sigaction(SIGINT, {SIG_DFL, [], SA_RESTORER, 0x2aef2189fca0}, NULL, 8) = 0
rt_sigaction(SIGQUIT, {SIG_DFL, [], SA_RESTORER, 0x2aef2189fca0}, NULL, 8) = 0
open("/scratch/utiltmp/270693_5722_2_create.cgi/192.168.3.40", O_WRONLY|O_CREAT|O_TRUNC, 0666) = 7
ioctl(7, SNDCTL_TMR_TIMEBASE or TCGETS, 0x7fff549a4930) = -1 ENOTTY (Inappropriate ioctl for device)
lseek(7, 0, SEEK_CUR) = 0
fstat(7, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
fcntl(7, F_SETFD, FD_CLOEXEC) = 0
write(7, "ARRAY,HASH%2CVAL%252Cgateway%2CV"..., 182) = 182
close(7) = 0
rt_sigaction(SIG_0, NULL, {SIG_DFL, [], 0}, 8) = -1 EINVAL (Invalid argument)
rt_sigaction(SIGHUP, NULL, {0x1, [], SA_RESTORER, 0x2aef2189fca0}, 8) = 0
rt_sigaction(SIGINT, NULL, {SIG_DFL, [], SA_RESTORER, 0x2aef2189fca0}, 8) = 0
rt_sigaction(SIGQUIT, NULL, {SIG_DFL, [], SA_RESTORER, 0x2aef2189fca0}, 8) = 0
rt_sigaction(SIGILL, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGTRAP, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGABRT, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGBUS, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGFPE, NULL, {0x1, [FPE], SA_RESTORER|SA_RESTART, 0x2aef21ade030}, 8) = 0
rt_sigaction(SIGKILL, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGUSR1, NULL, {0x1, [], SA_RESTORER, 0x2aef2189fca0}, 8) = 0
rt_sigaction(SIGSEGV, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGUSR2, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGPIPE, NULL, {SIG_DFL, [], SA_RESTORER, 0x2aef2189fca0}, 8) = 0
rt_sigaction(SIGALRM, NULL, {0x2aef208dbea0, [], SA_RESTORER, 0x2aef2189fca0}, 8) = 0
rt_sigaction(SIGALRM, {SIG_DFL, [], SA_RESTORER, 0x2aef2189fca0}, {0x2aef208dbea0, [], SA_RESTORER, 0x2aef2189fca0}, 8) = 0
rt_sigaction(SIGTERM, NULL, {SIG_DFL, [], SA_RESTORER, 0x2aef2189fca0}, 8) = 0
rt_sigaction(SIGSTKFLT, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], SA_RESTORER, 0x2aef2189fca0}, 8) = 0
rt_sigaction(SIGCONT, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGSTOP, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGTSTP, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGTTIN, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGTTOU, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGURG, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGXCPU, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGXFSZ, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGVTALRM, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGPROF, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGWINCH, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGIO, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGPWR, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGSYS, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_2, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_3, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_4, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_5, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_6, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_7, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_8, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_9, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_10, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_11, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_12, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_13, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_14, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_15, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_16, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_17, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_18, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_19, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_20, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_21, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_22, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_23, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_24, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_25, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_26, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_27, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_28, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_29, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_30, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_31, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGRT_32, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGABRT, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGCHLD, NULL, {SIG_DFL, [], SA_RESTORER, 0x2aef2189fca0}, 8) = 0
rt_sigaction(SIGIO, NULL, {SIG_DFL, [], 0}, 8) = 0
rt_sigaction(SIGSYS, NULL, {SIG_DFL, [], 0}, 8) = 0
close(12) = 0
close(13) = 0
close(6) = 0
close(5) = 0
close(10) = 0
close(4) = 0

And then creates the VPS ---

Ok, it looks like it is hung up on trying to ping a bunch of IPs to check if they are free.

If you enter an IP to use for the new VM explicitly on the creation form, does the process go any faster?

even stranger --- if I put in the IP addresses and netmasks manually, it won't create the system at all saying

Failed to create system : No IP addresses entered

But of course I DID put in the IP addresses and click the box for entering them manually.

Could you post a screenshot of what you're seeing on the system creation form?

Real soon now ---

Another oddity here is that doing things like shutting down and/or rebooting systems also seems to have a really long delay involved.

I have no idea if this might shed any light or not but --- another odd thing that has happened is that there is no longer a virtualmin link on the cloudmin server page - just webmin and cloudmin.

I've been trying to attach a screenshot --- but the site keeps complaining about a validation error - the images really are not very big (about 150k) but I can't upload them.

Ok, so presumably in the "IP addresses for KVM instance" section you chose "Manually entered addresses below" and entered at least one IP?

If I leave it blank and then let it select from the available IP pools, then I see the "waiting for a long time" behavior - but then it goes and creates the VPS properly.

If I put in a specific IP (or pair of IP's) then the creation fails immediately, saying that I didn't put an IP in.

If it would be any help in the process I'd be glad to let you log in to the server.

Yes, remote access to your system would be really useful, as I can't re-produce either problem (the delay or manual IP entering) on our test systems.

Well, it's always something.

I was trying to get the cloudmin server set up so you could log in, so I disabled two-factor authentication on the webmin configuration page. Now I cannot log in to the server through webmin at all. I tried deleting the two two-factor auth lines in miniserv.conf as well but no joy.

So -- how do I get back to being able to log in again?

Okay I am back in on the server. I see the error of my ways, I probably should have disabled two-factor for root before I disabled it for webmin.

Ok, I see the issue now - you had Cloudmin configured to wait 60 seconds for status pings (on the Cloudmin Configuration page), which due to a bug also applies to pings used to find free IPs. I will fix this in the next release, but until then you can work around the issue by lowering the wait time to something like 5 seconds.

Ah -- that's MUCH better -- thank you. I thought I was going nuts.