Virtualmin freezes on "Restoring Apache virtual host configuration"

Hi,

I am attempting to restore a Virtualmin backup created on a CentOS 6.5 system onto a new system running a fresh copy of CentOS 7 and Virtualmin 4.13. I have configured Virtualmin properly, imported my server templates and etc., and am now onto the restoration phase. However, the restore script, which I am running through the command line (restore-domain), seems to be stuck at "Restoring Apache virtual host configuration." It's been stuck here for over 10 minutes with the system's CPU usage quite high.

The server to which I am restoring is a Linode virtual server with an SSD hard drive and 1 GB RAM. It runs very fast and so I don't know why the restore process would get stuck.

The command I am running is: virtualmin restore-domain --source /vm/ --all-domains --all-features --skip-warnings --continue-on-error

What can I do to resolve this problem?

Thank you, -Logan

Status: 
Closed (fixed)

Comments

Howdy -- well, a restoring a domain can take some time, depending on the size and contents of the particular Virtual Server.

It sounds like it's doing something at the moment -- what is the output of this command, that will help us see what it's currently up to:

ps auxwf

Also, we noticed that your Virtualmin Pro license recently expired. We hope you'll consider renewing that! If you had any questions, comments, or concerns regarding Virtualmin Pro, please feel free to let us know.

I have been planning to renew my Virtualmin Pro license for a while now, I just haven't gotten around to it yet because I have been trying to make some infrastructure changes and switch Web hosting providers, thus the need for the current domain restore. I only have 23 virtual servers, and their total size is under 500 MB, and so since I am running on an SSD I didn't think the restore would take very long.

Here is the output from the given command:

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
root         2  0.0  0.0      0     0 ?        S    Jan11   0:00 [kthreadd]
root         3  0.0  0.0      0     0 ?        S    Jan11   0:05  \_ [ksoftirqd/0]
root         5  0.0  0.0      0     0 ?        S<   Jan11   0:00  \_ [kworker/0:0H]
root         7  0.0  0.0      0     0 ?        S    Jan11   0:46  \_ [rcu_sched]
root         8  0.0  0.0      0     0 ?        S    Jan11   0:00  \_ [rcu_bh]
root         9  0.0  0.0      0     0 ?        S    Jan11   0:00  \_ [migration/0]
root        10  0.0  0.0      0     0 ?        S<   Jan11   0:00  \_ [khelper]
root        11  0.0  0.0      0     0 ?        S    Jan11   0:00  \_ [kdevtmpfs]
root        12  0.0  0.0      0     0 ?        S<   Jan11   0:00  \_ [netns]
root        13  0.0  0.0      0     0 ?        S<   Jan11   0:00  \_ [perf]
root        16  0.0  0.0      0     0 ?        S    Jan11   0:00  \_ [xenwatch]
root        18  0.0  0.0      0     0 ?        S    Jan11   0:00  \_ [xenbus]
root        23  0.0  0.0      0     0 ?        S<   Jan11   0:00  \_ [writeback]
root        25  0.0  0.0      0     0 ?        S<   Jan11   0:00  \_ [crypto]
root        27  0.0  0.0      0     0 ?        S<   Jan11   0:00  \_ [bioset]
root        29  0.0  0.0      0     0 ?        S<   Jan11   0:00  \_ [kblockd]
root       183  0.0  0.0      0     0 ?        S<   Jan11   0:00  \_ [md]
root       184  0.0  0.0      0     0 ?        S<   Jan11   0:00  \_ [rpciod]
root       393  0.0  0.0      0     0 ?        S    Jan11   0:08  \_ [kswapd0]
root       395  0.0  0.0      0     0 ?        S    Jan11   0:00  \_ [fsnotify_mark]
root       396  0.0  0.0      0     0 ?        S    Jan11   0:00  \_ [ecryptfs-kthrea]
root       398  0.0  0.0      0     0 ?        S<   Jan11   0:00  \_ [nfsiod]
root       399  0.0  0.0      0     0 ?        S<   Jan11   0:00  \_ [cifsiod]
root       401  0.0  0.0      0     0 ?        S    Jan11   0:00  \_ [jfsIO]
root       403  0.0  0.0      0     0 ?        S    Jan11   0:00  \_ [jfsCommit]
root       404  0.0  0.0      0     0 ?        S    Jan11   0:00  \_ [jfsSync]
root       406  0.0  0.0      0     0 ?        S<   Jan11   0:00  \_ [xfsalloc]
root       408  0.0  0.0      0     0 ?        S<   Jan11   0:00  \_ [xfs_mru_cache]
root       410  0.0  0.0      0     0 ?        S<   Jan11   0:00  \_ [xfslogd]
root       411  0.0  0.0      0     0 ?        S<   Jan11   0:00  \_ [glock_workqueue]
root       413  0.0  0.0      0     0 ?        S<   Jan11   0:00  \_ [delete_workqueu]
root       415  0.0  0.0      0     0 ?        S<   Jan11   0:00  \_ [gfs_recovery]
root       567  0.0  0.0      0     0 ?        S    Jan11   0:00  \_ [khvcd]
root      1270  0.0  0.0      0     0 ?        S<   Jan11   0:00  \_ [bioset]
root      1272  0.0  0.0      0     0 ?        S<   Jan11   0:00  \_ [drbd-reissue]
root      1274  0.0  0.0      0     0 ?        S<   Jan11   0:00  \_ [rbd]
root      1287  0.0  0.0      0     0 ?        S<   Jan11   0:00  \_ [kpsmoused]
root      1289  0.0  0.0      0     0 ?        S<   Jan11   0:00  \_ [raid5wq]
root      1290  0.0  0.0      0     0 ?        S<   Jan11   0:00  \_ [dm_bufio_cache]
root      1292  0.0  0.0      0     0 ?        S<   Jan11   0:00  \_ [ipv6_addrconf]
root      1294  0.0  0.0      0     0 ?        S<   Jan11   0:00  \_ [ceph-msgr]
root      1295  0.0  0.0      0     0 ?        S<   Jan11   0:00  \_ [bioset]
root      1367  0.0  0.0      0     0 ?        S<   Jan11   0:00  \_ [deferwq]
root      1369  0.0  0.0      0     0 ?        S<   Jan11   0:00  \_ [reiserfs/xvda]
root      1371  0.0  0.0      0     0 ?        S<   Jan11   0:04  \_ [kworker/0:1H]
root      1372  0.0  0.0      0     0 ?        S    Jan11   0:03  \_ [jbd2/xvda-8]
root      1373  0.0  0.0      0     0 ?        S<   Jan11   0:00  \_ [ext4-rsv-conver]
root      1398  0.0  0.0      0     0 ?        S    Jan11   0:02  \_ [kauditd]
root      3555  0.0  0.0      0     0 ?        S    Jan13   0:01  \_ [kworker/u2:0]
root       573  0.0  0.0      0     0 ?        S    10:29   0:00  \_ [kworker/0:3]
root      4827  0.0  0.0      0     0 ?        S    11:03   0:00  \_ [kworker/u2:1]
root      8195  0.0  0.0      0     0 ?        S    11:15   0:00  \_ [kworker/0:1]
root      8518  0.0  0.0      0     0 ?        S    11:25   0:00  \_ [kworker/0:0]
root         1  0.0  0.4 199256  5020 ?        Ss   Jan11   0:24 /usr/lib/systemd/systemd --system --deserialize 23
root      1396  0.0  0.4  54316  4436 ?        Ss   Jan11   1:57 /usr/lib/systemd/systemd-journald
root      1495  0.0  0.0 116680    92 ?        S<sl Jan11   0:19 /sbin/auditd -n
avahi     1571  0.0  0.0  28084   116 ?        Ss   Jan11   0:01 avahi-daemon: running [linux.local]
avahi     1585  0.0  0.0  27952    20 ?        S    Jan11   0:00  \_ avahi-daemon: chroot helper
root      1584  0.0  0.1 459256  1584 ?        Ssl  Jan11   0:19 /usr/sbin/NetworkManager --no-daemon
root      2470  0.0  0.0 104208   388 ?        S    Jan11   0:00  \_ /sbin/dhclient -d -sf /usr/libexec/nm-dhcp-helper -pf /var/run/
chrony    1586  0.0  0.0  24732   128 ?        S    Jan11   0:00 /usr/sbin/chronyd -u chrony
root      1595  0.0  0.0 549984   496 ?        Ssl  Jan11   0:26 /usr/bin/python -Es /usr/sbin/tuned -l -P
dbus      1602  0.0  0.0  26832   532 ?        Ss   Jan11   0:14 /bin/dbus-daemon --system --address=systemd: --nofork --nopidfile -
root      1641  0.0  0.0 110012     8 hvc0     Ss+  Jan11   0:00 /sbin/agetty --keep-baud hvc0 115200 38400 9600
root      1647  0.0  0.0 110012     8 tty1     Ss+  Jan11   0:00 /sbin/agetty --noclear tty1
root      1663  0.0  0.0   6492     0 ?        Ss   Jan11   0:01 /sbin/iprinit --daemon
root      1676  0.0  0.0   6492     0 ?        Ss   Jan11   0:01 /sbin/iprupdate --daemon
root      1721  0.0  0.0  39132     0 ?        Ss   Jan11   0:00 /sbin/iprdump --daemon
polkitd   2427  0.0  0.0 514004   408 ?        Ssl  Jan11   0:03 /usr/lib/polkit-1/polkitd --no-debug
root      2659  0.0  0.0  82796   404 ?        Ss   Jan11   0:20 /usr/sbin/sshd -D
root     30205  0.0  0.3 137620  3440 ?        Ss   Jan13   0:00  \_ sshd: lmerrill [priv]
lmerrill 30215  0.0  0.2 137620  2448 ?        S    Jan13   0:00  |   \_ sshd: lmerrill@pts/1
lmerrill 30216  0.0  0.2 117456  2692 pts/1    Ss   Jan13   0:00  |       \_ -sh
root     30244  0.0  0.2 184856  2780 pts/1    S    Jan13   0:00  |           \_ su -
root     30245  0.0  0.2 115360  2804 pts/1    S+   Jan13   0:00  |               \_ -bash
root      2884  0.0  0.3 137620  3788 ?        Ss   10:55   0:00  \_ sshd: lmerrill [priv]
lmerrill  2968  0.0  0.2 137620  2472 ?        S    10:56   0:00  |   \_ sshd: lmerrill@pts/0
lmerrill  2969  0.0  0.3 117456  3132 pts/0    Ss   10:56   0:00  |       \_ -sh
root      3007  0.0  0.3 184860  3224 pts/0    S    10:57   0:00  |           \_ su -
root      3008  0.0  0.3 115360  3364 pts/0    S    10:57   0:00  |               \_ -bash
root      4710 80.7 12.1 315332 123496 pts/0   R+   11:03  17:23  |                   \_ /usr/libexec/webmin/virtual-server/restore-
root      8486  0.4  0.7 137620  7292 ?        Ss   11:24   0:00  \_ sshd: lmerrill [priv]
lmerrill  8488  0.0  0.3 137620  4004 ?        S    11:24   0:00      \_ sshd: lmerrill@pts/2
lmerrill  8489  0.0  0.3 117456  3500 pts/2    Ss   11:24   0:00          \_ -sh
root      8542  0.1  0.4 184864  4104 pts/2    S    11:25   0:00              \_ su -
root      8543  0.0  0.3 115360  3188 pts/2    S    11:25   0:00                  \_ -bash
root      8563  0.0  0.2 125620  2712 pts/2    R+   11:25   0:00                      \_ ps auxwf
root     13656  0.0  0.1 126336  1992 ?        Ss   Jan11   0:03 /usr/sbin/crond -n
root     29467  0.0  0.2 170956  2440 ?        S    10:00   0:00  \_ /usr/sbin/CROND -n
root     29469  0.0  0.2 113124  2136 ?        Ss   10:00   0:00      \_ /bin/bash /usr/share/clamav/freshclam-sleep
root     29471  0.0  0.0 107900   220 ?        S    10:00   0:00          \_ sleep 8753
root     16510  0.0  0.0  15628   148 ?        Ss   Jan11   0:00 /usr/sbin/dovecot
dovecot  16511  0.0  0.0   9320   268 ?        S    Jan11   0:00  \_ dovecot/anvil
root     16512  0.0  0.0   9448   560 ?        S    Jan11   0:00  \_ dovecot/log
mailman  17196  0.0  0.0 221208    72 ?        Ss   Jan11   0:00 /usr/bin/python /usr/lib/mailman/bin/mailmanctl -s start
mailman  17197  0.0  0.0 221104   488 ?        S    Jan11   0:27  \_ /usr/bin/python /usr/lib/mailman/bin/qrunner --runner=ArchRunne
mailman  17198  0.0  0.0 221112   528 ?        S    Jan11   0:24  \_ /usr/bin/python /usr/lib/mailman/bin/qrunner --runner=BounceRun
mailman  17199  0.0  0.0 221112   480 ?        S    Jan11   0:24  \_ /usr/bin/python /usr/lib/mailman/bin/qrunner --runner=CommandRu
mailman  17200  0.0  0.0 221072   528 ?        S    Jan11   0:32  \_ /usr/bin/python /usr/lib/mailman/bin/qrunner --runner=IncomingR
mailman  17201  0.0  0.0 221128   472 ?        S    Jan11   0:28  \_ /usr/bin/python /usr/lib/mailman/bin/qrunner --runner=NewsRunne
mailman  17202  0.0  0.0 221400   496 ?        S    Jan11   0:28  \_ /usr/bin/python /usr/lib/mailman/bin/qrunner --runner=OutgoingR
mailman  17203  0.0  0.0 221072   484 ?        S    Jan11   0:31  \_ /usr/bin/python /usr/lib/mailman/bin/qrunner --runner=VirginRun
mailman  17204  0.0  0.0 221072   484 ?        S    Jan11   0:00  \_ /usr/bin/python /usr/lib/mailman/bin/qrunner --runner=RetryRunn
root     20745  0.0  0.7 316120  8096 ?        Ssl  Jan11   0:32 /usr/sbin/rsyslogd -n
root     19171  0.0  8.0 363668 81604 ?        Ssl  Jan12   2:21 /usr/sbin/glusterfs --volfile-server=192.168.182.197 --volfile-id=/
root     13687  0.0  0.3 298716  3580 ?        Ssl  Jan13   0:03 /usr/sbin/glusterfs --volfile-server=192.168.182.197 --volfile-id=/
nslcd    14362  0.0  0.0 438552   416 ?        Ssl  Jan13   0:07 /usr/sbin/nslcd
root     17253  0.0  0.0 113256   184 ?        S    Jan13   0:00 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql --pid-file=/v
mysql    17399  0.1  2.9 937304 30184 ?        Sl   Jan13   1:15  \_ /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plug
root     18122  0.0  0.0  40600   200 ?        Ss   Jan13   0:00 /usr/lib/systemd/systemd-udevd
root     18345  0.0  0.2  36792  2424 ?        Ss   Jan13   0:01 /usr/lib/systemd/systemd-logind
root     19650  0.1  1.3 305272 13684 ?        Ssl  Jan13   0:53 /usr/sbin/glusterfs --volfile-server=192.168.182.197 --volfile-id=/
root     19685  0.0  1.2 307232 12564 ?        Ssl  Jan13   0:08 /usr/sbin/glusterfs --volfile-server=192.168.182.197 --volfile-id=/
nobody   20214  0.0  0.0 189752   948 ?        Ss   Jan13   0:01 proftpd: (accepting connections)
root     22135  0.0  0.2  91964  2300 ?        Ss   Jan13   0:00 /usr/libexec/postfix/master -w
postfix  22137  0.0  0.1  94236  1804 ?        S    Jan13   0:00  \_ qmgr -l -t fifo -u
postfix  10991  0.0  0.2  94180  2888 ?        S    02:58   0:00  \_ tlsmgr -l -t unix -u
postfix  30247  0.0  0.3  94168  3164 ?        S    10:20   0:00  \_ pickup -l -t fifo -u
root     22532  0.0  3.7  92852 38364 ?        Ss   Jan13   0:00 /usr/libexec/webmin/virtual-server/lookup-domain-daemon.pl
postgrey 23723  0.0  1.4 171860 14252 ?        Ss   Jan13   0:00 /usr/sbin/postgrey --unix=/var/spool/postfix/postgrey/socket --pidf
root     30755  0.0  1.9  94312 19424 ?        Ss   00:06   0:07 /usr/bin/perl /usr/libexec/webmin/miniserv.pl /etc/webmin/miniserv.
named     7957  0.0  2.1 166304 21592 ?        Ssl  11:07   0:00 /usr/sbin/named -u named

It is indeed still stuck on "Restoring Apache virtual host configuration." What's worse is that this is the second virtual server being restored; the first went through this step without any problems, and that server was larger in terms of total file size than is the current server being restored. Thank you, -Logan

Hmm, I don't see anything in the process output there that indicates what the restore is doing, which I had hoped it would show us.

We may need to take a peek at the strace output -- what does this command show:

strace -p 123496

(if it says the command isn't available, you may need to first run "yum install strace")

Hi,

I installed strace via yum and then ran the command, here is its output:

strace: attach: ptrace(PTRACE_ATTACH, ...): No such process

Thanks, -Logan

Shoot, I grabbed the wrong process ID for that process... it should actually be "4710".

What is the output of this command:

strace -p 4710

That is the process ID for your restore-domain command.

Hi,

I ran the command and it said "Process 4710 attached" but nothing else after that. It's still running, it's been about 5 minutes now, but there is still no output.

For reference, Webmin reports that process 4710, restore-domain.pl, is using 92.4% of CPU.

Thanks, -Logan

Update: strace is still running but no new output has been added. Similarly, restore-domain.pl is still running but has also not moved, and CPu usage is now at 93.0%.

Thanks, -Logan

If the strace isn't showing any system call activity (which would indicate that it is waiting on a lock for example), then the cause might be an infinite loop in the Virtualmin code that is triggered by something in your Apache config.

If possible, I would like to see the backup file you are trying to restore for this domain..

Hi,

I stopped the restore process a few hours ago and restarted it, this time starting a stack trace immediately. Everything seemed to move along just fine until the "Restoring Apache virtual host configuration" part.

Here are the last few lines from the stack trace:

stat("/home/audioshout/fcgi-bin/php5.fcgi", {st_mode=S_IFREG|0755, st_size=202, ...}) = 0 geteuid() = 0 pipe([5, 10]) = 0 ioctl(5, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, 0x7fffb1f32510) = -1 ENOTTY (Inappropriate ioctl for device) lseek(5, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek) ioctl(10, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, 0x7fffb1f32510) = -1 ENOTTY (Inappropriate ioctl for device) lseek(10, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek) fcntl(5, F_SETFD, FD_CLOEXEC) = 0 fcntl(10, F_SETFD, FD_CLOEXEC) = 0 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fe46c29aa10) = 13998 close(10) = 0 read(5, "OK\n", 8192) = 3 read(5, "#!/bin/bash\n", 8192) = 12 read(5, "PHPRC=$PWD/../etc/php5.6\n", 8192) = 25 read(5, "export PHPRC\n", 8192) = 13 read(5, "umask 022\n", 8192) = 10 read(5, "PHP_FCGI_MAX_REQUESTS=500\n", 8192) = 26 read(5, "export PHP_FCGI_MAX_REQUESTS\n", 8192) = 29 read(5, "SCRIPT_FILENAME=$PATH_TRANSLATED"..., 8192) = 33 read(5, "export SCRIPT_FILENAME\n", 8192) = 23 read(5, "exec /opt/rh/php56/bin/php-cgi\n", 8192) = 31 read(5, "", 8192) = 0 close(5) = 0 kill(13998, SIGKILL) = 0 wait4(13998, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGKILL}], 0, NULL) = 13998 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=13998, si_status=SIGKILL, si_utime=2, si_stime=3} ---

As far as sending the backup file, I'm not exactly sure how I would do that because each virtual server is in its own TAR file, but I don't currently have a public Web server available to put the file on. I can give you SSH access to the server though if you'd like to take a look there at the backup files.

Thanks, -Logan

Hi, just wanted to see if any progress has been made on this thus far? Because this very same restore worked on my other machine running CentOS 6 and Virtualmin 4.11 I was considering downgrading the current server to Virtualmin 4.11 from 4.13 if we can't get this resolved, as this has already caused more downtime than I would have liked. I look forward to seeing what you're able to do with this and if you are able to find and fix the issue. Thanks so much!

I'm taking a look at this now..

So I don't have to do a full restore, which domain does the process hang on?

Honestly it varies each time I run the command, for some weird reason. Usually it's audioshout.com though. But it's usually the second or third domain that it tries to restore, if you run the full command I gave in my e-mail.

I am also logged in as root now too if you need to send a quick message via wall or something like that.

Thanks!

Ok, I think I've fixed it ... give the restore another try now.

Update: Restore has as of now gone farther than before, but seems to be stuck now on "Restoring SpamAssassin and Procmail configurations" for the virtual server cloud.onefridge.net. Not sure if it's hung or is just taking an abnormally long time, as it usually completed this step in only a few seconds for other servers. Will update again if things change.

Update 2: Never mind, looks to be caused by Virtualmin being unable to connect to my remote LDAP server, which I am checking now.

Update 3: Just checked the remote LDAP server and restarted the Virtualmin restore, let's see if things go OK this time. Also rebooted the Virtualmin machine. Will post if anything else goes wrong. Thanks.

Everything works now!! I'm not sure what you did, Jamie, but you fixed the issue. The restore process completed successfully and Virtualmin is humming along quite nicely now. Thank you so much for resolving the problem. What was the specific issue, anyway?

Thanks!

Great! Turns out that there was a bug in Virtualmin that was only triggered in a certain rare case, which caused it to go into an infinite loop on restores!

This fix will also be included in the next release.

Automatically closed -- issue fixed for 2 weeks with no activity.