Hi,
I am attempting to restore a Virtualmin backup created on a CentOS 6.5 system onto a new system running a fresh copy of CentOS 7 and Virtualmin 4.13. I have configured Virtualmin properly, imported my server templates and etc., and am now onto the restoration phase. However, the restore script, which I am running through the command line (restore-domain), seems to be stuck at "Restoring Apache virtual host configuration." It's been stuck here for over 10 minutes with the system's CPU usage quite high.
The server to which I am restoring is a Linode virtual server with an SSD hard drive and 1 GB RAM. It runs very fast and so I don't know why the restore process would get stuck.
The command I am running is: virtualmin restore-domain --source /vm/ --all-domains --all-features --skip-warnings --continue-on-error
What can I do to resolve this problem?
Thank you, -Logan
Comments
Submitted by andreychek on Wed, 01/14/2015 - 10:23 Comment #1
Howdy -- well, a restoring a domain can take some time, depending on the size and contents of the particular Virtual Server.
It sounds like it's doing something at the moment -- what is the output of this command, that will help us see what it's currently up to:
ps auxwf
Also, we noticed that your Virtualmin Pro license recently expired. We hope you'll consider renewing that! If you had any questions, comments, or concerns regarding Virtualmin Pro, please feel free to let us know.
Submitted by airshock on Wed, 01/14/2015 - 10:30 Comment #2
I have been planning to renew my Virtualmin Pro license for a while now, I just haven't gotten around to it yet because I have been trying to make some infrastructure changes and switch Web hosting providers, thus the need for the current domain restore. I only have 23 virtual servers, and their total size is under 500 MB, and so since I am running on an SSD I didn't think the restore would take very long.
Here is the output from the given command:
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 2 0.0 0.0 0 0 ? S Jan11 0:00 [kthreadd]
root 3 0.0 0.0 0 0 ? S Jan11 0:05 \_ [ksoftirqd/0]
root 5 0.0 0.0 0 0 ? S< Jan11 0:00 \_ [kworker/0:0H]
root 7 0.0 0.0 0 0 ? S Jan11 0:46 \_ [rcu_sched]
root 8 0.0 0.0 0 0 ? S Jan11 0:00 \_ [rcu_bh]
root 9 0.0 0.0 0 0 ? S Jan11 0:00 \_ [migration/0]
root 10 0.0 0.0 0 0 ? S< Jan11 0:00 \_ [khelper]
root 11 0.0 0.0 0 0 ? S Jan11 0:00 \_ [kdevtmpfs]
root 12 0.0 0.0 0 0 ? S< Jan11 0:00 \_ [netns]
root 13 0.0 0.0 0 0 ? S< Jan11 0:00 \_ [perf]
root 16 0.0 0.0 0 0 ? S Jan11 0:00 \_ [xenwatch]
root 18 0.0 0.0 0 0 ? S Jan11 0:00 \_ [xenbus]
root 23 0.0 0.0 0 0 ? S< Jan11 0:00 \_ [writeback]
root 25 0.0 0.0 0 0 ? S< Jan11 0:00 \_ [crypto]
root 27 0.0 0.0 0 0 ? S< Jan11 0:00 \_ [bioset]
root 29 0.0 0.0 0 0 ? S< Jan11 0:00 \_ [kblockd]
root 183 0.0 0.0 0 0 ? S< Jan11 0:00 \_ [md]
root 184 0.0 0.0 0 0 ? S< Jan11 0:00 \_ [rpciod]
root 393 0.0 0.0 0 0 ? S Jan11 0:08 \_ [kswapd0]
root 395 0.0 0.0 0 0 ? S Jan11 0:00 \_ [fsnotify_mark]
root 396 0.0 0.0 0 0 ? S Jan11 0:00 \_ [ecryptfs-kthrea]
root 398 0.0 0.0 0 0 ? S< Jan11 0:00 \_ [nfsiod]
root 399 0.0 0.0 0 0 ? S< Jan11 0:00 \_ [cifsiod]
root 401 0.0 0.0 0 0 ? S Jan11 0:00 \_ [jfsIO]
root 403 0.0 0.0 0 0 ? S Jan11 0:00 \_ [jfsCommit]
root 404 0.0 0.0 0 0 ? S Jan11 0:00 \_ [jfsSync]
root 406 0.0 0.0 0 0 ? S< Jan11 0:00 \_ [xfsalloc]
root 408 0.0 0.0 0 0 ? S< Jan11 0:00 \_ [xfs_mru_cache]
root 410 0.0 0.0 0 0 ? S< Jan11 0:00 \_ [xfslogd]
root 411 0.0 0.0 0 0 ? S< Jan11 0:00 \_ [glock_workqueue]
root 413 0.0 0.0 0 0 ? S< Jan11 0:00 \_ [delete_workqueu]
root 415 0.0 0.0 0 0 ? S< Jan11 0:00 \_ [gfs_recovery]
root 567 0.0 0.0 0 0 ? S Jan11 0:00 \_ [khvcd]
root 1270 0.0 0.0 0 0 ? S< Jan11 0:00 \_ [bioset]
root 1272 0.0 0.0 0 0 ? S< Jan11 0:00 \_ [drbd-reissue]
root 1274 0.0 0.0 0 0 ? S< Jan11 0:00 \_ [rbd]
root 1287 0.0 0.0 0 0 ? S< Jan11 0:00 \_ [kpsmoused]
root 1289 0.0 0.0 0 0 ? S< Jan11 0:00 \_ [raid5wq]
root 1290 0.0 0.0 0 0 ? S< Jan11 0:00 \_ [dm_bufio_cache]
root 1292 0.0 0.0 0 0 ? S< Jan11 0:00 \_ [ipv6_addrconf]
root 1294 0.0 0.0 0 0 ? S< Jan11 0:00 \_ [ceph-msgr]
root 1295 0.0 0.0 0 0 ? S< Jan11 0:00 \_ [bioset]
root 1367 0.0 0.0 0 0 ? S< Jan11 0:00 \_ [deferwq]
root 1369 0.0 0.0 0 0 ? S< Jan11 0:00 \_ [reiserfs/xvda]
root 1371 0.0 0.0 0 0 ? S< Jan11 0:04 \_ [kworker/0:1H]
root 1372 0.0 0.0 0 0 ? S Jan11 0:03 \_ [jbd2/xvda-8]
root 1373 0.0 0.0 0 0 ? S< Jan11 0:00 \_ [ext4-rsv-conver]
root 1398 0.0 0.0 0 0 ? S Jan11 0:02 \_ [kauditd]
root 3555 0.0 0.0 0 0 ? S Jan13 0:01 \_ [kworker/u2:0]
root 573 0.0 0.0 0 0 ? S 10:29 0:00 \_ [kworker/0:3]
root 4827 0.0 0.0 0 0 ? S 11:03 0:00 \_ [kworker/u2:1]
root 8195 0.0 0.0 0 0 ? S 11:15 0:00 \_ [kworker/0:1]
root 8518 0.0 0.0 0 0 ? S 11:25 0:00 \_ [kworker/0:0]
root 1 0.0 0.4 199256 5020 ? Ss Jan11 0:24 /usr/lib/systemd/systemd --system --deserialize 23
root 1396 0.0 0.4 54316 4436 ? Ss Jan11 1:57 /usr/lib/systemd/systemd-journald
root 1495 0.0 0.0 116680 92 ? S<sl Jan11 0:19 /sbin/auditd -n
avahi 1571 0.0 0.0 28084 116 ? Ss Jan11 0:01 avahi-daemon: running [linux.local]
avahi 1585 0.0 0.0 27952 20 ? S Jan11 0:00 \_ avahi-daemon: chroot helper
root 1584 0.0 0.1 459256 1584 ? Ssl Jan11 0:19 /usr/sbin/NetworkManager --no-daemon
root 2470 0.0 0.0 104208 388 ? S Jan11 0:00 \_ /sbin/dhclient -d -sf /usr/libexec/nm-dhcp-helper -pf /var/run/
chrony 1586 0.0 0.0 24732 128 ? S Jan11 0:00 /usr/sbin/chronyd -u chrony
root 1595 0.0 0.0 549984 496 ? Ssl Jan11 0:26 /usr/bin/python -Es /usr/sbin/tuned -l -P
dbus 1602 0.0 0.0 26832 532 ? Ss Jan11 0:14 /bin/dbus-daemon --system --address=systemd: --nofork --nopidfile -
root 1641 0.0 0.0 110012 8 hvc0 Ss+ Jan11 0:00 /sbin/agetty --keep-baud hvc0 115200 38400 9600
root 1647 0.0 0.0 110012 8 tty1 Ss+ Jan11 0:00 /sbin/agetty --noclear tty1
root 1663 0.0 0.0 6492 0 ? Ss Jan11 0:01 /sbin/iprinit --daemon
root 1676 0.0 0.0 6492 0 ? Ss Jan11 0:01 /sbin/iprupdate --daemon
root 1721 0.0 0.0 39132 0 ? Ss Jan11 0:00 /sbin/iprdump --daemon
polkitd 2427 0.0 0.0 514004 408 ? Ssl Jan11 0:03 /usr/lib/polkit-1/polkitd --no-debug
root 2659 0.0 0.0 82796 404 ? Ss Jan11 0:20 /usr/sbin/sshd -D
root 30205 0.0 0.3 137620 3440 ? Ss Jan13 0:00 \_ sshd: lmerrill [priv]
lmerrill 30215 0.0 0.2 137620 2448 ? S Jan13 0:00 | \_ sshd: lmerrill@pts/1
lmerrill 30216 0.0 0.2 117456 2692 pts/1 Ss Jan13 0:00 | \_ -sh
root 30244 0.0 0.2 184856 2780 pts/1 S Jan13 0:00 | \_ su -
root 30245 0.0 0.2 115360 2804 pts/1 S+ Jan13 0:00 | \_ -bash
root 2884 0.0 0.3 137620 3788 ? Ss 10:55 0:00 \_ sshd: lmerrill [priv]
lmerrill 2968 0.0 0.2 137620 2472 ? S 10:56 0:00 | \_ sshd: lmerrill@pts/0
lmerrill 2969 0.0 0.3 117456 3132 pts/0 Ss 10:56 0:00 | \_ -sh
root 3007 0.0 0.3 184860 3224 pts/0 S 10:57 0:00 | \_ su -
root 3008 0.0 0.3 115360 3364 pts/0 S 10:57 0:00 | \_ -bash
root 4710 80.7 12.1 315332 123496 pts/0 R+ 11:03 17:23 | \_ /usr/libexec/webmin/virtual-server/restore-
root 8486 0.4 0.7 137620 7292 ? Ss 11:24 0:00 \_ sshd: lmerrill [priv]
lmerrill 8488 0.0 0.3 137620 4004 ? S 11:24 0:00 \_ sshd: lmerrill@pts/2
lmerrill 8489 0.0 0.3 117456 3500 pts/2 Ss 11:24 0:00 \_ -sh
root 8542 0.1 0.4 184864 4104 pts/2 S 11:25 0:00 \_ su -
root 8543 0.0 0.3 115360 3188 pts/2 S 11:25 0:00 \_ -bash
root 8563 0.0 0.2 125620 2712 pts/2 R+ 11:25 0:00 \_ ps auxwf
root 13656 0.0 0.1 126336 1992 ? Ss Jan11 0:03 /usr/sbin/crond -n
root 29467 0.0 0.2 170956 2440 ? S 10:00 0:00 \_ /usr/sbin/CROND -n
root 29469 0.0 0.2 113124 2136 ? Ss 10:00 0:00 \_ /bin/bash /usr/share/clamav/freshclam-sleep
root 29471 0.0 0.0 107900 220 ? S 10:00 0:00 \_ sleep 8753
root 16510 0.0 0.0 15628 148 ? Ss Jan11 0:00 /usr/sbin/dovecot
dovecot 16511 0.0 0.0 9320 268 ? S Jan11 0:00 \_ dovecot/anvil
root 16512 0.0 0.0 9448 560 ? S Jan11 0:00 \_ dovecot/log
mailman 17196 0.0 0.0 221208 72 ? Ss Jan11 0:00 /usr/bin/python /usr/lib/mailman/bin/mailmanctl -s start
mailman 17197 0.0 0.0 221104 488 ? S Jan11 0:27 \_ /usr/bin/python /usr/lib/mailman/bin/qrunner --runner=ArchRunne
mailman 17198 0.0 0.0 221112 528 ? S Jan11 0:24 \_ /usr/bin/python /usr/lib/mailman/bin/qrunner --runner=BounceRun
mailman 17199 0.0 0.0 221112 480 ? S Jan11 0:24 \_ /usr/bin/python /usr/lib/mailman/bin/qrunner --runner=CommandRu
mailman 17200 0.0 0.0 221072 528 ? S Jan11 0:32 \_ /usr/bin/python /usr/lib/mailman/bin/qrunner --runner=IncomingR
mailman 17201 0.0 0.0 221128 472 ? S Jan11 0:28 \_ /usr/bin/python /usr/lib/mailman/bin/qrunner --runner=NewsRunne
mailman 17202 0.0 0.0 221400 496 ? S Jan11 0:28 \_ /usr/bin/python /usr/lib/mailman/bin/qrunner --runner=OutgoingR
mailman 17203 0.0 0.0 221072 484 ? S Jan11 0:31 \_ /usr/bin/python /usr/lib/mailman/bin/qrunner --runner=VirginRun
mailman 17204 0.0 0.0 221072 484 ? S Jan11 0:00 \_ /usr/bin/python /usr/lib/mailman/bin/qrunner --runner=RetryRunn
root 20745 0.0 0.7 316120 8096 ? Ssl Jan11 0:32 /usr/sbin/rsyslogd -n
root 19171 0.0 8.0 363668 81604 ? Ssl Jan12 2:21 /usr/sbin/glusterfs --volfile-server=192.168.182.197 --volfile-id=/
root 13687 0.0 0.3 298716 3580 ? Ssl Jan13 0:03 /usr/sbin/glusterfs --volfile-server=192.168.182.197 --volfile-id=/
nslcd 14362 0.0 0.0 438552 416 ? Ssl Jan13 0:07 /usr/sbin/nslcd
root 17253 0.0 0.0 113256 184 ? S Jan13 0:00 /bin/sh /usr/bin/mysqld_safe --datadir=/var/lib/mysql --pid-file=/v
mysql 17399 0.1 2.9 937304 30184 ? Sl Jan13 1:15 \_ /usr/sbin/mysqld --basedir=/usr --datadir=/var/lib/mysql --plug
root 18122 0.0 0.0 40600 200 ? Ss Jan13 0:00 /usr/lib/systemd/systemd-udevd
root 18345 0.0 0.2 36792 2424 ? Ss Jan13 0:01 /usr/lib/systemd/systemd-logind
root 19650 0.1 1.3 305272 13684 ? Ssl Jan13 0:53 /usr/sbin/glusterfs --volfile-server=192.168.182.197 --volfile-id=/
root 19685 0.0 1.2 307232 12564 ? Ssl Jan13 0:08 /usr/sbin/glusterfs --volfile-server=192.168.182.197 --volfile-id=/
nobody 20214 0.0 0.0 189752 948 ? Ss Jan13 0:01 proftpd: (accepting connections)
root 22135 0.0 0.2 91964 2300 ? Ss Jan13 0:00 /usr/libexec/postfix/master -w
postfix 22137 0.0 0.1 94236 1804 ? S Jan13 0:00 \_ qmgr -l -t fifo -u
postfix 10991 0.0 0.2 94180 2888 ? S 02:58 0:00 \_ tlsmgr -l -t unix -u
postfix 30247 0.0 0.3 94168 3164 ? S 10:20 0:00 \_ pickup -l -t fifo -u
root 22532 0.0 3.7 92852 38364 ? Ss Jan13 0:00 /usr/libexec/webmin/virtual-server/lookup-domain-daemon.pl
postgrey 23723 0.0 1.4 171860 14252 ? Ss Jan13 0:00 /usr/sbin/postgrey --unix=/var/spool/postfix/postgrey/socket --pidf
root 30755 0.0 1.9 94312 19424 ? Ss 00:06 0:07 /usr/bin/perl /usr/libexec/webmin/miniserv.pl /etc/webmin/miniserv.
named 7957 0.0 2.1 166304 21592 ? Ssl 11:07 0:00 /usr/sbin/named -u named
It is indeed still stuck on "Restoring Apache virtual host configuration." What's worse is that this is the second virtual server being restored; the first went through this step without any problems, and that server was larger in terms of total file size than is the current server being restored. Thank you, -Logan
Submitted by andreychek on Wed, 01/14/2015 - 10:44 Comment #3
Hmm, I don't see anything in the process output there that indicates what the restore is doing, which I had hoped it would show us.
We may need to take a peek at the strace output -- what does this command show:
strace -p 123496
(if it says the command isn't available, you may need to first run "yum install strace")
Submitted by airshock on Wed, 01/14/2015 - 10:47 Comment #4
Hi,
I installed strace via yum and then ran the command, here is its output:
strace: attach: ptrace(PTRACE_ATTACH, ...): No such process
Thanks, -Logan
Submitted by andreychek on Wed, 01/14/2015 - 11:18 Comment #5
Shoot, I grabbed the wrong process ID for that process... it should actually be "4710".
What is the output of this command:
strace -p 4710
That is the process ID for your restore-domain command.
Submitted by airshock on Wed, 01/14/2015 - 11:27 Comment #6
Hi,
I ran the command and it said "Process 4710 attached" but nothing else after that. It's still running, it's been about 5 minutes now, but there is still no output.
For reference, Webmin reports that process 4710, restore-domain.pl, is using 92.4% of CPU.
Thanks, -Logan
Submitted by airshock on Wed, 01/14/2015 - 12:26 Comment #7
Update: strace is still running but no new output has been added. Similarly, restore-domain.pl is still running but has also not moved, and CPu usage is now at 93.0%.
Thanks, -Logan
Submitted by JamieCameron on Wed, 01/14/2015 - 19:08 Comment #8
If the strace isn't showing any system call activity (which would indicate that it is waiting on a lock for example), then the cause might be an infinite loop in the Virtualmin code that is triggered by something in your Apache config.
If possible, I would like to see the backup file you are trying to restore for this domain..
Submitted by airshock on Wed, 01/14/2015 - 20:38 Comment #9
Hi,
I stopped the restore process a few hours ago and restarted it, this time starting a stack trace immediately. Everything seemed to move along just fine until the "Restoring Apache virtual host configuration" part.
Here are the last few lines from the stack trace:
stat("/home/audioshout/fcgi-bin/php5.fcgi", {st_mode=S_IFREG|0755, st_size=202, ...}) = 0 geteuid() = 0 pipe([5, 10]) = 0 ioctl(5, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, 0x7fffb1f32510) = -1 ENOTTY (Inappropriate ioctl for device) lseek(5, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek) ioctl(10, SNDCTL_TMR_TIMEBASE or SNDRV_TIMER_IOCTL_NEXT_DEVICE or TCGETS, 0x7fffb1f32510) = -1 ENOTTY (Inappropriate ioctl for device) lseek(10, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek) fcntl(5, F_SETFD, FD_CLOEXEC) = 0 fcntl(10, F_SETFD, FD_CLOEXEC) = 0 clone(child_stack=0, flags=CLONE_CHILD_CLEARTID|CLONE_CHILD_SETTID|SIGCHLD, child_tidptr=0x7fe46c29aa10) = 13998 close(10) = 0 read(5, "OK\n", 8192) = 3 read(5, "#!/bin/bash\n", 8192) = 12 read(5, "PHPRC=$PWD/../etc/php5.6\n", 8192) = 25 read(5, "export PHPRC\n", 8192) = 13 read(5, "umask 022\n", 8192) = 10 read(5, "PHP_FCGI_MAX_REQUESTS=500\n", 8192) = 26 read(5, "export PHP_FCGI_MAX_REQUESTS\n", 8192) = 29 read(5, "SCRIPT_FILENAME=$PATH_TRANSLATED"..., 8192) = 33 read(5, "export SCRIPT_FILENAME\n", 8192) = 23 read(5, "exec /opt/rh/php56/bin/php-cgi\n", 8192) = 31 read(5, "", 8192) = 0 close(5) = 0 kill(13998, SIGKILL) = 0 wait4(13998, [{WIFSIGNALED(s) && WTERMSIG(s) == SIGKILL}], 0, NULL) = 13998 --- SIGCHLD {si_signo=SIGCHLD, si_code=CLD_KILLED, si_pid=13998, si_status=SIGKILL, si_utime=2, si_stime=3} ---
As far as sending the backup file, I'm not exactly sure how I would do that because each virtual server is in its own TAR file, but I don't currently have a public Web server available to put the file on. I can give you SSH access to the server though if you'd like to take a look there at the backup files.
Thanks, -Logan
Submitted by JamieCameron on Wed, 01/14/2015 - 22:34 Comment #10
SSH access would be really useful - you can email me at jcameron@virtualmin.com
Submitted by airshock on Thu, 01/15/2015 - 09:00 Comment #11
Good morning Jamie,
I just sent you an e-mail with SSH details. It will be coming from loganmeril@gmail.com.
Thanks, -Logan
Submitted by airshock on Thu, 01/15/2015 - 15:28 Comment #12
Hi, just wanted to see if any progress has been made on this thus far? Because this very same restore worked on my other machine running CentOS 6 and Virtualmin 4.11 I was considering downgrading the current server to Virtualmin 4.11 from 4.13 if we can't get this resolved, as this has already caused more downtime than I would have liked. I look forward to seeing what you're able to do with this and if you are able to find and fix the issue. Thanks so much!
Submitted by JamieCameron on Thu, 01/15/2015 - 19:52 Comment #13
I'm taking a look at this now..
So I don't have to do a full restore, which domain does the process hang on?
Submitted by airshock on Thu, 01/15/2015 - 20:04 Comment #14
Honestly it varies each time I run the command, for some weird reason. Usually it's audioshout.com though. But it's usually the second or third domain that it tries to restore, if you run the full command I gave in my e-mail.
I am also logged in as root now too if you need to send a quick message via wall or something like that.
Thanks!
Submitted by JamieCameron on Thu, 01/15/2015 - 20:57 Comment #15
Ok, I think I've fixed it ... give the restore another try now.
Submitted by airshock on Thu, 01/15/2015 - 20:59 Comment #16
Restoring now....
Submitted by airshock on Thu, 01/15/2015 - 21:20 Comment #17
Update: Restore has as of now gone farther than before, but seems to be stuck now on "Restoring SpamAssassin and Procmail configurations" for the virtual server cloud.onefridge.net. Not sure if it's hung or is just taking an abnormally long time, as it usually completed this step in only a few seconds for other servers. Will update again if things change.
Submitted by airshock on Thu, 01/15/2015 - 21:33 Comment #18
Update 2: Never mind, looks to be caused by Virtualmin being unable to connect to my remote LDAP server, which I am checking now.
Submitted by airshock on Thu, 01/15/2015 - 21:53 Comment #19
Update 3: Just checked the remote LDAP server and restarted the Virtualmin restore, let's see if things go OK this time. Also rebooted the Virtualmin machine. Will post if anything else goes wrong. Thanks.
Submitted by airshock on Fri, 01/16/2015 - 12:17 Comment #20
Everything works now!! I'm not sure what you did, Jamie, but you fixed the issue. The restore process completed successfully and Virtualmin is humming along quite nicely now. Thank you so much for resolving the problem. What was the specific issue, anyway?
Thanks!
Submitted by JamieCameron on Fri, 01/16/2015 - 16:53 Comment #21
Great! Turns out that there was a bug in Virtualmin that was only triggered in a certain rare case, which caused it to go into an infinite loop on restores!
This fix will also be included in the next release.
Submitted by Issues on Fri, 01/30/2015 - 16:56 Comment #22
Automatically closed -- issue fixed for 2 weeks with no activity.