Hey guys
I have configuration with 4 servers, made up of a cloudmin (connect) server and 3 web servers.
Virtual servers exist on web101 and I'm trying to ask cloudmin to replicate the the websites to web102 and web203. This process fails every time, with the error:
Starting replication from web101 of Virtualmin settings ..
Finding source and destination systems ..
.. found source web101 and 2 destinations
Refreshing domains on source system ..
.. done
Creating temporary directories ..
.. done
Backing up 4 virtual servers on source system ..
.. created backup of 711.95 kB
Transferring backups to destination systems ..
.. done
Restoring backups on destination systems ..
.. 0 restores succeeded, 2 failed
Failed to restore on web102 : Failed to read backup file : /tmp/.webmin/105207_54742_1_fastrpc.cgi/ssltest.local.tar.gz : Not a valid tar or tar.gz file
Failed to restore on web103 : Failed to read backup file : /tmp/.webmin/953777_63257_1_fastrpc.cgi/ssltest.local.tar.gz : Not a valid tar or tar.gz file
Removing excess virtual servers on destinations ..
.. no domains need deletion on web102
.. no domains need deletion on web103
Replication failed - see the output above for the reason why.
So I've interrupted the process to grab the tar files that cloudmin stores on the 2 destination servers, inside the /tmp/.webmin directory. In the example above there are 4 tar files related to the virtual servers to replicate and then 1 virtualmin settings tar file. When I try to extract these myself it's evident that the tar files are invalid / corrupt.
]# tar -zxvf benchmark.local.tar.gz
./
tar: Skipping to next header
gzip: stdin: invalid compressed data--crc error
gzip: stdin: invalid compressed data--length error
tar: Child returned status 1
tar: Error is not recoverable: exiting now
#
# tar zxvf cloudmin101.local.tar.gz
gzip: stdin: invalid compressed data--format violated
./
./.backup/
./.usermin/
tar: Skipping to next header
tar: Child returned status 1
tar: Error is not recoverable: exiting now
My initial thoughts were that this was related to an interrupted process during the transfer / dd process. However, the machines are all VMs on the same host at present (during testing), connected to a physical single switch. The network is rock solid, achieves 3gbps between the machines, and never drops any packets.
Is there any chance someone could point me in the right direction to troubleshoot this further? Cheers
Comments
Submitted by VirtualNoob on Tue, 10/25/2016 - 12:02 Comment #1
It's worth also adding that the files do vary in size too...
[root@web103 953777_63257_1_fastrpc.cgi]# ls -alh
total 728K
drwxr-x--- 4 502 502 4.0K Oct 25 17:18 .
drwxr-xr-x 4 root root 72 Oct 25 17:18 ..
drwxrwxrwx 2 root root 6 Oct 25 17:18 .backup
-rw-r--r-- 1 root root 26K Oct 25 17:18 benchmark.local.tar.gz
-rw-r--r-- 1 root root 27K Oct 25 17:18 cloudmin101.local.tar.gz
-rw-r--r-- 1 root root 27K Oct 25 17:18 replication.local.tar.gz
-rw-r--r-- 1 root root 630K Oct 25 17:18 ssltest.local.tar.gz
drwx------ 2 502 502 6 Jul 15 17:42 .usermin
-rw-r--r-- 1 root root 5.4K Oct 25 17:18 virtualmin.tar.gz
Submitted by JamieCameron on Tue, 10/25/2016 - 22:36 Comment #2
That's quite unusual - it seems like the ssh transfer of those files failed silently, resulting in a corrupt or truncated file.
Is the destination system perhaps out of disk space?
Submitted by VirtualNoob on Wed, 10/26/2016 - 02:35 Comment #3
Hey Jamie
Nope, these are all brand new systems. /tmp is part of the root filesystem and has a 124GB xfs within a logical volume. Total disk usage on the entire machine is 3GB.
Submitted by JamieCameron on Thu, 10/27/2016 - 00:12 Comment #4
On the source system, which backup compression format do you have selected in Virtualmin? This is visible at System Settings -> Module Configuration -> Backup and restore -> Backup compression format.
Submitted by VirtualNoob on Thu, 10/27/2016 - 10:44 Comment #5
Backup compression format is currently gzip on the source system.
Submitted by JamieCameron on Sat, 10/29/2016 - 00:41 Comment #6
And on the destination too?
Submitted by VirtualNoob on Mon, 10/31/2016 - 08:05 Comment #7
Yep on all 3 nodes it's gzip
Submitted by JamieCameron on Mon, 10/31/2016 - 23:25 Comment #8
If on the source system you just make a regular Virtualmin backup of one of the domains, scp it to the destination and then restore, does it work OK?
Submitted by VirtualNoob on Tue, 11/01/2016 - 04:32 Comment #9
Yeah, manual backup, SCP and restore via the virtualmin web interface works as expected.
Still getting the same errors with a newly created virtual server on the source system:
Failed to restore on web102 : Failed to read backup file : /tmp/.webmin/862446_16406_1_fastrpc.cgi/benchmark.local.tar.gz : Not a valid tar or tar.gz file
Failed to restore on web103 : Failed to read backup file : /tmp/.webmin/38937_19047_1_fastrpc.cgi/benchmark.local.tar.gz : Not a valid tar or tar.gz file
Submitted by JamieCameron on Tue, 11/01/2016 - 17:03 Comment #10
Is
/tmp
perhaps full on the source or destination system?Submitted by VirtualNoob on Tue, 11/01/2016 - 17:09 Comment #11
Nope. I made sure I ran the manual backup, SCP and restore from that location to be sure. But as I said in an earlier post the systems are all HyperV nodes with 127GB root file systems and only ~3GB of data on each.
Submitted by JamieCameron on Thu, 11/03/2016 - 00:07 Comment #12
Any chance we could login to this system to see what's going wrong?
Submitted by VirtualNoob on Thu, 11/03/2016 - 06:08 Comment #13
Erm yeah, I'll have to grant VPN access and then logins for the 3 virtualmin nodes and the cloudmin node. Could you let me know where to send the credentials and I'll set it all up for you.
All of the nodes I grant access to have zero data on them and so you can't hurt anything. Feel free to reboot, or whatever else you may need to do as long as you let me know exactly what any potential fix might have been.
Thanks
Submitted by JamieCameron on Fri, 11/04/2016 - 00:19 Comment #14
Please send me login details at jcameron@virtualmin.com
Submitted by VirtualNoob on Fri, 11/04/2016 - 01:00 Comment #15
I've just sent those through to you.
Submitted by VirtualNoob on Wed, 11/09/2016 - 06:44 Comment #16
Hey Jamie
I can see that you've not yet connected to the VPN using the details I've emailed over. Could you please give me an update on this, and also confirm the credentials were safely received by you last Friday.
Thanks
Submitted by JamieCameron on Sat, 11/12/2016 - 01:15 Comment #17
I didn't get your email - what address did you send it from?
Submitted by VirtualNoob on Sun, 11/13/2016 - 03:27 Comment #18
Hi Jamie
I've just sent a further copy to both your Virtualmin and webmin mailboxes. It was sent from chris at domain_removed dot co dot uk. Please let me know if it was received this time.
Thanks
Submitted by JamieCameron on Sun, 11/13/2016 - 23:10 Comment #19
Well, that was very interesting - after much debugging I found the problem, which was a corner case in which transferred files could be corrupted but only when logging in as a sudo-capable user instead of root. I've fixed this on your system, and will include the fix in the next Cloudmin release.
Submitted by JamieCameron on Sun, 11/13/2016 - 23:10 Comment #20
Submitted by VirtualNoob on Mon, 11/14/2016 - 01:03 Comment #21
That's brilliant news Jamie, thanks again. I'm quite relieved it was a bug and not a misconfiguration on our part.