Submitted by Franco Nogarin on Wed, 12/02/2015 - 10:41 Pro Licensee
Hi, my cloudmin hosts backup my vms over ssh to a NAS system, one of my hosts died last night when its logical volume blew up, failing with last night. when I try to restore the backup to a new host is fails saying that the reason for failure is
Host system smvmcloud.hardware is down
I have all my backups this way, but if I loose a host and cannot restore, what good is the backup? Anyhow, how can I restore my backups, my enterprise is totally down and I am in a remote city with no physical access.
Status:
Active
Comments
Submitted by andreychek on Wed, 12/02/2015 - 11:01 Comment #1
There shouldn't be a problem with your backups.
You mentioned that this is a new host -- are there any other VM's running on this host at the moment?
What is "smvmcloud.hardware", is that the new host you are attempted to use?
Is it possible that it's offline, or that firewall rules are preventing Cloudmin from communicating with it? You'd want to make sure that ports 22, and 10000-10010 are all available and not being blocked.
Submitted by cruiskeen on Wed, 12/02/2015 - 11:06 Comment #2
Does Cloudmin think that smvmcloud is down? Have you tried refreshing the status for smvmcloud?
Submitted by Franco Nogarin on Wed, 12/02/2015 - 11:42 Pro Licensee Comment #3
Hi, there is no "new" host as in brand new, I meant alternate host.
so smvmcloud had several VMs on it, backed up regularly. last night the host smvmcloud failed when it lost its entire logical volume.
now I wish to restore the backed up VMs that used to reside on smvmcloud to an alternate host, in the this case to host smvmbackup, when i do it give the above error message. I have checked and valid backup files are on the NAS where they need to be.
so the backup is being restored to a different host and the one that is down is listed as the reason for the restore failure.
see the attached images showing valid backup, restore params and fail message. Franco
Submitted by Franco Nogarin on Wed, 12/02/2015 - 11:52 Pro Licensee Comment #4
@cruiskeen
yes because it is down, as I mentioned, it had total failure.
Submitted by Franco Nogarin on Wed, 12/02/2015 - 11:56 Pro Licensee Comment #5
@andreychek Not a problem with firewalls, the systems are on the same subnet with no firewalls or security issues, all VMS and are on the same logical subnet. and all hosts share a physical subnet Franco
Submitted by JamieCameron on Wed, 12/02/2015 - 12:01 Comment #6
Does the VM that you are trying to restore still appear in Cloudmin on the List Managed Systems page? If so, the problem here is that Cloudmin wants to restore it on the host it thinks the VM is currently on.
The solution is to select the VM on the List Managed Systems page and click the "De-Register" button. Then re-try the restore, making sure to select the new host on the restore page.
Submitted by Franco Nogarin on Wed, 12/02/2015 - 12:15 Pro Licensee Comment #7
@JamieCameron
I followed your instructions, and when I tried to restore i got:
System to restore has been deleted, and no system details were included in the backup
I chose the least important VM in the group, so it was not a critical loss, but its gone now. :(
Submitted by Franco Nogarin on Wed, 12/02/2015 - 12:17 Pro Licensee Comment #8
FYI the cloudmin master is: Operating system Ubuntu Linux 12.04.1 Webmin version 1.770 Cloudmin version 8.3
Submitted by JamieCameron on Wed, 12/02/2015 - 12:21 Comment #9
If you copied the backup files from another system, make sure you copied all the files that begin with the hostname of the VM - there should be one large file for the disk image, and a few smaller files that contain meta-information.
Submitted by Franco Nogarin on Wed, 12/02/2015 - 12:48 Pro Licensee Comment #10
@JamieCameron
All backups are over ssh to a NAS, the files in the restore are:
-rw-r--r--+ 1 root root 1.2K DecĀ 2 04:03 nms.virtual.disks
-rw-rw-rw-+ 1 root root 6.7G DecĀ 2 04:03 nms.virtual.gz
-rw-------+ 1 root root 1.9K Nov 30 19:31 nms.virtual.serv
the backup/restore functions work fine, for fully functional hosts. I am retesting the system to prove it is functional for backup restore and will post the results as soon as I have them.
Submitted by JamieCameron on Wed, 12/02/2015 - 13:03 Comment #11
Ok -if you aren't restoring directly from the directory on the NAS, make sure you copy all those files (especially the .serv file) to the location you are restoring from.
Submitted by Franco Nogarin on Wed, 12/02/2015 - 14:21 Pro Licensee Comment #12
I am restoring directly from the nas. but I will try to copy the files to the new host and restore from there.
Submitted by Franco Nogarin on Wed, 12/02/2015 - 14:40 Pro Licensee Comment #13
restoring directly from a folder on the host works, so I have a path to restore all my VMs however I would still like to fix the problem that is causing me to not restore my VMs directly.
Submitted by JamieCameron on Wed, 12/02/2015 - 15:19 Comment #14
Yeah, the original problem is a Cloudmin bug - you shouldn't need to de-register the VM to do a restore in this case. It gets tricky because if the host was only temporarily down and Cloudmin created a new instance on a different host, you could end up with duplicates.
Submitted by Franco Nogarin on Wed, 12/02/2015 - 16:19 Pro Licensee Comment #15
ok, good show folks, I am still struggkling to bring up my entire enterprise, but it is working to do it manually and locally.
Submitted by Franco Nogarin on Wed, 12/02/2015 - 21:26 Pro Licensee Comment #16
sorry but I spoke too soon, while this technique seems to restore the VM it is unusable. here is the typical message I get when restoring, and ultimately the VM fire up but is never ping able, I suspect its booting to a blank drive - here is the restore messages:
Restoring 1 systems from /root on system smvmmaster.hardware ..
Finding systems to restore ..
.. found 1 systems
Working out backup sources ..
.. found 1 usable sources
Re-creating missing system geoserver.virtual ..
Copying 647.61 MB image file to system smvmmaster.hardware ..
.. done, and added to host cache
Creating virtual system with KVM ..
.. creation started.
Waiting for creation to complete ..
.. creation has completed successfully.
Fixing root disk device in fstab file ..
.. device fix failed : No Linux partitions found in disk image /dev/data/geoserver_virtual_img
Fixing root disk device in Grub configuration file ..
.. no Grub configuration file found!
Removing missing disks from fstab file ..
.. cleanup failed : Failed to read fstab file : No Linux partitions found in disk image /dev/data/geoserver_virtual_img
Expanding filesystem to 20 GB ..
.. failed to find primary disk
Mounting new instance's filesystem ..
.. failed : No Linux partitions found in disk image /dev/data/geoserver_virtual_img
Adding DHCP entries ..
Adding DHCP entry for 192.168.80.29 with Ethernet address 02:54:00:63:9E:29 ..
.. done
.. all done
Adding DNS entry geoserver.virtual. for IP address 192.168.80.29 ..
.. done
Fetching current status ..
.. status successfully retrieved (Down)
Re-fetching current status of host system smvmmaster.hardware ..
.. status successfully retrieved (Webmin)
Enabling system at host boot time ..
.. done
.. done
Re-creating 1 additional disks for system geoserver.virtual ..
.. done
Resizing disk LV geoserver_virtual_img from 668 MB to 15 GB, to match backup ..
.. resize failed : parted -s \/dev\/data\/geoserver_virtual_img rm 2 failed : Warning: /dev/dm-16 contains GPT signatures, indicating that it has a GPT table. However, it does not have a valid fake msdos partition table, as it should. Perhaps it was corrupted -- possibly by a program that doesn't understand GPT partition tables. Or perhaps you deleted the GPT table, and are now using an msdos partition table. Is this a GPT partition table? Error: Both the primary and backup GPT tables are corrupt. Try making a fresh table, and using Parted's rescue feature to recover partitions. . The restore may not complete properly, or may not make use of all space on the disk.
Restoring geoserver.virtual from /root/geoserver.virtual.gz on system smvmmaster.hardware ..
Restoring copies of disks for geoserver.virtual .................................................................................. ................................................................................ ....
.. restored backup
Restores of 1 systems completed successfully.
Submitted by Franco Nogarin on Wed, 12/02/2015 - 22:34 Pro Licensee Comment #17
somehow - dont undertand it but this has resolved itself - I continue restoring VMs - thanks.