Unable to restore backups

Hi, my cloudmin hosts backup my vms over ssh to a NAS system, one of my hosts died last night when its logical volume blew up, failing with last night. when I try to restore the backup to a new host is fails saying that the reason for failure is

Host system smvmcloud.hardware is down

I have all my backups this way, but if I loose a host and cannot restore, what good is the backup? Anyhow, how can I restore my backups, my enterprise is totally down and I am in a remote city with no physical access.

Status: 
Active

Comments

There shouldn't be a problem with your backups.

You mentioned that this is a new host -- are there any other VM's running on this host at the moment?

What is "smvmcloud.hardware", is that the new host you are attempted to use?

Is it possible that it's offline, or that firewall rules are preventing Cloudmin from communicating with it? You'd want to make sure that ports 22, and 10000-10010 are all available and not being blocked.

Does Cloudmin think that smvmcloud is down? Have you tried refreshing the status for smvmcloud?

Hi, there is no "new" host as in brand new, I meant alternate host.

so smvmcloud had several VMs on it, backed up regularly. last night the host smvmcloud failed when it lost its entire logical volume.

now I wish to restore the backed up VMs that used to reside on smvmcloud to an alternate host, in the this case to host smvmbackup, when i do it give the above error message. I have checked and valid backup files are on the NAS where they need to be.

so the backup is being restored to a different host and the one that is down is listed as the reason for the restore failure.

see the attached images showing valid backup, restore params and fail message. Franco

@cruiskeen
yes because it is down, as I mentioned, it had total failure.

@andreychek Not a problem with firewalls, the systems are on the same subnet with no firewalls or security issues, all VMS and are on the same logical subnet. and all hosts share a physical subnet Franco

Does the VM that you are trying to restore still appear in Cloudmin on the List Managed Systems page? If so, the problem here is that Cloudmin wants to restore it on the host it thinks the VM is currently on.

The solution is to select the VM on the List Managed Systems page and click the "De-Register" button. Then re-try the restore, making sure to select the new host on the restore page.

@JamieCameron

I followed your instructions, and when I tried to restore i got:

System to restore has been deleted, and no system details were included in the backup

I chose the least important VM in the group, so it was not a critical loss, but its gone now. :(

FYI the cloudmin master is: Operating system Ubuntu Linux 12.04.1 Webmin version 1.770 Cloudmin version 8.3

If you copied the backup files from another system, make sure you copied all the files that begin with the hostname of the VM - there should be one large file for the disk image, and a few smaller files that contain meta-information.

@JamieCameron

All backups are over ssh to a NAS, the files in the restore are:

-rw-r--r--+ 1 root root 1.2K DecĀ  2 04:03 nms.virtual.disks
-rw-rw-rw-+ 1 root root 6.7G DecĀ  2 04:03 nms.virtual.gz
-rw-------+ 1 root root 1.9K Nov 30 19:31 nms.virtual.serv

the backup/restore functions work fine, for fully functional hosts. I am retesting the system to prove it is functional for backup restore and will post the results as soon as I have them.

Ok -if you aren't restoring directly from the directory on the NAS, make sure you copy all those files (especially the .serv file) to the location you are restoring from.

I am restoring directly from the nas. but I will try to copy the files to the new host and restore from there.

restoring directly from a folder on the host works, so I have a path to restore all my VMs however I would still like to fix the problem that is causing me to not restore my VMs directly.

Yeah, the original problem is a Cloudmin bug - you shouldn't need to de-register the VM to do a restore in this case. It gets tricky because if the host was only temporarily down and Cloudmin created a new instance on a different host, you could end up with duplicates.

ok, good show folks, I am still struggkling to bring up my entire enterprise, but it is working to do it manually and locally.

sorry but I spoke too soon, while this technique seems to restore the VM it is unusable. here is the typical message I get when restoring, and ultimately the VM fire up but is never ping able, I suspect its booting to a blank drive - here is the restore messages:

Restoring 1 systems from /root on system smvmmaster.hardware ..

Finding systems to restore ..
.. found 1 systems
Working out backup sources ..
.. found 1 usable sources

Re-creating missing system geoserver.virtual ..
Copying 647.61 MB image file to system smvmmaster.hardware ..
.. done, and added to host cache
Creating virtual system with KVM ..
.. creation started.

Waiting for creation to complete ..
.. creation has completed successfully.

Fixing root disk device in fstab file ..
.. device fix failed : No Linux partitions found in disk image /dev/data/geoserver_virtual_img

Fixing root disk device in Grub configuration file ..
.. no Grub configuration file found!

Removing missing disks from fstab file ..
.. cleanup failed : Failed to read fstab file : No Linux partitions found in disk image /dev/data/geoserver_virtual_img

Expanding filesystem to 20 GB ..
.. failed to find primary disk

Mounting new instance's filesystem ..
.. failed : No Linux partitions found in disk image /dev/data/geoserver_virtual_img

Adding DHCP entries ..
Adding DHCP entry for 192.168.80.29 with Ethernet address 02:54:00:63:9E:29 ..
.. done
.. all done
Adding DNS entry geoserver.virtual. for IP address 192.168.80.29 ..
.. done

Fetching current status ..
.. status successfully retrieved (Down)

Re-fetching current status of host system smvmmaster.hardware ..
.. status successfully retrieved (Webmin)

Enabling system at host boot time ..
.. done

.. done
Re-creating 1 additional disks for system geoserver.virtual ..
.. done

Resizing disk LV geoserver_virtual_img from 668 MB to 15 GB, to match backup ..
.. resize failed : parted -s \/dev\/data\/geoserver_virtual_img rm 2 failed : Warning: /dev/dm-16 contains GPT signatures, indicating that it has a GPT table. However, it does not have a valid fake msdos partition table, as it should. Perhaps it was corrupted -- possibly by a program that doesn't understand GPT partition tables. Or perhaps you deleted the GPT table, and are now using an msdos partition table. Is this a GPT partition table? Error: Both the primary and backup GPT tables are corrupt. Try making a fresh table, and using Parted's rescue feature to recover partitions. . The restore may not complete properly, or may not make use of all space on the disk.

Restoring geoserver.virtual from /root/geoserver.virtual.gz on system smvmmaster.hardware ..
Restoring copies of disks for geoserver.virtual .................................................................................. ................................................................................ ....
.. restored backup
Restores of 1 systems completed successfully.

somehow - dont undertand it but this has resolved itself - I continue restoring VMs - thanks.