Unable to restore backup of Citrix Xen VM

Restoring 1 systems from destinations from host systems ..

Finding systems to restore ..
.. found 1 systems

Working out backup sources ..
.. found 1 usable sources

Restoring Migrator from /backups/cloudmin/2011-11-18/Migrator.gz on Cloudmin master ..
.. restore failed : Failed to connect VDI c8160941-b307-42c8-a3f4-aff5e8f93086 to host 962afe18-5cf1-4f74-9f33-047bf9351b99 : A device with the name given already exists on the selected VM device: xvdb

Restores of 0 systems completed, but 1 failed!

Migrator : Failed to connect VDI c8160941-b307-42c8-a3f4-aff5e8f93086 to host 962afe18-5cf1-4f74-9f33-047bf9351b99 : A device with the name given already exists on the selected VM device: xvdb

Status: 
Closed (fixed)

Comments

Just an update, the backup process is also broken now. I have no idea what has happened. Since it last worked, I have rebooted our test cluster (both Citrix Xen Servers). Perhaps some device or folder is missing. Everything else is working normally.

I just tried running a backup and got this:

Backing up 5 systems to /backups/cloudmin/%Y-%m-%d on Cloudmin master ..

Finding systems to backup ..
.. found 5 systems

Working out backup destinations ..
.. found 5 usable destinations

Backing up Migrator to /backups/cloudmin/2011-11-18/Migrator.gz on Cloudmin master ..
.. backup failed : Failed to connect VDI c8160941-b307-42c8-a3f4-aff5e8f93086 to host 962afe18-5cf1-4f74-9f33-047bf9351b99 : A device with the name given already exists on the selected VM device: xvdb

Backing up VM3 to /backups/cloudmin/2011-11-18/VM3.tar.gz on Cloudmin master ..

....

Does rebooting the host system perhaps fix this? It looks like some device from the VM is still connected to the host ..

Thanks for the suggestion, but I already tried this to no avail.

So from what I can gather poking around the Citrix XenServer system, the VDI is the virtual disk of the "Migrator" VM which it's trying to back up and the Host ID corresponds to the UUID of the control domain on the XenServer box:

# xe vdi-list uuid=c8160941-b307-42c8-a3f4-aff5e8f93086
uuid ( RO)                : c8160941-b307-42c8-a3f4-aff5e8f93086
          name-label ( RW): VM 2 0
    name-description ( RW): Created by template provisioner
             sr-uuid ( RO): d4f8f4fa-68d2-f895-afab-93da9af51e00
        virtual-size ( RO): 8589934592
            sharable ( RO): false
           read-only ( RO): false

# xe vm-list uuid=962afe18-5cf1-4f74-9f33-047bf9351b99
uuid ( RO)           : 962afe18-5cf1-4f74-9f33-047bf9351b99
     name-label ( RW): Control domain on host: ams1-xs-1.anu.net
    power-state ( RO): running

But when I look at the disks on the control domain, I only see sda, the SATA raid that the OS is installed on:

[root@ams1-xs-1 ~]# fdisk -l

WARNING: GPT (GUID Partition Table) detected on '/dev/sda'! The util fdisk doesn't support GPT. Use GNU Parted.


Disk /dev/sda: 399.9 GB, 399988752384 bytes
256 heads, 63 sectors/track, 48439 cylinders
Units = cylinders of 16128 * 512 = 8257536 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1   *           1       48440   390614015+  ee  EFI GPT

Mounted filesystems are just the local RAID and the NFS shares for the guests:

[root@ams1-xs-1 ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1             4.0G  2.0G  1.9G  52% /
none                  380M     0  380M   0% /dev/shm
/opt/xensource/packages/iso/XenCenter.iso
                       44M   44M     0 100% /var/xen/xc-install
10.1.101.80:/Xen/d4f8f4fa-68d2-f895-afab-93da9af51e00
                      100G   32G   69G  32% /var/run/sr-mount/d4f8f4fa-68d2-f895-afab-93da9af51e00
10.1.101.80:/ISOs      10G  3.0G  7.1G  30% /var/run/sr-mount/8f1ac82b-7816-eab7-3fdd-7343d5e16350

Can you tell me what Cloudmin is actually trying to do when that error is produced, so I could try it manually?

My guess is that the virtual disk is still attached to the host system.

You can get the host's UUID with the command xe vm-list , and look for the "Control domain" VM.

You can then get the UUIDs of attached VBDs with the command xe vdb-list vm-uuid=XXX where XXX is the UUID of the VM.

You can then detach it from the host with the commands xxe vbd-unplug uuid=YYYY and xe vbd-destroy uuid=YYYY , where YYYY is the VBD's ID.

[root@ams1-xs-1 ~]# xe vbd-list vm-uuid=962afe18-5cf1-4f74-9f33-047bf9351b99
uuid ( RO)             : 0a373c01-64bb-136a-80c3-e5e7243cba37
          vm-uuid ( RO): 962afe18-5cf1-4f74-9f33-047bf9351b99
    vm-name-label ( RO): Control domain on host: ams1-xs-1.anu.net
         vdi-uuid ( RO): 40ade37f-4513-44eb-8317-f07ec9430e80
            empty ( RO): false
           device ( RO): xvdb

[root@ams1-xs-1 ~]# xe vbd-unplug uuid=0a373c01-64bb-136a-80c3-e5e7243cba37
The device is not currently attached
device: 0a373c01-64bb-136a-80c3-e5e7243cba37

[root@ams1-xs-1 ~]# xe vbd-destroy uuid=0a373c01-64bb-136a-80c3-e5e7243cba37
[root@ams1-xs-1 ~]#

The destroy command worked, and I have just run a successful restore! I will check now that backups work too.

Thanks!!

Sorry, still not completely solved. The problem appears to be that the backup process is failing due to lack of disk space:

System               Result                                        Size     
-------------------- --------------------------------------------- ----------
Migrator             OK                                            986.32 MB
VM3                  Failed to create copy of LVM disk  : dd: writ -        
ams1-mgr-1.anu.net   Failed to connect VDI 3892dcca-8225-4b4a-bc2c -        
ams1-vpn-1.anu.net   Failed to connect VDI c7eaa7a3-303e-4ea6-9dfb -        
win7.anu.net         Failed to connect VDI 8f049719-0d05-4b32-a9a6 -        

Finding systems to backup ..
. found 5 systems

Working out backup destinations ..
. found 5 usable destinations

Backing up Migrator to /backups/cloudmin/2011-11-20/Migrator.gz on Cloudmin master ..
   Compressing LVM disk for Migrator via SSH to cloudmin.anu.net ..
. created backup of 986.32 MB

Backing up VM3 to /backups/cloudmin/2011-11-20/VM3.tar.gz on Cloudmin master ..
   Creating copy of LVM disk for VM3 ..
. backup failed : Failed to create copy of LVM disk  : dd: writing `/tmp/.webmin/505568_1_fastrpc.cgi/8961866a-71b0-4691-81a6-560388b2ba6b': No space left on device
65751+0 records in
65750+0 records out
2154500096 bytes (2.2 GB) copied, 41.0787 seconds, 52.4 MB/s

VM3 has an 8GB virtual disk which is a sparse image which is 44% full, so approx. 3.5GB of data. My backup destination is on an NFS share and has 37GB free:

[root@cloudmin ~]# df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/mapper/VG0-Root   18G  1.4G   16G   9% /
/dev/xvda1             99M   14M   80M  15% /boot
tmpfs                 512M     0  512M   0% /dev/shm
10.1.101.214:/home    1.1T  992G   37G  97% /backups

I think what may be happening is that Cloudmin is trying to back up to a temporary file on the Citrix XenServer host machine, which only has a 4GB root partition with about 2GB free. Could that be the case? That would explain why the first VM backup with less than 1GB data succeeds, but the 2nd fails.

Is this expected functionality? How can I get around this?

Cloudmin should actually be able to directly copy a VM's disks to the destination system via SSH, and avoid the need to save a local temp file..

However, this isn't currently possible for VMs that have multiple disks. How many does your VM3 system have, and are any used for swap?

OK, that is the problem. VM3 has 2 disks attached.

As a workaround, can I specify a temp directory on my NFS share for those (rare) VMs with multiple drives?

We are planning on moving ~100 VMs onto a Citrix XenServer/Cloudmin setup over the next couple months, and a few of those will need multiple drives attached.

The work-around is to edit the file /etc/webmin/server-manager/config on the Cloudmin master system, and add the line :

img_temp=/tmp/cloudmin

This will specify the temp directory that will be used on all host systems. On your Citrix hosts with limited disk space, you can symlimk /tmp/cloudmin to a directory that has plenty of space.

The proper fix for this will be in the next Cloudmin release, which uses a different backup format that doesn't ever need to use temp space on the host system.

Thanks for the workaround.

What's the ETA on the next release? Can we help test? Would rather fix this properly than work around. We're still in beta stage with a handful of non-critical VMs on the new setup.

If you are interested, I should be able to send you a beta version with the proper fix later today?

that would be great. it's almost midnight local time so no hurry, i won't be looking at it until tomorrow anyway. thanks!

Ok, I have sent you a new Cloudmin RPM package via email ..

I just updated with the RPM and gave it a whirl. This time it is the Migrator VM which has 2 block devices attached (xvda, xvdb) and the backup worked perfectly!

In the backup dir I see two files:

/backups/cloudmin/2011-11-22:
total 7.5G
-rw-r--r-- 1 root root 594M Nov 22 11:16 ams1-mgr-1.anu.net.gz
-rw-r--r-- 1 root root 393M Nov 22 11:19 ams1-vpn-1.anu.net.gz
-rw-r--r-- 1 root root 987M Nov 22 11:04 Migrator.gz
-rw-r--r-- 1 root root 362M Nov 22 11:07 Migrator.gz.1
-rw-r--r-- 1 root root 348M Nov 22 11:11 VM3.gz
-rw-r--r-- 1 root root 4.8G Nov 22 11:41 win7.anu.net.gz

Do you have plans to extend the interface to allow restoring just selected drives on a system? That would be a really useful feature.

Many thanks for all your support on this issue.

OK, that looks good. The new backup format puts each disk into a separate backup file ..

SIngle-disk restores is a nice idea - I'll look into it.