Submitted by chriswik on Fri, 11/18/2011 - 06:06 Pro Licensee
Restoring 1 systems from destinations from host systems ..
Finding systems to restore ..
.. found 1 systems
Working out backup sources ..
.. found 1 usable sources
Restoring Migrator from /backups/cloudmin/2011-11-18/Migrator.gz on Cloudmin master ..
.. restore failed : Failed to connect VDI c8160941-b307-42c8-a3f4-aff5e8f93086 to host 962afe18-5cf1-4f74-9f33-047bf9351b99 : A device with the name given already exists on the selected VM device: xvdb
Restores of 0 systems completed, but 1 failed!
Migrator : Failed to connect VDI c8160941-b307-42c8-a3f4-aff5e8f93086 to host 962afe18-5cf1-4f74-9f33-047bf9351b99 : A device with the name given already exists on the selected VM device: xvdb
Status:
Closed (fixed)
Comments
Submitted by andreychek on Fri, 11/18/2011 - 09:24 Comment #1
Submitted by chriswik on Fri, 11/18/2011 - 10:44 Pro Licensee Comment #2
Just an update, the backup process is also broken now. I have no idea what has happened. Since it last worked, I have rebooted our test cluster (both Citrix Xen Servers). Perhaps some device or folder is missing. Everything else is working normally.
I just tried running a backup and got this:
Backing up 5 systems to /backups/cloudmin/%Y-%m-%d on Cloudmin master ..
....
Submitted by JamieCameron on Fri, 11/18/2011 - 12:31 Comment #3
Does rebooting the host system perhaps fix this? It looks like some device from the VM is still connected to the host ..
Submitted by chriswik on Fri, 11/18/2011 - 15:13 Pro Licensee Comment #4
Thanks for the suggestion, but I already tried this to no avail.
So from what I can gather poking around the Citrix XenServer system, the VDI is the virtual disk of the "Migrator" VM which it's trying to back up and the Host ID corresponds to the UUID of the control domain on the XenServer box:
# xe vdi-list uuid=c8160941-b307-42c8-a3f4-aff5e8f93086
uuid ( RO) : c8160941-b307-42c8-a3f4-aff5e8f93086
name-label ( RW): VM 2 0
name-description ( RW): Created by template provisioner
sr-uuid ( RO): d4f8f4fa-68d2-f895-afab-93da9af51e00
virtual-size ( RO): 8589934592
sharable ( RO): false
read-only ( RO): false
# xe vm-list uuid=962afe18-5cf1-4f74-9f33-047bf9351b99
uuid ( RO) : 962afe18-5cf1-4f74-9f33-047bf9351b99
name-label ( RW): Control domain on host: ams1-xs-1.anu.net
power-state ( RO): running
But when I look at the disks on the control domain, I only see sda, the SATA raid that the OS is installed on:
[root@ams1-xs-1 ~]# fdisk -l
WARNING: GPT (GUID Partition Table) detected on '/dev/sda'! The util fdisk doesn't support GPT. Use GNU Parted.
Disk /dev/sda: 399.9 GB, 399988752384 bytes
256 heads, 63 sectors/track, 48439 cylinders
Units = cylinders of 16128 * 512 = 8257536 bytes
Device Boot Start End Blocks Id System
/dev/sda1 * 1 48440 390614015+ ee EFI GPT
Mounted filesystems are just the local RAID and the NFS shares for the guests:
[root@ams1-xs-1 ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 4.0G 2.0G 1.9G 52% /
none 380M 0 380M 0% /dev/shm
/opt/xensource/packages/iso/XenCenter.iso
44M 44M 0 100% /var/xen/xc-install
10.1.101.80:/Xen/d4f8f4fa-68d2-f895-afab-93da9af51e00
100G 32G 69G 32% /var/run/sr-mount/d4f8f4fa-68d2-f895-afab-93da9af51e00
10.1.101.80:/ISOs 10G 3.0G 7.1G 30% /var/run/sr-mount/8f1ac82b-7816-eab7-3fdd-7343d5e16350
Can you tell me what Cloudmin is actually trying to do when that error is produced, so I could try it manually?
Submitted by JamieCameron on Sat, 11/19/2011 - 00:19 Comment #5
My guess is that the virtual disk is still attached to the host system.
You can get the host's UUID with the command
xe vm-list
, and look for the "Control domain" VM.You can then get the UUIDs of attached VBDs with the command
xe vdb-list vm-uuid=XXX
where XXX is the UUID of the VM.You can then detach it from the host with the commands
xxe vbd-unplug uuid=YYYY
andxe vbd-destroy uuid=YYYY
, where YYYY is the VBD's ID.Submitted by chriswik on Sat, 11/19/2011 - 04:04 Pro Licensee Comment #6
[root@ams1-xs-1 ~]# xe vbd-list vm-uuid=962afe18-5cf1-4f74-9f33-047bf9351b99
uuid ( RO) : 0a373c01-64bb-136a-80c3-e5e7243cba37
vm-uuid ( RO): 962afe18-5cf1-4f74-9f33-047bf9351b99
vm-name-label ( RO): Control domain on host: ams1-xs-1.anu.net
vdi-uuid ( RO): 40ade37f-4513-44eb-8317-f07ec9430e80
empty ( RO): false
device ( RO): xvdb
[root@ams1-xs-1 ~]# xe vbd-unplug uuid=0a373c01-64bb-136a-80c3-e5e7243cba37
The device is not currently attached
device: 0a373c01-64bb-136a-80c3-e5e7243cba37
[root@ams1-xs-1 ~]# xe vbd-destroy uuid=0a373c01-64bb-136a-80c3-e5e7243cba37
[root@ams1-xs-1 ~]#
The destroy command worked, and I have just run a successful restore! I will check now that backups work too.
Thanks!!
Submitted by chriswik on Sat, 11/19/2011 - 04:05 Pro Licensee Comment #7
Submitted by chriswik on Mon, 11/21/2011 - 04:19 Pro Licensee Comment #8
Sorry, still not completely solved. The problem appears to be that the backup process is failing due to lack of disk space:
System Result Size
-------------------- --------------------------------------------- ----------
Migrator OK 986.32 MB
VM3 Failed to create copy of LVM disk : dd: writ -
ams1-mgr-1.anu.net Failed to connect VDI 3892dcca-8225-4b4a-bc2c -
ams1-vpn-1.anu.net Failed to connect VDI c7eaa7a3-303e-4ea6-9dfb -
win7.anu.net Failed to connect VDI 8f049719-0d05-4b32-a9a6 -
Finding systems to backup ..
. found 5 systems
Working out backup destinations ..
. found 5 usable destinations
Backing up Migrator to /backups/cloudmin/2011-11-20/Migrator.gz on Cloudmin master ..
Compressing LVM disk for Migrator via SSH to cloudmin.anu.net ..
. created backup of 986.32 MB
Backing up VM3 to /backups/cloudmin/2011-11-20/VM3.tar.gz on Cloudmin master ..
Creating copy of LVM disk for VM3 ..
. backup failed : Failed to create copy of LVM disk : dd: writing `/tmp/.webmin/505568_1_fastrpc.cgi/8961866a-71b0-4691-81a6-560388b2ba6b': No space left on device
65751+0 records in
65750+0 records out
2154500096 bytes (2.2 GB) copied, 41.0787 seconds, 52.4 MB/s
VM3 has an 8GB virtual disk which is a sparse image which is 44% full, so approx. 3.5GB of data. My backup destination is on an NFS share and has 37GB free:
[root@cloudmin ~]# df -h
Filesystem Size Used Avail Use% Mounted on
/dev/mapper/VG0-Root 18G 1.4G 16G 9% /
/dev/xvda1 99M 14M 80M 15% /boot
tmpfs 512M 0 512M 0% /dev/shm
10.1.101.214:/home 1.1T 992G 37G 97% /backups
I think what may be happening is that Cloudmin is trying to back up to a temporary file on the Citrix XenServer host machine, which only has a 4GB root partition with about 2GB free. Could that be the case? That would explain why the first VM backup with less than 1GB data succeeds, but the 2nd fails.
Is this expected functionality? How can I get around this?
Submitted by JamieCameron on Mon, 11/21/2011 - 15:05 Comment #9
Cloudmin should actually be able to directly copy a VM's disks to the destination system via SSH, and avoid the need to save a local temp file..
However, this isn't currently possible for VMs that have multiple disks. How many does your VM3 system have, and are any used for swap?
Submitted by chriswik on Mon, 11/21/2011 - 15:31 Pro Licensee Comment #10
OK, that is the problem. VM3 has 2 disks attached.
As a workaround, can I specify a temp directory on my NFS share for those (rare) VMs with multiple drives?
We are planning on moving ~100 VMs onto a Citrix XenServer/Cloudmin setup over the next couple months, and a few of those will need multiple drives attached.
Submitted by JamieCameron on Mon, 11/21/2011 - 15:57 Comment #11
The work-around is to edit the file
/etc/webmin/server-manager/config
on the Cloudmin master system, and add the line :img_temp=/tmp/cloudmin
This will specify the temp directory that will be used on all host systems. On your Citrix hosts with limited disk space, you can symlimk
/tmp/cloudmin
to a directory that has plenty of space.The proper fix for this will be in the next Cloudmin release, which uses a different backup format that doesn't ever need to use temp space on the host system.
Submitted by chriswik on Mon, 11/21/2011 - 16:02 Pro Licensee Comment #12
Thanks for the workaround.
What's the ETA on the next release? Can we help test? Would rather fix this properly than work around. We're still in beta stage with a handful of non-critical VMs on the new setup.
Submitted by JamieCameron on Mon, 11/21/2011 - 16:48 Comment #13
If you are interested, I should be able to send you a beta version with the proper fix later today?
Submitted by chriswik on Mon, 11/21/2011 - 16:50 Pro Licensee Comment #14
that would be great. it's almost midnight local time so no hurry, i won't be looking at it until tomorrow anyway. thanks!
Submitted by JamieCameron on Mon, 11/21/2011 - 23:45 Comment #15
Ok, I have sent you a new Cloudmin RPM package via email ..
Submitted by chriswik on Tue, 11/22/2011 - 05:05 Pro Licensee Comment #16
I just updated with the RPM and gave it a whirl. This time it is the Migrator VM which has 2 block devices attached (xvda, xvdb) and the backup worked perfectly!
In the backup dir I see two files:
/backups/cloudmin/2011-11-22:
total 7.5G
-rw-r--r-- 1 root root 594M Nov 22 11:16 ams1-mgr-1.anu.net.gz
-rw-r--r-- 1 root root 393M Nov 22 11:19 ams1-vpn-1.anu.net.gz
-rw-r--r-- 1 root root 987M Nov 22 11:04 Migrator.gz
-rw-r--r-- 1 root root 362M Nov 22 11:07 Migrator.gz.1
-rw-r--r-- 1 root root 348M Nov 22 11:11 VM3.gz
-rw-r--r-- 1 root root 4.8G Nov 22 11:41 win7.anu.net.gz
Do you have plans to extend the interface to allow restoring just selected drives on a system? That would be a really useful feature.
Many thanks for all your support on this issue.
Submitted by JamieCameron on Tue, 11/22/2011 - 11:49 Comment #17
OK, that looks good. The new backup format puts each disk into a separate backup file ..
SIngle-disk restores is a nice idea - I'll look into it.