live migration doesn't work

Live Migration doesn't work as I hoped it would.

Look here:

Moving Xen system from usaxen03 to master1 ..

Checking if live migration is possible .. .. not possible when system will be shut down for moving

Shutting down on original host system (because LVM volume group is different) .. .. done

Copying disk images to new host system ..

  Copying disk /dev/fluid02/rudi1_img ..
  .. copied image of 1024 MB

  Copying disk /dev/fluid02/rudi1_swap ..
  .. copied image of 512 MB

.. copied 2 disk images totalling 1.50 GB

Fixing and copying Xen config file /xen/rudi1.cfg .. .. copied to /xen/rudi1.cfg

Deleting disk and config files from original host .. .. done

Configuring Xen instance for VNC console access .. .. added on dynamic port

Starting up on new host system .. Re-fetching system status .. .. done. New status is : Down

.. startup failed :

PTY PID: 5902 Using config file "/xen/rudi1.cfg". Error: Kernel image does not exist: /xen/vmlinuz-vm2-xenU

Status: 
Active

Comments

So looks like there are two separate issues here..

  1. Live migration won't work unless the LVM volume group has the same name on both systems, or else the device names for disk images would change.

  2. The kernel file is missing on the new host system.. This looks like a Cloudmin bug, but the work-around is to create a new Xen instance on the new host system, which will create the missing kernel file. You can then start up the migrated instance.

  1. Ok, I see where you're going with this, BUT, what about an option to check and see which LVM volume has space on the new host, and modify the xen config file accordingly, using the new LVM volume instead?

  2. This isn't really a good work-around - although it's workable, it doesn't work, since the existing VM name is already registered, I can't create a another MV (even on the new server) with the same name. CloudMin won't allow it.

  1. The Xen config does get modified correctly in this case .. however, that won't help with live migration, as Xen doesn't have any ability to use a different disk device path during a live move.

  2. You don't have to create a new VM with the same name - just create any VM on the new host system, then delete it.

So what do I need to do to have a fool-proof, and no-downtime setup, live migration between 2 XEN servers?

They need to both share the same storage, typically Clustered LVM. Also, the volume group name needs to be the same on both systems.

Thanx Jamie,

So, how do I set this up? What recommendations do you have, that works with CloudMin? I don't see anything like that in the Documents.

And, what about the fact that the 2 servers could be in different data centers, how would I perform a live migration in this case?

We have servers in 4 different data centers, and sometimes a client want to move a VPS to a different DC - what do you suggest? I

From what I understand of Cluster LVM, it can't be run across multiple datacenters .. so live migration isn't possible. However, I haven't actually ever set it up myself.

The closest you can get when moving between datacenters is to have Cloudmin freeze the state of the system, transfer across the disk files, then un-freeze it again on the new host. This isn't quite live migration in that it doesn't happen instantly, but it avoids the need to shut down a system.

Hi Jamie,

Ok, so by the sounds of it, this isn't something you have setup before, and don't know if CloudMin can do it either?

Has anyone setup clustered LVM +CloudMin before?

Is it possible for CloudMin to work with SAN servers?

Other customers have used Cloudmin with cluster LVM - it basically behaves the same as regular LVM, but allows multiple systems to access the same volume group.

Any SAN that works with Linux LVM will work with Cloudmin. However, most customers just go with single-system LVM, as the additional administrator overhead of setting up some kind of storage cluster isn't worth it, given that the only real gain is a small reduction in down-time when moving Xen guests.

Hi Jamie,

I hear you, but this does raise a bit of a concern then :) How does one do a live migration, without a SAN in place?

And, do you have any documentation / references / suggestions for setting up a SAN? Apart from the fact that it could lead to an IO bottleneck with many VM's, and that a SAN setup would require more expensive hardware, what other "pitfalls" would there be to use a SAN?

And wouldn't this help to make the whole CloudMin setup more redundant?

I can't really comment much on SAN setup, as I've never done it - Cloudmin can make use of a SAN, but doesn't yet do the work of setting one up for you. Some kind of shared storage like this is needed for live migration to work. Another alternative is NFS, which does work and requires less specialized hardware, but at the cost of performance.

I would suggest reading up on "cluster lvm" for more information.

ok, so live migration, and more importantly, high availability isn't currently possible with CloudMin.

Live migration is possible, but only if you have shared storage setup, whichhas to be done outside of cloudmin.