Network connection lost during VM move

Hello,

We were transfering a KVM virtual machine from a host to another.

For some reason, the second host lost network connection during the transfer.

Now, we have a situation where the LVM volume wasn't completely copied on the second host, but cloudmin "thinks" the VM is already on the second host.

The LVM volume is still present on the first host.

What would be the procedure to be able to retry the transfer ?

Status: 
Closed (fixed)

Comments

Idea : could we manually restart the LVM volume transfer via dd ?

What would be the commands to issue on both servers ?

One thing you could do is tell Cloudmin that the KVM instance is still on the original system, so that you can re-try the move to the new system.

This can be done from the command line with a command like :

vm2 move-system move-system --host your-kvm-instance --dest original-host --manual

Yes it worked. The VM is back on the first host. I could start it to verify it's filesystem. Now I will start transfer over again.

Thanks !

Suggestion : shouldn't Cloudmin change the VMs config files only after completion of the move procedure ?

Yes, that is what is supposed to happen ... however, if the transfer was to fail in such as way that it appeared to succeed, Cloudmin's configs might be improperly updated.

Do you happen to have a record of exactly what appeared on the move page?

Unfortunatly, I didn't save the output...

Here's the event sequence :

In fact, the network connection was dropped between the two hosts. The transfer was stalled, but Cloudmin was still showing the same message.

I did a "ps aux | grep dd" and saw, on both hosts, the dd processes still up. So I killed those processes (no "Interrupt transfer" button in Cloudmin...).

Immediatly after that, Cloudmin displayed a message, something like "Transfer completed" and "Parent host unavailable", in red.

The LVM volume was still present on the first host and also appeared on the second host.

The host's IP address was from the second host's range, and was listed as hosted on it.

Hope it helps.

That seems like a Cloudmin bug - it should have detected the transfer as having failed after dd was killed.

Do you happen to recall what it was displaying just before you killed dd?

Not exactly, but I think it was still displaying the message saying that Cloudmin was transfering the filesystem.

I didn't see an error message, besides the one saying the parent host was unreachable.

Sorry I didn't save the output...

Ok, I will take a closer look at the code and figure out why this failed transfer was considered successful..

But since the original issue has been fixed, I'll close this bug.

Automatically closed -- issue fixed for 2 weeks with no activity.