Cloudmin doesn't always cleanly remove an instance in KVM; this happens about 90% of the time

When I try to do a manual cleanup after the fact from the webmin gui (the logical volume(s) is/are still there):

From the webmin gui: Failed to delete logical volume : Logical volume KVM_guests/mon3_cloudmin_brantham_ca_img is used by another device.

If I then try lvremove or lvchange -an on /dev/KVM_guests/mon3_cloudmin_brantham_ca_img I can't remove it; it yields the following error:

Logical volume KVM_guests/mon3_cloudmin_brantham_ca_img is used by another device.

If I then go into parted, I'll see there's at least one partition on it. I remove the partition(s), and then quit parted.

I try it again. Still no dice.

I check /dev/mapper: lrwxrwxrwx 1 root root 8 Nov 26 08:21 KVM_guests-mon3_cloudmin_brantham_ca_img -> ../dm-13 brw-rw---- 1 root disk 253, 14 Nov 26 08:09 KVM_guests-mon3_cloudmin_brantham_ca_imgp2

It appears /dev/mapper sees a partition that parted doesn't see (KVM_guests-mon3_cloudmin_brantham_ca_imgp2).

No reference to 253,13 (dm-13) or 253,14 in lsof, either.

The only solution I've found to this is to reboot the cloudmin server. Obviously that's a horrible solution because it means taking down all of the virtual machines and the main host OS and then bringing it all back up, disturbing numerous customers.

The thing is, cloudmin should be closing out any opened files, mounts, partitions on the logical volumes when I remove the instance. If it was, I wouldn't be having to go through this exercise.

Before it's suggested, yes, the instance is shut down before I tried removing the virtual machine.

Status: 
Active

Comments

Yeah, this sounds like a bug - Cloudmin does try to clean up any mapper devices for partitions on the disks of VMs that it deletes to avoid exactly this problem, but that can fail if some other process is using the device.

If you run the fuser command on the device file for the disk or one of it's partitions, does it return any process IDs?