Cloudmin backups not working, issue error

Hi there,

I have cloudmin setup to backup my instances to a backup server. I noticed that, even though I was not sent any alerts saying there was a problem, the backups were not being created. I enter the backup definition and said to "backup now". This is what i see:

HTTP/1.0 500 Perl execution failed Server: MiniServ/0.01 Date: Mon, 4 Jan 2010 13:50:54 GMT Content-type: text/html Connection: close Error - Perl execution failed

Undefined subroutine &server_manager::execute_commmand_on_server called at /usr/libexec/webmin/server-manager/xen-type-lib.pl line 4699, line 4.

Any ideas?

Thanks!

Status: 
Closed (fixed)

Comments

Thanks - this is a typo in the Cloudmin code.

The fix is to edit the file /usr/libexec/webmin/server-manager/xen-type-lib.pl and remove one of the //m// characters from commmand .. that is, change the function name to execute_command_on_server

I will include this fix in the next Cloudmin release (3.8)

OK, will do.

A key issue though: Shouldn't an error of some sort pop up somewhere? I had left the backups running and nothing every seemed wrong until I tried to run the backup manually. Anyhow, just a thought...

Thanks.

In this case, because the failure is in the Cloudmin code the only place the error will appear is in root's mailbox.

A regular backup failure would send an email notification.

That makes sense.

OK, more issues,

I get the following:


Backing up amon.xen.domain.com to /backups/vps/xen/amon.xen.domain.com.tar.gz on Cloudmin master ..
Creating LVM snapshots for disks of amon.xen.domain.com ..
Creating copies of LVM disk for amon.xen.domain.com ..
TARing up copies of LVM disks for amon.xen.domain.com ..
.. backup failed : Failed to create TAR file of filesystem : gzip: stdout: No space left on device
Backups of 0 systems completed, but 1 failed!
amon.xen.domain.com : Failed to create TAR file of filesystem : gzip: stdout: No space left on device

My first question is which device is it referring to? The device on the host node? Or on the backup server?

I think it's the host node it's referring to. If that's the case, how do you backup instances that are more than half the size of the host node's disk space?

Thanks.

By the way, I checked the master's root email. There are no error messages there regarding the backup error.

Thanks.

It's the host node. Unfortnately, it needs to have enough disk space free to create a copy of the Xen instance's logical volumes. The only way to avoid this is to not use LVM, sorry .. I've looked into a number of possible ways to tar up the LVs directly, but none were possible :-(

Not sure I understand the issue completely, but this seems like a major problem. We will effectively only be able to use half the available disk space on any one host server for virtual instances.

So you need enough disk space to create the LV snapshot AND enough disk space to tar up the snapshot. Is that right?

If we choose to not use LVM, what does that mean in terms of the overall backup process? How will it be different? How will the entire cloudmin setup be different for that matter without LVM?

What if you instead use rsync to copy an instance it over to the backup server, then tar it over there? Just trying to think around this, this is a surprise for me at the moment... :-(

Thanks.

Hi Jamie,

I also notice that after the backups failed, the snapshots are still there.

You might want to also think through what to do with the snapshots if the backups fail. I assume the next step is to delete the snapshots.

lvremove /dev/VolGroup00/amon_swap_cloudmin_snap
lvremove /dev/VolGroup00/amon_img_cloudmin_snap

Does that look right to you?

Thanks!

OK, so this is what I am thinking.

You take the snapshot. The snapshot is much smaller than the original volume and only really grows as the original volume changes. As backups are done shortly after the snapshot, we anticipate that the snapshot will not be too big.

Then, you rsync the backup over to the backup server. This would make it so that you don't have to tar the file on the original host server. Once the files are on the backup server, then you can tar it to your heart's content. This of course means that the backups would be limited to ssh/rsync type backups. FTP would be out of the question in that case...

Anyhow, just trying to help. I still hope you will answer my questions in previous posts.

Cheers, Pablo

You only need to have enough disk space free for your largest Xen instance, since they are backed up one by one. Most users have a number of small Xen guests on each host node, so this isn't too expensive.

The idea of using rsync to copy the snapshot is a good one, but will rsync actually work on device files like LVM logical volumes? I think that it won't ... certainly tar doesn't, as it just saves the block special file in the archive instead. That's why Cloudmin has to first copy it to a temporary file.

Regarding those leftover LVM snapshots, they may be a side-effect of the first problem you reported, where the backup failed without being able to clean up. If you delete them and re-run a backup, do they re-appear?

Leftover LVM snapshots: I deleted the snapshots, then re-ran the backup and after failure checked again, the snapshots are indeed still there. I have to manually delete them. So it looks like something is not catching the error in the backup script and taking care of those snapshots.

Rsync: I was more thinking of rsyncing the contents of the LVM snapshot, not the device file itself. So, if you were to mount the snapshot as a file system and then rsync the contents across to the backup server, that would effectively perform the backup, no?

Thanks!

I've considered another possible solution here - if your Xen guests have only one disk, Cloudmin could create a compressed backup file directly from the LVM snapshot, which would use up much less space on the host system. I'll see if I can implement that for the next release.

Regarding the leftover snapshots, I found a bug that can cause this, and will fix it in the next Cloudmin release.

The next Cloudmin release (3.7) will include a change in the backup format to reduce the amount of temporary disk space needed, for single-disk Xen instances on LVM.

Hi Jamie,

I am curious just how much space will be required for the temporary space. As I have it now, I only really have enough space for the base OS and a little extra. I allocate the rest of the space I need for each VM as I require it through LVM.

Also, you mentioned this new piece will be available with Cloudmin 3.7, but I am already on Cloudmin 3.7. A bit confusing :-)

Thanks for taking on this issue so proactively. I hope I can continue to provide you with valuable feedback.

Cheers!

Sorry, I meant the 3.8 release.

That will reduce disk space needed by creating a compressed file directly from the logical volume, which should be much smaller than the un-compressed volume. In fact, the current Cloudmin release creates both un-compressed and compressed copies on the host system.

Hi Jamie,

I still don't think that's the optimal solution. I really wish you would consider mounting the snapshot and ryncing or FTPing or whatever the files over to the backup server from there.

Basically, having to write something to disk seems dangerous for servers that are close to disk capacity. Again, my Xen node has limited disk space in the boot device. All left over space is waiting to be allocated as LVM volumes. Being this the case, this is why I ask how much space your method will require.

I have done some google searching and found that this is the usual way of handling LVM snapshot backups to remote servers. Create snapshot -> mount it -> backup the files on the mount.

Regardless of what you decide, thanks again for taking the time to work on this.

Thanks!

There is another optimization I could perform, which is to pipe the gzipped file directly into ssh to copy to the target system. The command would be something like :

dd /dev/volumegroup/xen_lv | gzip -c | ssh user@host "cat >/backup/xeninstance.gz"

Jamie, that sounds like exactly the right solution. You would not need to write anything at all to the local disk. Awesome.

Of course, I imagine you have to think about how to catch errors... ;-)

But absolutely, great approach!

Thanks again.

I'll work on this for Cloudmin 3.8 ..

Cloudmin 3.9 will support direct backups via SSH with no intermediate files are all, for Xen instances with a single disk and Xen instances not using LVM.

Automatically closed -- issue fixed for 2 weeks with no activity.

Hi Jamie,

I just wanted to let you know that the backups are not working. I am still getting disk space errors after updating Cloudmin to 3.9. Here is my log:

Backing up 1 systems to destinations from host systems ..

  Finding systems to backup ..
  .. found 1 systems

  Working out backup destinations ..
  .. found 1 usable destinations

  Backing up amon.xen.demonio.com to /backups/vps/xen/amon.xen.demonio.com.tar.gz on Cloudmin master ..
        Creating LVM snapshots for disks of amon.xen.demonio.com ..
        Creating copies of LVM disk for amon.xen.demonio.com ..
        Removing LVM snapshots for disks of amon.xen.demonio.com ..
  .. backup failed : Failed to create copy of LVM disk /dev/VolGroup00/amon_img_cloudmin_snap : dd: writing `/tmp/.webmin/687618_1_fastrpc.cgi/amon_img': No space left on device 164466+0 records in 164465+0 records out 5389209600 bytes (5.4 GB) copied, 131.074 seconds, 41.1 MB/s

Backups of 0 systems completed, but 1 failed!

amon.xen.demonio.com : Failed to create copy of LVM disk /dev/VolGroup00/amon_img_cloudmin_snap : dd: writing `/tmp/.webmin/687618_1_fastrpc.cgi/amon_img': No space left on device 164466+0 records in 164465+0 records out 5389209600 bytes (5.4 GB) copied, 131.074 seconds, 41.1 MB/s

Can you help?

Have you tried updating to 4.0, which is now out? It fixes a few bugs in this area ..

Thanks Jamie, that fixed it! Cheers, Pablo

i got that in virtualmin when trying to restore multiple domains. One Fails with this error

Restoring AWstats configuration file .. .. done

Re-creating mail and FTP users .. HTTP/1.0 500 Perl execution failed Server: MiniServ/0.01 Date: Tue, 4 May 2010 02:00:41 GMT Content-type: text/html Connection: close

Error - Perl execution failed

Modification of non-creatable array value attempted, subscript -1 at /usr/share/webmin/postfix/postfix-lib.pl line 488.

any ideas?

netzsolutions.de - that looks like a separate bug, related to Virtualmin and not Cloudmin (which is what this bug report was originally about).

Could you open a separate ticket for this awstats issue?

Automatically closed -- issue fixed for 2 weeks with no activity.

H Jamie,

I wanted to revisit this issue once again.

It turns out that I am creating VM's that have more than one disk attached to them. The backups do not work since this fix was specifically made for single disk VM's.

Is there any chance that the backup routine can be looked at once again to try to get around the disk space issues?

Thank you.

Is this second disk just for swap, or do you have it mounted as an actual filesystem?

It is actually mounted as a file system.

Ok .. so in that case, if you have multiple disks on LVM Cloudmin cannot create a single backup tar file that contains all disk images. This means it needs to make a copy of each to local disk (typically under /tmp) on the host system, then tar up those files.

Because there isn't really any easy solution to this issue, my suggestion would be to make your /tmp filesystem much larger if possible - perhaps even make it a mount from the same LVM volume group you use for virtual system disks.

Hi Jamie,

Thanks for your reply.

Let me ask a general question:

If a VM has more than one mounted file system, will I still be able to move the server across Xen hosts?

If so, wouldn't it be possible to use a similar mechanism for backups?

I guess the question is: If I move the VM to another VM host, is it also necessary to have a large /tmp folder?

I imagine there's a reasonable explanation for why it's not the same thing, but I am really hoping that we can find a way to work around the problem.

Thanks for a great product!

Cheers, Pablo

Yes, moves are supported for VMs that have multiple disks, because the move is done one disk at a time.

For backups Cloudmin needs to create a single file, which is why a different method has to be used.

Is there any reason why you can't have a larger /tmp filesystem on your host systems?

Jamie,

The reason I can't increase /tmp is simply because we do not have enough disk space. As I wrote previously in this ticket, based on the current backup scheme, our host servers would have to have 50% of disk space dedicated to backup /tmp space. This is very limiting.

For example, on one host I have a VM that is 100GB total size, yet the disk space total is 120GB. This means I only have 20GB left over for performing backups in /tmp. This is where the problem lies. I need a way around this problem and that's why I am hoping for something like the single server backup method which does not require /tmp space.

Thanks for giving this some thought.

Fair enough .. I can see how this could be an issue.

We are working on code changes to support a new multi-file backup format that would avoid this issue, which will hopefully be in the next Cloudmin release.

Awesome news Jamie! If you need any help testing please let me know, I will be glad to help.

Best! Pablo

The next Cloudmin release will use a separate file per disk, and no temp space on the host system.

Awesome news Jamie!

Thanks for giving this priority.

Cheers, Pablo

Automatically closed -- issue fixed for 2 weeks with no activity.