Backup working, but appears to hit some time out for a retry and leaves the backup.pl thinking its still running

Backup working, but appears to hit some time out for a retry and leaves the backup.pl thinking its still running when processing when its not doing anything. Also leaves the files in the temporary dir.

I am thinking that the virtual server files are over 700mb to 1.4GB. Wondering if the backup.pl cannot wait that long and has not heard back so just wait for ever.

Is there a setting to fix the wait time or start the backup.pl back off from where it left off in the process.

So the ftp is uploaded successfully. But then it doesn't continue to the next virtual server.

Status: 
Active

Comments

These are what keeps running

root 1111 0.0 0.0 3956 576 ? Ss Feb26 0:00 /bin/sh -c /etc/webmin/virtual-server/backup.pl --id 135157000213219 root 1114 0.0 2.9 136736 91852 ? S Feb26 0:31 /usr/bin/perl /usr/share/webmin/virtual-server/backup.pl --id 135157000213219 root 3423 0.0 0.0 3956 576 ? Ss Feb23 0:00 /bin/sh -c /etc/webmin/virtual-server/backup.pl --id 135157019113481 root 3426 0.0 2.9 136700 91436 ? S Feb23 0:35 /usr/bin/perl /usr/share/webmin/virtual-server/backup.pl --id 135157019113481 root 10227 0.0 0.0 3956 576 ? Ss Feb24 0:00 /bin/sh -c /etc/webmin/virtual-server/backup.pl --id 135157023713578 root 10230 0.0 2.9 136696 91424 ? S Feb24 0:30 /usr/bin/perl /usr/share/webmin/virtual-server/backup.pl --id 135157023713578

So just to clarify -- you're saying that your backups are working just fine... but that after it's completed, you have the above backup.pl processes running still?

Sort of. It seams to stop at that virtual server that it last completed. that appears to by of a large size 700mb +. I check the ftp server and the file is there. but then no further virtual servers are backed up and backup.pl just stays running.

Hope that helps.

Also no completed/failed email comes as well. until the system limit of 3 backup.pl processes are running at one time.

That's all definitely odd, and it may be a bug, but FTP backups should be able to handle files of that size. In theory, anyhow :-)

There's another test I'd be interested in seeing -- is there any chance you could go into Backup and Restore -> Backup Domains, and generate a backup of just one of your larger domains?

I'd be curious to see what output is generated by the backup process in Virtualmin. If you could copy and paste that output into here, that'd be great.

Also, how long does it take to complete?

Which makes me think of one other thing -- if you go into Backup and Restore -> Backup Logs, do you see any errors regarding the backups that you've tried in the past?

Ok, So I kicked off a single backup of 1 virtual server at 2:14 and the time date stamp on the ftp file is 2:18 file size is 1.3GB and its a tar.gz. uploading over a lan.

It appears that the .info and .dom are not time and date stamp changed.

It appears that the backup.pl is still running.

There appears to be no backup log for me to copy as it has not finished. or backup.pl process has not created it??

Any ideas?

Well, I suppose I was curious what output you got in the GUI if you were to kick that off from within Virtualmin.

However, there is another thing we can look at. When you run a "ps", and you see that backup.pl process running there, take note of the process id (PID) of the backup.pl.

Then, if you run "strace -p pid", when it appears to be stick, what output do you receive?

This is what I see.

strace -p 10987 Process 10987 attached - interrupt to quit read(4,

"have to ctrl c to get out"

^C Process 10987 detached

If I do a manual backup in virtualmin so it displays it on the screen. Does the smaller ones fine then gets to a big one and...

I get to an uploading file and it never changes from there. I see the file being uploaded then nothing. Never moves to the next virtual server.

Anything you would like me to check?

this is getting urgent for our production servers?

Sorry for the delay I'll get back with you shortly...

That strace suggests that the backup.pl command is waiting to read from something.

Could you run the command ps auxwwww and attach the output to this bug?

Also, I'd be interested to see the output from lsof -p 10987

Thanks - from that lsof output, it looks like the backup process is hanging waiting on a response from the remote FTP server.

Does anything get logged to the log files on the FTP server (like /var/log/messages) when the backup fails at this point?

This is all I can see. looks ok. but then the backup just does not continue

(000025)11/03/2013 23:08:55 p.m. - (not logged in) (124.198.)> Connected, sending welcome message... (000025)11/03/2013 23:08:55 p.m. - (not logged in) (124.198.)> 220 FileZilla Server version 0.9.41 (000025)11/03/2013 23:08:55 p.m. - (not logged in) (124.198.)> USER backup (000025)11/03/2013 23:08:55 p.m. - (not logged in) (124.198.> 331 Password required for backup (000025)11/03/2013 23:08:55 p.m. - (not logged in) (124.198.)> PASS **** (000025)11/03/2013 23:08:55 p.m. - backup (124.198.)> 230 Logged on (000025)11/03/2013 23:08:55 p.m. - backup (124.198.)> TYPE I (000025)11/03/2013 23:08:55 p.m. - backup (124.198.)> 200 Type set to I (000025)11/03/2013 23:08:55 p.m. - backup (124.198.)> PASV (000025)11/03/2013 23:08:55 p.m. - backup (124.198.)> 227 Entering Passive Mode (000025)11/03/2013 23:08:55 p.m. - backup (124.198.)> STOR 124.198.sun/demo11.wers.co.nz.tar.gz.dom (000025)11/03/2013 23:08:55 p.m. - backup (124.198.)> 150 Connection accepted (000025)11/03/2013 23:08:55 p.m. - backup (124.198.)> 226 Transfer OK (000025)11/03/2013 23:08:55 p.m. - backup (124.198.)> QUIT (000025)11/03/2013 23:08:55 p.m. - backup (124.198.)> 221 Goodbye (000025)11/03/2013 23:08:55 p.m. - backup (124.198.)> disconnected.

Would it be possible for you to try a backup to another FTP server and see if the same hang happens?