Amazon S3 scheduled backup failed [#23710]

Submitted by JamieCameron on Sat, 10/20/2012 - 12:46 Comment #1

Does this happen again if you re-try the backup?

Also, what did you enter in the field for the S3 backup destination?

Log in or register to post comments

Submitted by lvsys on Sat, 10/20/2012 - 14:53 Comment #2

Well, I just successfully backed up one virtual server. The full backup takes 1 hour so I am relunctant to start it during day time.

That being said, by looking into S3 in details, the backup partially worked last night: 1/3 of the virtual servers made it onto Amazon S3. All others failed (we have 48 websites and 110 domains on the machine)

The S3 backup destination is: bucketname/servername/backup--%Y-%m-%d ---- both bucketname and servername exist and work fine.

Our 2 other virtualmin dedicated servers backed up to Amazon fine last night - just this one server failed.

I am wondering if you are retrying on failure when you upload to S3.... We've been using S3 in production in our applications and found out that upload failures are frequent on S3 and that retry logic is required due to Amazon's eventual consistency model.

Log in or register to post comments

Submitted by JamieCameron on Sat, 10/20/2012 - 15:08 Comment #3

Yes, Virtualmin does re-try up to 3 times (by default) to upload to S3.

Are you backing up each domain to a separate file?

Also, how large is the largest domain you are backing up? For big files (more than 5 GB), Amazon requires a different API call to upload.

Log in or register to post comments

Submitted by lvsys on Sat, 10/20/2012 - 19:56 Comment #4

OK. We'll see tomorrow if we get the same error.

We do backup each virtual server to a separate file.

The largest domain is 500 MB.

M

Log in or register to post comments

Submitted by JamieCameron on Sat, 10/20/2012 - 20:00 Comment #5

That should work fine then.

When this error happens, is it for the first domain in the backup, or do some succeed and then others fail?

Log in or register to post comments

Submitted by lvsys on Sun, 10/21/2012 - 00:19 Comment #6

Ok - we'll see in the morning.

When it failed: some succeeded and some failed. No specific order. Seemed fairly randomly distributed

What would that indicate? Network spottiness on our side? (it's possible)

We have two destinations: S3 + Local NFS. The local NFS backup worked fine, so I am not sure it was network related.

What is the command that gets executed for the backup?

Log in or register to post comments

Submitted by lvsys on Sun, 10/21/2012 - 11:32 Comment #7

It failed again.

It's frustrating, Let's compare two servers: "server success" and "server failure".

The log for "server success" show a transfer message (successful )after each backup of a virtual server

The log for "server failure" does not show any transfer action after each backup of a virtual server.

BOTH scheduled backups are EXACTLY the same on both servers, except the destination folder within the S3 bucket.

The checkbox "transfer each server" seems to not work at all on "server failure"

I deleted the schedule on the "server failure", and recreated exactly just like on the good server.

What can cause this issue?

MD

Log in or register to post comments

Submitted by lvsys on Sun, 10/21/2012 - 11:44 Comment #8

I am adding a screenshot of our backup settings - without the private info.

I just restarted a backup and I cancelled it because it did the same thing .. wasn't notifying of transfer after the backup of a single virtual server.

can you please advise?

Thanks

Log in or register to post comments

Submitted by JamieCameron on Sun, 10/21/2012 - 12:36 Comment #9

So do you have two systems that are both writing to the same bucket at the same time?

If possible, could you attach the output from a backup that failed? I'd like to see what messages were logged and when..

Log in or register to post comments

Submitted by lvsys on Sun, 10/21/2012 - 13:07 Comment #10

Yes, actually, we have 3 virtualmin systems that backup to the same bucket at the same time. 2 of them work fine - no problem. 1 fails.

I would prefer to email you the backup log as it contains sensitive info.

where can I send the log?

MD

Log in or register to post comments

Submitted by JamieCameron on Sun, 10/21/2012 - 17:21 Comment #11

Yes, you can email it to me at jcameron@virtualmin.com

Also, have you tried staggering the backups so they don't all happen at the exact same time?

Log in or register to post comments

Submitted by lvsys on Mon, 10/22/2012 - 13:03 Comment #12

I just emailed two backup logs to you (from last night). A successful one and a failed one. and the screenshots of the two scheduled backups (identical)

Yes, the backups are staggered: Server 1 (failure) at midnight. Server 2 (successful) at 1:30 AM

Log in or register to post comments

Submitted by lvsys on Mon, 10/22/2012 - 13:06 Comment #13

Could this be a problem because we have both S3 and a Local File as backup destinations, that cause the backup logic to upload everything at once, at the end of the backup?

Log in or register to post comments

Submitted by lvsys on Tue, 10/23/2012 - 11:42 Comment #14

Hi Jamie,

Thanks for your help, and private emails. We followed your advice and split up the backup and it worked better.

That being said, from the 120 domains we're backing up, there are 4 aliased domains that fail with the following errors, not consecutively (they happen throughout, not one after the other)

Creating backup for virtual server xxxx.com ..
    Copying virtual server configuration ..
    .. done

    Copying Apache aliases ..
    Uploading archive to Amazon's S3 service ..
    .. upload failed! Upload failed : HTTP/1.1 411 Length Required

.. completed in 31 seconds

From digging in the Amazon S3 doc for the 411 Length Required. I am wondering if the system is uploading a 0 length file - which would trigger the error. As I said above these are all for aliased domains, not for the real websites, but still, it's a bit unsettling.

Are you using the s3-bash scripts? Maybe there's a bug in them? Our other servers don't have this issue. Wondering what's up with this particular system that makes it different.

Regardless, wondering if we could get more info about the specific file that failed to upload: the .gz, the .dom, the .info ... all of them... ?

Cheers

MD

Log in or register to post comments

Submitted by JamieCameron on Tue, 10/23/2012 - 12:37 Comment #15

That error means that the domain's tar.gz file failed to upload. If you check the backup of the same alias domain to local disk, are the tar.gz files zero size?

Log in or register to post comments

Submitted by lvsys on Thu, 10/25/2012 - 02:01 Comment #16

I can't tell - I turned off the second backup locally. Too much CPU time spent zipping.

I think the upload to Amazon is not the problem because it's failing every night on the SAME 4 domains. Highly unlikely to be network related.

I think the backup is creating zero-size backup files for these 4 domains. Here's the log I get for the 4 aliases that fail to backup:

Creating backup for virtual server AAAA.info ..
    Copying virtual server configuration ..
    .. done

    Copying Apache aliases ..
    Uploading archive to Amazon's S3 service ..
    .. upload failed! Upload failed : HTTP/1.1 411 Length Required

.. completed in 32 seconds

Below is a log for all other aliases that are successfully backed up. Notice how the above does not include the 'home directory' tarring as shown below. This is very consistent. none of the failed backups contain these lines in the log.

Creating backup for virtual server BBBBB.net ..
    Copying virtual server configuration ..
    .. done

    Copying Apache aliases ..
    Creating TAR file of home directory ..
    .. done

    Uploading archive to Amazon's S3 service ..
    .. done

.. completed in 0 seconds

Any ideas? Would you like to get a login into our system to check things out? We're fine with that.

Cheers

MD

Log in or register to post comments

Submitted by JamieCameron on Thu, 10/25/2012 - 15:09 Comment #17

Yes, a remote login to debug this would be very useful. You can contact me at jcameron@virtualmin.com with the details. I'd need remote root SSH access though..

Log in or register to post comments

Submitted by lvsys on Thu, 10/25/2012 - 15:27 Comment #18

Right on. I'll pass it on to you then.

Thanks

MD

Log in or register to post comments

Submitted by JamieCameron on Thu, 10/25/2012 - 19:35 Comment #19

Thanks for the login - I found the problem, which was that you had some alias domains (the ones that failed to backup) which Virtualmin thought didn't have home directories, even though they actually did!

I have fixed this bug on your system, and will include the fix in the next release. Also, I made the error message clearer for this failure mode.

Log in or register to post comments

Submitted by lvsys on Fri, 10/26/2012 - 02:02 Comment #20

No problem, awesome. Sometimes, I know it's easier to get your hands in a system rather than have 100 emails of troubleshooting.

Thank you very much

Best Regards

MD

Log in or register to post comments

Submitted by sgrayban on Fri, 10/26/2012 - 03:32 Comment #21

What was the fix Jamie ? I am seeing the same issue and its been bugging me all week

Log in or register to post comments

Submitted by lvsys on Fri, 10/26/2012 - 03:51 Comment #22

Jamie, thank you for your help. The nightly backup to S3 worked without errors. Very happy customer :)

MD

Log in or register to post comments

Submitted by JamieCameron on Fri, 10/26/2012 - 10:45 Comment #23

The work-around to fix this is to make sure any failing alias domains have a home directory enabled. This can be done on the Edit Virtual Server page, or with the enable-feature API command.

Log in or register to post comments

Submitted by sgrayban on Sat, 10/27/2012 - 00:42 Comment #24

hmm maybe this is isnt the same issue...

Let paste what I get for the errors i see..

Creating incremental TAR file of home directory ..
.. done

.. completed in 1 seconds

that's it.. it doesnt upload it to S3 and there is plenty of quota and the home directory is there. I can't figure out what its doing and enabling the debugging doesnt have anything useful either.

Log in or register to post comments

Submitted by JamieCameron on Sat, 10/27/2012 - 13:09 Comment #25

Scott - could you post or email me the full backup output? I don't see any mention of S3 in the snippet you posted ..

Log in or register to post comments

Submitted by sgrayban on Sat, 10/27/2012 - 16:35 Comment #26

I didn't miss pasting it... it just isnt there. That's the end of that domain log that's giving me a error.

Log in or register to post comments

Submitted by JamieCameron on Sat, 10/27/2012 - 19:21 Comment #27

So there is no mention of S3 anywhere in the backup output??

Log in or register to post comments

Submitted by sgrayban on Sun, 10/28/2012 - 20:02 Comment #28

not for the domains that are failing -- its weird -- no mention or even try to upload and the backup time was seconds instead of minutes and they aren't over quota or even close to being over quota

Log in or register to post comments

Submitted by JamieCameron on Sun, 10/28/2012 - 21:51 Comment #29

The only way I can see the upload failing to happen is if there is a failure somewhere earlier in the backup process for that domain ..

Log in or register to post comments

Submitted by sgrayban on Tue, 10/30/2012 - 09:03 Comment #30

Like I said there is nothing in any log that tells me what the issue is... it simply skips the backup and fails.

Log in or register to post comments

Submitted by JamieCameron on Tue, 10/30/2012 - 12:03 Comment #31

Could you email me the full backup log at jcameron@virtualmin.com ? I'd like to see everything that gets logged earlier ..

Log in or register to post comments

Submitted by Issues on Tue, 11/13/2012 - 11:08 Comment #32

Automatically closed -- issue fixed for 2 weeks with no activity.

Log in or register to post comments

Amazon S3 scheduled backup failed

Comments