Amazon s3 backup and 400 error

Hello,

Over the last few weeks, I've had some backups fail with a "HTTP/1.1 400 Bad Request" response when backing up to Amazon S3. I know that one of these days, the backup I need will be one that didn't make it to Amazon! :-)

Just wondering if it is possible to programmatically retry the backup for a set number of times (3?) before moving on without copying the backup to Amazon? Just to ensure that the backup makes it to S3 when some fluke such as the 400 bad response code happens.

Thanks! Alan

Status: 
Closed (fixed)

Comments

Actually, Virtualmin already retries s3 operations at least 3 times to avoid this kind of transient issue - but if s3 is down for long enough, that may not help.

At what stage of the backup process do you get this error?

I have it set to upload each server as its own file right after it makes the backup each day. (Transfer each virtual server after it is backed up is checked). Its been happening randomly about once a week and only affects one of the backup files per group of uploaded files.

Here's the chunk around the failure: ... Backing up Webmin ACL files .. .. done

Creating TAR file of home directory .. .. done

Uploading archive to Amazon's S3 service .. .. upload failed! Upload failed : HTTP/1.1 400 Bad Request

.. completed in 33 minutes, 3 seconds Creating backup for virtual server... ...

Thanks! Alan

One thing you might want to try is increasing the number of times Virtualmin will re-try an S3 upload internally. This can be done at System Settings -> Virtualmin Configuration -> Backup and restore , using the "Number of times to re-try FTP or S3 uploads" field.

Didn't realize that existed. Will give it a shot. Thanks! Alan

I've figured out the issue with this after several more failed backups. One of the back up files is 7.8 GB and it seems that Amazon only allows a maximum of a 5GB file size. Will Virtualmin soon be able to split backups into multiple files? I'm in the process of writing my own post backup script but it would be better if Virtualmin could handle this natively.

Thanks! Alan

That should be possible to support - I will look into implementing this for the 3.93 release.

Multi-part backup support has been implemented, and will be included in Virtualmin 3.93.

Automatically closed -- issue fixed for 2 weeks with no activity.

I updated to 3.93, but my backup is failing for the large server (7+ GB).

I'm now getting this:

Uploading archive to Amazon's S3 service .. .. upload failed! HTTP connection to s3.amazonaws.com:443 for /hartlessbydesign/weekly-2012-07-23/quickstopauctions.com.tar.gz?uploads failed : Failed to lookup IP address for s3.amazonaws.com

.. completed in 47 minutes, 8 seconds

Then the backup moves on to the next server, but then fails with:

Creating backup for virtual server three17.org .. Copying virtual server configuration .. Backup failed : Failed to open /etc/webmin/virtual-server/domains/13074868043080 for writing : Too many open files

I do not receive a failed backup email or anything (that's how I noticed something was wrong as I was not receiving an email for full backups after upgrading). That led me to do a full backup via the browser which displayed those errors.

Please let me know if I can provide anything to get this resolved.

Thanks! Alan

That looks like a different issue - in particular, the error about "s3.amazonaws.com" seems like a DNS problem.

Does your backup fail with this same error all the time?

Yes:

Uploading archive to Amazon's S3 service .. .. upload failed! HTTP connection to s3.amazonaws.com:443 for /hartlessbydesign/weekly-2012-07-23/quickstopauctions.com.tar.gz?uploads failed : Failed to lookup IP address for s3.amazonaws.com

.. completed in 46 minutes, 10 seconds Creating backup for virtual server three17.org .. Copying virtual server configuration .. Backup failed : Failed to open /etc/webmin/virtual-server/domains/13074868043080 for writing : Too many open files

What happens if you run the following command:

host s3.amazonaws.com

Also, what are the contents of your /etc/resolv.conf file?

host s3.amazonaws.com

s3.amazonaws.com is an alias for s3.geo.amazonaws.com. s3.geo.amazonaws.com is an alias for s3-1.amazonaws.com. s3-1.amazonaws.com has address 207.171.163.205

resolv.conf: nameserver 72.14.188.5 nameserver 72.14.179.5 nameserver 127.0.0.1 domain members.linode.com

When the backup is done, does each domain get uploaded after it is backed up, or do they all get uploaded at the end of the process?

Its set to upload to s3 after each domain is backed up. Normally (as it is configured to do so), if one fails, it continues on. But in this case, it just quits the backup process after the failed s3 upload attempt. If I remove the large domain from the backup profile, the backup completes.

Could you try backing up that large domain to a local directory, and then uploading it to S3 using Virtualmin command-line API? That will let us determine if the problem is with the S3 code or the backup code.

The upload command is like :

virtualmin upload-s3-file --access-key xxxx --secret-key yyyy --source /backup/quickstopauctions.com.tar.gz --bucket your-bucket-name

I would be interested to know what that outputs..

This is what I get from the output:

ERROR: Abort failed : Original error : Part 1021 failed at 5347737600 : HTTP connection to s3.amazonaws.com:443 for /mybucket/quickstopauctions.com.tar.gz?partNumber=1021&uploadId=ocECY13gUrS6HLBYyaMAASXHKK7rwE9a5NjTOnaDSLWJogBjr2W29Mo9ojXUjC2Y_05E_obca5osFl6K5m_6Tw-- failed : Failed to lookup IP address for s3.amazonaws.com

One thing you could try is increasing the number of times Virtualmin will attempt the upload of an S3 file part - this may help avoid the DNS lookup errors. To do this, go to System Settings -> Virtualmin Configuration -> Backup and restore and change "Number of times to re-try FTP or S3 uploads" to something like 30.

I changed it to 30 and tried again but still had the same result.

Thanks, Alan

Did you re-try using the upload-s3-file command, or by doing a backup? The config change I mentioned only applies to backups ..

This is very unusual, as I can't re-produce this on my test systems...

I wonder if the cause is really intermittent DNS issues. You can take DNS out of the equation by adding a line to your /etc/hosts file like :

207.171.187.117  s3.amazonaws.com

And then re-try the upload-s3-file command.

Same error after adding the line to the host file.

I'm in the process of testing using s3cmd 1.1.0 beta2 which has added mutipart support to see if I have the same issue. (I've removed the line from the hosts file for testing s3cmd). This should tell me if it is a DNS issue as s3cmd in theory should fail if it is.

I'll report back once its finished or fails.

Thanks! Alan

s3cmd crashed as well. Thought occurred to me after that, firewall. I have csf/lfd installed. I disabled lfd and backup works. I need to dig to see if lfd is blocking an IP address or blocking scripts. Maybe if I whitelist amazons IPs it'll work?

Thanks, Alan

Yes, IP blocking could certainly cause problems like this - especially if it impacts DNS lookups.

I hate to resurrect this, but I don't think the firewall is the issue.

Using a patched version of s3cmd (beta 2 + a fix that caused the upload to fail), I'm able to upload a large backup to S3 with no problem. So right now I'm using a post backup script to use s3cmd to upload the backup file Virtualmin creates.

Trying with the virtualmin upload-s3-file, I get

ERROR: Abort failed : Original error : Part 1021 failed at 5347737600 : HTTP connection to s3.amazonaws.com:443 for /hbdbackupweekly/quickstopauctions.com.tar.gz?partNumber=1021&uploadId=fo3CzncrpoJZlG7ARBrrcYMU16SBJWAdgA26SNAVJKFxRshhxIEl7HtIShP3dEH79IP87yBUQHRgqSAeEA4QTQ-- failed : Failed to lookup IP address for s3.amazonaws.com

I have the number of times to retry set to like 100. Any ideas?

Thanks! Alan

That's really odd - do you still have s3.amazonaws.com in your /etc/hosts file, to rule out possible DNS server failures?

That's really odd - in that case, a DNS lookup should for s3.amazonaws.com should never fail.

I think I would have to login to your system myself to see what is going wrong here. Let me know if that is possible..

How could I securely get you the credentials?

Thanks, Alan

One way to do that would be to enable Remote Support in the Virtualmin Support module. That temporarily enables SSH logins using an SSH key, so no passwords need to be distributed.

Details on doing that are here:

http://www.virtualmin.com/documentation/system/support

Alternatively, you could email him your root password, perhaps using just a temporary one... but the Remote Support module is a more secure way of doing that.

I am getting these same errors. I don't think its a DNS problem. I have Virtualmin configured to upload each domain as it is finished. I only see the IP lookup error for the site with the large backup (about 20GB). The next domain in the sequence always fails with the too many open files message. I'm running Virtualmin 3.94 Pro on Ubuntu 10.04.4 on an Amazon AWS instance.

Ah, that explains it .. the "Too many open files" error indicates that your system is out of file descriptors, which can also cause DNS lookup failures.

I had a look at the Virtualmin code, and found a bug that could cause this. You can fix it on your system by editing the file `/usr/{share,libexec}/webmin/virtual-server/s3-lib.pl and changing line 723 from :

&close_http_connection($out);

to:

&close_http_connection($h);

Then running /etc/webmin/restart

I would also change the ulimits in /etc/security/limits.conf as the "to many open files" is coming from that directly setting.

My backups started working again after a reboot. This was possibly a temporary solution. I've made the change to s3-lib.pl so hopefully the issue won't re-appear.

Thanks

Cool .. The fix will be included in the next Virtualmin release.

Automatically closed -- issue fixed for 2 weeks with no activity.