"Failed to restore on worker2 : --except-feature must come after --all-features" in Virtualmin Replication

Virtualmin Replication in Cloudmin Pro seems to have syntax errors when executing commands on the back end when restoring backups on destination systems. Here's the output I'm getting this morning:

Transferring backups to destination systems .. .. done

Restoring backups on destination systems .. .. 0 restores succeeded, 1 failed

Failed to restore on worker2 : --except-feature must come after --all-features

Removing excess virtual servers on destinations .. .. no domains need deletion on worker2

Status: 
Closed (fixed)

Comments

That's odd, because the Cloudmin code already puts --except-feature after --all-features.

Which Cloudmin version are you running there?

We're using Cloudmin v8.4 Pro on top of Virtualmin GPL v 5.0

OK, those are the latest versions. Can you check on the destination system at Webmin -> Webmin Actions Log what API command call is logged when the restore happens? It should show the full restore-domain call with all args.

Interesting. I don't see one at all from last night, in spite of getting an e-mail saying that the replication failed. I don't see anything in yesterday's logs either.

But when I do the virtualmin replication manually, the replication succeeds, I see a log for the action, and apparently the --except-feature switch is in the right place. Here's the remote API call:

Called remote API restore-domain.pl --replication --all-features --source /tmp/.webmin/672366_9950_1_fastrpc.cgi --option "dir delete 1" --no-reuid --virtualmin config --virtualmin templates --virtualmin custom --virtualmin scheds --virtualmin chroot --skip-warnings --domain[MANY DOMAINS REDACTED] --except-feature dir --except-feature mail --except-feature logrotate

Where does it save these cron jobs anyway? I can't find them in the system cron or in the user cronjobs either.

They use webmin's internal cron feature - the configs are stored under /etc/webmin/webmincron

So the exact same backup succeeds when run manually, but fails on schedule?!

Yes, that's exactly what happens.

Assuming that vsync.pl is the program that Cloudmin runs when it syncs virtual servers, it doesn't seem like the arguments mean much or have any bearing on the error I'm getting.

Yes, vsync.pl is the program run from webmin's cron to do the sync.

I wonder if perhaps you still have an older version of the code cached in memory. Try running /etc/webmin/restart as root to force an update.

Ok, I've given that a shot and I'll see what my e-mail says in the morning.

Well, that looks like it did the trick. I could swear I've restarted this server recently, but hey, there we are. I now see last night's replication in Worker2's Webmin Action Log.

Ok, cool. I'm also surprised that this restart didn't happen when Cloudmin was upgraded, but it's hard to debug that after the fact.

Status: Active ยป Fixed

Nope! I just got a new one this morning with the same error message "Failed to restore on worker2 : --except-feature must come after --all-features"

So it intermittently fails for the same domains on different days?

No, I think I may have just accidentally deleted it that one day. It's coming up every day now.

Can you attach a screenshot of the sync for with all sections open? I'd like to see what settings you have enabled, especially for replication of global virtualmin settings.

Can you also post what's under "Advanced replication settings" ?

Where's that? I can't find that page anywhere in Cloudmin or Virtualmin.

Could you test un-checking all the "Virtualmin global settings to replicate", and see if the same problem happens?

Nope, I'm still getting that error "Failed to restore on worker2 : --except-feature must come after --all-features"

Any chance we could login to your Cloudmin system to see what is going wrong here?

I had a look at your system, and it looks like all the recent syncs succeeded OK (based on the logs from your destination system).

Are you still seeing failures?

Yes. Here's the full text of the e-mail I got, including headers:

Return-path: <webmin@worker1.lightspeed.ca>
Envelope-to: sysop@lightspeed.ca
Delivery-date: Tue, 29 Mar 2016 00:05:10 -0700
Received: from root by pop.lightspeed.ca with local (Exim 4.82)
(envelope-from <webmin@worker1.lightspeed.ca>)
id 1akniA-00084i-Kn
for sysop@lightspeed.ca; Tue, 29 Mar 2016 00:05:10 -0700
From: webmin@worker1.lightspeed.ca
To: sysop@lightspeed.ca
Cc:
Subject: Failed replication of 103 virtual servers from worker1.lightspeed.ca
Date: Tue, 29 Mar 2016 00:05:10 -0700 (PDT)
Message-Id: <1459235110.24794.1@worker1.lightspeed.ca>
X-Spam-Flag: UNKNOWN
X-Delivered-To: sysop@lightspeed.ca ernied@lightspeed.ca
X-Message-Age: 0
Finding source and destination systems ..
.. found source worker1.lightspeed.ca and destination worker2

Refreshing domains on source system ..
.. done

Creating temporary directories ..
.. done

Backing up 103 virtual servers on source system ..
.. created backup of 29.69 MB

Transferring backups to destination systems ..
.. done

Restoring backups on destination systems ..
.. 0 restores succeeded, 1 failed

Failed to restore on worker2 : --except-feature must come after --all-features

Removing excess virtual servers on destinations ..
.. no domains need deletion on worker2

Ok, FINALLY I see the bug here - it happens only when the restore API in virtualmin is called remotely (as it is for a domain sync), and when some features to replicate are excluded. It will be fixed in the next Virtualmin release - but until then, as a work-around you can change your sync config to include the features you want, rather than excluding the ones you don't want.

That's great news! Thanks for finding this bug and getting it fixed!