Cloudmin Replication Fails

Hi,

We have 5 cloudmin hosts and would like to close the master. We have the replication configured but the slave hosts don't show any of the configured cluster servers. Only the master shows all the servers. How can we move the master to another server?

Best Regards, Paulo Cardoso

Status: 
Closed (fixed)

Comments

Do you mean that on the replica system, none of the VMs and/or physical systems managed by Cloudmin show up in the left menu?

Hi,

Exactly. On one of the replicas I can see some very old vm's but on the other 2 it just shows it's own server and not any of the vms on this host or in other hosts. And the hosts show on the top (yellow bar) "This system is a Cloudmin replica, but is not replicating from any master system yet. No changes to managed systems or global settings are allowed."

I read somewhere that the replication slave can't have any VM on it. Is it true?

Is replication actually working? It seems like it may have stalled and it's copying new VM details over.

How can I check it? I've tried to add a new replica server but the message is the same: "Replication already in progress " How can I force a manual replication? We really need to close the master server due to a hardware problem but if we do that we will lose access to the cluster.

Hi,

Any news about this?

Title: Cloudmin Replication » Cloudmin Replication Fails

Check if there is any replicate.pl process running on your Cloudmin master system, and if so kill it. Then run /etc/webmin/server-manager/replicate.pl and see if it reports any errors.

Hi,

I had no process running with name replicate.pl

I couldn't find /etc/webmin/server-manager/replicate.pl but found /usr/libexec/webmin/server-manager/replicate.pl If I run it, nothing happens and still no process replicate.pl running.

Then, I run: "cloudmin replicate" and nothing happens Then I run "cloudmin replicate --debug" and get:

host1.com Status: Failed Error: Replication already in progress host2.com Status: Failed Error: Replication already in progress host3.com Status: Failed Error: Replication already in progress host4.com Status: Failed Error: Replication already in progress

After this I went to /etc/webmin/server-manager/ and found a replication-status.lock with a very old date. I killed the process with that ID and deleted it. After that, tried to run cloudmin replicate and it took some time to finish and I can see it started syncing again. It is syncing now so I will wait to see if it finishes.

Ok, that seems to explain it. What was the process whose PID you found in the replication-status.lock file?

I think it was /usr/libexec/webmin/server-manager/status.pl

But it doesn't look like it is fixed yet. I changed one of the replicas to standalone and it doesn't show any vm... And we still see "This system is a Cloudmin replica, but is not replicating from any master system yet. No changes to managed systems or global settings are allowed." in the replicas.

Is there any chance we could login to your Cloudmin master system to see what's going wrong here?

Sorry for the delay. After some hours it started to sync and now we have all the configurations replicated to other nodes. he problem was fixed. Thanks.

Status: Active » Fixed