I'm using the round-robin DNS feature in Cloudmin to set up DNS across two different Varnish servers which are front ends for multiple Drupal sites. Although this more or less works okay, one of the varnish servers keeps getting taken out of service, sending out an email that one system is turned off:
DNS roundrobin varnishbalance.cloudmin.cruiskeenconsulting.com has only 1 systems included, which is below the configured minimum of 2.
The following systems have been excluded from the roundrobin :
System hostname IP address Reason for exclusion
varnish3.cloudmin.cruiske 69.4.98.200 Status is No SSH
As far as I can see nothing actually went wrong with the ssh on varnish3, and I turned off SSH checking on the round-robin - but it apparently continues to check ssh whether the box is checked or not.
So this raises multiple questions:
a. How can I turn off ssh checking since the box doesn't seem to actually work? b. Is there some way to adjust the ssh check? I suspect it's just that the ssh check is at times slower than what cloudmin expects, and it's timing out. c. Any reasonable way to debug this? d. Is there any way to get a message when the failed system comes back? I can only configure it to send an email on failure. e. Is this logged anywhere?
Comments
Submitted by JamieCameron on Wed, 05/30/2012 - 16:59 Comment #1
This can happen if Cloudmin's standard status monitoring has detected that it cannot SSH to the system.
One work-around would be to include "No SSH" in the list of statuses of systems to include. This can be done on the Edit DNS Roundrobin page, in the "Conditions for systems to include" section.
Submitted by cruiskeen on Wed, 05/30/2012 - 17:01 Comment #2
OOOH -- okay, "no ssh" didn't intuitively make a lot of sense to me, but you mean that overrides the normal monitoring???
Got it!
Submitted by JamieCameron on Wed, 05/30/2012 - 17:20 Comment #3
The "No SSH" option doesn't disable the regular monitoring, just the criteria for including a system in the roundrobin.
Submitted by cruiskeen on Thu, 05/31/2012 - 10:34 Comment #4
Yup, understood that. Thanks. working fine.
Submitted by JamieCameron on Thu, 05/31/2012 - 12:14 Comment #5
Submitted by Issues on Thu, 06/14/2012 - 12:18 Comment #6
Automatically closed -- issue fixed for 2 weeks with no activity.