False Monitoring Alerts [#10004]

Submitted by andreychek on Fri, 06/12/2009 - 09:36 Comment #1

Howdy -- until we get this resolved, what you might consider doing is prevent the status notifications from going to Virtual Server owners.

To do that, go into System settings -> Server Templates -> Default -> Status Monitoring.

From there, first enter an email address in "Additional email address for monitoring messages", and then set "Send email to server owner" to "No".

Sometimes at the 10 second timeout, the check can timeout -- but in theory, it should be fine at 120 seconds.

Do you know of anything particularly CPU intensive going on around 6-7am?

Log in or register to post comments

Submitted by JamieCameron on Fri, 06/12/2009 - 11:42 Comment #2

You might also want to check the files logs/access_log and logs/error_log under the problem domains for entries from around the time those messages were sent - there may be messages indicating why the status check failed.

Log in or register to post comments

Submitted by Tito on Fri, 06/12/2009 - 18:09 Comment #3

Thanks for the reply guys!!

I have checked every log and don't see anything going on around those times. I even ran sar and vmstat for those times and nothing seems to be showing critical. The system load is at a minimum of 0.02 at 5 minutes and a maximum of 0.21 at 5 minutes.

I even put "watch" to monitor certain logs and nothing popped up that was obvious, though I could certainly be overlooking something.

I also went ahead and changed the timeout to 60 seconds from 10 in the "Default Template" and went into each template I have and told it to go by the "Default Template" settings to see if that makes a difference.

Log in or register to post comments

Submitted by Tito on Fri, 06/12/2009 - 18:09 Comment #4

And also forgot to mention, I turned the option to mail to server owners any alerts in the meantime to see if this fixes anything

Log in or register to post comments

Submitted by JamieCameron on Fri, 06/12/2009 - 18:15 Comment #5

In the emails you get when the site is reported as down, is there any more detail such as an HTTP error message?

Log in or register to post comments

Submitted by Tito on Fri, 06/12/2009 - 19:51 Comment #6

the message is the same in all emails for every domain:

Monitor on sld.mydomain.tld.com for 'Website sld.mydomain.sld' has detected that the service has gone down at 12/Jun/2009 06:10

That's the only thing in the body of the email and it's the same every time.

Also, just a thought...it would be nice if the system would tell you that a virtual server is back up after it succeeds with the check after failure.

I am guessing this is what Mon is for? I guess I would have to read into Mon

Log in or register to post comments

Submitted by JamieCameron on Sat, 06/13/2009 - 00:27 Comment #7

You can have it send email when a service goes back up as well - just go to Webmin -> Others -> System and Server Status -> Scheduled Monitoring, and in the 'Send email when' field select 'When a service changes status'.

As for the underlying cause, does it help if you go to Webmin -> Others -> System and Server Status -> whatever.com , and increase the 'Failures before reporting' field?

Log in or register to post comments

Submitted by Tito on Sat, 06/13/2009 - 07:11 Comment #8

Jamie,

I made the change to "When a service changes status". Thanks for that since I was not aware we had that.

Also, I went back to every whatever.com and increased the "Failures before reporting" field and looks like that may have worked but too soon to tell. I haven't received any failures today which is good. I was getting them daily :-)

I will give it a day or 2 (Till Monday) to see if this helped and keep you posted.

Thanks again

Log in or register to post comments

Submitted by JamieCameron on Sat, 06/13/2009 - 10:18 Comment #9

Ok, let us know.

By the way, do you get the failure emails for all domains at the same time, or at different times?

Log in or register to post comments

Submitted by Tito on Sun, 06/14/2009 - 14:21 Comment #10

Jamie,

Still no failure emails so looking good so far. :-)

And yes, I get failures for all domains at the same time.

Log in or register to post comments

Submitted by JamieCameron on Sun, 06/14/2009 - 14:27 Comment #11

Was Apache perhaps restarted at that time? You can see by looking at the /var/log/httpd/error_log or /var/log/apache2/error_log file ..

Log in or register to post comments

Submitted by Tito on Tue, 06/16/2009 - 05:53 Comment #12

Jamie,

There were no errors for apache not signs of shutdown of apache or other services. I had looked for this prior and checked again and nothing.

But, I still haven't received any alerts since the change on Friday so this is good :-)

Log in or register to post comments

Submitted by JamieCameron on Tue, 06/16/2009 - 11:26 Comment #13

Ok, so this looks like some kind of transient failure at that time - perhaps the system was loaded enough that the HTTP request didn't return in time. To work around this in future, I will have Virtualmin set that failure count to 2 for all new domains..

Log in or register to post comments

Submitted by Issues on Tue, 06/30/2009 - 12:18 Comment #14

Automatically closed -- issue fixed for 2 weeks with no activity.

Log in or register to post comments

False Monitoring Alerts

Comments