fcgid warning stopping http [#40771]

Submitted by methownet on Tue, 05/17/2016 - 19:53 Pro Licensee

I'm getting warnings that apache is down briefly and the log file is filling with these warnings: [Tue May 17 17:31:33.301114 2016] [fcgid:warn] [pid 13017] mod_fcgid: process 15884 graceful kill fail, sending SIGKILL [Tue May 17 17:36:14.379820 2016] [fcgid:warn] [pid 13017] mod_fcgid: process 16513 graceful kill fail, sending SIGKILL [Tue May 17 17:36:44.407617 2016] [fcgid:warn] [pid 13017] mod_fcgid: process 16516 graceful kill fail, sending SIGKILL [Tue May 17 17:41:00.397407 2016] [fcgid:warn] [pid 13017] mod_fcgid: process 17167 graceful kill fail, sending SIGKILL [Tue May 17 17:41:10.576335 2016] [fcgid:warn] [pid 13017] mod_fcgid: process 17184 graceful kill fail, sending SIGKILL [Tue May 17 17:41:44.654235 2016] [fcgid:warn] [pid 13017] mod_fcgid: process 17217 graceful kill fail, sending SIGKILL [Tue May 17 17:42:14.734673 2016] [fcgid:warn] [pid 13017] mod_fcgid: process 17234 graceful kill fail, sending SIGKILL Per suggestions elsewhere I increased the timeout in apache2/mods-available to 60: FcgidConnectTimeout 60 yet the issue continues. What else can I try?

Status:

Active

Comments

Submitted by methownet on Tue, 05/17/2016 - 20:09 Pro Licensee Comment #1

It might be related to this site error: [Tue May 17 18:04:05.189540 2016] [fcgid:warn] [pid 21006] [client xxx.xxx.xxxx.xxx:56151] mod_fcgid: read data timeout in 91 seconds, referer: http://websitehere.com/2014/03/19/4lightningpinewalkthrough/#comment-31412 [Tue May 17 18:04:05.189612 2016] [core:error] [pid 21006] [client xxx.xxx.xxxx.xxx:56151] End of script output before headers: index.php, referer: http://websitehere.com/2014/03/19/4lightningpinewalkthrough/#comment-31412 [Tue May 17 18:04:56.232711 2016] [fcgid:warn] [pid 21247] [client xxx.xxx.xxxx.xxx:61676] mod_fcgid: read data timeout in 91 seconds, referer: http://websitehere.com/ [Tue May 17 18:04:56.232820 2016] [core:error] [pid 21247] [client xxx.xxx.xxxx.xxx:61676] End of script output before headers: index.php, referer: http://websitehere.c

Submitted by andreychek on Tue, 05/17/2016 - 20:29 Comment #2

Howdy -- hmm, where the log rotations running at the time? Or was Apache being restarted then?

The errors you're receiving suggest that maybe Apache was being restarted at the time.

Submitted by methownet on Tue, 05/17/2016 - 21:04 Pro Licensee Comment #3

it appears to be restarting over and over again. Hundreds of graceful kill fail messages a day for several days.

Submitted by methownet on Tue, 05/17/2016 - 21:21 Pro Licensee Comment #4

I'll disable what may be the offending site and see if it stops there.

Submitted by methownet on Tue, 05/17/2016 - 22:28 Pro Licensee Comment #5

I'm getting an error message from my monitoring server that http is down on this server once an hour. Could this be related to a cron job? Disabling the offending site didn't seem to stop it.

Submitted by andreychek on Tue, 05/17/2016 - 22:35 Comment #6

The domain you shared in your logs above, "websitehere.com" -- one thing you may want to try is to go into Server Configuration -> Website Options, and there, try setting it to use CGI rather than FCGID.

FIrst off, does that resolve the problem? And second, if you're still seeing any issues, can you check the logs again and see if there are any new errors?

CGI produces very good error messages, and may catch things that weren't being displayed previously.

Submitted by methownet on Wed, 05/18/2016 - 07:34 Pro Licensee Comment #7

I've monitored this through the night now and what I'm seeing is this: Nagios sends me a notification that http is down every hour however when I test sites after receiving the warning, all sites on the server are coming up fine. Nagios is monitoring 50 devices and is reporting nothing else amiss on the network but it is showing that http has been down for 13 hours on that one server. It seems to be a communication issue between the upgraded server and Nagios.

Submitted by methownet on Wed, 05/18/2016 - 07:39 Pro Licensee Comment #8

The message returned from Nagios: HTTP CRITICAL: HTTP/1.1 500 Internal Server Error - 713 bytes in 0.002 second response time

rebooting the Nagios server now....

Submitted by andreychek on Wed, 05/18/2016 - 08:15 Comment #9

Hmm, what URL is Nagios using to test?

What you may want to do is look at the log for that domain at the time Nagios is connecting to see what the 500 error is.

Submitted by methownet on Wed, 05/18/2016 - 08:43 Pro Licensee Comment #10

Getting closer. Nagios is using the shared IP number. When I type that in on a browser, I get: Internal Server Error The server encountered an internal error or misconfiguration and was unable to complete your request. Please contact the server administrator at [no address given] to inform them of the time this error occurred, and the actions you performed just before this error. When I do the same thing on any of my other servers I get the default website. How do I fix that?

Submitted by andreychek on Wed, 05/18/2016 - 08:46 Comment #11

Browsing to the IP address will indeed show the default website on your server.

The question is, which website is set as the default?

You can set the default website by going into Server Configuration -> Website Options, and setting the default website for the IP address.

Submitted by methownet on Wed, 05/18/2016 - 08:55 Pro Licensee Comment #12

That did it. It seems that the problem website, which I have disabled temporarily, was also set as the default website. When it went south yesterday, Nagios started reporting the error. Thanks so much. JH

Submitted by andreychek on Wed, 05/18/2016 - 09:12 Comment #13

Ah, excellent, I'm glad that's working as expected now!