Apache2 hangs after multiple php5-cgi segfaults

The system was upgraded from long-working Debian/Etch + VirtualMin installation using these steps: http://www.virtualmin.com/documentation/id,upgrading_debian_etch_to_lenn...

After successful upgrade the system is unstable with following symptoms:

  • dmesg is full of segfaults of php5-cgi, two kinds of errors appear:

php5-cgi[11154]: segfault at 00002b47e6aa7edb rip 00002aaaaaac31af rsp 00000000407ffc90 error 4 php5-cgi[3903]: segfault at 00002b868b0a5ed0 rip 00002b868b0a5ed0 rsp 0000000040800128 error 14

with ration of about 1:5 (67 of the former message, 303 of the latter).

  • after some time one of Apache2 processes hangs and has +- 100% CPU usage (varies like 97%-103% - this is dualcore system).

The apache that hangs is not the "root" apache, but rather the one spawning fastcgi processes:

16908 ?        Ss     0:06 /usr/sbin/apache2 -k start
17153 ?        R      7:01  \_ /usr/sbin/apache2 -k start  # <<== this one
17169 ?        Z      0:01      \_ [php5-cgi] <defunct>
17180 ?        Z      0:01      \_ [php5-cgi] <defunct>
17245 ?        Z      0:05      \_ [php4-cgi] <defunct>
17259 ?        Z      0:00      \_ [php5-cgi] <defunct>
17260 ?        Z      0:01      \_ [php5-cgi] <defunct>

When strace -p is used to attach this process, it does nothing, so I tried using gdb, to trackdown the error and it seems like infinite loop in mod_fcgid.so:

(gdb) bt
#0  0x00007fa83ac0bf1f in pm_main () from /usr/lib/apache2/modules/mod_fcgid.so
#1  0x00007fa83ac0ee22 in procmgr_post_config () from /usr/lib/apache2/modules/mod_fcgid.so
#2  0x00007fa83ac0ddff in ?? () from /usr/lib/apache2/modules/mod_fcgid.so
#3  0x0000000000438cf4 in ap_run_post_config ()
#4  0x0000000000425bbc in main ()

In case you need further information, please do not hesitate to contact me.

Thank you

Dominik Pantůček, CTO Trustica s.r.o.



Joe's picture
Submitted by Joe on Mon, 08/03/2009 - 04:43 Pro Licensee

We're possibly not the best people to ask about this. It looks like a bug specific to packages on Debian (we don't provide any of the packages in this scenario; Debian provides apache, PHP, and mod_fcgid, and we don't modify them or provide custom versions in Lenny). Then again, we've never heard of these problems on Lenny, either, so it might be something specific to your system.

I have seen similar symptoms (though your good groundwork with gdb leads me to think this is not relevant to your case) when both mod_php version 4 and version 5 were loaded into the same Apache. So, it could be a module interaction issue. You might try disabling any modules you don't need just to minimize the potential for conflict and bugs.

From there, I'd suggest checking the debian ticket tracker for any similar reports about mod_fcgid. We've not seen problems like this before, so it's not something that effects all versions of mod_fcgid...it seems to be just the one you have (again, though, we haven't heard from other Debian users either, so it might be specific to something in your deployment).

Also, are you sure you don't have hardware problems? A huge percentage of the times I've seen regular persistent segfaults, it's been a memory or heat problem. Since these are consistently with one particular piece of code, that seems less likely, but it would be worth doing a memtest and checking the temperature of your components, if you have access to the console and can reboot to run tests, just to be sure.

Howdy -- while as Joe said, I'm not sure we can correct that, I am curious what kernel you're running... what does the "hostname" command output?

Also, as a stop-gap measure just to get things up and running, you might look into moving your sites to use CGI rather than FCGID. That may help get Apache online and processing requests.