Hi all,
It was more than 9 months ago I discovered a problem with the graceful restarts on a default Virtualmin installation with the default execution mode (FCGId), but recently I had the time to dig deeper and experiment.
What is the setup:
<?php
for($i = 1; $i <= 30; $i++) {
echo $i."\n";
sleep(1);
}
?>
What is the error:
Run the script via browser, then go and do a graceful restart on apache (service httpd graceful). After around 12 seconds you are going to see "No data received" error in you browser (Chrome) and the following in the apache error log for that virtual server:
(22)Invalid argument: mod_fcgid: can't lock process table in pid 25570
(the pid number will be different)
Further experiments show that this script gets forcefully killed before ending.
If you reduce the time the script executes to 5 seconds ($i <= 4), you'll get the same result, this time after 5 seconds.
Further experiments show this process completes, but you still get the errors both in the browser and the error log.
Do you get the same error? Test and post it here.
Dig:
It is actually a problem of mod_fcgid, not Virtualmin itself.
Graceful restarts are performed every time a virtual server is installed, deleted, or the settings concerning Apache are changed. On a shared hosting environment this could be every 2 minutes. Even if it is not shared hosting this is still pretty often. Every time you do a graceful restart (install/remove server) all the running processes will get killed or at least you'll get scary error in the browser.
The first experiment tweak was to add a file write at the end of the script which shows which script completes and which gets killed before that. I got the result above.
Add this inside the loop:
file_put_contents("test.txt", "test run for: ".$i." seconds");
So why 12 seconds and where is this set. After some time I discovered that increasing FcgidErrorScanInterval to 60 will let the second process to complete (but still you get the errors).
If you check the code of mod_fcgid In fcgid_pm_main.c, the graceful restart should be performed by the function kill_all_subprocess() but obviously the scan_errorlist() is also executed even if there is a check for procmgr_must_exit(). The code is really messy, even if it is not very complex, I didn't quite understand it.
The error in the log "can't lock process table in pid 25570" probably means that some information about the process is destroyed immediately upon the graceful restart, so we will never get the result back.
Even if we get around the early termination of the processes increasing FcgidErrorScanInterval the second problem is actually bigger - all your users are going to see this error.
Do you get the same? So far I can propose to:
Thanks for your time testing and commenting!
I am still experiencing the problem with mod_fcgid 2.3.9 and apache 2.2.15. Graceful apache restarts leaves php processes dangling. This was suppose to have been fixed in 2.3.7 (https://issues.apache.org/bugzilla/show_bug.cgi?id=50309)
I am waiting for apache 2.4 to use php-fpm.