Nginx websites not found after reboot

I finally got Nginx working with LDAP and it's been running for several days, then I rebooted the system and could no longer connect to any sites. I was getting 502 and 404 errors, even though I can do a netstat -plunt and see that nginx is listening on the IPs.

I did a restore of all the domains and the sites are up and running again, but after a reboot, I get the same issue. The sites no longer respond until I do a restore.

What would be causing this and any clues on how to fix this?

Status: 
Active

Comments

Howdy -- what errors are you seeing in the logfiles for your various domains that aren't working?

Also, are you sure that it's Nginx that's running, and not Apache? If Apache happened to start up prior to Nginx, that could cause problems like what you're describing.

I looked at both the access and error logs for each domain and it doesn't show anything that pertains to the error.

Yes, I am 100% sure I am using nginx and not apache. Apache is not used at all and does not start.

I can give you root access if you want to check it out.

sorry about that, since I restored from backup, it restored the log files as well. After rebooting to reproduce the error, I checked the error log:

2014/06/25 05:14:26 [error] 3949#0: *3 connect() to unix:/var/php-nginx/139884701632276.sock/socket failed (111: Connection refused) while connecting to upstream, client: 108.162.225.74, server: natrika.com, request: "GET / HTTP/1.1", upstream: "fastcgi://unix:/var/php-nginx/139884701632276.sock/socket:", host: "natrika.com" 2014/06/25 05:15:17 [error] 3949#0: *10 connect() to unix:/var/php-nginx/139884701632276.sock/socket failed (111: Connection refused) while connecting to upstream, client: 108.162.225.74, server: natrika.com, request: "GET / HTTP/1.1", upstream: "fastcgi://unix:/var/php-nginx/139884701632276.sock/socket:", host: "natrika.com"

seems to be having problems connecting to fcgi socket.
After I restore, it works fine again.

Hmm, yeah, that's an unusual error, I may need to get Jamie's input on what could cause that.

Are you saying that it's working now though, after performing a restore of some of your domains? Or are you still seeing this issue with some/all of your sites?

All my domains are working after restoring. After a reboot, all the sites will give me a 404. I think the fcgi-php5 is not starting up for each domain. But after restoring, it works again.

I'm not sure if it has anything to do with LDAP. I spent the last week getting LDAP up and running and migrating the accounts (delete and restore). The servers been up and running fine for the last few days until the reboot. That's all I can think of.

I suspect the restore is launching the missing process.

What is the output of this command:

ls /etc/init.d/

I also forgot about NFS, which I didn't finish setting up, so I disabled it on boot. But here is my output from /etc/init.d:

README apache2 bind9 bootlogs bootmisc.sh checkfs.sh checkroot-bootclean.sh checkroot.sh clamav-daemon clamav-freshclam cron dovecot fail2ban fetchmail halt hostname.sh hwclock.sh init.d.txt kbd keymap.sh killprocs kmod lookup-domain mailman milter-greylist modules_dep.sh motd mountall-bootclean.sh mountall.sh mountdevsubfs.sh mountkernfs.sh mountnfs-bootclean.sh mountnfs.sh mtab.sh mysql networking newrelic-sysmond nfs-common nginx nscd php-fcgi-10webdesign-com php-fcgi-4dub-com php-fcgi-airportshuttlefinder-com php-fcgi-chumsai-net php-fcgi-hotellinkmarketing-com php-fcgi-insureact-com php-fcgi-kiatikorn-com php-fcgi-limofinder-net php-fcgi-makitaezy-com php-fcgi-mchydrophonics-com php-fcgi-natrika-com php-fcgi-parkinglinks-com php-fcgi-rich-construction-com php-fcgi-tobacco-com php-fcgi-websitecafe-info plymouth plymouth-log postfix postgresql postgrey pppd-dns pptpd procps proftpd quota quotarpc rc rc.local rcS reboot rmnologin rpcbind rsync rsyslog samba saslauthd screen-cleanup sendmail sendsigs single skeleton slapd spamassassin ssh sudo udev udev-mtab umountfs umountnfs.sh umountroot urandom usermin varnish varnishlog varnishncsa vzreboot webmin wide-dhcpv6-client xinetd

Hmm, those various "php-fcgi-*" init scripts, I'm wondering if perhaps they're not starting for some reason.

Can you verify that in Webmin -> System -> Bootup and Shutdown, that they're configured to start up at boot time in your current runlevel?

Also, I'd be curious if manually starting those resolves the problem you're running into.

I just rebooted the system again and can confirm that the php-fcgi-* aren't starting up, but I can go to Bootup and Shutdown and manually start them, and then restart nginx, and the sites are up again.

I'm attaching a pdf of my Bootup and Shutdown screen, perhaps you can suggest what services are starting up that are interfering with the php-fcgi-* from starting automatically.

I can give you root if you feel like looking around under the hood.

So I had a look, and I noticed that your PHP fcgi scripts are set to start very early in the boot order (16 instead of the usual 99). Did you perhaps re-order them in the Bootup and Shutdown module.

This could result in the PHP wrappers being started before the system is ready, such as when LDAP is available.

Hi Jamie,

I'm not even aware of where to set the boot order, but I did change it from Sort actions by: boot order, instead of name. I didn't think that affected the boot order, just the sorting. Can you advise where to change those values?

You can do this in the Bootup and Shutdown module - the simplest way is to disable the action at boot, and then re-enable it again.

Jamie, I just did what you suggested. But where do I check the values, in reference to: "I noticed that your PHP fcgi scripts are set to start very early in the boot order (16 instead of the usual 99)"?