These forums are locked and archived, but all topics have been migrated to the new forum. You can search for this topic on the new forum: Search for BIND exiting and shutting down for no apparent reason - random RCODE REFUSED on the new forum.
I have been seeing random BIND shut down and restart on Virtualmin GPL install. The shut down is always preceded by some error but random domain or IP entry. these IP addresses and domains are not on my server. The code message is often "error (unexpected RCODE REFUSED) resolving to some domain/IP"
If anyone can point me in the correct direction for finding the cause. I am assuming there is a bad setting somewhere. I am running CentOS6 with Virtualmin GPL
A sample log of a shutdown
Sep 25 06:01:06 ip-68-178-130-21 named[20231]: error (unexpected RCODE REFUSED) resolving '49.142.218.222.in-addr.arpa/PTR/IN': 202.103.225.70#53
Sep 25 06:01:07 ip-68-178-130-21 named[20231]: error (unexpected RCODE REFUSED) resolving '49.142.218.222.in-addr.arpa/PTR/IN': 202.103.224.69#53
Sep 25 06:01:09 ip-68-178-130-21 named[20231]: error (unexpected RCODE REFUSED) resolving '49.142.218.222.in-addr.arpa/PTR/IN': 202.103.224.69#53
Sep 25 06:01:09 ip-68-178-130-21 named[20231]: error (unexpected RCODE REFUSED) resolving '49.142.218.222.in-addr.arpa/PTR/IN': 202.103.225.70#53
Sep 25 06:02:17 ip-68-178-130-21 named[20231]: shutting down
Sep 25 06:02:17 ip-68-178-130-21 named[20231]: stopping command channel on 127.0.0.1#953
Sep 25 06:02:17 ip-68-178-130-21 named[20231]: stopping command channel on ::1#953
Sep 25 06:02:17 ip-68-178-130-21 named[20231]: no longer listening on ::#53
Sep 25 06:02:17 ip-68-178-130-21 named[20231]: no longer listening on 127.0.0.1#53
Sep 25 06:02:17 ip-68-178-130-21 named[20231]: no longer listening on 68.178.130.21#53
Sep 25 06:02:17 ip-68-178-130-21 named[20231]: no longer listening on 68.178.131.108#53
Sep 25 06:02:17 ip-68-178-130-21 named[20231]: no longer listening on 50.62.34.152#53
Sep 25 06:02:17 ip-68-178-130-21 named[20231]: exiting
Sep 25 06:07:46 ip-68-178-130-21 named[29722]: starting BIND 9.8.2rc1-RedHat-9.8.2-0.17.rc1.el6.3 -u named
Sep 25 06:07:46 ip-68-178-130-21 named[29722]: built with '--build=x86_64-redhat-linux-gnu' '--host=x86_64-redhat-linux-gnu' '--target=x86_64-redhat-linux-gnu' '--program-prefix=' '--prefix=/usr' '--exec-prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' '--sysconfdir=/etc' '--datadir=/usr/share' '--includedir=/usr/include' '--libdir=/usr/lib64' '--libexecdir=/usr/libexec' '--sharedstatedir=/var/lib' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--with-libtool' '--localstatedir=/var' '--enable-threads' '--enable-ipv6' '--with-pic' '--disable-static' '--disable-openssl-version-check' '--with-dlz-ldap=yes' '--with-dlz-postgres=yes' '--with-dlz-mysql=yes' '--with-dlz-filesystem=yes' '--with-gssapi=yes' '--disable-isc-spnego' '--with-docbook-xsl=/usr/share/sgml/docbook/xsl-stylesheets' '--enable-fixed-rrset' 'build_alias=x86_64-redhat-linux-gnu' 'host_alias=x86_64-redhat-linux-gnu' 'target_alias=x86_64-redhat-linux-gnu' 'CFLAGS= -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic' 'CPPFLAGS= -DDIG_SIGCHASE'
Sep 25 06:07:46 ip-68-178-130-21 named[29722]: ----------------------------------------------------
Sep 25 06:07:46 ip-68-178-130-21 named[29722]: BIND 9 is maintained by Internet Systems Consortium,
Sep 25 06:07:46 ip-68-178-130-21 named[29722]: Inc. (ISC), a non-profit 501(c)(3) public-benefit
Sep 25 06:07:46 ip-68-178-130-21 named[29722]: corporation. Support and training for BIND 9 are
Sep 25 06:07:46 ip-68-178-130-21 named[29722]: available at https://www.isc.org/support
Sep 25 06:07:46 ip-68-178-130-21 named[29722]: ----------------------------------------------------
Sep 25 06:07:46 ip-68-178-130-21 named[29722]: adjusted limit on open files from 1024 to 1048576
Sep 25 06:07:46 ip-68-178-130-21 named[29722]: found 16 CPUs, using 16 worker threads
Another BIND Shutdown was this one - these IP addresses and domains are not on my server:
Sep 25 05:46:25 ip-68-178-130-21 named[22235]: client 122.136.196.116#63897: query (cache) 'aa3247.com/A/IN' denied
Sep 25 05:46:25 ip-68-178-130-21 named[22235]: client 122.136.196.116#3209: query (cache) 'aa3247.com/A/IN' denied
Sep 25 05:46:25 ip-68-178-130-21 named[22235]: shutting down
Sep 25 05:46:25 ip-68-178-130-21 named[22235]: stopping command channel on 127.0.0.1#953
Sep 25 05:46:25 ip-68-178-130-21 named[22235]: stopping command channel on ::1#953
Sep 25 05:46:25 ip-68-178-130-21 named[22235]: no longer listening on ::#53
Sep 25 05:46:25 ip-68-178-130-21 named[22235]: no longer listening on 127.0.0.1#53
Sep 25 05:46:25 ip-68-178-130-21 named[22235]: no longer listening on 68.178.130.21#53
Sep 25 05:46:25 ip-68-178-130-21 named[22235]: no longer listening on 68.178.131.108#53
Sep 25 05:46:25 ip-68-178-130-21 named[22235]: no longer listening on 50.62.34.152#53
Sep 25 05:46:27 ip-68-178-130-21 named[22235]: exiting
Howdy,
Are you by chance using an OpenVZ-based VPS?
If so, can you paste in the contents of your /proc/user_beancounters file?
-Eric
Yes it is.
lockedpages 0 0 9223372036854775807 9223372036854775807 0
privvmpages 799349 840633 1572864 1572864 0
shmpages 1206 1206 9223372036854775807 9223372036854775807 0
dummy 0 0 9223372036854775807 9223372036854775807 0
numproc 167 167 32567 32567 0
physpages 638774 678198 9223372036854775807 9223372036854775807 0
vmguarpages 0 0 786432 9223372036854775807 0
oomguarpages 638775 678199 9223372036854775807 9223372036854775807 0
numtcpsock 48 51 9223372036854775807 9223372036854775807 0
numflock 171 172 9223372036854775807 9223372036854775807 0
numpty 1 1 255 255 0
numsiginfo 6 7 1024 1024 0
tcpsndbuf 1465696 1520640 9223372036854775807 9223372036854775807 0
tcprcvbuf 786432 835584 9223372036854775807 9223372036854775807 0
othersockbuf 282656 565536 9223372036854775807 9223372036854775807 0
dgramrcvbuf 0 9344 9223372036854775807 9223372036854775807 0
numothersock 201 207 9223372036854775807 9223372036854775807 0
dcachesize 2355922 2400994 9223372036854775807 9223372036854775807 0
numfile 18951 18960 9223372036854775807 9223372036854775807 0
dummy 0 0 0 0 0
dummy 0 0 0 0 0
dummy 0 0 0 0 0
numiptent 14 14 9223372036854775807 9223372036854775807 0
David Calderwood - Euro-Pacific Digital Media
This time BIND just went down twice in a row but with yet another error code:
Sep 25 09:32:38 ip-68-178-130-21 named[29722]: error (unexpected RCODE SERVFAIL) resolving '228.195.58.213.in-addr.arpa/PTR/IN': 2001:67c:e0::6#53
Sep 25 09:35:16 ip-68-178-130-21 php-cgi: PHP Warning: Module 'mcrypt' already loaded in Unknown on line 0
Sep 25 09:35:19 ip-68-178-130-21 named[29722]: client 122.136.196.116#29758: query (cache) 'aa3247.com/A/IN' denied
Sep 25 09:35:33 ip-68-178-130-21 named[29722]: error (unexpected RCODE REFUSED) resolving 'ns.bta.net.cn/A/IN': 202.106.196.233#53
Sep 25 09:35:33 ip-68-178-130-21 named[29722]: error (unexpected RCODE REFUSED) resolving 'ns.bta.net.cn/AAAA/IN': 202.106.196.233#53
Sep 25 09:35:33 ip-68-178-130-21 named[29722]: error (unexpected RCODE REFUSED) resolving 'ns2.bta.net.cn/A/IN': 202.106.196.233#53
Sep 25 09:35:33 ip-68-178-130-21 named[29722]: error (unexpected RCODE REFUSED) resolving 'ns2.bta.net.cn/AAAA/IN': 202.106.196.233#53
Sep 25 09:35:56 ip-68-178-130-21 named[29722]: error (connection refused) resolving '130.36.99.91.psbl.surriel.com/A/IN': 82.94.250.75#53
Sep 25 09:37:41 ip-68-178-130-21 named[29722]: error (unexpected RCODE SERVFAIL) resolving '228.195.58.213.in-addr.arpa/PTR/IN': 193.0.9.6#53
Sep 25 09:38:21 ip-68-178-130-21 named[29722]: shutting down
Sep 25 09:38:21 ip-68-178-130-21 named[29722]: stopping command channel on 127.0.0.1#953
Sep 25 09:38:21 ip-68-178-130-21 named[29722]: stopping command channel on ::1#953
Sep 25 09:38:21 ip-68-178-130-21 named[29722]: no longer listening on ::#53
Sep 25 09:38:21 ip-68-178-130-21 named[29722]: no longer listening on 127.0.0.1#53
Sep 25 09:38:21 ip-68-178-130-21 named[29722]: no longer listening on 68.178.130.21#53
Sep 25 09:38:21 ip-68-178-130-21 named[29722]: no longer listening on 68.178.131.108#53
Sep 25 09:38:21 ip-68-178-130-21 named[29722]: no longer listening on 50.62.34.152#53
Sep 25 09:38:22 ip-68-178-130-21 named[29722]: exiting
David Calderwood - Euro-Pacific Digital Media
I don't think the error message and the exiting is directly related. In your first example, there's over a minute between these events. Can you check how much time passed between each of these incidences?
I'd also think that BIND is hitting some kind of resource limit. You might check the syslog for further errors, maybe OOM messages. Short of a programming bug, it should not simply exit like that. You might check with "tcpdump udp port 53" what the load of your BIND is.
Looks like about 20 items per second - not sure what is normal.
This happens about 4 times a day, but at no particular time. I've looked thru the logs, but the only consistency I see is right before BIND closes, there is always an error message before such as:
error (unexpected RCODE REFUSED)
error (unexpected RCODE SERVFAIL)
error (unexpected RCODE REFUSED)
Sometimes there can be 30-50 error messages within a few minutes all on the same PID with different IPs
David Calderwood - Euro-Pacific Digital Media
Looks like about 20 items per second - not sure what is normal.
This happens about 4 times a day, but at no particular time. I've looked thru the logs, but the only consistency I see is right before BIND closes, there is always an error message before such as:
error (unexpected RCODE REFUSED)
error (unexpected RCODE SERVFAIL)
error (unexpected RCODE REFUSED)
Sometimes there can be 30-50 error messages within a few minutes all on the same PID with different IPs
David Calderwood - Euro-Pacific Digital Media
Looks like about 20 items per second - not sure what is normal.
This happens about 4 times a day, but at no particular time. I've looked thru the logs, but the only consistency I see is right before BIND closes, there is always an error message before such as:
error (unexpected RCODE REFUSED)
error (unexpected RCODE SERVFAIL)
error (unexpected RCODE REFUSED)
Sometimes there can be 30-50 error messages within a few minutes all on the same PID with different IPs
David Calderwood - Euro-Pacific Digital Media
The resolver errors you see are reverse lookups, probably triggered by some service on your machine that wants to know the hostname for an IP address that is trying to connect (e.g. SSHD and Postfix can do that). You might want to check your syslog, auth-log or mail-log for an unusually high amount of connection attempts / login failures, which indicate dictionary attacks.
Still, BIND should not simply exit from something like this. It's meant to process a considerable amount of requests per second.
Did you check the syslog or kernel log for additional errors like OOMs? Otherwise, I'm fairly certain that it's a problem with the virtualization software in use for your server. What's your used and free memory? You might want to use the tool "atop" to record and playback performance data, that should give you an idea if something has a memory spike or similar.
No OOMs and nothing in the kernel logs.
However I did find spamassasin set to lookup DNS test mode (not sure that would matter) and did fine an email account that is trying to login every 3 seconds with an authentication failure in saslauthd
The only thing I have seen pulling a lot of CPU is clamav.
It's not a busy server - only 100 virtual servers.
David Calderwood - Euro-Pacific Digital Media
It's not uncommon that we see unusual issues and service that die unexpectedly on OpenVZ-based systems... but typically when that occurs, the "failcnt" field of the user_beancounters file shows some non-0 values in it.
On your system, they all appear to be "0".
How much RAM is in your server, and what is the current usage?
You can determine that by running this command:
free -m
The system has 3GB ram with 6GB burstable
]# free -m
total used free shared buffers cached
Mem: 3072 1708 1363 0 0 0
-/+ buffers/cache: 1708 1363
Swap: 0 0 0
David Calderwood - Euro-Pacific Digital Media
I can only repeat what I wrote last time, and what Eric said too now: We need to know what your RAM situation is.
free
andatop
are recommended to provide that information, and to check historical memory data when BIND crashes.This is really odd. Memory should be plenty, and no beancounter violation.
Two things I can think of right now: Intermittent memory problems. To find out about those, install "atop". Or the BIND version you're using (being an RC) might be buggy. Ubuntu officially uses 9.8.1, yours is 9.8.2-RC1.
You could try turning on BIND's own logging, including debug logging. Or you might want to check if there's an earlier, non-RC version of BIND available. Are you using third-party repositories for your CentOS? Those often contain versions of software that make trouble when used with Virtualmin.
The Centos iss a clean standard install and Virtualmin is the standard. It is a real problem as the server closes down about 6 times a day. Stopped twice again this morning. I searched the logs, and the same RCODE error then a BIND exit. I noticed that there is a bunch of messages all relating to the same PID. Don't know if it means anything.
Close at 7.03AM
Sep 26 07:03:38 ip-68-178-130-21 named[4010]: error (unexpected RCODE REFUSED) resolving '16.136.119.140.in-addr.arpa/PTR/IN': 2001:288:2001::8c77:36e#53
Sep 26 07:03:40 ip-68-178-130-21 php-cgi: PHP Warning: Module 'mcrypt' already loaded in Unknown on line 0
Sep 26 07:03:40 ip-68-178-130-21 php-cgi: PHP Warning: Module 'mcrypt' already loaded in Unknown on line 0
Sep 26 07:04:01 ip-68-178-130-21 php-cgi: PHP Warning: Module 'mcrypt' already loaded in Unknown on line 0
Sep 26 07:04:06 ip-68-178-130-21 php-cgi: PHP Warning: Module 'mcrypt' already loaded in Unknown on line 0
Sep 26 07:04:39 ip-68-178-130-21 named[4010]: error (unexpected RCODE SERVFAIL) resolving '65.37.114.42.in-addr.arpa/PTR/IN': 210.245.31.10#53
Sep 26 07:04:49 ip-68-178-130-21 named[4010]: client 122.136.196.116#45453: query (cache) 'aa3247.com/A/IN' denied
Sep 26 07:05:46 ip-68-178-130-21 php-cgi: PHP Warning: Module 'mcrypt' already loaded in Unknown on line 0
Sep 26 07:06:41 ip-68-178-130-21 named[4010]: client 122.136.196.116#61009: query (cache) 'aa3247.com/A/IN' denied
Sep 26 07:07:17 ip-68-178-130-21 named[4010]: client 122.136.196.116#45732: query (cache) 'aa3247.com/A/IN' denied
Sep 26 07:07:41 ip-68-178-130-21 named[4010]: shutting down
Sep 26 07:07:41 ip-68-178-130-21 named[4010]: stopping command channel on 127.0.0.1#953
Sep 26 07:07:41 ip-68-178-130-21 named[4010]: stopping command channel on ::1#953
Sep 26 07:07:41 ip-68-178-130-21 named[4010]: no longer listening on ::#53
Sep 26 07:07:41 ip-68-178-130-21 named[4010]: no longer listening on 127.0.0.1#53
Sep 26 07:07:41 ip-68-178-130-21 named[4010]: no longer listening on 68.178.130.21#53
Sep 26 07:07:41 ip-68-178-130-21 named[4010]: no longer listening on 68.178.131.108#53
Sep 26 07:07:41 ip-68-178-130-21 named[4010]: no longer listening on 50.62.34.152#53
Sep 26 07:07:41 ip-68-178-130-21 named[4010]: exiting
Sep 26 07:12:50 ip-68-178-130-21 named[7851]: starting BIND 9.8.2rc1-RedHat-9.8.2-0.17.rc1.el6.3 -u named
Sep 26 07:12:50 ip-68-178-130-21 named[7851]: built with '--build=x86_64-redhat-linux-gnu' '--host=x86_64-redhat-linux-gnu' '--target=x86_64-redhat-linux-gnu' '--program-prefix=' '--prefix=/usr' '--exec-prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' '--sysconfdir=/etc' '--datadir=/usr/share' '--includedir=/usr/include' '--libdir=/usr/lib64' '--libexecdir=/usr/libexec' '--sharedstatedir=/var/lib' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--with-libtool' '--localstatedir=/var' '--enable-threads' '--enable-ipv6' '--with-pic' '--disable-static' '--disable-openssl-version-check' '--with-dlz-ldap=yes' '--with-dlz-postgres=yes' '--with-dlz-mysql=yes' '--with-dlz-filesystem=yes' '--with-gssapi=yes' '--disable-isc-spnego' '--with-docbook-xsl=/usr/share/sgml/docbook/xsl-stylesheets' '--enable-fixed-rrset' 'build_alias=x86_64-redhat-linux-gnu' 'host_alias=x86_64-redhat-linux-gnu' 'target_alias=x86_64-redhat-linux-gnu' 'CFLAGS= -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic' 'CPPFLAGS= -DDIG_SIGCHASE'
Sep 26 07:12:50 ip-68-178-130-21 named[7851]: ----------------------------------------------------
Sep 26 07:12:50 ip-68-178-130-21 named[7851]: BIND 9 is maintained by Internet Systems Consortium,
Sep 26 07:12:50 ip-68-178-130-21 named[7851]: Inc. (ISC), a non-profit 501(c)(3) public-benefit
Sep 26 07:12:50 ip-68-178-130-21 named[7851]: corporation. Support and training for BIND 9 are
Sep 26 07:12:50 ip-68-178-130-21 named[7851]: available at https://www.isc.org/support
Sep 26 07:12:50 ip-68-178-130-21 named[7851]: ----------------------------------------------------
Sep 26 07:12:50 ip-68-178-130-21 named[7851]: adjusted limit on open files from 1024 to 1048576
Sep 26 07:12:50 ip-68-178-130-21 named[7851]: found 16 CPUs, using 16 worker threads
Sep 26 07:12:50 ip-68-178-130-21 named[7851]: using up to 4096 sockets
Sep 26 07:12:50 ip-68-178-130-21 named[7851]: loading configuration from '/etc/named.conf'
Sep 26 07:12:50 ip-68-178-130-21 named[7851]: reading built-in trusted keys from file '/etc/named.iscdlv.key'
Sep 26 07:12:50 ip-68-178-130-21 named[7851]: using default UDP/IPv4 port range: [1024, 65535]
Sep 26 07:12:50 ip-68-178-130-21 named[7851]: using default UDP/IPv6 port range: [1024, 65535]
Sep 26 07:12:50 ip-68-178-130-21 named[7851]: listening on IPv6 interfaces, port 53
Sep 26 07:12:50 ip-68-178-130-21 named[7851]: listening on IPv4 interface lo, 127.0.0.1#53
Sep 26 07:12:50 ip-68-178-130-21 named[7851]: listening on IPv4 interface eth0, 68.178.130.21#53
Sep 26 07:12:50 ip-68-178-130-21 named[7851]: listening on IPv4 interface eth0:1, 68.178.131.108#53
Sep 26 07:12:50 ip-68-178-130-21 named[7851]: listening on IPv4 interface eth0:2, 50.62.34.152#53
Sep 26 07:12:50 ip-68-178-130-21 named[7851]: generating session key for dynamic DNS
Sep 26 07:12:50 ip-68-178-130-21 named[7851]: sizing zone task pool based on 95 zones
Sep 26 07:12:50 ip-68-178-130-21 named[7851]: set up managed keys zone for view _default, file '/var/named/dynamic/managed-keys.bind'
Sep 26 07:12:50 ip-68-178-130-21 named[7851]: Warning: 'empty-zones-enable/disable-empty-zone' not set: disabling RFC 1918 empty zones
Sep 26 07:12:50 ip-68-178-130-21 named[7851]: automatic empty zone: 127.IN-ADDR.ARPA
Close at 9:22AM
Sep 26 09:21:14 ip-68-178-130-21 named[26623]: error (unexpected RCODE SERVFAIL) resolving '226.128.136.183.in-addr.arpa/PTR/IN': 202.96.103.36#53
Sep 26 09:21:28 ip-68-178-130-21 php-cgi: PHP Warning: Module 'mcrypt' already loaded in Unknown on line 0
Sep 26 09:21:52 ip-68-178-130-21 php-cgi: PHP Warning: Module 'mcrypt' already loaded in Unknown on line 0
Sep 26 09:22:30 ip-68-178-130-21 named[26623]: received control channel command 'stop'
Sep 26 09:22:30 ip-68-178-130-21 named[26623]: shutting down: flushing changes
Sep 26 09:22:30 ip-68-178-130-21 named[26623]: stopping command channel on 127.0.0.1#953
Sep 26 09:22:30 ip-68-178-130-21 named[26623]: stopping command channel on ::1#953
Sep 26 09:22:30 ip-68-178-130-21 named[26623]: no longer listening on ::#53
Sep 26 09:22:30 ip-68-178-130-21 named[26623]: no longer listening on 127.0.0.1#53
Sep 26 09:22:30 ip-68-178-130-21 named[26623]: no longer listening on 68.178.130.21#53
Sep 26 09:22:30 ip-68-178-130-21 named[26623]: no longer listening on 68.178.131.108#53
Sep 26 09:22:30 ip-68-178-130-21 named[26623]: no longer listening on 50.62.34.152#53
Sep 26 09:22:30 ip-68-178-130-21 named[26623]: exiting
Sep 26 09:22:32 ip-68-178-130-21 named[3617]: starting BIND 9.8.2rc1-RedHat-9.8.2-0.17.rc1.el6.3 -u named
Sep 26 09:22:32 ip-68-178-130-21 named[3617]: built with '--build=x86_64-redhat-linux-gnu' '--host=x86_64-redhat-linux-gnu' '--target=x86_64-redhat-linux-gnu' '--program-prefix=' '--prefix=/usr' '--exec-prefix=/usr' '--bindir=/usr/bin' '--sbindir=/usr/sbin' '--sysconfdir=/etc' '--datadir=/usr/share' '--includedir=/usr/include' '--libdir=/usr/lib64' '--libexecdir=/usr/libexec' '--sharedstatedir=/var/lib' '--mandir=/usr/share/man' '--infodir=/usr/share/info' '--with-libtool' '--localstatedir=/var' '--enable-threads' '--enable-ipv6' '--with-pic' '--disable-static' '--disable-openssl-version-check' '--with-dlz-ldap=yes' '--with-dlz-postgres=yes' '--with-dlz-mysql=yes' '--with-dlz-filesystem=yes' '--with-gssapi=yes' '--disable-isc-spnego' '--with-docbook-xsl=/usr/share/sgml/docbook/xsl-stylesheets' '--enable-fixed-rrset' 'build_alias=x86_64-redhat-linux-gnu' 'host_alias=x86_64-redhat-linux-gnu' 'target_alias=x86_64-redhat-linux-gnu' 'CFLAGS= -O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector --param=ssp-buffer-size=4 -m64 -mtune=generic' 'CPPFLAGS= -DDIG_SIGCHASE'
Sep 26 09:22:32 ip-68-178-130-21 named[3617]: ----------------------------------------------------
Sep 26 09:22:32 ip-68-178-130-21 named[3617]: BIND 9 is maintained by Internet Systems Consortium,
Sep 26 09:22:32 ip-68-178-130-21 named[3617]: Inc. (ISC), a non-profit 501(c)(3) public-benefit
Sep 26 09:22:32 ip-68-178-130-21 named[3617]: corporation. Support and training for BIND 9 are
Sep 26 09:22:32 ip-68-178-130-21 named[3617]: available at https://www.isc.org/support
Sep 26 09:22:32 ip-68-178-130-21 named[3617]: ----------------------------------------------------
Sep 26 09:22:32 ip-68-178-130-21 named[3617]: adjusted limit on open files from 1024 to 1048576
Sep 26 09:22:32 ip-68-178-130-21 named[3617]: found 16 CPUs, using 16 worker threads
Sep 26 09:22:32 ip-68-178-130-21 named[3617]: using up to 4096 sockets
Sep 26 09:22:32 ip-68-178-130-21 named[3617]: loading configuration from '/etc/named.conf'
Sep 26 09:22:32 ip-68-178-130-21 named[3617]: reading built-in trusted keys from file '/etc/named.iscdlv.key'
Sep 26 09:22:32 ip-68-178-130-21 named[3617]: using default UDP/IPv4 port range: [1024, 65535]
Sep 26 09:22:32 ip-68-178-130-21 named[3617]: using default UDP/IPv6 port range: [1024, 65535]
David Calderwood - Euro-Pacific Digital Media
So what about my suggestions, like atop, BIND debug log and downgrading to a stable non-RC version? I'm afraid I can't help you if you don't reply to my suggestions.
The BIND debug is already on, but it does not seem to be writing to the file. I will be willing to downgrade but the current stable version is 9.8.6 and I have installed 9.8.2-0.17.rc1.el6.3 which is not the latest. I do not have atop installed, but checked it out this morning and will install.
Thanks for the advice
David Calderwood - Euro-Pacific Digital Media
One setting question for BIND.
Do full recursive lookups for clients? It is set to Yes
Should it be NO?
David Calderwood - Euro-Pacific Digital Media
I noticed that my ram is at capacity.
#free -m
total used free shared buffers cached
Mem: 3072 3072 0 0 0 0
-/+ buffers/cache: 3072 0
Swap: 0 0 0
David Calderwood - Euro-Pacific Digital Media
You'd definitely want to limit recursive queries to your local server, and not allow it for any user on the internet. That's done with the "allow-recursion" directive, or there's also an edit box in Webmin's BIND module for it. So you should set "Do fully recursive lookups" to "Default", and in "Addresses and Topology", set "Allow recursive queries from" to "Listed..." and enter in the box
As for the RAM filling up, did that happen when your BIND crashed? You really need ATOP, to have historical data about processes. It could be a script going haywire and spawning many processes, or BIND itself using too much RAM. But you won't find that out without ATOP.
When you have it installed, go to
/etc/init.d/atop
and modify the line "DARGS=", the number at the end is the interval in seconds that atop uses to store resource data. You might want to lower the default of 10 minutes to e.g. one minute, for your debugging purposes.Thanks. Have changed settings and am installing atop. Let's see what happens.
David Calderwood - Euro-Pacific Digital Media
I installed atop and was able to track down the problem.
Seemed that the dkim-mil filter and SASLAUTHD were out of control and the dkim file was around 1GB and was using 100% CPU. Seemed to have spawned an entire city of processes (150+). Still not sure why. I stopped the service uninstalled and re-installed both programs and so far, so good.
The other program that was using 100% of CPU for 20 minutes at a time was bw.pl (bandwidth monitoring) which I didn't really need so I turned that service off.
David Calderwood - Euro-Pacific Digital Media
Good to hear you found the issue! You see, atop really helps in such cases. Also for me it has helped greatly in the past when nailing down processes running rampant. :)