Bind DNS Cluster Problems

22 posts / 0 new
Last post
#1 Wed, 05/06/2009 - 09:15
thedatabackup

Bind DNS Cluster Problems

I am unable to get the Bind Cluster feature to fill in the DNS records on a slave server. It creates the file but will not fill it with the records leaving empty files on the slave. I have opened ports 10000:10100 on between both servers. The slave has been added to the webmin servers index. Any ideas?

I did see one of these messages for each domain in /var/log/messages

May 6 13:58:21 mx4 kernel: [ 1996.732132] audit(1241632701.314:612): type=1503 operation="inode_link" requested_mask="rw::rwl" denied_mask="::l" name="/var/cache/bind/db-TFCMVf8$f8J" name2="/var/cache/bind/dmain.org.hosts" pid=4519 profile="/usr/sbin/named" namespace="default"

Wed, 05/06/2009 - 09:22
andreychek

Hmm, if you type the command "getenforce" on the server that had the messages you're describing above, what output to you get?
-Eric

Wed, 05/06/2009 - 11:51 (Reply to #2)
thedatabackup

Master:
#getenforce
Disabled

Slave:
#getenforce
The program 'getenforce' is currently not installed

Wed, 05/06/2009 - 11:52 (Reply to #3)
thedatabackup

Master = Centos 5.3

Slave = Ubuntu Server 8

Wed, 05/06/2009 - 11:59 (Reply to #4)
andreychek

Interesting!

Which of those two servers was it that you saw that error in the messages file?
-Eric

Wed, 05/06/2009 - 12:02 (Reply to #5)
thedatabackup

Those messages were on the slave server.

Thu, 05/07/2009 - 07:26 (Reply to #6)
thedatabackup

I reloaded the ubuntu server with Centos and these are the errors now:

May 7 12:12:48 localhost named[18475]: zone domain.com/IN: Transfer started.
May 7 12:12:48 localhost named[18475]: transfer of 'domain.com/IN' from xx.xxx.xx.xxx#53: connected using xx.xxx.xx.xxx#44406
May 7 12:12:49 localhost named[18475]: transfer of 'domain.com/IN' from xx.xxx.xx.xxx#53: failed while receiving responses: REFUSED
May 7 12:12:49 localhost named[18475]: transfer of 'domain.com/IN' from xx.xxx.xx.xxx#53: end of transfer

Fri, 05/08/2009 - 08:25 (Reply to #7)
andreychek

Howdy,

I take it this error shows up on the master dns server?

DNS zone transfers occur on port 53 TCP, rather than the 53 UDP that DNS requests occur on.

Is port 53 TCP open on the slave DNS server, and is there an "allow-transfer" line in the named config file allowing access from the IP of your master DNS server?

Also, on the slave, you may see errors in the logfiles that explain the issue more clearly.
-Eric

Mon, 05/11/2009 - 05:36 (Reply to #8)
ronald
ronald's picture

I've had these errors for a long time myself.
A temporary solution is to chmod 777 the directories on the slave. That is in /var/named or where ever the zones are created on your system (may be chrooted)

in the end the solution was user/group names and rights. When a folder belongs to user root or user bind than the user named can't write to it as was in my case.

the zones created end up with weird looking names like hj5yG6dsq4

Thu, 05/21/2009 - 23:15 (Reply to #9)
drhiii

Hello Joe,

Tx for your response here.

Yessir, I am still getting the non-auth response.

And I am with you on the idea that there may be new sanity checks that decided to reveal themselves in this upgrade across now three separate server environments. It was not Webmin, but also postfix and several apps that were upgraded. The config files remained the same. But now three totally separate setups, each one has exhibited the same response to the upgrades. And each one worked prior to all this.

I've walked through reconfigurations until my face melted, and had to step back for a day or so as it is forest/trees in my brain. But I agree with you that there appears to be a policy check that has come into play since the upgrades on these three separate installations.

Will return to this and triple triple triple check the zone records across master and slaves. It is perplexing. Very.

Appreciate your response to this. I will solve it, but it looks like I am in a fist fight with what will end up being something like a syntax error, a typo, probably something as simple as that.... big sigh.

Thu, 05/21/2009 - 02:03
drhiii

Hello. I have an interest in this thread as I've been mucking around for two days trying to solve this.

Had working primary and secondary DNS servers. Serviced thru webmin. Worked a peach. A few days ago after an update to the OS, it broke. Have been trying to get it working since.

Servers are "communicating" with each other. A zone will get created on the secondary, but no records get transferred. I have run the gamut of making sure the firewall is correct (it is), rndc, acls, reconfigured up the yin/yang. Am not sure why I keep getting the 'not authoritative' as I have followed everything I can find.

Ideas anyone?

Also, I cannot for the life of me figure out how to transfer all of the zones at once, once I can get the slave server responding properly. Am perplexed. Help?

May 21 04:46:29 tpl3 named[26957]: client 192.168.33.44#32908: received notify for zone '4455.tplweb.com': TSIG 'transfer': not authoritative
May 21 04:46:29 tpl3 named[26957]: received SIGHUP signal to reload zones
May 21 04:46:29 tpl3 named[26957]: loading configuration from '/etc/bind/named.conf'
May 21 04:46:29 tpl3 named[26957]: default max-cache-size (33554432) applies
May 21 04:46:29 tpl3 named[26957]: default max-cache-size (33554432) applies: view _bind
May 21 04:46:29 tpl3 named[26957]: reloading configuration succeeded
May 21 04:46:29 tpl3 named[26957]: zone 4455.xxyyzz.com/IN: has 0 SOA records
May 21 04:46:29 tpl3 named[26957]: zone 4455.xxyyzz.com/IN: has no NS records
May 21 04:46:29 tpl3 named[26957]: reloading zones succeeded
May 21 04:46:29 tpl3 named[26957]: zone 4455.xxyyzz.com/IN: Transfer started.
May 21 04:46:29 tpl3 named[26957]: transfer of '4455.xxyyzz.com/IN' from 75.148.112.153#53: connected using 192.168.33.44#37739
May 21 04:46:29 tpl3 named[26957]: zone 4455.xxyyzz.com/IN: transferred serial 1242902769: TSIG 'transfer'
May 21 04:46:29 tpl3 named[26957]: transfer of '4455.xxyyzz.com/IN' from 192.168.33.44#53: Transfer completed: 1 messages, 4 records, 243 bytes, 0.001 secs (243000 bytes/sec)
May 21 04:46:29 tpl3 named[26957]: zone 4455.xxyyzz.com/IN: sending notifies (serial 1242902769)

Thu, 05/21/2009 - 02:13 (Reply to #11)
drhiii

Adding to the above message... I can now create a zone, and it appears a single record gets added. But I cannot add addition records to it. It always stays at 1 record, regardless of whether I add two or twenty. And the other existing records never get updated.

Help, ideas?

Thu, 05/21/2009 - 14:22 (Reply to #12)
Joe
Joe's picture

Are you still getting the non-authoritative error?

If so, you might double check to be sure the NS records for these zones match your master and slave. I'm just guessing, but maybe there's a new safety/sanity check in a new BIND release or something.

--

Check out the forum guidelines!

Sat, 06/13/2009 - 15:30
Tito

Has anyone been able to fix this issue? I am also getting empty zone files when zones are transferred from my master server (virtualmin) to my slave server (slave dns).

I am not getting any "non-authoritative" errors. Does anyone know why the zone files are coming up empty?

Sat, 06/13/2009 - 15:39 (Reply to #14)
andreychek

Are you getting any errors at all, on either the master system or the slave? If not, do you see any named messages at all?

In addition, you may also want to make sure SELinux is disabled, it's capable of causing random trouble :-) -Eric

Sat, 06/13/2009 - 16:04
Tito

I just realized the following on my slave server:

When I go to a zone file via webmin and click on "apply zone" on the top right hand corner, I get this error: "NDC command failed : rndc: connect failed: 127.0.0.1#953: connection refused"

When I look at the named logs of the slave server, I see the following on various log files for named in the slave server:

13-Jun-2009 15:46:50.506 zone mydomain.net/IN: refresh: failure trying master 192.192.192.45.192.192.192.45#53 (source 0.0.0.0#0): operation canceled 13-Jun-2009 15:46:50.506 zone mydomain.com/IN: refresh: failure trying master 192.192.192.45#53 (source 0.0.0.0#0): operation canceled 13-Jun-2009 15:46:50.506 zone mydomain.com/IN: refresh: failure trying master 192.168.30.45#53 (source 0.0.0.0#0): operation canceled 13-Jun-2009 15:46:50.506 zone mydomain.com/IN: refresh: failure trying master 192.192.30.45#53 (source 0.0.0.0#0): operation canceled

13-Jun-2009 15:24:43.784 no longer listening on 127.0.0.1#53 13-Jun-2009 15:24:43.784 no longer listening on 209.2.2.4#53 13-Jun-2009 15:46:50.505 no longer listening on 127.0.0.1#53 13-Jun-2009 15:46:50.505 no longer listening on 209.2.2.4#53

13-Jun-2009 15:46:03.867 transfer of 'mydomain.com/IN' from 209.2.2.4#53: failed to connect: connection refused 13-Jun-2009 15:46:03.867 transfer of 'mydomain.com/IN' from 209.2.2.4#53: end of transfer 13-Jun-2009 15:46:16.532 transfer of 'mydomain.net/IN' from 209.2.2.4#53: failed to connect: connection refused 13-Jun-2009 15:46:16.532 transfer of 'mydomain.net/IN' from 209.2.2.4#53: end of transfer

Sat, 06/13/2009 - 16:13
Tito

And yep, I made sure that port 53 udp and tcp was open on both the master and slave servers and also port 953 tcp only on both master and slave servers

Sat, 06/13/2009 - 16:45 (Reply to #17)
andreychek

On the master server, what does this show:

netstat -an | grep :53

Also, in the named conf file, do you have any lines with "allow-transfer" in them? -Eric

Sat, 06/13/2009 - 16:47
Tito

YEAAAHH!!!

Fixed...here is what I figured out and hopefully this will fix it for others:

On my slave server, from my previous post, you can tell that I was getting connection refused even though I had the proper ports open in iptables.

So, I decided to grep the heck out of /var/named for the word "Listen|listen" and "Port|port" and came across this line in my grep result:

"listen-on"

That line is in:

"/var/named/chroot/var/log/named"

Once you go into that file, search for "option" and within that bracket, change:

"listen-on { any; };"

To

"//listen-on { any; };" "listen-on port 53 { 127.0.0.1; 209.2.2.4; };" "That is not my real public IP ;-)"

Notice that all I did was comment it out. Then, restart named and BAM!!...well, I can only speak for myself because it worked for me but hopefully it works for the rest of you.

Sat, 06/13/2009 - 16:49
Tito

Eric,

Yep, I had the allow-transfer line in there already set to my localhost ip as well as my_pub_ip/16 so that wasn't it.

I very much appreciate your help though...thanks a bunch to you and the community!!!

Wed, 06/23/2010 - 13:17
bitworks

I have the same issue, one master and two slaves, transfers the files, but no records. Tried changing permissions to 777 and then removed the slave server from the cluster deleted zones from the slave.

Strange behavior is that the zone comes over as the right file name and then several minutes later changes to the db-XXkljsadf sort of format. I am suspecting that something might be running as root or having problems with the chroot shell.

Just retried this process again:

I see the zones created as the correct file names on the host. Everything seems to show up fine in Webmin and then a minute or so later I see this in the logs:

Jun 23 09:59:09 ns2 named[26700]: zone winds.org/IN: has 0 SOA records Jun 23 09:59:09 ns2 named[26700]: zone winds.org/IN: has no NS records Jun 23 09:59:09 ns2 named[26700]: zone winds.org/IN: saved '/var/winds.org' as '/var/db-VS7PCGTU'

These are imported zone files(on the master), but the same thing happens if I create a master manually.

Well, more troubleshooting reveals the problem must be on master server. If I take the named.conf from the master and move it to another server without webmin running bind, start it up. Runs fine and does the xfer. Something in webmin is getting in the way.

Sort of got it halfway working by using the old master server in the webmin config, but it does not transfer from the new box. Is there something in the software that checks SOA records with an external server?

Fri, 12/24/2010 - 13:37
stelios

Any update on this? I've got the same problem as bitworks mentioned earlier.

Topic locked