Prevent OS updates from chrooting Bind [#53785]

Submitted by yngens on Fri, 09/29/2017 - 11:32

This issue happens quite regularly on Virtualmin servers. Despite initially Bind is setup without chroot sometimes system updates messes up with the settings and websites become unavailable. Then we have to go to Bind's settings just to find out that somehow "Yes" is set for the Is named.conf under chroot directory? option. And it sucks especially when you are a hosting vendor and have ton's of Virtualmin servers. So it is really high time for *min team to be careful about pushing this kind of untested updates. Please respect how Bind is set initially and make further updates respect those settings.

Status:

Active

Comments

Submitted by JamieCameron on Fri, 09/29/2017 - 19:17 Comment #1

Was it an update to BIND which caused this, or an update to Webmin / Virtualmin?

Submitted by yngens on Fri, 09/29/2017 - 19:56 Comment #2

No idea which exact update causes this, but it happens regularly.

Submitted by yngens on Fri, 09/29/2017 - 19:57 Comment #3

Why Bind update would change the existing configuration? So I believe it is Webmin/Virtualmin updates.

Submitted by andreychek on Fri, 09/29/2017 - 20:20 Comment #4

Joe is looking into this, though the trouble is that so far we haven't been able to reproduce it, and we hadn't received other reports of that happening.

We're trying some different things to reproduce that, in the hopes of determining what's triggering the issue you saw.

Do you happen to know when the last time you saw this issue was? That is, what was the day, and if you know, a rough time?

And would it be possible for you to attach the /var/log/yum.log from a server you experienced this on?

What we'll do is use that day and time, and review the yum log, to try and determine a way to reproduce it.

Once we figure out how to reproduce it, we'll then sort out the best way to resolve it.

Submitted by yngens on Fri, 09/29/2017 - 22:42 Comment #5

I decided to report this after two different servers failed on our customers during last 24 hours. Might be more of such servers, that we have not just heard from our users about. Generally, the websites on such servers keep running until somehow Bind is restarted, I guess, and only then the issue reveals itself.

Ironically, after reverting the chroot settings to "No", removing all the DNS-related records and then regenerating them everything was working fine, but after reading your last reply on this thread I wanted to give you output of /var/log/yum.log, but I can't SSH into none of those two servers. We would have to hard-reboot them, but since it was more than enough of inconvenience for users of the servers we don't want to do so now.

Submitted by yngens on Fri, 09/29/2017 - 22:54 Comment #6

OMG, it is happening on every fucking Virtualmin server now. I've just tried to create a virtual server on a healthy server and it is giving:

.. BIND DNS domain failed! : Zone domain.com does not exist! at /usr/libexec/webmin/web-lib-funcs.pl line 1433.

Joe is looking into this, though the trouble is that so far we haven't been able to reproduce it, and we hadn't received other reports of that happening.

Eric,

Please don't to it. You and me and everybody else knows this fucking problem pops up regularly on the issues and forums. Here are just some examples:

https://www.virtualmin.com/node/21352 https://www.virtualmin.com/node/17017 https://www.virtualmin.com/node/20420 https://www.virtualmin.com/node/38967 https://www.virtualmin.com/node/16987 https://www.virtualmin.com/node/22811 https://www.virtualmin.com/node/257 https://www.virtualmin.com/node/22752 https://www.virtualmin.com/node/21870

Those are essentially the same problem. And that is why I created this one requesting to finally troubleshooting this to prevent automatic modifications of initial Bind configuration settings.

Submitted by yngens on Fri, 09/29/2017 - 23:28 Comment #7

So I am now in one of such servers. When I go to Webmin > Servers > Bind DNS server, it shows:

BIND DNS Server
BIND version 9.9, under chroot /var/named/chroot  
The primary configuration file for BIND /var/named/chroot/etc/named.conf does not exist, or is not valid. Create it?
 Setup nameserver for internal non-internet use only
  Setup as an internet name server, and download root server information
  Setup as an internet name server, but use Host Manager's older root server information
  Create Primary Configuration File and Start Nameserver

though we never ever used chrooted Bind. So I am ignoring that page and going to Module Config and there it is:

Chroot directory to run BIND under is set to /var/named/chroot

and

Is named.conf under chroot directory? is set to "Yes".

And here is the last entries in /var/log/yum.log

Sep 14 04:30:42 Updated: mdadm.x86_64 4.0-5.el7
Sep 14 04:30:42 Updated: bind.x86_64 32:9.9.4-51.el7
Sep 14 04:30:43 Installed: grub2-tools-efi.x86_64 1:2.02-0.64.el7.centos
Sep 14 04:30:43 Updated: authconfig.x86_64 6.2.8-30.el7
Sep 14 04:30:43 Updated: sudo.x86_64 1.8.19p2-11.el7_4
Sep 14 04:30:43 Installed: systemd-libs.i686 219-42.el7_4.1
Sep 14 04:30:43 Installed: rdma-core.i686 13-7.el7
Sep 14 04:30:45 Erased: rdma
Sep 14 04:31:28 Erased: pygobject3-base
Sep 15 09:13:12 Updated: postgresql-libs.x86_64 9.2.23-1.el7_4
Sep 15 09:13:13 Updated: postgresql.x86_64 9.2.23-1.el7_4
Sep 15 09:13:14 Updated: postgresql-server.x86_64 9.2.23-1.el7_4
Sep 21 06:24:56 Updated: httpd-tools.x86_64 1:2.4.6-67.el7.centos.2.vm
Sep 21 06:24:57 Updated: httpd.x86_64 1:2.4.6-67.el7.centos.2.vm
Sep 21 06:24:57 Updated: mod_ssl.x86_64 2:2.4.6-67.el7.centos.2.vm
Sep 23 07:53:07 Updated: wbm-virtualmin-awstats.noarch 2:5.6-2

I am not sure if the updates which took place on Sep 23rd, 21st and 15th effected this issue, but the last Bind update seems took place on Sep 14 04:30:42:

Sep 14 04:30:42 Updated: bind.x86_64 32:9.9.4-51.el7

and if I run yum info bind.x86_64 then it shows:

yum info bind.x86_64
Installed Packages
Name        : bind
Arch        : x86_64
Epoch       : 32
Version     : 9.9.4
Release     : 51.el7
Size        : 4.3 M
Repo        : installed
From repo   : updates
Summary     : The Berkeley Internet Name Domain (BIND) DNS (Domain Name System) server
URL         : http://www.isc.org/products/BIND/
License     : ISC
Description : BIND (Berkeley Internet Name Domain) is an implementation of the DNS
            : (Domain Name System) protocols. BIND includes a DNS server (named),
            : which resolves host names to IP addresses; a resolver library
            : (routines for applications to use when interfacing with DNS); and
            : tools for verifying that the DNS server is operating properly.

so it seems to have updated from the updates repo.

But then if Virtualmin has nothing to do with the update, then it must:

catch this up and automatically modify back to non-chrooted Bind; or
switch everybody to start using chrooted Bind; or
completely take over Bind updates to prevent overriding the existing Bind settings.

Please understand that we are using non-chrooted Bind just because we always follow *-min ways of doing things and if this particular setting is outdated and became just legacy one, then please make it clear for Virtualmin community and help everybody to switch to chrooted Bind to avoid this kind of massive problems as we are having now.

Submitted by yngens on Fri, 09/29/2017 - 23:30 Comment #8

Some relevant quotes from https://www.virtualmin.com/node/20420:

Submitted by andreychek on Mon, 12/12/2011 - 21:16 Comment #8 Okay, that setting should work for the time being -- the next Webmin release should work around the problem that you're seeing though, so if you're interested in using the chroot setup, you should be able to when the next Webmin version comes out.

Submitted by JamieCameron on Mon, 02/27/2012 - 14:23 Comment #16 Ok, I see the underlying issue now .. on CentOS 6, Webmin and Virtualmin are by default using a chroot directory when they shouldn't be. This also causes the code in the Virtualmin installer that fixes the listen directives in named.conf to fail. I will fix this in the next Webmin release. Till then, the work-around is to disable chroot at Webmin -> Servers -> BIND DNS Server -> Module Config.

Just imagine: it was back in 2012! Five long years ago! And I just can't wondering aren't you guys ever tired of demonstrating naivety when you actually do know where the problem comes from thanks to many reports on the same issue throughout years passed?!

Why don't you completely prevent Bind from updating from other repositories and let Virtualmin repository control it?!

Submitted by andreychek on Fri, 09/29/2017 - 23:44 Comment #9

Yup, we understand, and that's why we're trying to help you :-)

We run CentOS on our own servers here, and haven't ever run into that issue... and as I mentioned above, Joe is trying to reproduce what you described so we can help.

We understand it's happened on systems you're managing, but it's not happening on the majority of systems out there.

Joe and I talked about this today, and during installation, we're not aware of anything during the Virtualmin install that's changing the chroot settings.

It should be using whatever setup is already on the server.

If it's chroot it'll run with that, and if it's not chroot it'll run without.

As of now, we're unfortunately not sure what might be causing that, but we're looking into it, and going over how it could be prevented or otherwise dealt with.

Thank you for your yum logs, that does indeed help.

Submitted by JamieCameron on Sat, 09/30/2017 - 00:04 Comment #10

I think I know what may be causing this (and it's certainly not the same issue as in 2012!).

The cause appears to be that Webmin thinks that you're using a chroot directory when you really aren't, rather than the actual configuration of BIND being changed.

@yngens - can you post the contents of /etc/webmin/bind8/config on one of your CentOS 7 systems that isn't having this problem?

Submitted by yngens on Sat, 09/30/2017 - 00:17 Comment #11

@yngens - can you post the contents of /etc/webmin/bind8/config on one of your CentOS 7 systems that isn't having this problem?

Here it is:

cat /etc/webmin/bind8/config
updserial_man=1
keygen=dnssec-keygen
checkconf=named-checkconf
tmpl_dnssec=0
default_prins=
pid_file=/var/run/named.pid /run/named.pid
named_conf=/etc/named.conf
restart_cmd=restart
rev_must=0
soa_start=0
file_perms=
extra_reverse=
records_order=0
reversezonefilename_format=ZONE.rev
master_dir=/var/named
master_ttl=1
allow_comments=0
no_chroot=1
force_random=0
dnssec_period=21
named_path=/usr/sbin/named
whois_cmd=whois
file_owner=
ndc_cmd=ndc
named_group=
spf_record=0
show_list=1
rev_def=0
dnssectools_conf=/etc/dnssec-tools/dnssec-tools.conf
forwardzonefilename_format=ZONE.hosts
default_view=
rndcconf_cmd=rndc-confgen
start_cmd=systemctl start named
dnssectools_rollrec=/var/named/system.rollrec
rndc_conf=/etc/rndc.conf
signzone=dnssec-signzone
extra_forward=
ipv6_mode=1
slave_dir=/var/named/slaves
keys_dir=
soa_style=0
max_zones=50
largezones=0
other_slaves=1
dnssectools_keydir=/var/named/dtkeys
auto_chroot=sh -c '. /etc/sysconfig/named && echo "$ROOTDIR"'
updserial_def=0
relative_paths=0
no_pid_chroot=0
short_names=0
default_master=
chroot=
updserial_on=1
dnssectools_rollmgr_pidfile=/var/run/rollmgr.pid
allow_long=0
checkzone=named-checkzone
allow_wild=1
this_ip=
stop_cmd=systemctl stop named
named_user=
confirm_zone=1
by_view=0
tmpl_dnssec_dt=1
free_nets=
zones_file=
extra_slaves=
support_aaaa=1
confirm_rec=0
allow_underscore=1
rndc_cmd=rndc

but note that it's after I manually fixed the issue.

Submitted by yngens on Sat, 09/30/2017 - 00:21 Comment #12

I've just logged into arbitrary server from our pool and, guess what, it is of course, broken and waiting until Bind gets restarted and customer runs into trouble. And here the output of the requested file:

cat /etc/webmin/bind8/config
dnssectools_keydir=/var/named/dtkeys
auto_chroot=sh -c '. /etc/sysconfig/named && echo "$ROOTDIR"'
updserial_man=1
keygen=dnssec-keygen
checkconf=named-checkconf
tmpl_dnssec=0
updserial_def=0
pid_file=/var/run/named.pid /run/named.pid
named_conf=/etc/named.conf
restart_cmd=restart
relative_paths=0
rev_must=0
soa_start=0
records_order=0
reversezonefilename_format=ZONE.rev
no_pid_chroot=0
short_names=0
master_dir=/var/named
master_ttl=1
chroot=/var/named/chroot
allow_comments=0
no_chroot=0
force_random=0
dnssec_period=21
updserial_on=1
named_path=/usr/sbin/named
whois_cmd=whois
dnssectools_rollmgr_pidfile=/var/run/rollmgr.pid
ndc_cmd=ndc
allow_long=0
checkzone=named-checkzone
allow_wild=1
spf_record=0
show_list=1
rev_def=0
stop_cmd=service named stop
dnssectools_conf=/etc/dnssec-tools/dnssec-tools.conf
confirm_zone=1
forwardzonefilename_format=ZONE.hosts
by_view=0
tmpl_dnssec_dt=1
rndcconf_cmd=rndc-confgen
start_cmd=service named start
dnssectools_rollrec=/var/named/system.rollrec
rndc_conf=/etc/rndc.conf
signzone=dnssec-signzone
support_aaaa=1
ipv6_mode=1
slave_dir=/var/named/slaves
confirm_rec=0
soa_style=0
max_zones=50
largezones=0
rndc_cmd=rndc
allow_underscore=1
other_slaves=1

Submitted by yngens on Sat, 09/30/2017 - 00:31 Comment #13

Guys, whatever you say I do believe it is the same issue which was discussed on https://www.virtualmin.com/node/20420 several years ago.

The solution is exactly the same: by changing the no_chroot=0 to no_chroot=1, the chroot=/var/named/chroot to chroot= and restarting Bind.

Submitted by JamieCameron on Sat, 09/30/2017 - 12:32 Comment #14

Ok, so the real issue is the chroot=/var/named/chroot line - if that's being changed on upgrade, that's a bug.

What does /etc/sysconfig/named contain on your system?

Submitted by yngens on Sat, 09/30/2017 - 14:08 Comment #15

cat /etc/sysconfig/named
# BIND named process options
# ~~~~~~~~~~~~~~~~~~~~~~~~~~
#
# OPTIONS="whatever"     --  These additional options will be passed to named
#                            at startup. Don't add -t here, enable proper
#                            -chroot.service unit file.
#                            Use of parameter -c is not supported here. Extend
#                            systemd named*.service instead. For more
#                            information please read the following KB article:
#                            https://access.redhat.com/articles/2986001
#
# DISABLE_ZONE_CHECKING  --  By default, service file calls named-checkzone
#                            utility for every zone to ensure all zones are
#                            valid before named starts. If you set this option
#                            to 'yes' then service file doesn't perform those
#                            checks.

Submitted by JamieCameron on Sun, 10/01/2017 - 20:01 Comment #16

Ok, so chroot definitely isn't in use by BIND.

Do you happen to recall which Webmin version you upgraded from and to which triggered this?

Submitted by andreychek on Sun, 10/01/2017 - 21:17 Comment #17

I sent Jamie an email about this, and we're going over the packages that were recently updated, amongst other things.

The recent yum.log file contents are available in Comment #7 above.

Submitted by yngens on Sun, 10/01/2017 - 21:54 Comment #18

Do you happen to recall which Webmin version you upgraded from and to which triggered this?

Unfortunately, I don't know which Webmin version the systems have upgraded from, but I guess it was just one previous than current one. And the current one is:

Webmin version 1.852 
Usermin version 1.720
Virtualmin version   6.00

The only thing that might play here is that the CentOS 7.x on this systems were pulled not from Virtualmin repositories (in other words these systems are not Cloudmin deployed guest systems), so Virtualmin setup file was pulled directly from http://software.virtualmin.com/gpl/scripts/install.sh and not from Cloudmin template.

Submitted by JamieCameron on Sun, 10/01/2017 - 23:51 Comment #19

This should be in your YUM log - if you can roughly correlate the updates logged there with when the problem happened, we can figure out which version caused it. I'm interested to know because I'm pretty sure this is fixed in the latest code - at least, I can't reproduce this problem on any CentOS 7 test systems.

Submitted by yngens on Mon, 10/02/2017 - 02:42 Comment #20

I already posted yum log and elaborated on the dates. Read above, James.

If you fixed this after I reported about the problem here, then of course you won't be able to reproduce it anymore. I would really glad to see more frequent acknowledgements on Virtualmin issue pages. Unfortunately, there are tons of "I can't reproduce this" notices from your team when in fact you know perfectly well the reported problems were there. Improve the way you interact with your userbase to make your products better.

Submitted by andreychek on Mon, 10/02/2017 - 09:09 Comment #21

yngens, we're trying to help, and there's some really puzzling aspects to this.

You mentioned in your posts above that the issue occurred within 24 hours of posting (which would have been on September 28/29).

However, the updates there had been installed many days prior to that. In fact, BIND, one of the possible culprits, had been installed a full 10 days prior to that.

That's unfortunately not a clear chain of events, it's difficult to blame this issue on updates that were performed a week and a half earlier.

Further, there are thousands of Webmin and Virtualmin installations on CentOS that are not experiencing this issue, including our own.

I'm sorry if we've repeated a question, but if the answer here was simple, we would have resolved it already :-)

Can you confirm that my description of the timeline above is correct? That is, that this issue started happening on or around September 28/29?

And can you look in your yum logs, and let us know when the last time that Webmin was updated? That is, when was Webmin upgraded to 1.852?