Let's Encrypt DNS Problem

3 posts / 0 new
Last post
#1 Thu, 08/22/2019 - 01:41
paulzag

Let's Encrypt DNS Problem

Something's changed after running flawlessly since late 2017.

I'm suspicious it's may have something to do with DNS replication if _acme-challenge.zagz.com TXT record isn't correct from a slave server?

Requesting a certificate for zagz.com from Let's Encrypt ..
.. request failed : Web-based validation failed : Failed to request certificate :
zagz.com challenge did not pass: dns :: DNS problem: query timed out looking up A for zagz.com
DNS-based validation failed : Failed to request certificate :
zagz.com challenge did not pass: DNS problem: SERVFAIL looking up TXT for _acme-challenge.zagz.com
$ dig A zagz.com @ns1.zagz.com

; <<>> DiG 9.10.6 <<>> A zagz.com @ns1.zagz.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 46174
;; flags: qr aa rd; QUERY: 1, ANSWER: 1, AUTHORITY: 4, ADDITIONAL: 5
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;zagz.com. IN A

;; ANSWER SECTION:
zagz.com. 60 IN A 107.174.101.239

;; AUTHORITY SECTION:
zagz.com. 60 IN NS ns1.zagz.com.
zagz.com. 60 IN NS ns4.zagz.com.
zagz.com. 60 IN NS ns3.zagz.com.
zagz.com. 60 IN NS ns2.zagz.com.

;; ADDITIONAL SECTION:
ns1.zagz.com. 60 IN A 107.174.101.239
ns2.zagz.com. 60 IN A 165.22.165.151
ns3.zagz.com. 60 IN A 198.46.129.251
ns4.zagz.com. 60 IN A 107.172.94.45

;; Query time: 334 msec
;; SERVER: 107.174.101.239#53(107.174.101.239)
;; WHEN: Thu Aug 22 16:26:32 AEST 2019
;; MSG SIZEĀ  rcvd: 189
$ dig TXT _acme-challenge.zagz.com

; <<>> DiG 9.10.6 <<>> TXT _acme-challenge.zagz.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 4787
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;_acme-challenge.zagz.com. IN TXT

;; ANSWER SECTION:
_acme-challenge.zagz.com. 5 IN TXT "S2nPWyirMrx--JvuhXii1JM9EuvvoD-u7QOcDAjFWLI"

;; Query time: 252 msec
;; SERVER: 192.168.15.1#53(192.168.15.1)
;; WHEN: Thu Aug 22 16:28:46 AEST 2019
;; MSG SIZEĀ  rcvd: 109
# grep known /var/log/virtualmin/ravioli.zagz.com_access_log
107.174.101.239 - - [21/Aug/2019:03:50:56 -0400] "GET /.well-known/acme-challenge/tblBjtLhoSV9sgAgDcnDkrXSSvew-XltlGpQUZg-b5k HTTP/1.1" 200 301 "-" "Python-urllib/2.7"
107.174.101.239 - - [21/Aug/2019:04:04:12 -0400] "GET /.well-known/acme-challenge/lsQuJ9uAlCiu753KWMZ7z1yUEeoPb5StHpcjRweVSzo HTTP/1.1" 200 301 "-" "Python-urllib/2.7"
107.174.101.239 - - [21/Aug/2019:12:38:06 -0400] "GET /.well-known/acme-challenge/nyOpxT1zSPDWU-2TeJByhHMVe5S8rJhBPE6m2WB0bXI HTTP/1.1" 200 301 "-" "Python-urllib/2.7"
# grep known /var/log/virtualmin/zagz.com_access_log
107.174.101.239 - - [21/Aug/2019:09:13:07 -0400] "GET /.well-known/acme-challenge/BgbU5yy4wMjUDkK6cDrUozLJBWSbUN-TjfeV7_DfHys HTTP/1.1" 301 573 "-" "Python-urllib/2.7"
107.174.101.239 - - [21/Aug/2019:10:18:06 -0400] "GET /.well-known/acme-challenge/02rl4WNhsAdC-3L7xcc0r9XYmyP46N6Eo6M6BVSSB5k HTTP/1.1" 301 573 "-" "Python-urllib/2.7"
107.174.101.239 - - [21/Aug/2019:10:29:18 -0400] "GET /.well-known/acme-challenge/mqlVxiz9al_klCp4CKWHyoCRfqVpuZP70hwsxUSashU HTTP/1.1" 301 573 "-" "Python-urllib/2.7"
107.174.101.239 - - [21/Aug/2019:11:23:09 -0400] "GET /.well-known/acme-challenge/TK_vUcZxlAQDdZxP8JCeMPv0_rxEM_x9_RBHldH9s_Q HTTP/1.1" 301 573 "-" "Python-urllib/2.7"
107.174.101.239 - - [21/Aug/2019:12:28:05 -0400] "GET /.well-known/acme-challenge/wLX3z3ALnDzvg004Aaa8ZCy0UtJA6SgkLIdjSYyu0RM HTTP/1.1" 301 573 "-" "Python-urllib/2.7"
107.174.101.239 - - [21/Aug/2019:13:33:09 -0400] "GET /.well-known/acme-challenge/5lDzC03JL20aZlN7ydxwezP7_vVjtoWnAeyHIQeAp_Q HTTP/1.1" 301 573 "-" "Python-urllib/2.7"
107.174.101.239 - - [21/Aug/2019:14:38:12 -0400] "GET /.well-known/acme-challenge/Q0Co3x6tF08A0Y8e7nePNKwBw7iJwROMBgFsLAz5f0k HTTP/1.1" 301 573 "-" "Python-urllib/2.7"
107.174.101.239 - - [21/Aug/2019:15:43:05 -0400] "GET /.well-known/acme-challenge/tbogdWxj3l5D4NmqNoyMIBWpyDDo-4ZbdVDl11Bi_ho HTTP/1.1" 301 573 "-" "Python-urllib/2.7"
107.174.101.239 - - [21/Aug/2019:16:48:08 -0400] "GET /.well-known/acme-challenge/_NpRqw23FGg0tJuPmtZu_se0wKA6a9YSkAPZe5SeS-g HTTP/1.1" 301 573 "-" "Python-urllib/2.7"
107.174.101.239 - - [21/Aug/2019:17:53:12 -0400] "GET /.well-known/acme-challenge/GWp8ShtTKQOJrf-UoxhIW_pFeFvLApYymELg-lEFK2Y HTTP/1.1" 301 573 "-" "Python-urllib/2.7"
107.174.101.239 - - [21/Aug/2019:18:58:07 -0400] "GET /.well-known/acme-challenge/9jvK1GtJ9Bi7obYJn2FMquOlg6cQ-T4rwlSQJKeMCDY HTTP/1.1" 301 573 "-" "Python-urllib/2.7"
107.174.101.239 - - [21/Aug/2019:20:03:10 -0400] "GET /.well-known/acme-challenge/U5SdlYLcY7a3xbiMXLkMJU8D97bVMgt6mZGbc9YXKaI HTTP/1.1" 301 573 "-" "Python-urllib/2.7"
107.174.101.239 - - [21/Aug/2019:21:08:05 -0400] "GET /.well-known/acme-challenge/_DdlEEvXaE77hdrkjxRwYlnnCaVHCzP4S_tt-FD1ktk HTTP/1.1" 301 573 "-" "Python-urllib/2.7"
107.174.101.239 - - [21/Aug/2019:22:13:04 -0400] "GET /.well-known/acme-challenge/4Fe7jWSPFXBbIHOLAEPdOP4TR_ToGPqp_NK3e9weWOs HTTP/1.1" 301 573 "-" "Python-urllib/2.7"
107.174.101.239 - - [21/Aug/2019:23:18:08 -0400] "GET /.well-known/acme-challenge/RB7G6vfkF_2lvK4AK6E0BI0BDlOBSbTnxRbmhFvlJkw HTTP/1.1" 301 573 "-" "Python-urllib/2.7"
107.174.101.239 - - [22/Aug/2019:00:23:11 -0400] "GET /.well-known/acme-challenge/S6nS7Bu8N1EJQzrWkSOfF5uD-K055GH2bxG2vQdz-kk HTTP/1.1" 301 573 "-" "Python-urllib/2.7"
107.174.101.239 - - [22/Aug/2019:01:28:14 -0400] "GET /.well-known/acme-challenge/peqEIIPj6BOZ5ngShkDWXqcPTHJLtIpmX-u0DEFCEP4 HTTP/1.1" 301 573 "-" "Python-urllib/2.7"
107.174.101.239 - - [22/Aug/2019:02:33:13 -0400] "GET /.well-known/acme-challenge/um4NOCP5_iLA8WWI2LRFIKs4AsqIIh6Nu2FfapcbCAE HTTP/1.1" 301 573 "-" "Python-urllib/2.7"
107.174.101.239 - - [22/Aug/2019:03:38:08 -0400] "GET /.well-known/acme-challenge/xCNeRMGuyNHYiQh1JAq9jwgV09SiYvmP3NMM5udyF4Q HTTP/1.1" 301 573 "-" "Python-urllib/2.7"
59.167.220.74 - - [22/Aug/2019:04:06:47 -0400] "GET /.well-known/acme-challenge/ HTTP/1.1" 301 524 "-" "Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Firefox/60.0"
107.174.101.239 - - [22/Aug/2019:04:43:10 -0400] "GET /.well-known/acme-challenge/WnP13ElasR15raDdqSNYuqdBbvoMAh0Orn9nKeQD3_c HTTP/1.1" 301 573 "-" "Python-urllib/2.7"
107.174.101.239 - - [22/Aug/2019:05:48:14 -0400] "GET /.well-known/acme-challenge/-Ht74gCsLTw2PWgM_rUaq4WeZfaiE-tJPW1Ik9qxYfY HTTP/1.1" 301 573 "-" "Python-urllib/2.7"
$ dig TXT _acme-challenge.zagz.com @ns1.zagz.com

(and all the way to @ns4) works.

DNS checks at https://mxtoolbox.com all pass.

zagz.com is hosted on 107.174.101.239, which also has a separate virtual server called ravioli.zagz.com so it can apply for its own Let's Encrypt SSL cert for non- zagz.com domains to have SSL mail.

Both Renewal and Request certificate is also now broken for ravioli.zagz.com virtual server. Other domains on this server occasionally fail to renew for a few hours but normally come good automatically.

It seems like http://zagz.com/.well-known/acme-challenge/ redirects to HTTPS in my WordPress .htaccess and somewhere else.

From ravioli.zagz.com Virtualmin > Server Configuration > SSL Certificate

Last successful renewal 06/23/2019 2:34 AM
Last failed renewal 08/22/2019 2:38 AM
Renewal failed due to Web-based validation failed : Failed to request certificate :
ravioli.zagz.com challenge did not pass: dns :: DNS problem: SERVFAIL looking up A for ravioli.zagz.com
Thu, 08/22/2019 - 21:24
paulzag

Fixed it.

For others with a similar problem...

I installed certbot for some added visibility to the process.

That led me to check https://letsdebug.net to get details of the problem. It returned

Test result for zagz.com using http-01
DNSLookupFailed
FATAL
A fatal issue occurred during the DNS lookup process for zagz.com/CAA.
DNS response for zagz.com had fatal DNSSEC issues: validation failure <zagz.com. CAA IN>: No DNSKEY record from 165.22.165.151 for key zagz.com. while building chain of trust
DNSLookupFailed
FATAL
A fatal issue occurred during the DNS lookup process for zagz.com/A.
DNS response for zagz.com had fatal DNSSEC issues: validation failure <zagz.com. A IN>: No DNSKEY record from 198.46.129.251 for key zagz.com. while building chain of trust
DNSLookupFailed
FATAL
A fatal issue occurred during the DNS lookup process for zagz.com/AAAA.
DNS response for zagz.com had fatal DNSSEC issues: validation failure <zagz.com. AAAA IN>: No DNSKEY record from 107.172.94.45 for key zagz.com. while building chain of trust
NoRecords
FATAL
No valid A or AAAA records could be ultimately resolved for zagz.com. This means that Let's Encrypt would not be able to to connect to your domain to perform HTTP validation, since it would not know where to connect to.
No A or AAAA records found.

That led me to look at DNSSEC. For now I deleted DNSSEC key on both my Virtualmin virtual server AND on my registrar's DNSSEC DS entries.

Letsencrypt worked like a charm. Maybe my DNSSEC key got changed in all the fiddling recently. So next step is to reenable DNSSEC then renew the certificate.

Fri, 09/06/2019 - 05:07
Iam-TJ

This is a MAJOR bug with the interaction of the lets-encrypt scripts and the DNS zone file which I've been witnessing recently. It affects a scenario where a domain (call it "example.com") is using Let's Encrypt for X509 certificates and uses BIND DNS server with the domain configured to use DNSSEC. The problem is that the certificate renewal fails due to a DNS "SERVFAIL" reported in "/var/log/letsencrypt/letsencrypt.log" which is in turn caused by a bad signature on the zone's SOA record.

I'm currently in a situation where two domains which should have been renewed by August 31st are still failing to renew due to this. Approximately every 65 minutes Virtualmin retries the renewal and fails again.

Every time Virtualmin tries to renew the X509 certificate it tampers with the DNS zone file (where from I've not pinpointed yet) such that the zone's SOA serial number is incremented BUT the changed SOA RR is not re-signed for DNSSEC. Consequently DNSSEC verification fails when Let's Encrypt starts the renewal process and it aborts. Various 3rd party DNSSEC checking tools report this breakage (e.g. http://dnsviz.net/ ).

My situation is further exacerbated because the server's BIND instance acts as the master/SOA for 6 slave servers (in this case the Linode public DNS servers). TTLs and notifies then cause the entire zone to get out of sync for a while and when Virtualmin is re-running the renewal attempt every 65 minutes and changing the SOA serial number the slave servers never get chance to 'catch up' with the master.

I've been able to manually prove this after virtualmin has tampered with the zone file by updating the SOA using :

dnssec-verify -o example.com /etc/bind/example.com.hosts
Loading zone 'example.com' from file 'example.com.hosts'
Verifying the zone using the following algorithms: ECDSAP256SHA256.
No correct ECDSAP256SHA256 signature for example.com SOA
The zone is not fully signed for the following algorithms: ECDSAP256SHA256.
dnssec-verify: fatal: DNSSEC completeness test failed.


I can manually fix it with:

dnssec-signzone -o example.com /etc/bind/example.com.hosts
mv /etc/bind/example.com.hosts{.signed,}
rndc reload
rndc notify example.com


But if Virtualmin X509 certificate Let's Encrypt is set to auto-renew it kicks in again within 65 minutes and breaks it again.

I assume what is happening is the lets-encrypt scripts are re-adding the autoconfig and autodiscover hosts to the zone file and incrementing the SOA serial number... but why it then fails to correctly sign the SOA (the added hosts are signed correctly) I have no idea!

The server OS is Ubuntu 18.04.

Topic locked