Certificate validation not working?

I set up a fly app many months ago with a custom domain and certificate. When I set it up, I used only A and AAAA records, no CNAME. When I set up the app, I did successfully issue the certificate.

For some reason, the certificate now doesn’t want to validate, despite the configuration still being correct.

It looks like we’re having problems connecting to your DNS provider for this hostname. We’re trying to figure out what’s up.

My DNS is also hosted on Fly. Not sure if that helps?

Ah yeah, that might be related. I’m not sure our workers are talking to UDP services hosted on Fly apps properly.

Your DNS server isn’t serving TCP DNS, if you set that up it might get you going while we figure out the UDP issue.

We went ahead and manually issued the certs for now while we get the UDP issue diagnosed.

Did you ever find a diagnosis for this issue? It looks like my certs are not automatically being reissued, and some will expire today.

FYI I did setup my DNS server to serve DNS over TCP (earlier today). Until now that has not seemed to have an effect on certificate validation.

Can you check if your server is returning SERVFAIL sometimes? Perhaps this is an error with our own resolver, but that’s what we’re getting querying your hostname recursively, starting at the root DNS servers.

# dig +trace <redacted>.ch AAAA

; <<>> DiG 9.16.33-Debian <<>> +trace <redacted>.ch AAAA
;; global options: +cmd
.                       34277   IN      NS      a.root-servers.net.
.                       34277   IN      NS      b.root-servers.net.
.                       34277   IN      NS      c.root-servers.net.
.                       34277   IN      NS      d.root-servers.net.
.                       34277   IN      NS      e.root-servers.net.
.                       34277   IN      NS      f.root-servers.net.
.                       34277   IN      NS      g.root-servers.net.
.                       34277   IN      NS      h.root-servers.net.
.                       34277   IN      NS      i.root-servers.net.
.                       34277   IN      NS      j.root-servers.net.
.                       34277   IN      NS      k.root-servers.net.
.                       34277   IN      NS      l.root-servers.net.
.                       34277   IN      NS      m.root-servers.net.
.                       34277   IN      RRSIG   NS 8 0 518400 20221117170000 20221104160000 18733 . 21lYxNI1lClONDUORE/X5gNslhAZ4ptVNgAZsLwIGTMCAfm4fwk6XGTf YIgSmHKAicLlaV3yCO4sj90QZ/8P9D39GBcucFGNEbZ/Uo6ebnuaC83I bxoDsXfpMdJ1fUXn1V6DdbzIWLfwWP0r6wCkaoSiW0NlI+CbxEXzMtBn Q+aJVvfy7QmzVGvWAphcUbSov6hhZaVIZj+QOuf6ZEB0NFE8XQDopvGO ap/FTGAXA7fC+OP36m73jA9KEsVPvcdFUIDvDWtMD0+H7F9kZpOe/5YM 5sa6iiru+GL17y0SR+ydLtLwgg+ZZr95pxqzl66RLUo/FmQcg2vk+9nq fMbUJg==
;; Received 525 bytes from fdaa::3#53(fdaa::3) in 4 ms

ch.                     172800  IN      NS      a.nic.ch.
ch.                     172800  IN      NS      b.nic.ch.
ch.                     172800  IN      NS      d.nic.ch.
ch.                     172800  IN      NS      e.nic.ch.
ch.                     172800  IN      NS      f.nic.ch.
ch.                     86400   IN      DS      11648 13 2 761408E4182706F4DAED906F81B5B1677FE1752C2B0794FF3F262FA1 EF760519
ch.                     86400   IN      RRSIG   DS 8 1 86400 20221117170000 20221104160000 18733 . GeRiQwUHybHABgfi0vZol/AO/EC3+pNiBpM4sV0CcinxuP5Fx3hghKxL FSpf2zRHgRCXRA7l7zIU3AjAYr4xwKmdsnb4xTo3K7BcMpRgvweCGD2x EDWLV4j0aYY1ei4pQUjqEMv4E/bl2O989f0KQdMMce6/k6TNbP98aIkY PeR1G4zwbEpWO1fov92CV7w0eAbvqeoiLHuqKQEZKgejEY8/DyZYBzFP DUdNGzlXiRo07o8skVTzEwsHhU6TjWpNgkc7C/m8pfm3q47BCD4tr94z Hla4BqeIlVCJJj1QrUgyAJgKl6pypumDqWTJ6jULZX3ez/pX+zjx/JSQ 172zSA==
;; Received 682 bytes from 199.7.83.42#53(l.root-servers.net) in 16 ms

<redacted>.ch.         3600    IN      NS      ns2.<redacted>.ch.
<redacted>.ch.         3600    IN      NS      ns1.<redacted>.ch.
ND3E0CGF5OGC08781SOJKFRIOOGBGC7E.ch. 900 IN NSEC3 1 1 0 - ND3FVIJ6P2HTHSRMNP8BLKRQO4274IF6 NS SOA RRSIG DNSKEY NSEC3PARAM
ND3E0CGF5OGC08781SOJKFRIOOGBGC7E.ch. 900 IN RRSIG NSEC3 13 2 900 20221129065017 20221030060220 56913 ch. OdV8fMi68wGsAC72L7tsngzfYigAPsOtGKYsb5GDwPZJSI06utKnvFLW ZYOj+njCTxd3c4Fs6W6+iuwKVXK1Xg==
6CN7V4JSRUFTG3SC0DRMUQEDGN0N9LKA.ch. 900 IN NSEC3 1 1 0 - 6CN90GAS8UUQJ87UGAQIGO6DBBT761SK NS DS RRSIG
6CN7V4JSRUFTG3SC0DRMUQEDGN0N9LKA.ch. 900 IN RRSIG NSEC3 13 2 900 20221124063400 20221025113001 56913 ch. Ck+KCiSGNk1wKK7frDfqhBIoJI3pWDJkFlu81jACivXxeOZyOXgBjCkJ gndKg1bHHKFL26eDdp6d8Vv92Q0VcQ==
couldn't get address for 'ns2.<redacted>.ch': failure
couldn't get address for 'ns1.<redacted>.ch': failure
dig: couldn't get address for 'ns2.<redacted>.ch': no more

Thank you, this was an important insight! I was able to reproduce this behaviour from inside of a fly instance, so I am seeing the same thing as you.

I suspect that your DNS servers are also in fly instances (or with networking which is similar enough to a fly instance), which is why your DNS servers are having the same issue as a fly instance.

I’m trying to diagnose the problem, but I don’t have much insight to your infrastructure, so I’m going to throw some stuff out and hope that we can help each other to diagnose this issue.

From what I can tell, the servfail is coming from your internal DNS server at fdaa::3. A tcpdump on port 53 (inside of a fly instance) shows:

15:50:31.548240 IP6 fdaa:0:2aad:a7b:a992:6c7f:ecd3:2.57148 > fdaa::3.53: 43605+ A? ns2.<redacted>.ch. (37)
15:50:31.548260 IP6 fdaa:0:2aad:a7b:a992:6c7f:ecd3:2.57148 > fdaa::3.53: 42580+ AAAA? ns2.<redacted>.ch. (37)
15:50:31.548380 IP6 fdaa::3.53 > fdaa:0:2aad:a7b:a992:6c7f:ecd3:2.57148: 42580 ServFail 0/0/0 (37)
15:50:31.548380 IP6 fdaa::3.53 > fdaa:0:2aad:a7b:a992:6c7f:ecd3:2.57148: 43605 ServFail 0/0/0 (37)

I wonder why your DNS server is not able to serve records for my domain, but it can for others (e.g. google.com). I suspect that there is some connectivity issue between your nameservers and mine.

As I mentioned before, my DNS server is hosted on fly, according to this, UDP won’t work over IPv6 on fly, so that could be a partial explanation.

What I’ve also been able to find out is that I also don’t have IPv4 UDP connectivity between two fly instances:

Pinging my dns IP from my local machine:

> ping -c 2 213.188.208.5
PING 213.188.208.5 (213.188.208.5): 56 data bytes
64 bytes from 213.188.208.5: icmp_seq=0 ttl=52 time=24.915 ms
64 bytes from 213.188.208.5: icmp_seq=1 ttl=52 time=21.725 ms

--- 213.188.208.5 ping statistics ---
2 packets transmitted, 2 packets received, 0.0% packet loss
round-trip min/avg/max/stddev = 21.725/23.320/24.915/1.595 ms

Pinging my dns IP from inside a fly instance

# in a fly instance
> ping -c 2 213.188.208.5
PING 213.188.208.5 (213.188.208.5) 56(84) bytes of data.

--- 213.188.208.5 ping statistics ---
2 packets transmitted, 0 received, 100% packet loss, time 1017ms

Similarly, an IPv4 DNS query from my machine works, but doesn’t work on fly:

From my machine:

> dig @213.188.208.5 <redacted>.ch NS

; <<>> DiG 9.10.6 <<>> @213.188.208.5 <redacted>.ch NS
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 42512
;; flags: qr aa rd; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 5
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;<redacted>.ch.		IN	NS

;; ANSWER SECTION:
<redacted>.ch.	86400	IN	NS	ns1.<redacted>.ch.
<redacted>.ch.	86400	IN	NS	ns2.<redacted>.ch.

;; ADDITIONAL SECTION:
ns1.<redacted>.ch.	86400	IN	A	213.188.208.5
ns1.<redacted>.ch.	86400	IN	AAAA	2a09:8280:1:9f8:b4a:f460:bfdb:21a4
ns2.<redacted>.ch.	86400	IN	A	213.188.209.57
ns2.<redacted>.ch.	86400	IN	AAAA	2a09:8280:1:a1f3:a957:bd27:d3a2:d9b5

;; Query time: 100 msec
;; SERVER: 213.188.208.5#53(213.188.208.5)
;; WHEN: Sat Nov 05 17:03:47 CET 2022
;; MSG SIZE  rcvd: 304

From fly instance:

dig @213.188.208.5 <redacted>.ch NS

; <<>> DiG 9.16.1-Ubuntu <<>> @213.188.208.5 <redacted>.ch NS
; (1 server found)
;; global options: +cmd
;; connection timed out; no servers could be reached

But, the query from the fly instance does work when using tcp:

dig +tcp @213.188.208.5 <redacted>.ch NS

; <<>> DiG 9.16.1-Ubuntu <<>> +tcp @213.188.208.5 <redacted>.ch NS
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 23850
;; flags: qr aa rd; QUERY: 1, ANSWER: 2, AUTHORITY: 0, ADDITIONAL: 5
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: 8af4501be56f92b3 (echoed)
;; QUESTION SECTION:
;<redacted>.ch.		IN	NS

;; ANSWER SECTION:
<redacted>.ch.	86400	IN	NS	ns1.<redacted>.ch.
<redacted>.ch.	86400	IN	NS	ns2.<redacted>.ch.

;; ADDITIONAL SECTION:
ns1.<redacted>.ch.	86400	IN	A	213.188.208.5
ns1.<redacted>.ch.	86400	IN	AAAA	2a09:8280:1:9f8:b4a:f460:bfdb:21a4
ns2.<redacted>.ch.	86400	IN	A	213.188.209.57
ns2.<redacted>.ch.	86400	IN	AAAA	2a09:8280:1:a1f3:a957:bd27:d3a2:d9b5

;; Query time: 171 msec
;; SERVER: 213.188.208.5#53(213.188.208.5)
;; WHEN: Sat Nov 05 16:01:19 UTC 2022
;; MSG SIZE  rcvd: 316

Based on that analysis, I have the following questions:

  • Why is there no IPv4 UDP connectivity between fly instances?
  • Do you have an idea how long it will be until you have IPv6 UDP?
  • Is it possible for your DNS server to fall back to DNS over TCP on IPv4? Maybe it already does this, but there’s some other misconfiguration on my side causing problems…

Good troubleshooting. The UDP issues between apps makes sense. Our DNS resolver runs inside one of our apps.

Best I can do on the weekend like this is to manually create the authorization, this will give you a good certificate for ~3 months, gives us some time to figure out the underlying issues.

Unfortunately, most DNS forwarding resolvers will use UDP even if the original request is TCP. That’s because authoritative servers usually only accept UDP.

I don’t pay Fly enough money to expect any support services on the weekend. Plus it’s my blog, which nobody (except me) cares about anyway :sweat_smile: I would appreciate it if you could follow up sometime during the week though.

I wanted to follow up on this issue. Am I using fly wrong, or is it expected that I don’t have UDP connectivity between fly instances?

There’s a network constraint breaking reachability between Fly apps over UDP.

We have a solution: we created an IP range that is routed differently. We could assign an IP from it to your app and everything should work again.

I’ll have more details for you today.

Amazing, that sounds great! Looking forward to hearing more.

Ok, I’ve assigned your DNS app one of these IPs. You can view it via fly ips list -a your-dns-app-name

You should delete all other entries (including ipv6, ipv6 over UDP is not supported anyway) so we only pick up the that newly-assigned ipv4 address.

Thank you! I’ve reconfigured my DNS servers, and everything’s working perfectly. :tada: