Around 10.15 UTC this morning I started getting errors resulting in failed requests. Had a few minutes of 500 errors. In the logs the error was EAI_AGAIN from a NodeJS fetch call. So the app was trying to make a https fetch, and failed.
Googling that error I found:
EAI_AGAIN appears to be a system error in the context of a failed DNS resolution. That rather points to whatever system resolver you have configured on your system
Which got me thinking: what DNS resolver is configured on the fly apps? How do they lookup DNS records? Presumably the host server handles it?
Were there any issues with that service in lhr today?
I haven’t changed any DNS entries or updated the app. It just stopped working. Then started again, presumably when the DNS could resolve again.
The /etc/resolv.conf inside instances is set to our private network DNS server. We need to go through it to resolve private DNS queries without further configuration. If the request is for something outside the private network then it is proxied to Google DNS servers.
As a matter of fact, there’s this NetActuate System Status from one of our service provider. I don’t think it should be causing EAI_AGAIN or anything more than increased latency though.
Does your private network DNS server do happy eyeballs ([RFC6555])(rfc6555)?
This means the DNS recursive resolver sends out the query to authoritative for both the v4 and v6 address for the FQDN at the same time and uses whatever answer comes back first.
Maybe this is already clear, but just to be sure: I’m not asking if Fly’s DNS sends queries to Google DNS over v4 or v6, but if your DNS sends out two queries, one for A records and one for AAAA records.