EAI_AGAIN error (DNS resolution)


Around 10.15 UTC this morning I started getting errors resulting in failed requests. Had a few minutes of 500 errors. In the logs the error was EAI_AGAIN from a NodeJS fetch call. So the app was trying to make a https fetch, and failed.

Googling that error I found:

EAI_AGAIN appears to be a system error in the context of a failed DNS resolution. That rather points to whatever system resolver you have configured on your system

Which got me thinking: what DNS resolver is configured on the fly apps? How do they lookup DNS records? Presumably the host server handles it?

Were there any issues with that service in lhr today?

I haven’t changed any DNS entries or updated the app. It just stopped working. Then started again, presumably when the DNS could resolve again.

The /etc/resolv.conf inside instances is set to our private network DNS server. We need to go through it to resolve private DNS queries without further configuration. If the request is for something outside the private network then it is proxied to Google DNS servers.

As a matter of fact, there’s this NetActuate System Status from one of our service provider. I don’t think it should be causing EAI_AGAIN or anything more than increased latency though.

Thanks @jerome

Interesting there was a fault in LHR today so maybe that was linked.

I had a quick look about NodeJS DNS and found things like GitHub - LCMApps/dns-lookup-cache: An implementation to speed up the nodejs "dns.lookup" method by avoiding thread pool and using DNS TTL cache for particular hostname but not sure that would help with this particular issue. Ah well. If it happens again, I’ll follow up.

Probably wouldn’t have helped.

How long did it last? Approximately.

About five minutes. So not a disaster.

Just wondered if there was anything I could fix but sounds like it is out of my control.

Does your private network DNS server do happy eyeballs ([RFC6555])(rfc6555)?
This means the DNS recursive resolver sends out the query to authoritative for both the v4 and v6 address for the FQDN at the same time and uses whatever answer comes back first.

Maybe this is already clear, but just to be sure: I’m not asking if Fly’s DNS sends queries to Google DNS over v4 or v6, but if your DNS sends out two queries, one for A records and one for AAAA records.

The private DNS server is just a dumb forwarder to; it’s just going to relay whatever the stub resolver in your VM generates.