temporary failure in name resolution

Hi,

Some of our apps are seeing an error: temporary failure in name resolution

Is there a network issue in LHR (10am today and 16:41), trusselltrust org.

Thanks,
Matt

Are you seeing this error in app logs or somewhere else?

There are no network issues that we know of. “Temporary failure in name resolution” usually means the DNS lookup isn’t working properly.

Yes this was in our app logs - it was against a few different services at the same time, Twilio, AWS RDS and a geocoding api. Is there a local DNS cache?

We run a caching resolver, yeah. It looks like Akamai had a DNS issue at around this time, notably affecting Apple: https://www.seattletimes.com/business/apple-suffers-widespread-outage-hitting-music-maps-and-icloud/

Our DNS resolvers haven’t had an outage, but Akamai’s issues would definitely affect multiple services. Most large companies use them for some level of DNS resolution.

Also I’m not trying to be dismissive with the Apple link, I know it’s weird to see when you had problems with 3 entirely different companies. Akamai’s DNS service is ubiquitous, though, and something with that big of effect on Apple is very likely to be the same problem you’d see with Twilio/Amazon.

We’re seeing timeouts from fly to Google Geocoding APIs, intermittently, and they also seem DNS related again in LHR and the trussell trust org. @kurt

Same error Temporary failure in name resolution.

There’s definitely a problem with DNS lookup on the instances:

irb(main):001:0> URI.open("http://google.com")

Traceback (most recent call last):
        1: from (irb):1
Net::OpenTimeout (execution expired)
irb(main):002:0>
irb(main):003:0> URI.open("http://apple.com")
Traceback (most recent call last):
        2: from (irb):1
        1: from (irb):2:in `rescue in irb_binding'
Net::OpenTimeout (execution expired)
irb(main):004:0> URI.open("http://fly.io")
=> #<Tempfile:/tmp/open-uri20220329-878-1qpv3c2>
irb(main):005:0> URI.open("http://apple.com")
Traceback (most recent call last):
        1: from (irb):5
IRB::Abort (abort then interrupt!)
irb(main):006:0> URI.open("http://fly.io")
=> #<Tempfile:/tmp/open-uri20220329-878-175w2c1>

I’m spot checking your instances and see no issues yet, what instance ID did you run that IRB session in?

Any of them, really. This is from bb4ff58a for example

# ping google.com
PING google.com(lhr25s34-in-x0e.1e100.net (2a00:1450:4009:820::200e)) 56 data bytes

--- google.com ping statistics ---
6 packets transmitted, 0 received, 100% packet loss, time 131ms

Ah, that doesn’t look like a DNS issue since it’s getting an address. See if ping -4 works? It could be an IPv6 routing issue (these are more common than they should be).

Hm, yes, you’re correct. Do you have in mind what can we do to disable ipv6 lookup? Or prefer ipv4 routing over ipv6.

We’re going to see if we can get the routing fixed. You’d need to configure your app to use a different resolver to disable IPv6 lookups. I’m not entirely sure how to do that in Ruby but I think it’s possible.

We’ve changed IPv6 routing to improve this, so you might be good now.

1 Like

Hi, We’re seeing this issue again on the same app, Temporary failure in name resolution. This time when we connect to AWS OpenSearch.

We just restarted the DNS server, maybe that’s what happened here? Is it still happening?

1 Like

No that must be it, error rate is back to 0%