DNS Resolution Failure (ENOTFOUND) for External Domains Inside Fly Container

Hi everyone,

I’m running into a persistent issue with outbound DNS resolution from within a Fly.io container running a Node.js/Remix application.

The Problem:

My application needs to connect to external APIs (e.g., api.fal.ai, but the issue seems general). However, these connections are failing with Error: getaddrinfo ENOTFOUND api.fal.ai errors originating from within the Node.js fetch or relevant SDK calls.

Debugging Steps Taken:

  1. SSH into the Instance: I used fly ssh console to access the running container.

  2. nslookup Tests:

  • Running nslookup api.fal.ai (which uses the default Fly internal DNS resolver fdaa::3) fails to return an IP address. It shows “Non-authoritative answer:” but no actual resolution.

  • Running nslookup api.fal.ai 8.8.8.8 (explicitly querying Google’s public DNS) also fails to return an IP address from within the container.

  1. Basic Internal DNS Check: Running nslookup 8.8.8.8 (a reverse lookup) does successfully reach the internal Fly DNS server (fdaa::3), but expectedly doesn’t provide a useful hostname result. This suggests the container can talk to the internal resolver, but forward lookups for external domains are failing.

Context & Things Checked:

  • Local Environment: The application code and external API connections work perfectly fine when run from my local development machine.

  • fly.toml: My fly.toml is fairly standard, using a Node base image, exposing the correct internal port, standard health checks, and no unusual networking configurations. (Happy to share if relevant).

  • Fly Secrets: I’ve checked fly secrets list and confirmed there are no HTTP_PROXY, HTTPS_PROXY, or NO_PROXY environment variables set that could be interfering. I did remove an unused AWS_ENDPOINT_URL_S3 secret, but that didn’t resolve the issue.

  • App Status: The application itself starts correctly, passes its health checks, and serves basic requests. The failure occurs specifically when attempting outbound connections that require DNS resolution.

Question:

Given that both internal and external DNS lookups for api.fal.ai are failing from within the container, what could be causing this DNS resolution failure within the Fly.io network environment for this specific instance? Are there any other common configuration issues or diagnostic steps I should try?

Thanks in advance for any insights!

api.fal.ai doesn’t seem to have any records, the resolution also fails from my laptop:

~ $ dig api.fal.ai

; <<>> DiG 9.18.33-1~deb12u2-Debian <<>> api.fal.ai
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 64530
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 65494
;; QUESTION SECTION:
;api.fal.ai.			IN	A

;; AUTHORITY SECTION:
fal.ai.			1800	IN	SOA	cecelia.ns.cloudflare.com. dns.cloudflare.com. 2369168044 10000 2400 604800 1800

;; Query time: 51 msec
;; SERVER: 127.0.0.53#53(127.0.0.53) (UDP)
;; WHEN: Thu Apr 03 08:33:15 EDT 2025
;; MSG SIZE  rcvd: 104

are you sure it’s the correct domain? if so, I’d recommend checking with fal.ai to see why there’s no records on their domain.

I wonder if you created a local record in your /etc/hosts for this domain, and then forgot about it? It’s dead as a dodo for me too.

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.