Hi everyone,
I’m running into a persistent issue with outbound DNS resolution from within a Fly.io container running a Node.js/Remix application.
The Problem:
My application needs to connect to external APIs (e.g., api.fal.ai, but the issue seems general). However, these connections are failing with Error: getaddrinfo ENOTFOUND api.fal.ai errors originating from within the Node.js fetch or relevant SDK calls.
Debugging Steps Taken:
-
SSH into the Instance: I used fly ssh console to access the running container.
-
nslookup Tests:
-
Running nslookup api.fal.ai (which uses the default Fly internal DNS resolver fdaa::3) fails to return an IP address. It shows “Non-authoritative answer:” but no actual resolution.
-
Running nslookup api.fal.ai 8.8.8.8 (explicitly querying Google’s public DNS) also fails to return an IP address from within the container.
- Basic Internal DNS Check: Running nslookup 8.8.8.8 (a reverse lookup) does successfully reach the internal Fly DNS server (fdaa::3), but expectedly doesn’t provide a useful hostname result. This suggests the container can talk to the internal resolver, but forward lookups for external domains are failing.
Context & Things Checked:
-
Local Environment: The application code and external API connections work perfectly fine when run from my local development machine.
-
fly.toml: My fly.toml is fairly standard, using a Node base image, exposing the correct internal port, standard health checks, and no unusual networking configurations. (Happy to share if relevant).
-
Fly Secrets: I’ve checked fly secrets list and confirmed there are no HTTP_PROXY, HTTPS_PROXY, or NO_PROXY environment variables set that could be interfering. I did remove an unused AWS_ENDPOINT_URL_S3 secret, but that didn’t resolve the issue.
-
App Status: The application itself starts correctly, passes its health checks, and serves basic requests. The failure occurs specifically when attempting outbound connections that require DNS resolution.
Question:
Given that both internal and external DNS lookups for api.fal.ai are failing from within the container, what could be causing this DNS resolution failure within the Fly.io network environment for this specific instance? Are there any other common configuration issues or diagnostic steps I should try?
Thanks in advance for any insights!