Issues resolving and connecting to 'external' services.

I’m testing a simple Go (Gin) web app that ultimately connects to an externally-run database server and performs a couple of simple SQL queries, rendering the output as HTML tables.

Testing the app ‘locally’ (i.e. not on Fly) it works fine. I can and do also connect to the DB server remotely from other locations without issue.

Deploying to Fly however, I have two problems:

  1. DNS doesn’t seem to work. Lookups from the Fly app to the DB server didn’t seem to work. The name uses Gandi for DNS and works everywhere I can test it from, including testers such as ‘whatsmydns.net’ and of course querying 8.8.8.8 directly.
    I ended up hardcoding the IP address.

  2. With the IP the connection works a few times after deployment/restart/moving to another region.
    If I then re-try after, say, overnight - I get connection issues:
    write tcp 172.19.5.90:53500->[external IP]:9440: write: connection timed out

The DB server on the IP is fine and working. It’s a Clickhouse server, running over TLS on port 9440.

I understand one solution (for both issues) might be to setup wireguard on the DB server, but ‘fly wireguard create “My Name” dbservername’ gets ‘Error Could not resolve’ (on flyctl v0.0.325)

Edit: I may have resolved the second (connection) issue, but the DNS one remains.

Lookups from the Fly app to the DB server didn’t seem to work.

What do you see when you flyctl ssh -a <appname> and exec

# query goog for an ip4 record
nslookup -debug -type A <my.db.host.tld> 8.8.8.8

# query fly dns-stub resolver for an ip4 record
nslookup -debug -type A <my.db.host.tld> fdaa::3


If I then re-try after, say, overnight - I get connection issues:
write tcp 172.19.5.90:53500->[external IP]:9440: write: connection timed out

Without looking at the code, I can only speculate that the db connection pooler might be at fault. If you’re in the JDK land, c3po is rock solid. Note though, distributed-by-default databases like Aurora and PlanetScale may require special handling for pooled conns; generic connection poolers like c3po might not cut it. Better to connect to those databases via their HTTP front-ends, than over odbc / what-have-you.

I’m not aware of the keepalive semantics for (outgoing) conns originating in fly apps. May be fly engs can chime in to clarify that bit. What I do know is, incoming TCP conns are terminated by the fly load balancer, if idle for more than 60s: Increasing idle timeout.