So, after diving into a pretty deep rabbit hole, I’ve managed to solve the issue, but am still baffled by what is causing it.
To recap, exact same code and env vars, running on local in docker container works fine, but running in Fly, the module that connects to Mongo instance throws a Connection Refused (os error 111).
So, I followed your suggestion, SSH into the instance and then try to debug. First I had to comment out the part where the Mongo connection was error-ing, as that was causing the instance to go into a crash-loop. Then, once I got a stable ssh connection to the instance, first thing I tried was connecting using mongoshell, which worked perfectly.
Okay, next ran Deno as a user inside the instance using the REPL, and validated that when I try to connect to the same Mongo instance, it fails:
> const logDBClient = new MongoClient()
undefined
> await logDBClient.connect("mongodb://<username>:<password>@<Mongo Host DNS name>:<port>/")
Uncaught Error: MongoError: "Connection failed: Connection refused (os error 111)"
at MongoClient.connect (https://deno.land/x/mongo@v0.22.0/src/client.ts:93:15)
at async <anonymous>:2:1
Okay, next I tried using the IP of the mongo instance, which… worked!!!
So, that narrowed it down to something in the DNS, but that’s weird because the mongoshell managed to successfully resolve, and I’d assume it’s using the same DNS as Deno within the instance.
Then I tried resolving the IP address using Deno.resolveDns()
, which returned the right IP! So, Deno’s DNS resolving was working, unless the Deno.resolveDns()
call in Deno is using a different subsystem than the Deno.connect()
call which the deno_mongo library is using to connect to Mongo.
To do a sanity check, I then tried to Deno.connect()
to the Mongo instance, which just opens a TCP connection to the instance on a port. And I found that when I put in the DNS host name of the instance, I got the error, but when I put in the IP of the instance, the connection worked!
> await Deno.connect({hostname: "<Mongo Host DNS name>", port: <port>})
Uncaught ConnectionRefused: Connection refused (os error 111)
at unwrapOpResult (deno:core/core.js:100:13)
at async Object.connect (deno:runtime/js/30_net.js:199:13)
at async <anonymous>:2:1
>
> await Deno.connect({hostname: "<Mongo Host IP>", port: <port>})
Conn {}
Okay, so here’s where it gets weird.
Quick aside:
We use Cloudflare for our DNS, and in Cloudflare the <Mongo Host DNS name>
is actually a CNAME to another DNS name which is the machine name of that droplet in DO. We set this up this way as we use instances in DO and in AWS lightsail, and I wanted to delegate the DO DNS to DO, and the Lightsail DNS to lightsail. So in our infra, machines hosted on DO get a machine name that looks like <foo>.digitalocean.<infradomain>
and machines hosted on lightsail get <bar>.lightsail.<infradomain>
, where digitalocean.<infradomain>
is delegated via NS to DO, and lightsail.<infradomain>
is delegated via NS to Lightsail.
So, <Mongo Host DNS name>
is CNAMEd to <mongo host machine>.digitalocean.<infradomain>
in cloudflare, which makes it easy for us to replace the machine without needing to reconfigure all our Mongo URIs.
So, I created a testing record in cloudfare, which is not CNAMEd, but an A record directly to the IP, baz.<infradomain>
, and test it in Deno, and it worked
So, my solution in the meantime, is to replace the CNAME with the direct IP.
But it bugs me that haven’t been able to figure out why there’s this weird quirk in DNS resolution which only happens with Deno on Fly and targeting DO machines. I tested all variations of the above locally, as well as to our instances hosted in lightsail, and DNS resolution and connections worked perfectly there. So, I suspect something is weird about the way Deno resolves addresses hosted via a delegated subdomain to Digital Ocean when running on Fly.io infra, but I haven’t had time since to reproduce it / dig deeper
Would appreciate thoughts on what could be happening, as I’m damn curious as well.