Deno app getting "Connection failed: Connection refused (os error 111)" when connecting to external Mongo

I wrote a simple Deno app to cache fly logs for our own debugging purposes. Looked at how flyctl does it and followed the same backoff strategy so hopefully it doesn’t affect your infra, but I’m facing a strange issue with running it on Fly.

When it’s running in Fly, the connection to our Mongo instance fails with:

2021-04-15T04:36:27.568Z <redacted instance id> sin [info]         throw new MongoError(`Connection failed: ${e.message || e}`);
2021-04-15T04:36:27.568Z <redacted instance id> sin [info] error: Uncaught (in promise) Error: MongoError: "Connection failed: Connection refused (os error 111)"
2021-04-15T04:36:27.569Z <redacted instance id> sin [info]               ^
2021-04-15T04:36:27.570Z <redacted instance id> sin [info]     at MongoClient.connect (https://deno.land/x/mongo@v0.22.0/src/client.ts:93:15)
2021-04-15T04:36:27.571Z <redacted instance id> sin [info]     at async file:///app/main.ts:68:1

What’s strange is, when I built the exact same code using the exact same Dockerfile locally and run it on Docker locally passing in the same environmental variables, it is able to connect to the same Mongo instance without issue.

I’ve also confirmed by printing out the env vars in-app (which is not great security wise I know) that the environment is being set properly and the app can see it, and a fetch call to the fly.io api shows that outbound HTTP calls at least are not having any issues. So it seems like it’s likely how the Mongo driver I’m using connects to Mongo. But since it works without a hitch on my local Docker, I’m not sure how to start debugging this :sweat_smile:

Would appreciate any help / any ideas of what might be going wrong.

Thanks!

Where is Mongo running? Is it possible it’s trying to connect over IPv6 but Mongo is only listening on IPv4?

This mongo instance is running on a DO instance.

I did not think of that at all, so it is possible yes. However, the DNS records are only for IPv4 though, so would the app still try to connect over IPv6?

Running drill <mongo host dns> returns only an A record with an IPv4 address.

Well that’s probably not it then! It would only use IPv6 if there was an AAAA record too.

The best way to debug this might be to boot your app, connect over ssh, and run some test scripts in your VM directly. That’s a super weird problem, though, it wouldn’t surprise me if it’s just a weird firewall setting on DigitalOcean.

Also the really cool thing to do would be to create a wireguard peer and have your DO droplet join your private network. :slight_smile:

Okay, will try that out. Also, good point on the wireguard peer :muscle: Will need to adjust the image a bit for that as I’m using hayd/distroless-deno as the docker base image which doesn’t have a shell in it for security concerns I think.

Actually considering moving the databases over to Fly once there’s a stable mongo replicaset implementation.

Will post back once I try that out!

Thanks!

So, after diving into a pretty deep rabbit hole, I’ve managed to solve the issue, but am still baffled by what is causing it.

To recap, exact same code and env vars, running on local in docker container works fine, but running in Fly, the module that connects to Mongo instance throws a Connection Refused (os error 111).

So, I followed your suggestion, SSH into the instance and then try to debug. First I had to comment out the part where the Mongo connection was error-ing, as that was causing the instance to go into a crash-loop. Then, once I got a stable ssh connection to the instance, first thing I tried was connecting using mongoshell, which worked perfectly.

Okay, next ran Deno as a user inside the instance using the REPL, and validated that when I try to connect to the same Mongo instance, it fails:

> const logDBClient = new MongoClient()
undefined
> await logDBClient.connect("mongodb://<username>:<password>@<Mongo Host DNS name>:<port>/")
Uncaught Error: MongoError: "Connection failed: Connection refused (os error 111)"
    at MongoClient.connect (https://deno.land/x/mongo@v0.22.0/src/client.ts:93:15)
    at async <anonymous>:2:1

Okay, next I tried using the IP of the mongo instance, which… worked!!! :exploding_head:

So, that narrowed it down to something in the DNS, but that’s weird because the mongoshell managed to successfully resolve, and I’d assume it’s using the same DNS as Deno within the instance.

Then I tried resolving the IP address using Deno.resolveDns(), which returned the right IP! So, Deno’s DNS resolving was working, unless the Deno.resolveDns() call in Deno is using a different subsystem than the Deno.connect() call which the deno_mongo library is using to connect to Mongo.

To do a sanity check, I then tried to Deno.connect() to the Mongo instance, which just opens a TCP connection to the instance on a port. And I found that when I put in the DNS host name of the instance, I got the error, but when I put in the IP of the instance, the connection worked!

> await Deno.connect({hostname: "<Mongo Host DNS name>", port: <port>})
Uncaught ConnectionRefused: Connection refused (os error 111)
    at unwrapOpResult (deno:core/core.js:100:13)
    at async Object.connect (deno:runtime/js/30_net.js:199:13)
    at async <anonymous>:2:1
> 
> await Deno.connect({hostname: "<Mongo Host IP>", port: <port>})
Conn {}

Okay, so here’s where it gets weird.

Quick aside:
We use Cloudflare for our DNS, and in Cloudflare the <Mongo Host DNS name> is actually a CNAME to another DNS name which is the machine name of that droplet in DO. We set this up this way as we use instances in DO and in AWS lightsail, and I wanted to delegate the DO DNS to DO, and the Lightsail DNS to lightsail. So in our infra, machines hosted on DO get a machine name that looks like <foo>.digitalocean.<infradomain> and machines hosted on lightsail get <bar>.lightsail.<infradomain>, where digitalocean.<infradomain> is delegated via NS to DO, and lightsail.<infradomain> is delegated via NS to Lightsail.

So, <Mongo Host DNS name> is CNAMEd to <mongo host machine>.digitalocean.<infradomain> in cloudflare, which makes it easy for us to replace the machine without needing to reconfigure all our Mongo URIs.

So, I created a testing record in cloudfare, which is not CNAMEd, but an A record directly to the IP, baz.<infradomain>, and test it in Deno, and it worked :exploding_head:

So, my solution in the meantime, is to replace the CNAME with the direct IP.

But it bugs me that haven’t been able to figure out why there’s this weird quirk in DNS resolution which only happens with Deno on Fly and targeting DO machines. I tested all variations of the above locally, as well as to our instances hosted in lightsail, and DNS resolution and connections worked perfectly there. So, I suspect something is weird about the way Deno resolves addresses hosted via a delegated subdomain to Digital Ocean when running on Fly.io infra, but I haven’t had time since to reproduce it / dig deeper

Would appreciate thoughts on what could be happening, as I’m damn curious as well.

That’s bonkers. VMs just use 8.8.8.8 to resolve addresses, I’m guessing your local docker was using a different nameserver? That’s the only difference I can think of. It’s especially weird that Deno.resolveDns worked.

Incidentally, if you control your Mongo servers might consider using WireGuard peers to connect to them from a Fly app. It’s a much better way to connect to DBs since peers are private and encrypted.