External DB nxdomain

Hello,

I changed db from fly’s postgresql for elixir app to external one. It works from my mac but looks like fly can not resolve hostname. Any suggestion how to handle it?

 Postgrex.Protocol (#PID<0.162.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (lin-8908-1914-pgsql-primary.servers.linodedb.net:5432): non-existing domain - :nxdomain

I tried same config what is in fly release from my mac and it works without any issue. It worked good with fly’s postgresql

Hi @Kriska,

Is this error happening during migrations? Or is this on application startup?

I’m not sure why it wouldn’t be able to resolve that domain. :thinking:

Both, migration and also app startup. It’s pretty weird. Any suggestions? If not we will have to leave fly :frowning:

Can you tell us which app this is? Either here or an email to support@.

cookienovo/cookienovo is org/app. Main motivation for DB service outside of fly.io was other application which has to be connected to same DB. But looks like it’s possible to make VPN with wireguard and other app outside of fly can use it to connect with db. If you suggest it’s good way to do it I will removed that Linode’s DB. But I suggest to keep it alive until you figure out where is the problem with resolving this issue.

Also I have problems with your DB for last couple of days. I wrote email on support too. DB can not start and our app see this problem:

	ERROR	cmd/keeper.go:719	cannot get configured pg parameters	{"error": "dial unix /tmp/.s.PGSQL.5433: connect: no such file or directory"}

and

Postgrex.Protocol (#PID<0.2161.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (top2.nearest.of.cookienovo-db.internal:5432): non-existing domain - :nxdomain

Hi @Kriska,

This sounds like when an app is not configured for IPv6 connections to the DB. Originally, I doubted that because the DB in question was outside of Fly.

With recently generated Phoenix apps, the configuration is there by default. If the app was generated with an older version of Phoenix then some config changes may be needed.

Fly’s internal networks are IPv6. Elixir/OTP needs some config help connecting there.

For instance, at the end of the Dockerfile, flyctl adds this config during fly launch.

# Appended by flyctl
ENV ECTO_IPV6 true
ENV ERL_AFLAGS "-proto_dist inet6_tcp"

Then in runtime.exs, some config similar to this should exist…

# ...
if config_env() == :prod do
  # ...

  maybe_ipv6 = if System.get_env("ECTO_IPV6"), do: [:inet6], else: []

  config :my_app, MyApp.Repo.Local,
    # ssl: true,
    url: database_url,
    pool_size: String.to_integer(System.get_env("POOL_SIZE") || "10"),
    socket_options: maybe_ipv6
#...

It’s the IPv6 config I’m referring to.

Also, you can try generating a new Phoenix app and deploying that Fly with a DB. Assuming that all works, you can destroy the app, but then you’ll have a working local example to compare with your project.

Hi,
It’s there. App worked with fly db with IPv6. I just change host for DB, it stopped working.

Then when I moved back to fly’s db it started work, again I just changed only host (password too).

But when I restart app at weekend it stopped working even with fly’s db and I figured out that fly’s db is crashing. Nothing helped, so I lost data and I had to create new DB and it works.

Also same issue was with builder machine at weekend, some timeouts and I could not build release.

So I’m 100% sure there are some problems with networking in Fly based on these issues.

There are no issues with our network.

Is your external DB listening on IPv4 or IPv6? Internal Fly DBs listen on IPv6, so we configure Ecto apps with socket_options: [:inet6].

This will not allow your app to connect to databases over IPv4, though. You will need to remove the socket options line in runtime.exs. You may have to remove the other settings @Mark referenced, too.

Hi, yes I did it. I removed ipv6 configs. And how I wrote when I gave back ipv6 configs it was not possible to connect even to your db. Which later crashed too which you can see from error from postgres logs. Same what I wrote you on support email at monday.

Your DB on Fly ran out of disk space. If you run fly checks list -a cookienovo-db it will tell you why it’s in an unhealthy state. You will need to expand the volume to make your DB accessible again.

Are you still trying to connect to that Linode DB? I’m very confused about what current problems you’re facing.

sorry, I didn’t know about that fly checks command. I expanded that disks and it helped. But still postgres is not very healthy, I got a lot of connections error via wireguard connected vps.

I stoped trying, when it didn’t work after 1-2 weeks. Our current setup is:

  1. db (scale 2) in fly
  2. web app in fly
  3. oban worker in linode connected via wireguard

when I check status I see one critical issue with leader

but when I check check-list I see this:

(sorry for not nice screenshots)

Not sure whate these mean:
HTTP GET http://172.19.27.242:5500/flycheck/vm: 500 Internal Server Error Output: [✓]
it’s checked as passed but it’s 500 Internal Server Error Output
and

checkDisk: 37.22 GB (75.7%!)(MISSING) free space on /data/ (206.43µs)[✗]

What that “MISSING” mean please? Replica is ok, only leader has these issues.

Is it because of CPU? I’m going to try to increase it and I will see

Also I forgot, when I did upscale with more cpu it doesn’t finish upgrading, replica is still on old version. I had to scale to 0 replicas and start them again, so I will have two same version of “replicas”. See screenshot, this is from now, maybe it will change for some time but this behaviour happened multiple times.