Failed to connect to database cluster (non-existing domain)

I’m trying out Fly’s managed db but I can’t get it to connect with the Phoenix application. I tried using the clustered live counter (elixir-hiring-project/releases.exs at main · superfly/elixir-hiring-project · GitHub) but it doesn’t seem to have db-related configuration.

Error message:

[error] Postgrex.Protocol (#PID<0.379.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (app-postgres.internal:5432): non-existing domain - :nxdomain

I’m certain that the DATABASE_URL was set properly since I managed to log it. The database was created through flyctl postgres create -a app_name --postgres-app postgres_name. I also tried using the postgres user and password with the database name appended to the postgres URL. Nothing works. So this leads me to believe that it may be related to how the db cluster was orchestrated. But I’m not sure.

The app instance, and db cluster are also in the same organization. However, in different regions since I ran into an error where I’m only allowed to run one app in a region.

# config/runtime.exs
import Config

if config_env() == :prod do
  database_url =
    System.get_env("DATABASE_URL") ||
      raise """
      environment variable DATABASE_URL is missing.
      For example: ecto://USER:PASS@HOST/DATABASE
      """

  secret_key_base =
    System.get_env("SECRET_KEY_BASE") ||
      raise """
      environment variable SECRET_KEY_BASE is missing.
      You can generate one by calling: mix phx.gen.secret
      """

  config :app, App.Repo,
    ssl: true,
    url: database_url,
    pool_size: String.to_integer(System.get_env("POOL_SIZE") || "10")

  config :app, AppWeb.Endpoint,
    server: true,
    http: [
      port: String.to_integer(System.get_env("PORT") || "4000"),
      transport_options: [socket_opts: [:inet6]]
    ],
    secret_key_base: secret_key_base
end
#config/prod.exs
use Mix.Config

config :app, AppWeb.Endpoint,
  load_from_system_env: true,
  http: [port: {:system, "PORT"}],
  url: [host: "***.fly.dev", port: 443], # Intentionally omitted
  force_ssl: [rewrite_on: [:x_forwarded_proto]],
  cache_static_manifest: "priv/static/cache_manifest.json"

Any pointers would be great! I’ve been stuck with this for 4 hours. :frowning:

Hello! We’re working on a guide for this, you need to make your repo config something like this:

  config :app, App.Repo,
    ssl: false,
    socket_options: [:inet6],
    url: database_url,
    pool_size: String.to_integer(System.get_env("POOL_SIZE") || "10")

This disabled postgres ssl (which isn’t necessary on Fly, the network is encrypted) and enables IPv6. Will you see if that helps?

@kurt Thank you for the prompt response. I will give this a shot. As for the restriction of only having one app per region, is this intended or a bug?

Oh I missed that question! There’s no intentional restriction like that, are you getting an error when you create a new app or attempt to deploy one?

It’s alright I added that in an edit cause I thought it might’ve been a factor.

So I have one app instance in nrt (Japan). I’m trying to spin up a db in that region as well. I tried it just now and this is what I got:

> fly postgres create
# Fill out info for cluster; nrt region

Launching...⣟ Error Can't create volume, application is already using 1 of 1 zones in nrt

Choosing a different region works fine though.

Oh I see, that just means our provisioner thinks our other disk array is full in Tokyo. I’m pretty sure it’s not so that’s definitely a bug. Give us a bit to troubleshoot.

Found and fixed. Will you give it another try? Sorry about that.

@kurt Thanks for taking the time to do this! I’ve tried the changes you’ve mentioned previously and it works.

Confused with how I should do migrations though. I have a Procfile in rel/overlays/Procfile with the contents:

release: web: /app/bin/app eval "App.Release.migrate" && /app/bin/app start

I deployed but it doesn’t seem to be invoking it. I tried the SSH method to workaround this but I ran into this issue IPv4 application internal network - #12 by thomas. Couldn’t follow the last comment cause I’m unfamiliar with WireGuard unfortunately.

Edit: I messed up with release: web:. I’ll deploy again.

We have experimental support for migrations! We’ve been getting the plumbing in place to make Phoenix apps work really well on Fly for the last month. Try adding this to your fly.toml:

[deploy]
release_command = "/app/bin/app eval 'App.Release.migrate'"

We don’t do anything with the release process in a Procfile, but that config will run migrations on deploy.

1 Like

I am getting the same error:

20:11:27.392 [error] Postgrex.Protocol (#PID<0.136.0>) failed to connect: ** (DBConnection.ConnectionError) tcp connect (mkrandio-db.internal:5432): non-existing domain - :nxdomain

The database appears to be running:

mkrandio                    	personal	error
mkrandio-db                 	personal	running  	19h10m ago

This is with a vanilla app created with Elixir 1.12.1 and Phoenix 1.6.0, just following the instructions from the doc, fly launch, fill out the questions for creating the DB cluster, and fly deploy. I also turned off ssl as mentioned above but same error. Also IPV6 is enabled.

Any ideas?

3 Likes

I’m having the same issue on a newly created fly.io Phoenix app. I’m simply allowing the app to read the DATABASE_URL secret set by fly.io, and I’ve set ssl: false and socket_options: [:inet6] on the Ecto repo config.

Elixir 1.12.3
Phoenix 1.6

Is this common? Is there a known fix or steps to debug?

What error are you getting? If it looks like this, it means that :inet6 option probably isn’t taking:

non-existing domain - :nxdomain

This is Phoenix 1.6 but not Phoenix 1.6.3? You may need to set some environment variables for <1.6.3: Deploy an Elixir Phoenix Application (pre v1.6.3)

1 Like

Sorry, a more complete description here.

This is a freshly launched app that I have been developing with docker-compose with no problems. I haven’t modified the app’s fly.io configuration in any way except for adding some required env vars like HOSTNAME to fly.toml and required secrets like SECRET_KEY_BASE. I’m relying on the DATABASE_URL set by fly.io on fly db creation.

fly logs on the app shows:

ewr [info]01:32:19.864 [error] Postgrex.Protocol (#PID<0.1752.0>) failed to connect: ** (DBConne
ction.ConnectionError) tcp connect (top2.nearest.of.myapp-db.internal:5432): non-existing domain - :nxdomain

Package versions should be pretty standard and up-to-date:

elixir 1.12.3
phoenix 1.6.6
db_connection 2.4.0
ecto 3.7.1
ecto_sql 3.7.0
phoenix_ecto 4.4.0
postgrex 0.15.11

Phoenix repo config (prod.exs):

config :guilder, MyApp.Repo,
  ssl: false,
  socket_options: [:inet6],

Phoenix DB config (releases.exs):

{:ok, url} = case System.fetch_env("DATABASE_URL")

config :short_stuff, MyApp.Repo,
  url: db_url,
  pool_size: String.to_integer(System.get_env("POOL_SIZE") || "10")

(The Dockerfile does set ENV MIX_ENV=prod before RUN mix do compile, release. Setting ssl and socket_options in releases.exs does not change the result.)

One hypothesis I have is that either DATABASE_URL is not set in the build environment (the image is built on a remote fly.io environment, right, not locally?) or that MIX_ENV is set in the build environment but not the runtime environment.

I have attempted to connect to the DB with psql from the fly ssh console shell, but psql is not installed in that shell environment.

What are some other causes of this failure mode, or steps to debug?

Oh yes, your hypothesis is correct. There’s no DATABASE_URL in the build environment, but the error sure makes it look like you’re getting a DATABASE_URL at boot time.

You may need to set socket_options in the releases.exs.

You can run fly pg connect -a <db-name> to get in with psql.

1 Like

Yeah it looks like you’ll have to move the MyApp.Repo config to runtime.exs. Afaik, there’s no need to set MIX_ENV once it has been built, so it’s really only useful in the build environment but I could be misremembering.

1 Like

Looking for DATABASE_URL at build time in releases.exs seems like it was a problem, but fixing it does not resolve the issue.

Consuming DATABASE_URL in runtime.exs instead of releases.exs still results in Postgrex throwing :nxdomain errors on deployment. I can confirm through logging that the DATABASE_URL env var is being found in runtime.exs and set on my Ecto repo config along with socket_options and ssl settings.

I’ll continue to debug with fly pg and update if I find a root cause.

Can you confirm you’ve done this?

Specifically, -proto-dist inet6_tcp might be necessary to allow AAAA record lookups. That error means it can’t find the IPv6 addresses for the host. :nxdomain almost always means that ipv6 isn’t entirely enabled in the runtime + ecto.

1 Like

I have the same problem. Have you figured it out?

I have spent a day trying to get my app to run on fly.io, and it hasn’t been easy with all these small issues.

Running into same issue when trying to run self-hosted Plausible on fly.io. I have tried new Plausible release candidate v2.1.0-rc.0 that has support for setting the socket_options, and I also tried disabling ssl. Still not working.

Apparently there was a bug related to this, but does not seem to have been fixed. Any progress?