Issue with SSL Certificate for Postgres

My Steps:

  1. I created PostgreSQL on Fly.io using the following commands:

fly postgres create

  1. Then, I added the SSL certificate for my domain with:

fly certs add ai-yp.fly.dev -a ai-yp

  1. To check the status of the certificate, I used:

fly certs show ai-yp.fly.dev -a ai-yp

  1. The certificate hasn’t been issued yet, and the output suggests I need to validate ownership by adding an AAAA record:

The certificate for ai-yp.fly.dev has not been issued yet.

Hostname = ai-yp.fly.dev
DNS Provider = flydns
Certificate Authority = Let’s Encrypt
Issued =
Added to App = 14 hours ago
Source = fly

You are creating a certificate for ai-yp.fly.dev
We are using lets_encrypt for this certificate.

You can validate your ownership of ai-yp.fly.dev by:

1: Adding an AAAA record to your DNS service which reads:

AAAA @ 

My Goal:

I want to connect to my PostgreSQL database from my Mac using psql with SSL encryption, so that the connection is secure. Here’s the command I plan to use for connecting:

psql "sslmode=require host=ai-yp.fly.dev port=5432 dbname=postgres user

What I’ve Tried:

I tried removing and re-adding the certificate with:

fly certs remove ai-yp.fly.dev -a ai-yp
fly certs add ai-yp.fly.dev -a ai-yp

Question:
What can I do to resolve this issue and successfully connect to PostgreSQL with SSL from my Mac?

Hi… In general, the app-name.fly.dev domains already have their certificates, and you, the user, needn’t do anything extra in that regard.

It looks like the actual problem might be that you’ve allocated an IPv4 address to the ai-yp app but (perhaps) haven’t yet adjusted its services settings.


Aside: It’s easier and safer to connect using flyctl proxy’s ad hoc Wireguard tunnel…

https://fly.io/docs/postgres/connecting/connecting-with-flyctl/

Unfortunately, the documentation doesn’t highlight this.

Added certificates, flyctl, postgres

I configured and deployed the server settings, but I still can not connect. I think it is configured correctly.
My fly.toml:

app = 'ai-yp'
primary_region = 'ams'

[env]
  FLY_SCALE_TO_ZERO = '2h'
  PRIMARY_REGION = 'ams'

[[mounts]]
  source = 'pg_data'
  destination = '/data'

[[services]]
  protocol = 'tcp'
  internal_port = 5432
  auto_start_machines = true

  [[services.ports]]
    port = 5432
    handlers = ['pg_tls']

  [services.concurrency]
    type = 'connections'
    hard_limit = 1000
    soft_limit = 1000

[[services]]
  protocol = 'tcp'
  internal_port = 5433
  auto_start_machines = true

  [[services.ports]]
    port = 5433
    handlers = ['pg_tls']

  [services.concurrency]
    type = 'connections'
    hard_limit = 1000
    soft_limit = 1000

[checks]
  [checks.pg]
    port = 5500
    type = 'http'
    interval = '15s'
    timeout = '10s'
    path = '/flycheck/pg'

  [checks.role]
    port = 5500
    type = 'http'
    interval = '15s'
    timeout = '10s'
    path = '/flycheck/role'

  [checks.vm]
    port = 5500
    type = 'http'
    interval = '15s'
    timeout = '10s'
    path = '/flycheck/vm'

[[metrics]]
  port = 9187
  path = '/metrics'

Speaking of flyctl proxy, I cannot use it because I need to share access with my colleagues, who prefer to use traditional methods.

Hm… I can see that you have an IPv4 address for that app—but can’t tell from here whether it’s dedicated or shared.

(It looks like pg_tls still does require a dedicated address.)


It might also be worth turning off FLY_SCALE_TO_ZERO temporarily, just to remove that from the list of possible complications, :thought_balloon:

It is dedicated

➜  fly fly ips list --app ai-yp
VERSION	IP                	TYPE                     	REGION	CREATED AT        
v4     	149.248.220.49    	public (dedicated, $2/mo)	global	Oct 14 2024 13:12	
v6     	fdaa:a:7ffa:0:1::2	private                  	global	Oct 14 2024 12:55

Deleting FLY_SCALE_TO_ZERO also did not help for me.

Could I ask you to attempt to connect to my server with SSL? Maybe there is a problem with my network…"

Certainly!

$ psql 'postgres://fly_forum_reader:dummy_password@ai-yp.fly.dev:5432/postgres?sslmode=require'
psql: error: SSL SYSCALL error: EOF detected

These EOFs during early SSL are indicative of glitches at the edge proxy that Fly maintains.

If the machine is definitely running and you’re not seeing any problems reported in the logs (fly logs -a ai-yp), then I would try temporarily creating a completely new database app. Sometimes the Fly.io infrastructure gets stale metadata, and it’s difficult for us ordinary users to diagnose when that’s happened…

Sorry you’re having so much trouble with this!

I’ve tried to create a new database with another location, same problem…

I have some information on my dashboard, can it help to determinate the problem?


You can probably delete these certificates. At best, they’re doing nothing, and, at worst, they’re confusing the edge proxy…

Your other database app didn’t have custom certificates such as these listed, right? I’m really surprised that attempt failed…


The ams and arn regions have been having sporadic problems lately, now that I think about it. Where did you try the newest one?

Also, are you connecting through ams? You can see this in ~/.fly/config.yml, under wireguard_state. (Don’t post things from in there, incidentally, :dragon:, since it has encryption details.)

Deleted the certificate, but nothing has changed. My temporary server was in the ARN region; today I tried WAV and DEN, but the same problem persists.

Sorry that didn’t pan out… Several other users have been reporting difficulty connecting to their Postgres databases suddenly, and the official status page is reporting “degraded performance” for at least one aspect of the .fly.dev domains, so this may be a widespread, transient metadata glitch:

https://community.fly.io/t/i-just-switched-the-region-for-my-database-and-my-app-and-now-my-app-cant-connect-to-the-db/22303

https://community.fly.io/t/postgres-instance-refusing-connection-to-fly-app-flask/22297

https://community.fly.io/t/unable-to-reach-postgres-instance/22295


Edit: Three new ones that might also be broadly related, at least as far as plausibly being metadata discrepancies…

https://community.fly.io/t/machine-starts-and-stops-right-after/22307

https://community.fly.io/t/static-egress-ips-for-machines/22004/20

https://community.fly.io/t/machine-not-starting-automatically-when-receiving-requests/22308


Edit2: There’s a more specific, official status incident now…

https://status.flyio.net/incidents/ftd07gnytjl4

We are investigating increased proxy errors for apps communicating over Flycast internal networking

That’s a little different from the feature that you were trying, but it does involve closely related code, as I understand it. It’d probably be worth making another attempt once they’ve marked it fully over…

Added proxy

Looks like all problems have been resolved, but my SSL SYSCALL error is still with me. Could I ask you to deploy server like that to test it?

Hm… I tried an abbreviated version with a throwaway database and IPv6—and unexpectedly also saw errors:

$ fly pg create --name minoan-saffron --initial-cluster-size 1 \
  --region ams --volume-size 1 --vm-size shared-cpu-1x
$ fly ips allocate-v6 -a minoan-saffron
$ fly ssh console -a different-app
# psql 'postgres://postgres:<right-password>@minoan-saffron.fly.dev:5432/?sslmode=require'
psql: error: SSL SYSCALL error: EOF detected
# psql 'postgres://postgres:<intentionally-wrong-password>@minoan-saffron.fly.dev:5432/?sslmode=disable'
psql: error: FATAL:  password authentication failed for user "postgres"
# # ...and the pg machine's logs *do* show 'password
# #  authentication failed for user "postgres"'.
# exit
$ fly config show -a minoan-saffron
.
.
.
  "services": [
    {
      "protocol": "tcp",
      "internal_port": 5432,
      "auto_start_machines": false,
      "ports": [
        {
          "port": 5432,
          "handlers": [
            "pg_tls"
          ]
        }
      ],
.
.
.
$ fly services list -a minoan-saffron
Services
PROTOCOL PORTS        HANDLERS FORCE HTTPS PROCESS GROUP REGIONS MACHINES 
TCP      5432 => 5432 [PG_TLS] False                     ams     1       
TCP      5433 => 5433 [PG_TLS] False                     ams     1 

This admittedly isn’t the verbatim procedure given in the docs, :dolphin:, but the PG app does already have the pg_tls services, at the right ports.

Possibly this is the Fly edge proxy not liking the “hairpin” aspect… I don’t recall that being mentioned with anything other than UDP before, though…

Hi @mayailurus :wave:,

It’s not the “hairpinning” aspect that the proxy doesn’t like, but rather the fact that some older versions of psql do not send SNI to the TLS server, which is required by the proxy. AFAIK SNI support should be enabled by default for psql clients version 14 and later, but if not, you may try this configuration option on the client side.

4 Likes

Thanks for chiming in! That’s an excellent find… I was stubbornly trying to connect from Debian Bullseye, which is only PG v13.

Using Bookworm instead did indeed fix it:

# psql 'postgres://postgres:<right-password>@minoan-saffron.fly.dev:5432/?sslmode=require'
psql (15.8 (Debian 15.8-0+deb12u1), server 16.4 (Ubuntu 16.4-1.pgdg24.04+1))
WARNING: psql major version 15, server major version 16.
         Some psql features might not work.
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off)
Type "help" for help.

postgres=# \conninfo
You are connected to database "postgres" as user "postgres"
on host "minoan-saffron.fly.dev" (address "2a09:8280:1::12:abcd:0") at port "5432".
SSL connection (protocol: TLSv1.3, cipher: TLS_AES_256_GCM_SHA384, compression: off)

It doesn’t look like that sslsni knob is available on v13, but this definitely does solve the main mystery, :black_cat:

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.