App suddenly can't connect to postgres DB?

Markicak · August 16, 2025, 11:09am

Hello Everyone.
I’ve had an app and a pg database running for about two years now, completely fine, no issues.
Today i noticed that my app can’t connect to the database anymore (it’s a spring boot app and connects via jdbc driver to the pg database) and is thus stuck in a boot loop.

However, i noticed that on my pg instance a new volume has been created and the old one is pending deletion?? It sounds like some kind of automatic migration happened which now keeps my app from being able to access the database.

If i proxy the database to my local pc i can connect to it just fine and can read the data.
When i ssh into the app which connects to the db, it resolves it’s ipv6 via internal fly.io domain just fine as well.

root@6e82926b022e98:/# getent hosts prognosticari-db.internal
fdaa:0:d5ef:a7b:caca:bf5f:adb7:2 prognosticari-db.internal

But the app still throws connection timeouts:

Caused by: java.net.SocketTimeoutException: connect timed out

Did anyone have anything similar happening or knows how I might fix this?

Thanks in advance

mayailurus · August 16, 2025, 1:13pm

Hi… You should be able to detect a migration via fly m status. Those do happen automatically, sometimes.

It might help to post your JDBC connection string, since people occasionally resort to desperate measures with those, .

(Be sure to * out the password, of course, but show the full structure—particularly if you’re using an IPv6 numeric literal (fdaa::*).)

Markicak · August 16, 2025, 4:22pm

Here’s the status of my machine which runs the db:

Machine status

Machine ID: 3d8d3e1ce02368
Instance ID: 01K2QDQTXQNC0M53TPDD99R8VY
State: started
HostStatus: ok

VM
ID = 3d8d3e1ce02368
Instance ID = 01K2QDQTXQNC0M53TPDD99R8VY
State = started
Image = flyio/postgres:14.4 (v0.0.32)
Name = floral-sunset-7365
Private IP = fdaa:0:d5ef:a7b:caca:bf5f:adb7:2
Region = fra
Process Group =
CPU Kind = shared
vCPUs = 1
Memory = 256
Created = 2025-08-15T17:51:45Z
Updated = 2025-08-16T11:40:32Z
Entrypoint =
Command =
Volume = vol_vx2w1yx5o28oxejr

Checks [3/3]
NAME STATUS LAST UPDATED OUTPUT
pg passing 4h27m ago [✓] transactions: read/write (155.23µs)
[✓] connections: 11 used, 3 reserved, 300 max (2.84ms)
vm passing 4h26m ago [✓] checkDisk: 814.97 MB (82.7%) free space on /data/ (49.99µs)
[✓] checkLoad: load averages: 0.02 0.07 0.17 (44.81µs)
[✓] memory: system spent 174ms of the last 60s waiting on memory (37.83µs)
[✓] cpu: system spent 816ms of the last 60s waiting on cpu (22.17µs)
[✓] io: system spent 2.12s of the last 60s waiting on io (18.55µs)
role passing 4h27m ago leader

Event Logs
STATE EVENT SOURCE TIMESTAMP INFO
started start flyd 2025-08-16T13:40:32.72+02:00
starting restart flyd 2025-08-16T13:40:31.858+02:00
stopped exit flyd 2025-08-16T13:40:31.749+02:00 exit_code=0,oom_killed=false,requested_stop=true
stopping restart user 2025-08-16T13:40:30.814+02:00
started start flyd 2025-08-16T08:58:12.006+02:00

However i don’t see anything regarding a migration - although, looking at the created/updated data:
Created = 2025-08-15T17:51:45Z
Updated = 2025-08-16T11:40:32Z

it seems odd, as i’ve had this same instance up for over 2 years, but it sounds like the machine was recreated yesterday by itself.

Looking at the logs of the machine from yesterday:

Machine logs

2025-08-15 19:51:58.568
cluster spec filename /fly/cluster-spec.json

2025-08-15 19:51:58.530
Machine created and started in 13.139s
2025-08-15 19:51:58.466
INFO [fly api proxy] listening at /.fly/api
2025-08-15 19:51:58.465
INFO Preparing to run: docker-entrypoint.sh start as root
2025-08-15 19:51:58.459
INFO Resized /data to 1069547520 bytes
2025-08-15 19:51:58.456
INFO Mounting /dev/vdc at /data w/ uid: 0, gid: 0 and chmod 0755
2025-08-15 19:51:58.456
/dev/vdc: clean, 1356/65280 files, 35334/261120 blocks
2025-08-15 19:51:58.454
INFO Checking filesystem on /data
2025-08-15 19:51:58.388
INFO Starting init (commit: 6c3309ba)…
2025-08-15 19:51:57.718
2025-08-15T17:51:57.718622637 [01K2QDQTXQNC0M53TPDD99R8VY:main] Running Firecracker v1.7.0
2025-08-15 19:51:57.650
Configuring firecracker
2025-08-15 19:51:57.026
Successfully prepared image registry-1.docker.io/flyio/postgres@sha256:9daaa15119742e5777f5480ef476024e8827016718b5b020ef33a5fb084b60e8 (11.625957647s)
2025-08-15 19:51:45.400
Pulling container image registry-1.docker.io/flyio/postgres@sha256:9daaa15119742e5777f5480ef476024e8827016718b5b020ef33a5fb084b60e8
2025-08-15 19:51:27.489
[27519659.762939] reboot: Restarting system
2025-08-15 19:51:26.479
Umounting /dev/vdb from /data
2025-08-15 19:51:26.479
Starting clean up.
2025-08-15 19:51:25.913
keeper | Process exited 0
2025-08-15 19:51:25.906
keeper | server stopped
2025-08-15 19:51:25.906
keeper | done
2025-08-15 19:51:25.852
keeper | 2025-08-15 17:51:25.852 UTC [596] LOG: database system is shut down
2025-08-15 19:51:25.845
proxy | exit status 130
2025-08-15 19:51:25.838
sentinel | Process exited 0
2025-08-15 19:51:25.828
keeper | 2025-08-15 17:51:25.828 UTC [599] LOG: shutting down
2025-08-15 19:51:25.828
keeper | 2025-08-15 17:51:25.827 UTC [596] LOG: background worker “logical replication launcher” (PID 605) exited with exit code 1
2025-08-15 19:51:25.824
keeper | waiting for server to shut down…2025-08-15 17:51:25.822 UTC [596] LOG: aborting any active transactions
2025-08-15 19:51:25.822
proxy | [WARNING] 226/175125 (538) : All workers exited. Exiting… (130)
2025-08-15 19:51:25.822
proxy | [ALERT] 226/175125 (538) : Current worker #1 (569) exited with code 130 (Interrupt)
2025-08-15 19:51:25.816
proxy | [NOTICE] 226/175125 (538) : path to executable is /usr/sbin/haproxy
2025-08-15 19:51:25.816
proxy | [NOTICE] 226/175125 (538) : haproxy version is 2.2.9-2+deb11u3
2025-08-15 19:51:25.816
exporter | signal: interrupt
2025-08-15 19:51:25.805
keeper | 2025-08-15 17:51:25.788 UTC [596] LOG: received fast shutdown request
2025-08-15 19:51:25.805
proxy | Stopping interrupt…
2025-08-15 19:51:25.805
sentinel | Stopping interrupt…
2025-08-15 19:51:25.805
keeper | Stopping interrupt…
2025-08-15 19:51:25.805
exporter | Stopping interrupt…
2025-08-15 19:51:25.805
supervisor stopping
2025-08-15 19:51:25.805
Got interrupt, stopping
2025-08-15 19:51:25.778
Sending signal SIGINT to main child process w/ PID 521

Where it seems to have pulled some kind of image - i would expect that it usually has the image already ready and doesn’t have to pull it on reboot - so if it pulled an image i suspect a migration happened (and also the creation of the new volume). So since it rebooted and pulled a new image it sounds like a migration.

My connection string is:

postgres://postgres:<password>@prognosticari-db.internal:5432/postgres

Could it somehow be that since the machine got recreated, it got assigned to a different space then where my app is, making it invisible/inaccessible to it? I’m just throwing wild guesses as I have no ideas what else could’ve happened or how to fix it.

mayailurus · August 16, 2025, 7:38pm

Hm… It looks like it might be truncating at 5 events, unfortunately. Do you see more in the dashboard?

(I see around four days’ worth in the dashboard for a Machine that only has 5 entries in fly m status.)

That wouldn’t be my first guess, but it’s not a bad thing to check, either. Try fly m list -a java-app-name and verify that their addresses all begin with fdaa:0:d5ef: (like the database Machine).

There are two different ways of approaching this overall, I think…

You’re running one of the Stolon-based Fly Postgres images, which is “doubly deprecated” as another user put in the forum recently. This might be a good time to evaluate whether Managed Postgres would be a better choice now, .

(It isn’t automatically.)

Sticking with single-Machine Legacy Postgres will mean more of these data-recovery/app-rescue emergencies—and the next one may lose the volume entirely.

If you do want to continue debugging this, then the next step would be to install psql on one of the Java Machines and see if that can get through. Doing so would remove JDBC as wildcard.

Basically keep whittling down the unknowns bit by bit, using both Fly.io’s own diagnostics as well as the classic Linux ones via SSH.

Hope this helps a little!

system · August 23, 2025, 7:39pm

This topic was automatically closed 7 days after the last reply. New replies are no longer allowed.