fly postgres create in FRA consistently failing

Trying to create a very simple Postgres cluster and it’s failing to come up.

$ fly postgres create 
? App name: invidious-db
? Select region: fra (Frankfurt, Germany)
? Select VM size: shared-cpu-1x - 256
? Volume size (GB): 10
Creating postgres cluster invidious-db in organization invidious
Postgres cluster invidious-db created
  Username:    postgres
  Password:    xxx
  Hostname:    invidious-db.internal
  Proxy Port:  5432
  PG Port: 5433
Save your credentials in a secure place, you won't be able to see them again!

Monitoring Deployment

2 desired, 2 placed, 0 healthy, 1 unhealthy [health checks: 6 total, 4 passing, 2 critical]
v0 failed - Failed due to unhealthy allocations
Failed Instances

==> Failure #1

Instance
  ID            = 6fb280e7                        
  Version       = 0                               
  Region        = fra                             
  Desired       = run                             
  Status        = running (leader)                
  Health Checks = 3 total, 2 passing, 1 critical  
  Restarts      = 0                               
  Created       = 4m59s ago                       

Recent Events
TIMESTAMP            TYPE       MESSAGE                 
2021-08-10T13:12:20Z Received   Task received by client 
2021-08-10T13:12:20Z Task Setup Building Task Directory 
2021-08-10T13:12:23Z Started    Task started by client  

Recent Logs
2021-08-10T13:16:56Z [info] keeper   | 2021-08-10T13:16:56.049Z	INFO	cmd/keeper.go:1505	our db requested role is master
2021-08-10T13:16:56Z [info] keeper   | 2021-08-10T13:16:56.051Z	INFO	cmd/keeper.go:1543	already master
2021-08-10T13:16:56Z [info] keeper   | 2021-08-10T13:16:56.068Z	INFO	cmd/keeper.go:1676	postgres parameters not changed
2021-08-10T13:16:56Z [info] keeper   | 2021-08-10T13:16:56.069Z	INFO	cmd/keeper.go:1703	postgres hba entries not changed
2021-08-10T13:16:57Z [info] proxy    | 2021-08-10T13:16:57.021Z	INFO	cmd/proxy.go:268	master address	{"address": "[fdaa:0:3119:a7b:66:0:3230:2]:5433"}
2021-08-10T13:16:57Z [info] proxy    | 2021-08-10T13:16:57.225Z	INFO	cmd/proxy.go:286	proxying to master address	{"address": "[fdaa:0:3119:a7b:66:0:3230:2]:5433"}
2021-08-10T13:17:01Z [info] keeper   | 2021-08-10T13:17:01.203Z	INFO	cmd/keeper.go:1505	our db requested role is master
2021-08-10T13:17:01Z [info] keeper   | 2021-08-10T13:17:01.205Z	INFO	cmd/keeper.go:1543	already master
2021-08-10T13:17:01Z [info] keeper   | 2021-08-10T13:17:01.222Z	INFO	cmd/keeper.go:1676	postgres parameters not changed
2021-08-10T13:17:01Z [info] keeper   | 2021-08-10T13:17:01.223Z	INFO	cmd/keeper.go:1703	postgres hba entries not changed
2021-08-10T13:17:02Z [info] proxy    | 2021-08-10T13:17:02.330Z	INFO	cmd/proxy.go:268	master address	{"address": "[fdaa:0:3119:a7b:66:0:3230:2]:5433"}
2021-08-10T13:17:02Z [info] proxy    | 2021-08-10T13:17:02.567Z	INFO	cmd/proxy.go:286	proxying to master address	{"address": "[fdaa:0:3119:a7b:66:0:3230:2]:5433"}
2021-08-10T13:17:06Z [info] keeper   | 2021-08-10T13:17:06.360Z	INFO	cmd/keeper.go:1505	our db requested role is master
2021-08-10T13:17:06Z [info] keeper   | 2021-08-10T13:17:06.362Z	INFO	cmd/keeper.go:1543	already master
2021-08-10T13:17:06Z [info] keeper   | 2021-08-10T13:17:06.382Z	INFO	cmd/keeper.go:1676	postgres parameters not changed
2021-08-10T13:17:06Z [info] keeper   | 2021-08-10T13:17:06.383Z	INFO	cmd/keeper.go:1703	postgres hba entries not changed
2021-08-10T13:17:07Z [info] proxy    | 2021-08-10T13:17:07.671Z	INFO	cmd/proxy.go:268	master address	{"address": "[fdaa:0:3119:a7b:66:0:3230:2]:5433"}
2021-08-10T13:17:07Z [info] proxy    | 2021-08-10T13:17:07.872Z	INFO	cmd/proxy.go:286	proxying to master address	{"address": "[fdaa:0:3119:a7b:66:0:3230:2]:5433"}
2021-08-10T13:17:11Z [info] keeper   | 2021-08-10T13:17:11.520Z	INFO	cmd/keeper.go:1505	our db requested role is master
2021-08-10T13:17:11Z [info] keeper   | 2021-08-10T13:17:11.523Z	INFO	cmd/keeper.go:1543	already master
2021-08-10T13:17:11Z [info] keeper   | 2021-08-10T13:17:11.546Z	INFO	cmd/keeper.go:1676	postgres parameters not changed
2021-08-10T13:17:11Z [info] keeper   | 2021-08-10T13:17:11.547Z	INFO	cmd/keeper.go:1703	postgres hba entries not changed
2021-08-10T13:17:12Z [info] proxy    | 2021-08-10T13:17:12.975Z	INFO	cmd/proxy.go:268	master address	{"address": "[fdaa:0:3119:a7b:66:0:3230:2]:5433"}
2021-08-10T13:17:13Z [info] proxy    | 2021-08-10T13:17:13.180Z	INFO	cmd/proxy.go:286	proxying to master address	{"address": "[fdaa:0:3119:a7b:66:0:3230:2]:5433"}
2021-08-10T13:17:16Z [info] keeper   | 2021-08-10T13:17:16.692Z	INFO	cmd/keeper.go:1505	our db requested role is master
2021-08-10T13:17:16Z [info] keeper   | 2021-08-10T13:17:16.695Z	INFO	cmd/keeper.go:1543	already master
2021-08-10T13:17:16Z [info] keeper   | 2021-08-10T13:17:16.713Z	INFO	cmd/keeper.go:1676	postgres parameters not changed
2021-08-10T13:17:16Z [info] keeper   | 2021-08-10T13:17:16.715Z	INFO	cmd/keeper.go:1703	postgres hba entries not changed
2021-08-10T13:17:18Z [info] proxy    | 2021-08-10T13:17:18.285Z	INFO	cmd/proxy.go:268	master address	{"address": "[fdaa:0:3119:a7b:66:0:3230:2]:5433"}
2021-08-10T13:17:18Z [info] proxy    | 2021-08-10T13:17:18.514Z	INFO	cmd/proxy.go:286	proxying to master address	{"address": "[fdaa:0:3119:a7b:66:0:3230:2]:5433"}

==> Failure #2

Instance
  ID            = 77ccf3f3                        
  Version       = 0                               
  Region        = fra                             
  Desired       = run                             
  Status        = running (replica)               
  Health Checks = 3 total, 2 passing, 1 critical  
  Restarts      = 0                               
  Created       = 4m5s ago                        

Recent Events
TIMESTAMP            TYPE       MESSAGE                 
2021-08-10T13:13:13Z Received   Task received by client 
2021-08-10T13:13:13Z Task Setup Building Task Directory 
2021-08-10T13:13:16Z Started    Task started by client  

Recent Logs
2021-08-10T13:16:57Z [info] keeper   | 2021-08-10T13:16:57.017Z	INFO	cmd/keeper.go:1557	our db requested role is standby	{"followedDB": "f403469d"}
2021-08-10T13:16:57Z [info] keeper   | 2021-08-10T13:16:57.019Z	INFO	cmd/keeper.go:1576	already standby
2021-08-10T13:16:57Z [info] keeper   | 2021-08-10T13:16:57.035Z	INFO	cmd/keeper.go:1676	postgres parameters not changed
2021-08-10T13:16:57Z [info] keeper   | 2021-08-10T13:16:57.037Z	INFO	cmd/keeper.go:1703	postgres hba entries not changed
2021-08-10T13:16:57Z [info] proxy    | 2021-08-10T13:16:57.085Z	INFO	cmd/proxy.go:268	master address	{"address": "[fdaa:0:3119:a7b:66:0:3230:2]:5433"}
2021-08-10T13:16:57Z [info] proxy    | 2021-08-10T13:16:57.288Z	INFO	cmd/proxy.go:286	proxying to master address	{"address": "[fdaa:0:3119:a7b:66:0:3230:2]:5433"}
2021-08-10T13:17:02Z [info] keeper   | 2021-08-10T13:17:02.173Z	INFO	cmd/keeper.go:1557	our db requested role is standby	{"followedDB": "f403469d"}
2021-08-10T13:17:02Z [info] keeper   | 2021-08-10T13:17:02.176Z	INFO	cmd/keeper.go:1576	already standby
2021-08-10T13:17:02Z [info] keeper   | 2021-08-10T13:17:02.192Z	INFO	cmd/keeper.go:1676	postgres parameters not changed
2021-08-10T13:17:02Z [info] keeper   | 2021-08-10T13:17:02.193Z	INFO	cmd/keeper.go:1703	postgres hba entries not changed
2021-08-10T13:17:02Z [info] proxy    | 2021-08-10T13:17:02.393Z	INFO	cmd/proxy.go:268	master address	{"address": "[fdaa:0:3119:a7b:66:0:3230:2]:5433"}
2021-08-10T13:17:02Z [info] proxy    | 2021-08-10T13:17:02.617Z	INFO	cmd/proxy.go:286	proxying to master address	{"address": "[fdaa:0:3119:a7b:66:0:3230:2]:5433"}
2021-08-10T13:17:07Z [info] keeper   | 2021-08-10T13:17:07.330Z	INFO	cmd/keeper.go:1557	our db requested role is standby	{"followedDB": "f403469d"}
2021-08-10T13:17:07Z [info] keeper   | 2021-08-10T13:17:07.331Z	INFO	cmd/keeper.go:1576	already standby
2021-08-10T13:17:07Z [info] keeper   | 2021-08-10T13:17:07.349Z	INFO	cmd/keeper.go:1676	postgres parameters not changed
2021-08-10T13:17:07Z [info] keeper   | 2021-08-10T13:17:07.350Z	INFO	cmd/keeper.go:1703	postgres hba entries not changed
2021-08-10T13:17:07Z [info] proxy    | 2021-08-10T13:17:07.722Z	INFO	cmd/proxy.go:268	master address	{"address": "[fdaa:0:3119:a7b:66:0:3230:2]:5433"}
2021-08-10T13:17:07Z [info] proxy    | 2021-08-10T13:17:07.927Z	INFO	cmd/proxy.go:286	proxying to master address	{"address": "[fdaa:0:3119:a7b:66:0:3230:2]:5433"}
2021-08-10T13:17:12Z [info] keeper   | 2021-08-10T13:17:12.558Z	INFO	cmd/keeper.go:1557	our db requested role is standby	{"followedDB": "f403469d"}
2021-08-10T13:17:12Z [info] keeper   | 2021-08-10T13:17:12.560Z	INFO	cmd/keeper.go:1576	already standby
2021-08-10T13:17:12Z [info] keeper   | 2021-08-10T13:17:12.577Z	INFO	cmd/keeper.go:1676	postgres parameters not changed
2021-08-10T13:17:12Z [info] keeper   | 2021-08-10T13:17:12.579Z	INFO	cmd/keeper.go:1703	postgres hba entries not changed
2021-08-10T13:17:13Z [info] proxy    | 2021-08-10T13:17:13.033Z	INFO	cmd/proxy.go:268	master address	{"address": "[fdaa:0:3119:a7b:66:0:3230:2]:5433"}
2021-08-10T13:17:13Z [info] proxy    | 2021-08-10T13:17:13.237Z	INFO	cmd/proxy.go:286	proxying to master address	{"address": "[fdaa:0:3119:a7b:66:0:3230:2]:5433"}
2021-08-10T13:17:17Z [info] keeper   | 2021-08-10T13:17:17.717Z	INFO	cmd/keeper.go:1557	our db requested role is standby	{"followedDB": "f403469d"}
2021-08-10T13:17:17Z [info] keeper   | 2021-08-10T13:17:17.719Z	INFO	cmd/keeper.go:1576	already standby
2021-08-10T13:17:17Z [info] keeper   | 2021-08-10T13:17:17.735Z	INFO	cmd/keeper.go:1676	postgres parameters not changed
2021-08-10T13:17:17Z [info] keeper   | 2021-08-10T13:17:17.736Z	INFO	cmd/keeper.go:1703	postgres hba entries not changed
2021-08-10T13:17:18Z [info] proxy    | 2021-08-10T13:17:18.343Z	INFO	cmd/proxy.go:268	master address	{"address": "[fdaa:0:3119:a7b:66:0:3230:2]:5433"}
2021-08-10T13:17:18Z [info] proxy    | 2021-08-10T13:17:18.583Z	INFO	cmd/proxy.go:286	proxying to master address	{"address": "[fdaa:0:3119:a7b:66:0:3230:2]:5433"}
***v0 failed - Failed due to unhealthy allocations and deploying as v1 

Troubleshooting guide at https://fly.io/docs/getting-started/troubleshooting/

Connect to postgres
Any app within the invidious organization can connect to postgres using the above credentials and the hostname "invidious-db.internal."
For example: postgres://postgres:xxx@invidious-db.internal:5432

See the postgres docs for more information on next steps, managing postgres, connecting from outside fly:  https://fly.io/docs/reference/postgres/

It looks like the issue is there not being enough RAM headroom available on the shared-1x instances?

It looks like that failed VM took >10s to start up. I’m checking to see if there was an actual problem or we just need to raise that timeout.

Postgres will eat up all the RAM you give it. 256MB isn’t much, but it’s fine for doing testing and development (and our clusters should work at that size).

FWI having the same issue here (first time trying Fly). Tried spinning up with a larger size node and same issue occurred.

Tried also creating the cluster with dedicated-cpu-1x 2GB instances, still failing.

$ fly checks list --app invidious-db
Health Checks for invidious-db
NAME STATUS   ALLOCATION REGION TYPE   LAST UPDATED OUTPUT                         
vm   passing  ed8f348b   fra    SCRIPT 3m9s ago     [✓] 9.14 GB (93.5%) free space 
                                                    on /data/ [✓] load averages:   
                                                    0.00 0.04 0.04 [✓] memory:     
                                                    0.0s waiting over the last 60s 
                                                    [✓] cpu: 0.9s waiting over the 
                                                    last 60s [✓] io: 0.0s waiting  
                                                    over the last 60s              
pg   critical ed8f348b   fra    SCRIPT 7m2s ago     [✗] leader check: lookup       
                                                    invidious-db.internal on       
                                                    [fdaa::3]:53: lame referral    
role passing  ed8f348b   fra    SCRIPT 7m2s ago     leader                         
vm   passing  77e9aecb   fra    SCRIPT 5m4s ago     [✓] 9.18 GB (93.8%) free space 
                                                    on /data/ [✓] load averages:   
                                                    0.00 0.01 0.00 [✓] memory:     
                                                    0.0s waiting over the last 60s 
                                                    [✓] cpu: 1.0s waiting over the 
                                                    last 60s [✓] io: 0.0s waiting  
                                                    over the last 60s              
role passing  77e9aecb   fra    SCRIPT 1h20m ago    replica                        
pg   critical 77e9aecb   fra    SCRIPT 1h21m ago    [✗] leader check: lookup       
                                                    invidious-db.internal on       
                                                    [fdaa::3]:53: lame referral  

It’s definitely not RAM/CPU related. We’ve narrowed the issue down to DNS propagation, it seems like it’s taking too long to spread the VM DNS entries around. Give us a few to figure out why and we’ll let you know when fra is good to go.

1 Like

Same issue on lhr btw

Well I’m sure enjoying this particular Tuesday. :wink: We’re checking all the EU regions now.

1 Like

Ok this is not a Postgres problem specifically, it’s a larger problem with our private DNS service. We’ve updated the status page here: Fly.io Status - Private DNS lookup failures for newly created VMs

This should be all cleared up now, will you give them another try? If you have a half-setup PG cluster you want us to clean up, post here and we’ll get it going. Otherwise you can just delete and re-create them.

2 Likes

I can confirm it’s all working nicely now. flyctl postgres create directly into a psql -h <db>.internal was a success. Magical product, thank you :sparkles:

1 Like