Config env vars seem to be missing during deploys (previously, suspected DNS issue)

Hello!

I’m using fly.io for a hackathon project (Elixir + PG, nothing special).

I’m using the Github Action for deployment (as specified in the docs) w/ the “superfly/flyctl-actions” action.

Problem:

When I do a deploy, I’m unable to access the app from my browser or curl.

Symptoms:

Curl hangs and then dies:

rertel@chris-laptop-getthru:~/work/getthru/deadvox$ curl -v https://dead-vox.fly.dev
*   Trying 188.93.145.147:443...
* TCP_NODELAY set
* Connected to dead-vox.fly.dev (188.93.145.147) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
*   CAfile: /etc/ssl/certs/ca-certificates.crt
  CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use h2
* Server certificate:
*  subject: CN=*.fly.dev
*  start date: Mar 28 23:29:07 2022 GMT
*  expire date: Jun 26 23:29:06 2022 GMT
*  subjectAltName: host "dead-vox.fly.dev" matched cert's "*.fly.dev"
*  issuer: C=US; O=Let's Encrypt; CN=R3
*  SSL certificate verify ok.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x55625442ce30)
> GET / HTTP/2
> Host: dead-vox.fly.dev
> user-agent: curl/7.68.0
> accept: */*
> 
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* Connection state changed (MAX_CONCURRENT_STREAMS == 4294967295)!
* Empty reply from server
* Connection #0 to host dead-vox.fly.dev left intact
curl: (52) Empty reply from server

Ping works (and the reply matches our IPv4 allocation):

crertel@chris-laptop-getthru:~/work/getthru/deadvox$ ping dead-vox.fly.dev
PING dead-vox.fly.dev (188.93.145.147) 56(84) bytes of data.
64 bytes from 188.93.145.147 (188.93.145.147): icmp_seq=1 ttl=51 time=7.76 ms
64 bytes from 188.93.145.147 (188.93.145.147): icmp_seq=2 ttl=51 time=7.90 ms
64 bytes from 188.93.145.147 (188.93.145.147): icmp_seq=3 ttl=51 time=7.85 ms
^C
--- dead-vox.fly.dev ping statistics ---
3 packets transmitted, 3 received, 0% packet loss, time 2003ms
rtt min/avg/max/mdev = 7.755/7.834/7.896/0.058 ms

The server is listening on its local app:

root@97d86bd3:/# netstat -plant
Active Internet connections (servers and established)
Proto Recv-Q Send-Q Local Address           Foreign Address         State       PID/Program name    
tcp        0      0 0.0.0.0:4369            0.0.0.0:*               LISTEN      554/epmd            
tcp6       0      0 :::4000                 :::*                    LISTEN      515/beam.smp        
tcp6       0      0 :::4369                 :::*                    LISTEN      554/epmd            
tcp6       0      0 fdaa:0:5906:a7b:1a:9:22 :::*                    LISTEN      516/hallpass        
tcp6       0      0 :::41979                :::*                    LISTEN      515/beam.smp        
tcp6       0      0 fdaa:0:5906:a7b:1:39888 fdaa:0:5906:a7b:1a:5432 ESTABLISHED 515/beam.smp        
tcp6       0      0 2604:1380:4111:1e:42526 2a04:4e42::644:80       TIME_WAIT   -                   
tcp6       0      0 ::1:48086               ::1:4369                ESTABLISHED 515/beam.smp        
tcp6       0      0 fdaa:0:5906:a7b:1:39906 fdaa:0:5906:a7b:1a:5432 ESTABLISHED 515/beam.smp        
tcp6       0      0 fdaa:0:5906:a7b:1:39894 fdaa:0:5906:a7b:1a:5432 ESTABLISHED 515/beam.smp        
tcp6       0    520 fdaa:0:5906:a7b:1a:9:22 fdaa:0:5906:a7b:1:48943 ESTABLISHED 516/hallpass        
tcp6       0      0 2604:1380:4111:1e:44908 2a04:4e42:3f::644:80    TIME_WAIT   -                   
tcp6       0      0 fdaa:0:5906:a7b:1:39882 fdaa:0:5906:a7b:1a:5432 ESTABLISHED 515/beam.smp        
tcp6       0      0 fdaa:0:5906:a7b:1:39916 fdaa:0:5906:a7b:1a:5432 ESTABLISHED 515/beam.smp        
tcp6       0      0 fdaa:0:5906:a7b:1:39890 fdaa:0:5906:a7b:1a:5432 ESTABLISHED 515/beam.smp        
tcp6       0      0 fdaa:0:5906:a7b:1:39892 fdaa:0:5906:a7b:1a:5432 ESTABLISHED 515/beam.smp        
tcp6       0      0 fdaa:0:5906:a7b:1:39884 fdaa:0:5906:a7b:1a:5432 ESTABLISHED 515/beam.smp        
tcp6       0      0 fdaa:0:5906:a7b:1:39898 fdaa:0:5906:a7b:1a:5432 ESTABLISHED 515/beam.smp        
tcp6       0      0 fdaa:0:5906:a7b:1:39880 fdaa:0:5906:a7b:1a:5432 ESTABLISHED 515/beam.smp        
tcp6       0      0 2604:1380:4111:1e:44910 2a04:4e42:3f::644:80    TIME_WAIT   -                   
tcp6       0      0 fdaa:0:5906:a7b:1:39912 fdaa:0:5906:a7b:1a:5432 ESTABLISHED 515/beam.smp        
tcp6       0      0 fdaa:0:5906:a7b:1:39902 fdaa:0:5906:a7b:1a:5432 ESTABLISHED 515/beam.smp        
tcp6       0      0 ::1:4369                ::1:48086               ESTABLISHED 554/epmd            
tcp6       0      0 fdaa:0:5906:a7b:1:39914 fdaa:0:5906:a7b:1a:5432 ESTABLISHED 515/beam.smp        
tcp6       0      0 fdaa:0:5906:a7b:1:39886 fdaa:0:5906:a7b:1a:5432 ESTABLISHED 515/beam.smp        
tcp6       0      0 fdaa:0:5906:a7b:1:39904 fdaa:0:5906:a7b:1a:5432 ESTABLISHED 515/beam.smp        
tcp6       0      0 fdaa:0:5906:a7b:1:39910 fdaa:0:5906:a7b:1a:5432 ESTABLISHED 515/beam.smp        
tcp6       0      0 fdaa:0:5906:a7b:1:39896 fdaa:0:5906:a7b:1a:5432 ESTABLISHED 515/beam.smp        
tcp6       0      0 fdaa:0:5906:a7b:1:39900 fdaa:0:5906:a7b:1a:5432 ESTABLISHED 515/beam.smp        
tcp6       0      0 fdaa:0:5906:a7b:1:39878 fdaa:0:5906:a7b:1a:5432 ESTABLISHED 515/beam.smp        
tcp6       0      0 fdaa:0:5906:a7b:1:39908 fdaa:0:5906:a7b:1a:5432 ESTABLISHED 515/beam.smp  

The app is responding internally:

root@97d86bd3:/# curl localhost:4000/heartbeat
{"upSince":"2022-04-07T16:26:07.554394Z","version":"7210486a4e81dee64a48c2665fcc3288bff12cb8"}root@97d86bd3:/# 

Digging in further, I think there’s also something screwy with the way env vars are getting setup:

fly.toml:

# fly.toml file generated for dead-vox on 2022-04-05T00:25:42-05:00

app = "dead-vox"

kill_signal = "SIGTERM"
kill_timeout = 10
processes = []

[deploy]
  release_command = "/app/bin/migrate"
  strategy = "immediate"

[env]
  PHX_HOST = "dead-vox.fly.dev"
  PORT = "8080"

[experimental]
  allowed_public_ports = []
  auto_rollback = true

[[services]]
  internal_port = 8080
  processes = ["app"]
  protocol = "tcp"
  script_checks = []

  [services.concurrency]
    hard_limit = 25
    soft_limit = 20
    type = "connections"

  [[services.ports]]
    force_https = true
    handlers = ["http"]
    port = 80

  [[services.ports]]
    handlers = ["tls", "http"]
    port = 443

  [[services.tcp_checks]]
    grace_period = "20s"
    interval = "15s"
    restart_limit = 0
    timeout = "2s"

#  [[services.http_checks]]
#    interval = 15000
#    grace_period = "30s"
#    method = "get"
#    path = "/heartbeat"
#    protocol = "http"
#    timeout = 2000
#    restart_limit = 5
#    [services.http_checks.headers]

Critically, I’m not seeing those env vars getting put into our VM:

# env | grep PORT
# env | grep PHX_HOST
# 

Env vars from fly secrets are showing up, as is the commit hash injected via flyctl deploy --remote-only -e CI_COMMIT_SHA="67ddabc7e12f62fa5a917666c1725521ed4c6efa" invoked in our CI pipeline.

If those vars from the toml aren’t getting set, that would explain the app listening on the wrong port and dropping traffic I guess. The question is, why aren’t they making it in?