Several weeks ago, I managed to successfully deploy an app (envoy proxy) on fly.
From what I gather by running flyctl apps list, I see that the app is in pending mode:
NAME OWNER STATUS LATEST DEPLOY
envoy personal pending 5m3s ago
fly-builder-twilight-river-4862 personal running 6m15s ago
I tried to restart the app, and even redeploy but I’m not getting any output.
It’s stuck at:
deployment-1615830648: digest: sha256:47861d6ac855a4da512efb7da80f732d226a9ef332b2bfa5d66b09b57540369e size: 3034
--> Done Pushing Image
==> Creating Release
Release v124 created
Deploying to : envoy.fly.dev
Monitoring Deployment
You can detach the terminal anytime without stopping the deployment
A quick flyctl status gives:
f status -a envoy
App
Name = envoy
Owner = personal
Version = 124
Status = pending
Hostname = envoy.fly.dev
Instances
ID VERSION REGION DESIRED STATUS HEALTH CHECKS RESTARTS CREATED
I also checked the regions, but it looks fine:
Region Pool:
sjc
Backup Region:
lax
sea
Last logs from when I issued a restart:
2021-03-15T17:23:54.427Z 0c611037 sjc [info] [runner] exiting
2021-03-15T17:23:54.440Z 0c611037 sjc [info] Main child exited normally with code: 0
2021-03-15T17:23:54.441Z 0c611037 sjc [info] Reaped child process with pid: 516 and signal: SIGKILL, core dumped? false
It looks like it may just be taking a hair too long to respond to health checks. If you run flyctl status --all you’ll see a failed VM from 10 min or so ago. Let me see if I can get it running.
Have a look now? I increased the check grace period to 30s. You can set this in your fly.toml by adding grace_period = "30s" under your health check definition.
Thanks to the new ssh console , I’m able to see why the HC failed:
Envoy isn’t able to connect to my droplet:
traceroute to 192.241.212.157 (192.241.212.157), 30 hops max, 46 byte packets
1 172.19.2.137 (172.19.2.137) 0.092 ms 0.144 ms 0.110 ms
2 169.254.6.1 (169.254.6.1) 0.217 ms 169.254.6.0 (169.254.6.0) 0.197 ms 0.148 ms
3 10.253.32.38 (10.253.32.38) 0.163 ms 0.124 ms 10.253.32.34 (10.253.32.34) 0.284 ms
4 10.253.32.2 (10.253.32.2) 0.649 ms 0.678 ms 10.253.32.6 (10.253.32.6) 0.682 ms
5 0.et-0-0-7.bsr1.sv5.packet.net (198.16.4.102) 2.259 ms 3.093 ms 0.et-0-0-7.bsr2.sv5.packet.net (198.16.4.104) 1.279 ms
6 eqix-sv1.digitalocean.com (206.223.117.65) 1.806 ms 1.554 ms as14061.sfmix.org (206.197.187.10) 3.331 ms
7 138.197.244.236 (138.197.244.236) 3.117 ms * 3.585 ms
8 138.197.248.207 (138.197.248.207) 3.029 ms * *
9 * * *
10 * * *
I’ve verified from multiple ISPs that the droplet is indeed reachable, and there are no network firewalls configured.
Is this something you have visibility on your end?
This looks like it’s a network problem on DigitalOcean’s end, just because it got to their routers in the facility. If that’s the case, they’ll likely have to fix it (although we are checking for a workaround).
One quick thing to try is running in a different region. They’re likely to go through a different peer from LAX, for example.
Ok, this is not a problem with that whole IP, it’s a problem with port 8443 specifically. Other ports on that IP work just fine. This is likely a firewall issue on our end, we’ll get it fixed.